LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2025-11-06 19:02:17 +08:00

Author	SHA1	Message	Date
hiyouga	e66b8ade4d	support unsloth generate Former-commit-id: 0ef1ad9f505dba71db9342f524cc3a7565e5e09e	2024-04-24 04:46:53 +08:00
hiyouga	460da206f6	refactor patcher Former-commit-id: 263cfe1294f5c3188f5e8d65791f35ee0d87315a	2024-04-24 03:02:23 +08:00
hiyouga	33f525455e	reenable sdpa and fast tok by default Former-commit-id: 9e00902dbedc71d55743d1bf237843506a557891	2024-04-24 02:18:44 +08:00
hiyouga	83b8bc8937	fix #3347 #3387 Former-commit-id: c253c18185a29b59190f3e0ed236c2bb4c788085	2024-04-24 01:30:16 +08:00
hiyouga	04da91e84e	fix #3365 Former-commit-id: 415ce41e8fa887e980e5bd575c8e95bd4076b90b	2024-04-21 19:20:18 +08:00
hiyouga	366c0eb1c5	fix mod stuff Former-commit-id: cf3988226e6398c67bb2955578e436fc505aa5c5	2024-04-21 18:11:10 +08:00
hoshi-hiyouga	5c3922713a	Merge pull request #3338 from astramind-ai/main Adding Mixture of Depth Former-commit-id: 4da2ece53353b63e672ff529d6beba41ff710c14	2024-04-21 18:05:52 +08:00
hoshi-hiyouga	7279a7014c	fix #3348 Former-commit-id: aa5e921c00f60074eceb2f9d4d8837cc713edba6	2024-04-20 10:34:09 +08:00
Marco	68dbd5d220	Added Mixture of Depths Former-commit-id: 75dd98b9abc847e22cb263c17ebcd2ca5dd98345	2024-04-18 20:31:24 +02:00
hiyouga	0d0c6612a5	fix #3324 Former-commit-id: 5e710c4ac331f3400534d33b2646c4108c898d98	2024-04-18 15:34:45 +08:00
hiyouga	dd992dcce9	fix #3316 Former-commit-id: 7395e9e90a209228ff563ab54319955608850fc3	2024-04-17 22:54:34 +08:00
hoshi-hiyouga	e8667f9c90	Merge pull request #3287 from Ledzy/badam [Feature] Add BAdam algorithm Former-commit-id: 10a5e1e65b34b03e5ca2a41bf6ded09a3fb25f0c	2024-04-16 17:32:16 +08:00
hoshi-hiyouga	b5c3d23a22	Update utils.py Former-commit-id: 7edf4dbed88b8034282f14fd6e0cb6f7f9e5f805	2024-04-16 17:29:30 +08:00
hoshi-hiyouga	88a080ced4	Update patcher.py Former-commit-id: 494e6a1e05b38f5ff61d83327303614f53c92e64	2024-04-16 17:29:19 +08:00
hoshi-hiyouga	12f43694be	Update adapter.py Former-commit-id: 8f7b75b26f020d8ae85baab7b082475c3bfeb512	2024-04-16 17:28:12 +08:00
Jonery	2ba03e6ef3	resolve gradient checkpointing issue. Former-commit-id: 6df9135d063bb6102f0cbcdf0d702076f5febbae	2024-04-16 12:05:27 +08:00
hiyouga	ba4efe3ff6	support unsloth 2024.4 Former-commit-id: 14a83f8bc4fe44783252378fce59198194a96bb8	2024-04-16 00:25:03 +08:00
hiyouga	2aa1d1476e	add codegemma Former-commit-id: 9324176525c2eda22962b0ca1895009b6237e6e3	2024-04-16 00:11:15 +08:00
hiyouga	19874e39ee	support cohere commandR #3184 Former-commit-id: e077c36872740f6b2ac255aee9da6c4c70f28977	2024-04-15 23:26:42 +08:00
Jonery	22188f1fa3	Feature BAdam Former-commit-id: d8d2807fbcf587c37f7fd34a23e9397d2775ceed	2024-04-15 23:15:27 +08:00
hiyouga	be206df674	update examples Former-commit-id: 369294b31c8a03a1cafcee83eb31a817007d3c49	2024-04-15 22:14:34 +08:00
hoshi-hiyouga	740d89e9df	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral Former-commit-id: 07bbaf5c67d00a152e5304e81b15fd9189e7bb99	2024-04-15 15:38:16 +08:00
hiyouga	506276c9cb	fix #3273 Former-commit-id: 3b20c89b342a068356ffc29c3724b645775c65db	2024-04-15 15:32:58 +08:00
liuzc	44c86150c9	fix: mixtral output_router_logits Former-commit-id: ab3171ea97ec968b972287287ef9ee2502c6d37c	2024-04-15 12:11:49 +08:00
hiyouga	a97f8d1fa8	release v0.6.2 Former-commit-id: f92ad0a62d957b595f6a76a5403216b163eb3d17	2024-04-11 20:08:51 +08:00
hoshi-hiyouga	db51e05205	Update adapter.py Former-commit-id: 720fde3683529ed7e08ac27c7c4598c6bdc30d44	2024-04-10 00:57:51 +08:00
hoshi-hiyouga	bfb090ed7a	Update adapter.py Former-commit-id: a84b8d17dbf221259212e81931d80bcdd6284ad7	2024-04-10 00:57:30 +08:00
Erich Schubert	cc2ff3065f	Pass additional_target to unsloth Fixes #3200 Former-commit-id: f8f87f5b0549cba6a011749c42064047f82ba577	2024-04-09 17:53:40 +02:00
hiyouga	f8609236ab	fix quant infer and qwen2moe Former-commit-id: b75d16767f35c36e2cf2aaab8a3844135085bccf	2024-04-09 17:12:59 +08:00
hiyouga	d97150c571	fix resize vocab at inference #3022 Former-commit-id: c243720b89eec0af2872fa3c7980a0026d893f4d	2024-04-03 18:14:24 +08:00
hiyouga	75819c1220	simplify readme Former-commit-id: 0da6ec2d516326fe9c7583ba71cd1778eb838178	2024-04-02 20:07:43 +08:00
hiyouga	76ba7b51c1	add moe aux loss control #3085 Former-commit-id: c9187ebc944e2de454ace3304b7d28eabb1b1a81	2024-04-02 14:26:31 +08:00
hiyouga	e645b9f42b	fix #3022 Former-commit-id: dac2f617bda9470ac8d85c7e9def09cc04970506	2024-04-02 13:58:39 +08:00
hiyouga	357a32d7a0	fix #3083 Former-commit-id: ff9a3f73961a362d0ddc22079f80a85465fffda8	2024-04-01 22:53:52 +08:00
hiyouga	8365522ce2	fix #3077 Former-commit-id: d0340391e8075cff0d84b3ef879c2101b66ca1dc	2024-04-01 21:35:18 +08:00
hiyouga	4e3ee3b703	support infer 4bit model on GPUs #3023 Former-commit-id: 950a9dab9055839990656b2b40956792b253573d	2024-04-01 17:34:04 +08:00
hiyouga	0c96919aa5	tiny fix Former-commit-id: ba4a9b3c01e2f7467fbc5be268f47c0d003caa65	2024-03-31 00:10:29 +08:00
marko1616	e9060f37e4	fix blank line contains whitespace Former-commit-id: 7bc3bcc64353d5a1d4870c6a9509b64cff710492	2024-03-30 23:46:55 +08:00
marko1616	fb6e653443	Fix Llama model save for full param train Former-commit-id: ca17b5db4f97c3ec9fe2004877f150e8f51ab4b5	2024-03-30 23:45:04 +08:00
hiyouga	27f5c967e4	update trainers Former-commit-id: d0dd6eefed0b86895ed00a7cafb331e5193db645	2024-03-28 18:16:27 +08:00
hiyouga	52eb06e2ee	fix #2961 Former-commit-id: 616917bb3be7f71073b56ad8c7bc4e164b08b9b5	2024-03-26 17:26:14 +08:00
hiyouga	6d7c325f19	fix #2928 Former-commit-id: 9558ee87bc7260a6596385aaa375df544862bfa9	2024-03-24 00:34:54 +08:00
hiyouga	06019b7ee3	fix #2941 Former-commit-id: 3775ab52017f0b610ddd8199cccfb8c001eda507	2024-03-24 00:28:44 +08:00
hiyouga	b590e82d41	support fsdp + qlora Former-commit-id: b894bf8e84be689db258021f0638e9ac939abcbc	2024-03-21 00:36:06 +08:00
hiyouga	6302cd94c8	fix #2346 Former-commit-id: c8888c499b0ac51e2fc86c16e8e91c79400a5993	2024-03-20 17:56:33 +08:00
hiyouga	2e81c03f41	fix patcher Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5	2024-03-15 19:18:42 +08:00
S3Studio	bada9f71a7	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479	2024-03-15 08:59:13 +08:00
hiyouga	0d245f67ea	fix export Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b	2024-03-14 18:17:01 +08:00
hiyouga	4ef67ed4dd	improve lora+ impl. Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39	2024-03-13 23:32:51 +08:00
hiyouga	d233c4a71b	fix #2802 Former-commit-id: f4c56ccd785790c02f0d1275cd75958677a18690	2024-03-13 12:33:45 +08:00

1 2 3 4

181 Commits