LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2026-05-02 00:28:53 +08:00

Author	SHA1	Message	Date
hiyouga	c05027d14a	remove redundant code Former-commit-id: 4a7a7ad2bcdc493458084f5f3d384239228b7d5a	2024-04-24 05:02:18 +08:00
hiyouga	5420905a2e	support unsloth generate Former-commit-id: 0ef1ad9f505dba71db9342f524cc3a7565e5e09e	2024-04-24 04:46:53 +08:00
hiyouga	03f2e3284a	refactor patcher Former-commit-id: 263cfe1294f5c3188f5e8d65791f35ee0d87315a	2024-04-24 03:02:23 +08:00
hiyouga	d2bb1b3a6b	reenable sdpa and fast tok by default Former-commit-id: 9e00902dbedc71d55743d1bf237843506a557891	2024-04-24 02:18:44 +08:00
hiyouga	35c4a2c212	fix #3347 #3387 Former-commit-id: c253c18185a29b59190f3e0ed236c2bb4c788085	2024-04-24 01:30:16 +08:00
hiyouga	1d341dcd83	fix #3365 Former-commit-id: 415ce41e8fa887e980e5bd575c8e95bd4076b90b	2024-04-21 19:20:18 +08:00
hiyouga	f8e219dc81	fix mod stuff Former-commit-id: cf3988226e6398c67bb2955578e436fc505aa5c5	2024-04-21 18:11:10 +08:00
hoshi-hiyouga	3365cc8cf0	Merge pull request #3338 from astramind-ai/main Adding Mixture of Depth Former-commit-id: 4da2ece53353b63e672ff529d6beba41ff710c14	2024-04-21 18:05:52 +08:00
hoshi-hiyouga	3a5e68b7d9	fix #3348 Former-commit-id: aa5e921c00f60074eceb2f9d4d8837cc713edba6	2024-04-20 10:34:09 +08:00
Marco	44cda2eece	Added Mixture of Depths Former-commit-id: 75dd98b9abc847e22cb263c17ebcd2ca5dd98345	2024-04-18 20:31:24 +02:00
hiyouga	9e1bd6420d	fix #3324 Former-commit-id: 5e710c4ac331f3400534d33b2646c4108c898d98	2024-04-18 15:34:45 +08:00
hiyouga	bee796f6b5	fix #3316 Former-commit-id: 7395e9e90a209228ff563ab54319955608850fc3	2024-04-17 22:54:34 +08:00
hoshi-hiyouga	42084e08ae	Merge pull request #3287 from Ledzy/badam [Feature] Add BAdam algorithm Former-commit-id: 10a5e1e65b34b03e5ca2a41bf6ded09a3fb25f0c	2024-04-16 17:32:16 +08:00
hoshi-hiyouga	c7c216069c	Update utils.py Former-commit-id: 7edf4dbed88b8034282f14fd6e0cb6f7f9e5f805	2024-04-16 17:29:30 +08:00
hoshi-hiyouga	cde9d1b917	Update patcher.py Former-commit-id: 494e6a1e05b38f5ff61d83327303614f53c92e64	2024-04-16 17:29:19 +08:00
hoshi-hiyouga	96213f04b0	Update adapter.py Former-commit-id: 8f7b75b26f020d8ae85baab7b082475c3bfeb512	2024-04-16 17:28:12 +08:00
Jonery	6dd6b3e396	resolve gradient checkpointing issue. Former-commit-id: 6df9135d063bb6102f0cbcdf0d702076f5febbae	2024-04-16 12:05:27 +08:00
hiyouga	efa808069a	support unsloth 2024.4 Former-commit-id: 14a83f8bc4fe44783252378fce59198194a96bb8	2024-04-16 00:25:03 +08:00
hiyouga	b5c5283dd6	add codegemma Former-commit-id: 9324176525c2eda22962b0ca1895009b6237e6e3	2024-04-16 00:11:15 +08:00
hiyouga	b638c65519	support cohere commandR #3184 Former-commit-id: e077c36872740f6b2ac255aee9da6c4c70f28977	2024-04-15 23:26:42 +08:00
Jonery	d4d471450f	Feature BAdam Former-commit-id: d8d2807fbcf587c37f7fd34a23e9397d2775ceed	2024-04-15 23:15:27 +08:00
hiyouga	276f2cb24e	update examples Former-commit-id: 369294b31c8a03a1cafcee83eb31a817007d3c49	2024-04-15 22:14:34 +08:00
hoshi-hiyouga	0c80751e87	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral Former-commit-id: 07bbaf5c67d00a152e5304e81b15fd9189e7bb99	2024-04-15 15:38:16 +08:00
hiyouga	9338f878a3	fix #3273 Former-commit-id: 3b20c89b342a068356ffc29c3724b645775c65db	2024-04-15 15:32:58 +08:00
liuzc	fde3d91242	fix: mixtral output_router_logits Former-commit-id: ab3171ea97ec968b972287287ef9ee2502c6d37c	2024-04-15 12:11:49 +08:00
hiyouga	7468f2535c	release v0.6.2 Former-commit-id: f92ad0a62d957b595f6a76a5403216b163eb3d17	2024-04-11 20:08:51 +08:00
hoshi-hiyouga	7856f98965	Update adapter.py Former-commit-id: 720fde3683529ed7e08ac27c7c4598c6bdc30d44	2024-04-10 00:57:51 +08:00
hoshi-hiyouga	e25ddef08c	Update adapter.py Former-commit-id: a84b8d17dbf221259212e81931d80bcdd6284ad7	2024-04-10 00:57:30 +08:00
Erich Schubert	95a4589bbf	Pass additional_target to unsloth Fixes #3200 Former-commit-id: f8f87f5b0549cba6a011749c42064047f82ba577	2024-04-09 17:53:40 +02:00
hiyouga	566d71b7a9	fix quant infer and qwen2moe Former-commit-id: b75d16767f35c36e2cf2aaab8a3844135085bccf	2024-04-09 17:12:59 +08:00
hiyouga	1348f7d860	fix resize vocab at inference #3022 Former-commit-id: c243720b89eec0af2872fa3c7980a0026d893f4d	2024-04-03 18:14:24 +08:00
hiyouga	b12176d818	simplify readme Former-commit-id: 0da6ec2d516326fe9c7583ba71cd1778eb838178	2024-04-02 20:07:43 +08:00
hiyouga	117b67ea30	add moe aux loss control #3085 Former-commit-id: c9187ebc944e2de454ace3304b7d28eabb1b1a81	2024-04-02 14:26:31 +08:00
hiyouga	03e20bb5c6	fix #3022 Former-commit-id: dac2f617bda9470ac8d85c7e9def09cc04970506	2024-04-02 13:58:39 +08:00
hiyouga	1dc963caa6	fix #3083 Former-commit-id: ff9a3f73961a362d0ddc22079f80a85465fffda8	2024-04-01 22:53:52 +08:00
hiyouga	40211db275	fix #3077 Former-commit-id: d0340391e8075cff0d84b3ef879c2101b66ca1dc	2024-04-01 21:35:18 +08:00
hiyouga	e7f13098c6	support infer 4bit model on GPUs #3023 Former-commit-id: 950a9dab9055839990656b2b40956792b253573d	2024-04-01 17:34:04 +08:00
hiyouga	526111a303	tiny fix Former-commit-id: ba4a9b3c01e2f7467fbc5be268f47c0d003caa65	2024-03-31 00:10:29 +08:00
marko1616	1f617c6e08	fix blank line contains whitespace Former-commit-id: 7bc3bcc64353d5a1d4870c6a9509b64cff710492	2024-03-30 23:46:55 +08:00
marko1616	a6858a36c0	Fix Llama model save for full param train Former-commit-id: ca17b5db4f97c3ec9fe2004877f150e8f51ab4b5	2024-03-30 23:45:04 +08:00
hiyouga	59e6ebf039	update trainers Former-commit-id: d0dd6eefed0b86895ed00a7cafb331e5193db645	2024-03-28 18:16:27 +08:00
hiyouga	3336422760	fix #2961 Former-commit-id: 616917bb3be7f71073b56ad8c7bc4e164b08b9b5	2024-03-26 17:26:14 +08:00
hiyouga	c548ad5e69	fix #2928 Former-commit-id: 9558ee87bc7260a6596385aaa375df544862bfa9	2024-03-24 00:34:54 +08:00
hiyouga	a57d839e1d	fix #2941 Former-commit-id: 3775ab52017f0b610ddd8199cccfb8c001eda507	2024-03-24 00:28:44 +08:00
hiyouga	935ee0a023	support fsdp + qlora Former-commit-id: b894bf8e84be689db258021f0638e9ac939abcbc	2024-03-21 00:36:06 +08:00
hiyouga	d8073488be	fix #2346 Former-commit-id: c8888c499b0ac51e2fc86c16e8e91c79400a5993	2024-03-20 17:56:33 +08:00
hiyouga	e194efab10	fix patcher Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5	2024-03-15 19:18:42 +08:00
S3Studio	096869c7b6	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479	2024-03-15 08:59:13 +08:00
hiyouga	aabe90343e	fix export Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b	2024-03-14 18:17:01 +08:00
hiyouga	46f99ff277	improve lora+ impl. Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39	2024-03-13 23:32:51 +08:00

1 2 3 4

182 Commits