LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2025-11-07 03:12:13 +08:00

Author	SHA1	Message	Date
Jonery	2ba03e6ef3	resolve gradient checkpointing issue. Former-commit-id: 6df9135d063bb6102f0cbcdf0d702076f5febbae	2024-04-16 12:05:27 +08:00
Jonery	22188f1fa3	Feature BAdam Former-commit-id: d8d2807fbcf587c37f7fd34a23e9397d2775ceed	2024-04-15 23:15:27 +08:00
hiyouga	be206df674	update examples Former-commit-id: 369294b31c8a03a1cafcee83eb31a817007d3c49	2024-04-15 22:14:34 +08:00
hoshi-hiyouga	740d89e9df	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral Former-commit-id: 07bbaf5c67d00a152e5304e81b15fd9189e7bb99	2024-04-15 15:38:16 +08:00
hiyouga	506276c9cb	fix #3273 Former-commit-id: 3b20c89b342a068356ffc29c3724b645775c65db	2024-04-15 15:32:58 +08:00
liuzc	44c86150c9	fix: mixtral output_router_logits Former-commit-id: ab3171ea97ec968b972287287ef9ee2502c6d37c	2024-04-15 12:11:49 +08:00
hiyouga	a97f8d1fa8	release v0.6.2 Former-commit-id: f92ad0a62d957b595f6a76a5403216b163eb3d17	2024-04-11 20:08:51 +08:00
hoshi-hiyouga	db51e05205	Update adapter.py Former-commit-id: 720fde3683529ed7e08ac27c7c4598c6bdc30d44	2024-04-10 00:57:51 +08:00
hoshi-hiyouga	bfb090ed7a	Update adapter.py Former-commit-id: a84b8d17dbf221259212e81931d80bcdd6284ad7	2024-04-10 00:57:30 +08:00
Erich Schubert	cc2ff3065f	Pass additional_target to unsloth Fixes #3200 Former-commit-id: f8f87f5b0549cba6a011749c42064047f82ba577	2024-04-09 17:53:40 +02:00
hiyouga	f8609236ab	fix quant infer and qwen2moe Former-commit-id: b75d16767f35c36e2cf2aaab8a3844135085bccf	2024-04-09 17:12:59 +08:00
hiyouga	d97150c571	fix resize vocab at inference #3022 Former-commit-id: c243720b89eec0af2872fa3c7980a0026d893f4d	2024-04-03 18:14:24 +08:00
hiyouga	75819c1220	simplify readme Former-commit-id: 0da6ec2d516326fe9c7583ba71cd1778eb838178	2024-04-02 20:07:43 +08:00
hiyouga	76ba7b51c1	add moe aux loss control #3085 Former-commit-id: c9187ebc944e2de454ace3304b7d28eabb1b1a81	2024-04-02 14:26:31 +08:00
hiyouga	e645b9f42b	fix #3022 Former-commit-id: dac2f617bda9470ac8d85c7e9def09cc04970506	2024-04-02 13:58:39 +08:00
hiyouga	357a32d7a0	fix #3083 Former-commit-id: ff9a3f73961a362d0ddc22079f80a85465fffda8	2024-04-01 22:53:52 +08:00
hiyouga	8365522ce2	fix #3077 Former-commit-id: d0340391e8075cff0d84b3ef879c2101b66ca1dc	2024-04-01 21:35:18 +08:00
hiyouga	4e3ee3b703	support infer 4bit model on GPUs #3023 Former-commit-id: 950a9dab9055839990656b2b40956792b253573d	2024-04-01 17:34:04 +08:00
hiyouga	0c96919aa5	tiny fix Former-commit-id: ba4a9b3c01e2f7467fbc5be268f47c0d003caa65	2024-03-31 00:10:29 +08:00
marko1616	e9060f37e4	fix blank line contains whitespace Former-commit-id: 7bc3bcc64353d5a1d4870c6a9509b64cff710492	2024-03-30 23:46:55 +08:00
marko1616	fb6e653443	Fix Llama model save for full param train Former-commit-id: ca17b5db4f97c3ec9fe2004877f150e8f51ab4b5	2024-03-30 23:45:04 +08:00
hiyouga	27f5c967e4	update trainers Former-commit-id: d0dd6eefed0b86895ed00a7cafb331e5193db645	2024-03-28 18:16:27 +08:00
hiyouga	52eb06e2ee	fix #2961 Former-commit-id: 616917bb3be7f71073b56ad8c7bc4e164b08b9b5	2024-03-26 17:26:14 +08:00
hiyouga	6d7c325f19	fix #2928 Former-commit-id: 9558ee87bc7260a6596385aaa375df544862bfa9	2024-03-24 00:34:54 +08:00
hiyouga	06019b7ee3	fix #2941 Former-commit-id: 3775ab52017f0b610ddd8199cccfb8c001eda507	2024-03-24 00:28:44 +08:00
hiyouga	b590e82d41	support fsdp + qlora Former-commit-id: b894bf8e84be689db258021f0638e9ac939abcbc	2024-03-21 00:36:06 +08:00
hiyouga	6302cd94c8	fix #2346 Former-commit-id: c8888c499b0ac51e2fc86c16e8e91c79400a5993	2024-03-20 17:56:33 +08:00
hiyouga	2e81c03f41	fix patcher Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5	2024-03-15 19:18:42 +08:00
S3Studio	bada9f71a7	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479	2024-03-15 08:59:13 +08:00
hiyouga	0d245f67ea	fix export Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b	2024-03-14 18:17:01 +08:00
hiyouga	4ef67ed4dd	improve lora+ impl. Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39	2024-03-13 23:32:51 +08:00
hiyouga	d233c4a71b	fix #2802 Former-commit-id: f4c56ccd785790c02f0d1275cd75958677a18690	2024-03-13 12:33:45 +08:00
hiyouga	33bef29828	fix kv cache Former-commit-id: a9588e36e95bed896eea8d79ba7108447ff08f4b	2024-03-13 01:21:50 +08:00
hiyouga	89770c5a8e	fix #2802 Former-commit-id: 1370db270d7ba1a20468abdb29193ce7534d1b4f	2024-03-12 17:08:34 +08:00
hiyouga	7538d8e726	fix #2732 Former-commit-id: bc39ad1d102b91d5417daa38b8a581e1e1ab2af9	2024-03-09 22:37:16 +08:00
hiyouga	56565bdbd4	allow non-packing pretraining Former-commit-id: 3fee5cc5a3db9ce874ad90f2500ec092d904bd4e	2024-03-09 22:21:46 +08:00
hiyouga	e16912b0c0	fix #2756 , patch #2746 Former-commit-id: 627d1c91e675f1d9ebf47bad123cbbf29821da4d	2024-03-09 02:01:26 +08:00
hoshi-hiyouga	5469111c65	Merge pull request #2746 from stephen-nju/main fix deepspeed ppo RuntimeError Former-commit-id: 656c653f0c628f9494b4d7ae12e60c8eeec1ea7a	2024-03-09 01:37:00 +08:00
hiyouga	1dd3f17f79	fix aqlm version Former-commit-id: 05673f81f0295c76957f3247c62f95fda322a63e	2024-03-09 00:09:09 +08:00
stephen_zhu	9e8fe6403d	update Former-commit-id: 295f9ef2eff2e8b5d7a21d3da8dd3e6eb2a42006	2024-03-08 12:47:44 +08:00
stephen	eb1ad9f161	fix ppo runtime error Former-commit-id: 14e2f221e3e720075e59065a3dc42aa4d993a8b6	2024-03-08 11:48:26 +08:00
hiyouga	b5187e4104	fix galore Former-commit-id: 62a3ceeef8f60caef43ccc7f971a0c9184e21296	2024-03-08 00:44:51 +08:00
hiyouga	ddabd699ca	support vllm Former-commit-id: 889f6e910e654d8ec3922c2185042d737ffbf1c3	2024-03-07 20:26:31 +08:00
hiyouga	a02d518edc	fix #2735 Former-commit-id: 416f6333f66b6afd70a3a936d82593efca583235	2024-03-07 16:15:53 +08:00
hiyouga	e7bea6981e	export use balanced gpu Former-commit-id: 710487dc694489bf3dfe54f8d32df80ce46439e4	2024-03-06 16:33:14 +08:00
hiyouga	4aa6db78fb	fix version checking Former-commit-id: 5780da8d640609cca388f55983d0251e5547209a	2024-03-06 14:51:51 +08:00
hiyouga	c60b53a164	improve aqlm optim Former-commit-id: 81be999b407e988c2f42764d827ac859d079ed3e	2024-03-05 20:49:50 +08:00
hiyouga	67bb861040	optimize aqlm training Former-commit-id: 8b42660e4039b3d6475f502f397686ba6b140627	2024-03-05 18:35:41 +08:00
hiyouga	0604a84208	fix dora inference Former-commit-id: 21b3597b0a05169afe51e1609b532787a65ca8ea	2024-03-05 11:51:41 +08:00
hiyouga	bfddb4b468	fix export on cpu device Former-commit-id: e4722a9a627ea4e9a1341cc00a3108dd06a6b550	2024-03-04 17:35:09 +08:00

1 2 3 4

163 Commits