LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2026-03-13 07:26:00 +08:00

Author	SHA1	Message	Date
hiyouga	6302cd94c8	fix #2346 Former-commit-id: c8888c499b0ac51e2fc86c16e8e91c79400a5993	2024-03-20 17:56:33 +08:00
hiyouga	2e81c03f41	fix patcher Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5	2024-03-15 19:18:42 +08:00
S3Studio	bada9f71a7	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479	2024-03-15 08:59:13 +08:00
hiyouga	0d245f67ea	fix export Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b	2024-03-14 18:17:01 +08:00
hiyouga	4ef67ed4dd	improve lora+ impl. Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39	2024-03-13 23:32:51 +08:00
hiyouga	d233c4a71b	fix #2802 Former-commit-id: f4c56ccd785790c02f0d1275cd75958677a18690	2024-03-13 12:33:45 +08:00
hiyouga	33bef29828	fix kv cache Former-commit-id: a9588e36e95bed896eea8d79ba7108447ff08f4b	2024-03-13 01:21:50 +08:00
hiyouga	89770c5a8e	fix #2802 Former-commit-id: 1370db270d7ba1a20468abdb29193ce7534d1b4f	2024-03-12 17:08:34 +08:00
hiyouga	7538d8e726	fix #2732 Former-commit-id: bc39ad1d102b91d5417daa38b8a581e1e1ab2af9	2024-03-09 22:37:16 +08:00
hiyouga	56565bdbd4	allow non-packing pretraining Former-commit-id: 3fee5cc5a3db9ce874ad90f2500ec092d904bd4e	2024-03-09 22:21:46 +08:00
hiyouga	e16912b0c0	fix #2756 , patch #2746 Former-commit-id: 627d1c91e675f1d9ebf47bad123cbbf29821da4d	2024-03-09 02:01:26 +08:00
hoshi-hiyouga	5469111c65	Merge pull request #2746 from stephen-nju/main fix deepspeed ppo RuntimeError Former-commit-id: 656c653f0c628f9494b4d7ae12e60c8eeec1ea7a	2024-03-09 01:37:00 +08:00
hiyouga	1dd3f17f79	fix aqlm version Former-commit-id: 05673f81f0295c76957f3247c62f95fda322a63e	2024-03-09 00:09:09 +08:00
stephen_zhu	9e8fe6403d	update Former-commit-id: 295f9ef2eff2e8b5d7a21d3da8dd3e6eb2a42006	2024-03-08 12:47:44 +08:00
stephen	eb1ad9f161	fix ppo runtime error Former-commit-id: 14e2f221e3e720075e59065a3dc42aa4d993a8b6	2024-03-08 11:48:26 +08:00
hiyouga	b5187e4104	fix galore Former-commit-id: 62a3ceeef8f60caef43ccc7f971a0c9184e21296	2024-03-08 00:44:51 +08:00
hiyouga	ddabd699ca	support vllm Former-commit-id: 889f6e910e654d8ec3922c2185042d737ffbf1c3	2024-03-07 20:26:31 +08:00
hiyouga	a02d518edc	fix #2735 Former-commit-id: 416f6333f66b6afd70a3a936d82593efca583235	2024-03-07 16:15:53 +08:00
hiyouga	e7bea6981e	export use balanced gpu Former-commit-id: 710487dc694489bf3dfe54f8d32df80ce46439e4	2024-03-06 16:33:14 +08:00
hiyouga	4aa6db78fb	fix version checking Former-commit-id: 5780da8d640609cca388f55983d0251e5547209a	2024-03-06 14:51:51 +08:00
hiyouga	c60b53a164	improve aqlm optim Former-commit-id: 81be999b407e988c2f42764d827ac859d079ed3e	2024-03-05 20:49:50 +08:00
hiyouga	67bb861040	optimize aqlm training Former-commit-id: 8b42660e4039b3d6475f502f397686ba6b140627	2024-03-05 18:35:41 +08:00
hiyouga	0604a84208	fix dora inference Former-commit-id: 21b3597b0a05169afe51e1609b532787a65ca8ea	2024-03-05 11:51:41 +08:00
hiyouga	bfddb4b468	fix export on cpu device Former-commit-id: e4722a9a627ea4e9a1341cc00a3108dd06a6b550	2024-03-04 17:35:09 +08:00
hiyouga	10845a2fe7	fix #2649 Former-commit-id: 1c850de660c671d92f0bc63f230d338b60b7c0bd	2024-03-01 13:02:41 +08:00
hiyouga	c5cbe2c6f9	fix #2642 Former-commit-id: d8435e7f1850532310e1bee069b45f38cd666e48	2024-02-29 18:32:54 +08:00
hiyouga	c5b9c285b4	tiny fix Former-commit-id: 3b6e1132c4d203e6d5376cf97e81cc160697c822	2024-02-29 17:28:50 +08:00
hiyouga	443d85d80f	release v0.5.3 Former-commit-id: f6bc89581b3cd129448da2defc23848de6f494ed	2024-02-29 00:34:19 +08:00
hiyouga	98a0c8e8bf	support DoRA, AWQ, AQLM #2512 Former-commit-id: 6614cc1f08aa944db083e27e451bbdd733f7dd97	2024-02-28 19:53:28 +08:00
hiyouga	8fd6803294	support lora for llama pro Former-commit-id: f74c78ba95f0545aae89e603e466f494705ad024	2024-02-21 02:17:22 +08:00
hiyouga	a6ff18ab17	fix #2481 Former-commit-id: 2a4e3e4a26a2fad77ccc476be7d45434b8af4a55	2024-02-15 19:07:47 +08:00
hiyouga	562b9d0167	support llama pro #2338 , add rslora Former-commit-id: 40d659b7f30dd5a004703c176ec1f22dc864e505	2024-02-15 02:27:36 +08:00
younesbelkada	4b195603c9	add v1 hf tags Former-commit-id: a29cc9f4472c95cd6a43ea350ab728e0a8069c6e	2024-02-13 05:58:49 +00:00
hiyouga	a52372df01	add option to disable version check Former-commit-id: fd769cb2de696aee3c5e882237e16eace6a9d675	2024-02-10 22:31:23 +08:00
hiyouga	3c35a04280	update gc kwargs Former-commit-id: 0cb81c156bc8c21a4bbdd3289a491f78dfcaf730	2024-02-07 00:38:24 +08:00
hiyouga	1a5edb7144	fix #2438 Former-commit-id: 412d856eeada2abcea598fac0a8d35ae90cc9c01	2024-02-06 15:23:08 +08:00
hiyouga	84ec6b5d01	fix #2420 Former-commit-id: 7a34087e4db62e603c9a9a26d8ff3910d7b10c40	2024-02-04 15:51:47 +08:00
hiyouga	4adb4477bc	bump up transformers version Former-commit-id: 82f4d4301ed9f31b160d6313a1d2d44a22865f4d	2024-02-04 00:01:16 +08:00
hiyouga	80c0ad44a3	add hint for freeze #2412 Former-commit-id: 9600c93633629605573d908019563fa3870ad6f8	2024-02-03 23:38:56 +08:00
hiyouga	434e858c4e	fix #2376 Former-commit-id: 8e2cfa7cca21b7fd4538d72114e36f704bcc82fe	2024-02-03 23:14:31 +08:00
hiyouga	7f28b961a5	fix autoset attn impl, update data readme Former-commit-id: 34a6e5f82baf45cc8dbb11f9f7ab4a480ab7ec5c	2024-01-31 11:58:07 +08:00
hiyouga	eafdae8e94	fix #2320 Former-commit-id: e0b0c4415aaf80e75f6dd4f3777a0616b0e60f84	2024-01-24 16:19:18 +08:00
ldwang	d1a35c3fb1	Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3. Signed-off-by: ldwang <ftgreat@gmail.com> Former-commit-id: 5f50c02f0e425737cd80abdf8fde9e25abf13083	2024-01-24 15:25:31 +08:00
ldwang	0df7de7ab0	Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3. Signed-off-by: ldwang <ftgreat@gmail.com> Former-commit-id: d1413dcec8a3b1d671f240b82a689c72b54d7b93	2024-01-24 14:43:16 +08:00
hiyouga	115e7af908	add hint Former-commit-id: c540ef41bda61993b83ef8cfe3c84b1d169e984c	2024-01-22 23:32:01 +08:00
hoshi-hiyouga	e3dfc7de11	Update patcher.py Former-commit-id: 33556cc6b0b65cc6db02e66f4f6e75112c33d966	2024-01-22 23:27:39 +08:00
A-Cepheus	fbaa6269d4	🐞 fix: typo Former-commit-id: 57a3687ecd23237559aee0e8e811b782846f2415	2024-01-22 16:04:39 +08:00
A-Cepheus	ff4027e448	🐞 fix: typo, move MoE fix to patcher Former-commit-id: 4ff28e99ff9b48df7150591c6bbd3723f22b7715	2024-01-22 16:01:58 +08:00
A-Cepheus	96f8e94cd4	fix: ZeRO3 does not work with MoE models Former-commit-id: b2844c049a88ea89f8e1812e2d2e8662b4002965	2024-01-22 15:21:14 +08:00
hiyouga	e4fe8c79ee	fix #2268 Former-commit-id: 300ecf9b9d7fd99fbb68f3d086e3ad973c2f894e	2024-01-21 14:11:38 +08:00

1 2 3

137 Commits