LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2026-03-11 06:16:00 +08:00

Author	SHA1	Message	Date
S3Studio	46ef7416e6	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: `e75407febd`	2024-03-15 08:59:13 +08:00
hiyouga	2cf95d4efe	fix export Former-commit-id: `3b4a59bfb1`	2024-03-14 18:17:01 +08:00
hiyouga	8b8671817f	improve lora+ impl. Former-commit-id: `72367307df`	2024-03-13 23:32:51 +08:00
hiyouga	8673abbe5e	fix #2802 Former-commit-id: `b9f87cdc11`	2024-03-13 12:33:45 +08:00
hiyouga	a74426df0f	fix kv cache Former-commit-id: `96ce76cd27`	2024-03-13 01:21:50 +08:00
hiyouga	0b7e870b07	fix #2802 Former-commit-id: `8d8956bad5`	2024-03-12 17:08:34 +08:00
hiyouga	276def1897	fix #2732 Former-commit-id: `18ffce36b5`	2024-03-09 22:37:16 +08:00
hiyouga	868444e124	allow non-packing pretraining Former-commit-id: `bdb496644c`	2024-03-09 22:21:46 +08:00
hiyouga	c561b268ef	fix #2756 , patch #2746 Former-commit-id: `e8dd38b7fd`	2024-03-09 02:01:26 +08:00
hoshi-hiyouga	36d65289d0	Merge pull request #2746 from stephen-nju/main fix deepspeed ppo RuntimeError Former-commit-id: `516d0ddc66`	2024-03-09 01:37:00 +08:00
hiyouga	398c261c7c	fix aqlm version Former-commit-id: `10be2f0ecc`	2024-03-09 00:09:09 +08:00
stephen_zhu	c69b9fbe58	update Former-commit-id: `aa71571b77`	2024-03-08 12:47:44 +08:00
stephen	495b858606	fix ppo runtime error Former-commit-id: `cdb7f82869`	2024-03-08 11:48:26 +08:00
hiyouga	5b50458acf	fix galore Former-commit-id: `33a4c24a8a`	2024-03-08 00:44:51 +08:00
hiyouga	34533b2f35	support vllm Former-commit-id: `d07ad5cc1c`	2024-03-07 20:26:31 +08:00
hiyouga	37e40563f1	fix #2735 Former-commit-id: `f74f804a71`	2024-03-07 16:15:53 +08:00
hiyouga	8b6c178249	export use balanced gpu Former-commit-id: `3e84f430b1`	2024-03-06 16:33:14 +08:00
hiyouga	e887aface7	fix version checking Former-commit-id: `3016e65657`	2024-03-06 14:51:51 +08:00
hiyouga	9561809ce9	improve aqlm optim Former-commit-id: `259af60d28`	2024-03-05 20:49:50 +08:00
hiyouga	c776cdfc3e	optimize aqlm training Former-commit-id: `d3d3dac707`	2024-03-05 18:35:41 +08:00
hiyouga	0f2250b831	fix dora inference Former-commit-id: `ddf352f861`	2024-03-05 11:51:41 +08:00
hiyouga	a62d17d009	fix export on cpu device Former-commit-id: `cda2ff8727`	2024-03-04 17:35:09 +08:00
hiyouga	d1e6e02461	fix #2649 Former-commit-id: `4e5fae2fac`	2024-03-01 13:02:41 +08:00
hiyouga	3787d13816	fix #2642 Former-commit-id: `c0be617195`	2024-02-29 18:32:54 +08:00
hiyouga	1853b5c172	tiny fix Former-commit-id: `4a871e80e2`	2024-02-29 17:28:50 +08:00
hiyouga	8e7d50dae4	release v0.5.3 Former-commit-id: `fa5ab21ebc`	2024-02-29 00:34:19 +08:00
hiyouga	5abbca70d3	support DoRA, AWQ, AQLM #2512 Former-commit-id: `cfefacaa37`	2024-02-28 19:53:28 +08:00
hiyouga	0fcb931f18	support lora for llama pro Former-commit-id: `9aeb404a94`	2024-02-21 02:17:22 +08:00
hiyouga	62b78001b7	fix #2481 Former-commit-id: `22acab8aff`	2024-02-15 19:07:47 +08:00
hiyouga	96265ec154	support llama pro #2338 , add rslora Former-commit-id: `7924ffc55d`	2024-02-15 02:27:36 +08:00
younesbelkada	6b98435a53	add v1 hf tags Former-commit-id: `0ca0f08162`	2024-02-13 05:58:49 +00:00
hiyouga	75adbfec79	add option to disable version check Former-commit-id: `91d09a01ac`	2024-02-10 22:31:23 +08:00
hiyouga	bbe5ff0570	update gc kwargs Former-commit-id: `0ae9a16b9d`	2024-02-07 00:38:24 +08:00
hiyouga	caeffc780d	fix #2438 Former-commit-id: `ebf31b62eb`	2024-02-06 15:23:08 +08:00
hiyouga	f6b2bcfa16	fix #2420 Former-commit-id: `19d33ede13`	2024-02-04 15:51:47 +08:00
hiyouga	b1064d2f9b	bump up transformers version Former-commit-id: `38e63bfd28`	2024-02-04 00:01:16 +08:00
hiyouga	0fc8612b97	add hint for freeze #2412 Former-commit-id: `6545c02790`	2024-02-03 23:38:56 +08:00
hiyouga	a9e58740f5	fix #2376 Former-commit-id: `4ecadc3512`	2024-02-03 23:14:31 +08:00
hiyouga	7beeae2209	fix autoset attn impl, update data readme Former-commit-id: `521ad76552`	2024-01-31 11:58:07 +08:00
hiyouga	b8a827faeb	fix #2320 Former-commit-id: `2bc30763e9`	2024-01-24 16:19:18 +08:00
ldwang	323ec3f89f	Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3. Signed-off-by: ldwang <ftgreat@gmail.com> Former-commit-id: `c284665425`	2024-01-24 15:25:31 +08:00
ldwang	db500a2bb6	Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3. Signed-off-by: ldwang <ftgreat@gmail.com> Former-commit-id: `18923b1402`	2024-01-24 14:43:16 +08:00
hiyouga	60a042cc16	add hint Former-commit-id: `e4ba1deedf`	2024-01-22 23:32:01 +08:00
hoshi-hiyouga	68977c8ca4	Update patcher.py Former-commit-id: `bdc9eff635`	2024-01-22 23:27:39 +08:00
A-Cepheus	f00ad6b4f8	🐞 fix: typo Former-commit-id: `b06a31e76a`	2024-01-22 16:04:39 +08:00
A-Cepheus	39d9aba166	🐞 fix: typo, move MoE fix to patcher Former-commit-id: `319a72b48d`	2024-01-22 16:01:58 +08:00
A-Cepheus	8985c43033	fix: ZeRO3 does not work with MoE models Former-commit-id: `e1d5c98519`	2024-01-22 15:21:14 +08:00
hiyouga	fb2d563be5	fix #2268 Former-commit-id: `e0a717aa3a`	2024-01-21 14:11:38 +08:00
hiyouga	b27e91222c	format style Former-commit-id: `638234ceee`	2024-01-20 20:15:56 +08:00
hiyouga	69e8925249	support longlora for main branch Former-commit-id: `38af076a75`	2024-01-20 19:25:22 +08:00

1 2 3 4

185 Commits