LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2025-08-05 05:02:50 +08:00

Author	SHA1	Message	Date
hiyouga	79666c298d	fix #3365 Former-commit-id: a1d31ffc8cb7a6a477704efe779d485d83b8b9fb	2024-04-21 19:20:18 +08:00
hiyouga	ec81d45d27	fix mod stuff Former-commit-id: f58425ab45727f7859583d4b9fda776715e27ff6	2024-04-21 18:11:10 +08:00
hoshi-hiyouga	48fb0be1b9	Update patcher.py Former-commit-id: a950f3b81de701f5f23ce3efa60ff0382bb40dfe	2024-04-16 17:29:19 +08:00
Jonery	025f329445	Feature BAdam Former-commit-id: 06c8908d3fe48907ddb585c5fa15677fc5416f94	2024-04-15 23:15:27 +08:00
hoshi-hiyouga	1bdf7e4b9d	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral Former-commit-id: 0e0942d388bdb0122001a7f8e081315059d5d327	2024-04-15 15:38:16 +08:00
hiyouga	ceccad3419	fix #3273 Former-commit-id: efc345c4b0095ec959ea23bbe54c344278780cbe	2024-04-15 15:32:58 +08:00
liuzc	11f4afc5ad	fix: mixtral output_router_logits Former-commit-id: 9f4fe623866b10b30c6418dee116b36671274f9f	2024-04-15 12:11:49 +08:00
hiyouga	0e08c209c4	fix quant infer and qwen2moe Former-commit-id: 7f6c2486b83e1d2c96a2314bfa8e1519ca5f574e	2024-04-09 17:12:59 +08:00
hiyouga	bf5ffeeae0	simplify readme Former-commit-id: 92dab8a90bdd82a72a06559943467b56dde12c71	2024-04-02 20:07:43 +08:00
hiyouga	f4be51f356	add moe aux loss control #3085 Former-commit-id: b267aeb53fc49d2eeb0f3fc5ebe55e643f5db377	2024-04-02 14:26:31 +08:00
hiyouga	c7104f8fab	fix #3022 Former-commit-id: 9ddbe2866a4a4433d7635659a5635d16c59800b1	2024-04-02 13:58:39 +08:00
hiyouga	829cf6458a	fix #3083 Former-commit-id: 4a6ca621c09d179561acc5957c8c911a4e44184c	2024-04-01 22:53:52 +08:00
hiyouga	34f1de0574	fix #3077 Former-commit-id: aee634cd20e6dfdfbe2fbb47ae57f62b2da2bf9a	2024-04-01 21:35:18 +08:00
hiyouga	b7468ea0a8	support infer 4bit model on GPUs #3023 Former-commit-id: eb259cc5738dfb383e4cc5d32579501c580e11b1	2024-04-01 17:34:04 +08:00
hiyouga	3cf35e57db	tiny fix Former-commit-id: 27776c34741ca0c58ed793bcdf1acd5e4a81fb39	2024-03-31 00:10:29 +08:00
marko1616	5721074af1	fix blank line contains whitespace Former-commit-id: d9a5134617d494ef13ba73f9c540123e89a8c29c	2024-03-30 23:46:55 +08:00
marko1616	67c05c2031	Fix Llama model save for full param train Former-commit-id: eb178eaff390a1dc342cc35ab8c7820d654f3717	2024-03-30 23:45:04 +08:00
hiyouga	ec94e5e876	fix #2961 Former-commit-id: 511f6754026fbbf48bd481018015338a6a3ad92f	2024-03-26 17:26:14 +08:00
hiyouga	58aa576ae5	fix #2941 Former-commit-id: a1c8c98c5fecfc0dd0ed1be33ee8dd2ade05b708	2024-03-24 00:28:44 +08:00
hiyouga	7999836fb6	support fsdp + qlora Former-commit-id: 84082251621e1470b3b5406a56d0a967780a1804	2024-03-21 00:36:06 +08:00
hiyouga	cf149bf43c	fix #2346 Former-commit-id: 7b8f5029018f0481f7da83cc5ee4408d95c9beb2	2024-03-20 17:56:33 +08:00
hiyouga	a5537f3ee8	fix patcher Former-commit-id: 85c376fc1e0bcc854ed6e70e6455a0b00b341655	2024-03-15 19:18:42 +08:00
S3Studio	46ef7416e6	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: e75407febdec086f2bdca723a7f69a92b3b1d63f	2024-03-15 08:59:13 +08:00
hiyouga	2cf95d4efe	fix export Former-commit-id: 3b4a59bfb1866a270b9934a4a2303197ffdab531	2024-03-14 18:17:01 +08:00
hiyouga	8b8671817f	improve lora+ impl. Former-commit-id: 72367307dfadf936fb989ebe8bc9f0ff229fb933	2024-03-13 23:32:51 +08:00
hiyouga	a74426df0f	fix kv cache Former-commit-id: 96ce76cd2753bc91c781ad13aa8f7a972abe815a	2024-03-13 01:21:50 +08:00
hiyouga	0b7e870b07	fix #2802 Former-commit-id: 8d8956bad542c0e1c0f7edbf4ffc22bb0f8788ae	2024-03-12 17:08:34 +08:00
hiyouga	868444e124	allow non-packing pretraining Former-commit-id: bdb496644ce2c18806fc4fdae1fedcb3e5b5f808	2024-03-09 22:21:46 +08:00
hiyouga	c561b268ef	fix #2756 , patch #2746 Former-commit-id: e8dd38b7fdf8e172745d2538eb103895f2839c38	2024-03-09 02:01:26 +08:00
hoshi-hiyouga	36d65289d0	Merge pull request #2746 from stephen-nju/main fix deepspeed ppo RuntimeError Former-commit-id: 516d0ddc666c179616a2a610b1353728db57391e	2024-03-09 01:37:00 +08:00
hiyouga	398c261c7c	fix aqlm version Former-commit-id: 10be2f0eccc3963a985afcd24e5b8b8fc638b1c3	2024-03-09 00:09:09 +08:00
stephen_zhu	c69b9fbe58	update Former-commit-id: aa71571b773c5dc527b17219ec87828e4455b330	2024-03-08 12:47:44 +08:00
stephen	495b858606	fix ppo runtime error Former-commit-id: cdb7f82869b07d9d5d31b7b2aaf6b033bd00e32e	2024-03-08 11:48:26 +08:00
hiyouga	37e40563f1	fix #2735 Former-commit-id: f74f804a715dfb16bf24a056bc95db6b102f9ed7	2024-03-07 16:15:53 +08:00
hiyouga	8b6c178249	export use balanced gpu Former-commit-id: 3e84f430b14a94e68f5815d8e412f0d74d28a04c	2024-03-06 16:33:14 +08:00
hiyouga	c776cdfc3e	optimize aqlm training Former-commit-id: d3d3dac7070eb9055bcdc91eaf53f5b3741c0bda	2024-03-05 18:35:41 +08:00
hiyouga	0f2250b831	fix dora inference Former-commit-id: ddf352f861e04e813cb8adeb4513964b4945081a	2024-03-05 11:51:41 +08:00
hiyouga	a62d17d009	fix export on cpu device Former-commit-id: cda2ff87272797a062c7addb1bf840ac46208dfd	2024-03-04 17:35:09 +08:00
hiyouga	d1e6e02461	fix #2649 Former-commit-id: 4e5fae2fac85227641bd16159cf296a32e0b18b4	2024-03-01 13:02:41 +08:00
hiyouga	3787d13816	fix #2642 Former-commit-id: c0be617195f43d972681dd59727857b1247eeb7e	2024-02-29 18:32:54 +08:00
hiyouga	1853b5c172	tiny fix Former-commit-id: 4a871e80e205466262534cdc710b0495954b153e	2024-02-29 17:28:50 +08:00
hiyouga	8e7d50dae4	release v0.5.3 Former-commit-id: fa5ab21ebc0ab738178c0c57578db3bda995ae06	2024-02-29 00:34:19 +08:00
hiyouga	5abbca70d3	support DoRA, AWQ, AQLM #2512 Former-commit-id: cfefacaa37453a15c55866d019887f24e886a577	2024-02-28 19:53:28 +08:00
hiyouga	96265ec154	support llama pro #2338 , add rslora Former-commit-id: 7924ffc55d98e33bfbfbca303e46c8f476435673	2024-02-15 02:27:36 +08:00
hiyouga	bbe5ff0570	update gc kwargs Former-commit-id: 0ae9a16b9d13bc1093662aa0b9bd990400ec2646	2024-02-07 00:38:24 +08:00
hiyouga	caeffc780d	fix #2438 Former-commit-id: ebf31b62eb1b75399cff7c7542c45ac72f6f41dd	2024-02-06 15:23:08 +08:00
hiyouga	a9e58740f5	fix #2376 Former-commit-id: 4ecadc35122340b3e520804270c1c1d16c696830	2024-02-03 23:14:31 +08:00
hiyouga	7beeae2209	fix autoset attn impl, update data readme Former-commit-id: 521ad765521bb65aff5a29a8125a2b26ef00bff4	2024-01-31 11:58:07 +08:00
hiyouga	b8a827faeb	fix #2320 Former-commit-id: 2bc30763e9a40a82484c27b9a472425fdb9b3bd8	2024-01-24 16:19:18 +08:00
ldwang	323ec3f89f	Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3. Signed-off-by: ldwang <ftgreat@gmail.com> Former-commit-id: c284665425e8eefcea2d0dd1c835883e7ce18c97	2024-01-24 15:25:31 +08:00

1 2

79 Commits