LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2025-08-06 21:52:50 +08:00

Author	SHA1	Message	Date
hiyouga	bd2b758b48	add codegemma Former-commit-id: 6543f3d4496218f7f90c582cb6aa8c852d716cbf	2024-04-16 00:11:15 +08:00
hiyouga	2dc3343b1c	support cohere commandR #3184 Former-commit-id: e0dbac28450a0e1e0b84e1577ef785fc762c0b46	2024-04-15 23:26:42 +08:00
hiyouga	fb385b8c26	update examples Former-commit-id: cce52351b54f70904f33902d9c17411134f9f6eb	2024-04-15 22:14:34 +08:00
hoshi-hiyouga	1bdf7e4b9d	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral Former-commit-id: 0e0942d388bdb0122001a7f8e081315059d5d327	2024-04-15 15:38:16 +08:00
hiyouga	ceccad3419	fix #3273 Former-commit-id: efc345c4b0095ec959ea23bbe54c344278780cbe	2024-04-15 15:32:58 +08:00
liuzc	11f4afc5ad	fix: mixtral output_router_logits Former-commit-id: 9f4fe623866b10b30c6418dee116b36671274f9f	2024-04-15 12:11:49 +08:00
hiyouga	431e9804ee	release v0.6.2 Former-commit-id: 9d4c949461d232a959c14859ae7fef191faab711	2024-04-11 20:08:51 +08:00
hoshi-hiyouga	77d16ada1e	Update adapter.py Former-commit-id: 98bc97d8d218182c026e9f57bbcbf40ab1e0bc87	2024-04-10 00:57:51 +08:00
hoshi-hiyouga	e5b4cb62e0	Update adapter.py Former-commit-id: 2111b586b648caa150a8e41877c7fede75911da8	2024-04-10 00:57:30 +08:00
Erich Schubert	3dccd3c67e	Pass additional_target to unsloth Fixes #3200 Former-commit-id: b5eefe5c4c084b63a12b023cae877fcd1914d4fc	2024-04-09 17:53:40 +02:00
hiyouga	0e08c209c4	fix quant infer and qwen2moe Former-commit-id: 7f6c2486b83e1d2c96a2314bfa8e1519ca5f574e	2024-04-09 17:12:59 +08:00
hiyouga	2ecf2bcbf0	fix resize vocab at inference #3022 Former-commit-id: 148bda353f0b53af022c51da9a9e59a56f341510	2024-04-03 18:14:24 +08:00
hiyouga	bf5ffeeae0	simplify readme Former-commit-id: 92dab8a90bdd82a72a06559943467b56dde12c71	2024-04-02 20:07:43 +08:00
hiyouga	f4be51f356	add moe aux loss control #3085 Former-commit-id: b267aeb53fc49d2eeb0f3fc5ebe55e643f5db377	2024-04-02 14:26:31 +08:00
hiyouga	c7104f8fab	fix #3022 Former-commit-id: 9ddbe2866a4a4433d7635659a5635d16c59800b1	2024-04-02 13:58:39 +08:00
hiyouga	829cf6458a	fix #3083 Former-commit-id: 4a6ca621c09d179561acc5957c8c911a4e44184c	2024-04-01 22:53:52 +08:00
hiyouga	34f1de0574	fix #3077 Former-commit-id: aee634cd20e6dfdfbe2fbb47ae57f62b2da2bf9a	2024-04-01 21:35:18 +08:00
hiyouga	b7468ea0a8	support infer 4bit model on GPUs #3023 Former-commit-id: eb259cc5738dfb383e4cc5d32579501c580e11b1	2024-04-01 17:34:04 +08:00
hiyouga	3cf35e57db	tiny fix Former-commit-id: 27776c34741ca0c58ed793bcdf1acd5e4a81fb39	2024-03-31 00:10:29 +08:00
marko1616	5721074af1	fix blank line contains whitespace Former-commit-id: d9a5134617d494ef13ba73f9c540123e89a8c29c	2024-03-30 23:46:55 +08:00
marko1616	67c05c2031	Fix Llama model save for full param train Former-commit-id: eb178eaff390a1dc342cc35ab8c7820d654f3717	2024-03-30 23:45:04 +08:00
hiyouga	89c400633a	update trainers Former-commit-id: 8c77b1091296e204dc3c8c1f157c288ca5b236bd	2024-03-28 18:16:27 +08:00
hiyouga	ec94e5e876	fix #2961 Former-commit-id: 511f6754026fbbf48bd481018015338a6a3ad92f	2024-03-26 17:26:14 +08:00
hiyouga	75829c8699	fix #2928 Former-commit-id: 7afbc85daee295cf38dcee9ded5afd87b2c4cfd1	2024-03-24 00:34:54 +08:00
hiyouga	58aa576ae5	fix #2941 Former-commit-id: a1c8c98c5fecfc0dd0ed1be33ee8dd2ade05b708	2024-03-24 00:28:44 +08:00
hiyouga	7999836fb6	support fsdp + qlora Former-commit-id: 84082251621e1470b3b5406a56d0a967780a1804	2024-03-21 00:36:06 +08:00
hiyouga	cf149bf43c	fix #2346 Former-commit-id: 7b8f5029018f0481f7da83cc5ee4408d95c9beb2	2024-03-20 17:56:33 +08:00
hiyouga	a5537f3ee8	fix patcher Former-commit-id: 85c376fc1e0bcc854ed6e70e6455a0b00b341655	2024-03-15 19:18:42 +08:00
S3Studio	46ef7416e6	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: e75407febdec086f2bdca723a7f69a92b3b1d63f	2024-03-15 08:59:13 +08:00
hiyouga	2cf95d4efe	fix export Former-commit-id: 3b4a59bfb1866a270b9934a4a2303197ffdab531	2024-03-14 18:17:01 +08:00
hiyouga	8b8671817f	improve lora+ impl. Former-commit-id: 72367307dfadf936fb989ebe8bc9f0ff229fb933	2024-03-13 23:32:51 +08:00
hiyouga	8673abbe5e	fix #2802 Former-commit-id: b9f87cdc11b3fe712574b91455dc722b69c60c66	2024-03-13 12:33:45 +08:00
hiyouga	a74426df0f	fix kv cache Former-commit-id: 96ce76cd2753bc91c781ad13aa8f7a972abe815a	2024-03-13 01:21:50 +08:00
hiyouga	0b7e870b07	fix #2802 Former-commit-id: 8d8956bad542c0e1c0f7edbf4ffc22bb0f8788ae	2024-03-12 17:08:34 +08:00
hiyouga	276def1897	fix #2732 Former-commit-id: 18ffce36b5ee0809f2e2905c2fd44843a3725ea0	2024-03-09 22:37:16 +08:00
hiyouga	868444e124	allow non-packing pretraining Former-commit-id: bdb496644ce2c18806fc4fdae1fedcb3e5b5f808	2024-03-09 22:21:46 +08:00
hiyouga	c561b268ef	fix #2756 , patch #2746 Former-commit-id: e8dd38b7fdf8e172745d2538eb103895f2839c38	2024-03-09 02:01:26 +08:00
hoshi-hiyouga	36d65289d0	Merge pull request #2746 from stephen-nju/main fix deepspeed ppo RuntimeError Former-commit-id: 516d0ddc666c179616a2a610b1353728db57391e	2024-03-09 01:37:00 +08:00
hiyouga	398c261c7c	fix aqlm version Former-commit-id: 10be2f0eccc3963a985afcd24e5b8b8fc638b1c3	2024-03-09 00:09:09 +08:00
stephen_zhu	c69b9fbe58	update Former-commit-id: aa71571b773c5dc527b17219ec87828e4455b330	2024-03-08 12:47:44 +08:00
stephen	495b858606	fix ppo runtime error Former-commit-id: cdb7f82869b07d9d5d31b7b2aaf6b033bd00e32e	2024-03-08 11:48:26 +08:00
hiyouga	5b50458acf	fix galore Former-commit-id: 33a4c24a8a3c153bc62edf74b9246699a0ae3233	2024-03-08 00:44:51 +08:00
hiyouga	34533b2f35	support vllm Former-commit-id: d07ad5cc1cdbc13879afd84f653afdfee03a6933	2024-03-07 20:26:31 +08:00
hiyouga	37e40563f1	fix #2735 Former-commit-id: f74f804a715dfb16bf24a056bc95db6b102f9ed7	2024-03-07 16:15:53 +08:00
hiyouga	8b6c178249	export use balanced gpu Former-commit-id: 3e84f430b14a94e68f5815d8e412f0d74d28a04c	2024-03-06 16:33:14 +08:00
hiyouga	e887aface7	fix version checking Former-commit-id: 3016e6565708637c1d760f2cd5a67cbd8a5a6c26	2024-03-06 14:51:51 +08:00
hiyouga	9561809ce9	improve aqlm optim Former-commit-id: 259af60d28985b919911587716c24a3ac7f7de64	2024-03-05 20:49:50 +08:00
hiyouga	c776cdfc3e	optimize aqlm training Former-commit-id: d3d3dac7070eb9055bcdc91eaf53f5b3741c0bda	2024-03-05 18:35:41 +08:00
hiyouga	0f2250b831	fix dora inference Former-commit-id: ddf352f861e04e813cb8adeb4513964b4945081a	2024-03-05 11:51:41 +08:00
hiyouga	a62d17d009	fix export on cpu device Former-commit-id: cda2ff87272797a062c7addb1bf840ac46208dfd	2024-03-04 17:35:09 +08:00

1 2 3 4

163 Commits