LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2025-12-17 04:10:36 +08:00

Author	SHA1	Message	Date
BUAADreamer	cfb485eddf	add llava and instructblip	2024-04-25 00:22:43 +08:00
BUAADreamer	4dcb11eab7	add multimodal LLM BLIP-2 and InstructBLIP	2024-04-23 18:45:43 +08:00
hiyouga	a1d31ffc8c	fix #3365	2024-04-21 19:20:18 +08:00
hiyouga	f58425ab45	fix mod stuff	2024-04-21 18:11:10 +08:00
hoshi-hiyouga	d0273787be	Merge pull request #3338 from astramind-ai/main Adding Mixture of Depth	2024-04-21 18:05:52 +08:00
hoshi-hiyouga	1fa287fd63	fix #3348	2024-04-20 10:34:09 +08:00
Marco	620add7b9f	Added Mixture of Depths	2024-04-18 20:31:24 +02:00
hiyouga	942362d008	fix #3324	2024-04-18 15:34:45 +08:00
hiyouga	c9a477322d	fix #3316	2024-04-17 22:54:34 +08:00
hoshi-hiyouga	4d660c5ade	Merge pull request #3287 from Ledzy/badam [Feature] Add BAdam algorithm	2024-04-16 17:32:16 +08:00
hoshi-hiyouga	38a56706e0	Update utils.py	2024-04-16 17:29:30 +08:00
hoshi-hiyouga	a950f3b81d	Update patcher.py	2024-04-16 17:29:19 +08:00
hoshi-hiyouga	750cdf2e74	Update adapter.py	2024-04-16 17:28:12 +08:00
Jonery	7ecb61822b	resolve gradient checkpointing issue.	2024-04-16 12:05:27 +08:00
hiyouga	7dc72fb58c	support unsloth 2024.4	2024-04-16 00:25:03 +08:00
hiyouga	6543f3d449	add codegemma	2024-04-16 00:11:15 +08:00
hiyouga	e0dbac2845	support cohere commandR #3184	2024-04-15 23:26:42 +08:00
Jonery	06c8908d3f	Feature BAdam	2024-04-15 23:15:27 +08:00
hiyouga	cce52351b5	update examples	2024-04-15 22:14:34 +08:00
hoshi-hiyouga	0e0942d388	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral	2024-04-15 15:38:16 +08:00
hiyouga	efc345c4b0	fix #3273	2024-04-15 15:32:58 +08:00
liuzc	9f4fe62386	fix: mixtral output_router_logits	2024-04-15 12:11:49 +08:00
hiyouga	9d4c949461	release v0.6.2	2024-04-11 20:08:51 +08:00
hoshi-hiyouga	98bc97d8d2	Update adapter.py	2024-04-10 00:57:51 +08:00
hoshi-hiyouga	2111b586b6	Update adapter.py	2024-04-10 00:57:30 +08:00
Erich Schubert	b5eefe5c4c	Pass additional_target to unsloth Fixes #3200	2024-04-09 17:53:40 +02:00
hiyouga	7f6c2486b8	fix quant infer and qwen2moe	2024-04-09 17:12:59 +08:00
hiyouga	148bda353f	fix resize vocab at inference #3022	2024-04-03 18:14:24 +08:00
hiyouga	92dab8a90b	simplify readme	2024-04-02 20:07:43 +08:00
hiyouga	b267aeb53f	add moe aux loss control #3085	2024-04-02 14:26:31 +08:00
hiyouga	9ddbe2866a	fix #3022	2024-04-02 13:58:39 +08:00
hiyouga	4a6ca621c0	fix #3083	2024-04-01 22:53:52 +08:00
hiyouga	aee634cd20	fix #3077	2024-04-01 21:35:18 +08:00
hiyouga	eb259cc573	support infer 4bit model on GPUs #3023	2024-04-01 17:34:04 +08:00
hiyouga	27776c3474	tiny fix	2024-03-31 00:10:29 +08:00
marko1616	d9a5134617	fix blank line contains whitespace	2024-03-30 23:46:55 +08:00
marko1616	eb178eaff3	Fix Llama model save for full param train	2024-03-30 23:45:04 +08:00
hiyouga	8c77b10912	update trainers	2024-03-28 18:16:27 +08:00
hiyouga	511f675402	fix #2961	2024-03-26 17:26:14 +08:00
hiyouga	7afbc85dae	fix #2928	2024-03-24 00:34:54 +08:00
hiyouga	a1c8c98c5f	fix #2941	2024-03-24 00:28:44 +08:00
hiyouga	8408225162	support fsdp + qlora	2024-03-21 00:36:06 +08:00
hiyouga	7b8f502901	fix #2346	2024-03-20 17:56:33 +08:00
hiyouga	85c376fc1e	fix patcher	2024-03-15 19:18:42 +08:00
S3Studio	e75407febd	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.	2024-03-15 08:59:13 +08:00
hiyouga	3b4a59bfb1	fix export	2024-03-14 18:17:01 +08:00
hiyouga	72367307df	improve lora+ impl.	2024-03-13 23:32:51 +08:00
hiyouga	b9f87cdc11	fix #2802	2024-03-13 12:33:45 +08:00
hiyouga	96ce76cd27	fix kv cache	2024-03-13 01:21:50 +08:00
hiyouga	8d8956bad5	fix #2802	2024-03-12 17:08:34 +08:00

1 2 3 4

179 Commits