LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2025-08-06 21:52:50 +08:00

Author	SHA1	Message	Date
hiyouga	1f99c367b3	remove redundant code Former-commit-id: 667ce08b27df9452faee87348419f5f1f0c0cb2f	2024-04-24 05:02:18 +08:00
hiyouga	c0afc4074f	support unsloth generate Former-commit-id: b1deb0a0b920645884e58f8206b1842c144c1c52	2024-04-24 04:46:53 +08:00
hiyouga	8465e54d38	refactor patcher Former-commit-id: aa2b79eb23c60825e6601b0b8cc6b59e3f566b2d	2024-04-24 03:02:23 +08:00
hiyouga	80c8586534	reenable sdpa and fast tok by default Former-commit-id: 07737a3d2d026c973ab964f948953d6ce0e1f2a9	2024-04-24 02:18:44 +08:00
hiyouga	34ecad4af8	fix #3347 #3387 Former-commit-id: 707f0b1d5d42b8e2c5b783c7783f65dfa9890a68	2024-04-24 01:30:16 +08:00
hiyouga	79666c298d	fix #3365 Former-commit-id: a1d31ffc8cb7a6a477704efe779d485d83b8b9fb	2024-04-21 19:20:18 +08:00
hiyouga	ec81d45d27	fix mod stuff Former-commit-id: f58425ab45727f7859583d4b9fda776715e27ff6	2024-04-21 18:11:10 +08:00
hoshi-hiyouga	7c63a9b5fd	Merge pull request #3338 from astramind-ai/main Adding Mixture of Depth Former-commit-id: d0273787be481fb2cbed993580f8239e63d74f7f	2024-04-21 18:05:52 +08:00
hoshi-hiyouga	e9b1aff447	fix #3348 Former-commit-id: 1fa287fd637aad0c5e8893046515a54bbff4c009	2024-04-20 10:34:09 +08:00
Marco	639297a5ef	Added Mixture of Depths Former-commit-id: 620add7b9f634de1a711f7b87b16050adf735e9b	2024-04-18 20:31:24 +02:00
hiyouga	9aa62ffb57	fix #3324 Former-commit-id: 942362d0087345e468e0ae541dcca9b684d74d1a	2024-04-18 15:34:45 +08:00
hiyouga	0170ef83a6	fix #3316 Former-commit-id: c9a477322df82fecdb268ed385e3e0c376c0baeb	2024-04-17 22:54:34 +08:00
hoshi-hiyouga	496396b3bc	Merge pull request #3287 from Ledzy/badam [Feature] Add BAdam algorithm Former-commit-id: 4d660c5ade384df4444fa0543a39edce6220903d	2024-04-16 17:32:16 +08:00
hoshi-hiyouga	b92f690190	Update utils.py Former-commit-id: 38a56706e0f52297501d351d38b51bee73e881dc	2024-04-16 17:29:30 +08:00
hoshi-hiyouga	48fb0be1b9	Update patcher.py Former-commit-id: a950f3b81de701f5f23ce3efa60ff0382bb40dfe	2024-04-16 17:29:19 +08:00
hoshi-hiyouga	ce56ff22af	Update adapter.py Former-commit-id: 750cdf2e74097c8775d03ddf55646cc14d4a686f	2024-04-16 17:28:12 +08:00
Jonery	b3260c7456	resolve gradient checkpointing issue. Former-commit-id: 7ecb61822b37f5d71060d696495830ff98edaa06	2024-04-16 12:05:27 +08:00
hiyouga	b40f266617	support unsloth 2024.4 Former-commit-id: 7dc72fb58cb988418323f63821a21a184ecf0718	2024-04-16 00:25:03 +08:00
hiyouga	bd2b758b48	add codegemma Former-commit-id: 6543f3d4496218f7f90c582cb6aa8c852d716cbf	2024-04-16 00:11:15 +08:00
hiyouga	2dc3343b1c	support cohere commandR #3184 Former-commit-id: e0dbac28450a0e1e0b84e1577ef785fc762c0b46	2024-04-15 23:26:42 +08:00
Jonery	025f329445	Feature BAdam Former-commit-id: 06c8908d3fe48907ddb585c5fa15677fc5416f94	2024-04-15 23:15:27 +08:00
hiyouga	fb385b8c26	update examples Former-commit-id: cce52351b54f70904f33902d9c17411134f9f6eb	2024-04-15 22:14:34 +08:00
hoshi-hiyouga	1bdf7e4b9d	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral Former-commit-id: 0e0942d388bdb0122001a7f8e081315059d5d327	2024-04-15 15:38:16 +08:00
hiyouga	ceccad3419	fix #3273 Former-commit-id: efc345c4b0095ec959ea23bbe54c344278780cbe	2024-04-15 15:32:58 +08:00
liuzc	11f4afc5ad	fix: mixtral output_router_logits Former-commit-id: 9f4fe623866b10b30c6418dee116b36671274f9f	2024-04-15 12:11:49 +08:00
hiyouga	431e9804ee	release v0.6.2 Former-commit-id: 9d4c949461d232a959c14859ae7fef191faab711	2024-04-11 20:08:51 +08:00
hoshi-hiyouga	77d16ada1e	Update adapter.py Former-commit-id: 98bc97d8d218182c026e9f57bbcbf40ab1e0bc87	2024-04-10 00:57:51 +08:00
hoshi-hiyouga	e5b4cb62e0	Update adapter.py Former-commit-id: 2111b586b648caa150a8e41877c7fede75911da8	2024-04-10 00:57:30 +08:00
Erich Schubert	3dccd3c67e	Pass additional_target to unsloth Fixes #3200 Former-commit-id: b5eefe5c4c084b63a12b023cae877fcd1914d4fc	2024-04-09 17:53:40 +02:00
hiyouga	0e08c209c4	fix quant infer and qwen2moe Former-commit-id: 7f6c2486b83e1d2c96a2314bfa8e1519ca5f574e	2024-04-09 17:12:59 +08:00
hiyouga	2ecf2bcbf0	fix resize vocab at inference #3022 Former-commit-id: 148bda353f0b53af022c51da9a9e59a56f341510	2024-04-03 18:14:24 +08:00
hiyouga	bf5ffeeae0	simplify readme Former-commit-id: 92dab8a90bdd82a72a06559943467b56dde12c71	2024-04-02 20:07:43 +08:00
hiyouga	f4be51f356	add moe aux loss control #3085 Former-commit-id: b267aeb53fc49d2eeb0f3fc5ebe55e643f5db377	2024-04-02 14:26:31 +08:00
hiyouga	c7104f8fab	fix #3022 Former-commit-id: 9ddbe2866a4a4433d7635659a5635d16c59800b1	2024-04-02 13:58:39 +08:00
hiyouga	829cf6458a	fix #3083 Former-commit-id: 4a6ca621c09d179561acc5957c8c911a4e44184c	2024-04-01 22:53:52 +08:00
hiyouga	34f1de0574	fix #3077 Former-commit-id: aee634cd20e6dfdfbe2fbb47ae57f62b2da2bf9a	2024-04-01 21:35:18 +08:00
hiyouga	b7468ea0a8	support infer 4bit model on GPUs #3023 Former-commit-id: eb259cc5738dfb383e4cc5d32579501c580e11b1	2024-04-01 17:34:04 +08:00
hiyouga	3cf35e57db	tiny fix Former-commit-id: 27776c34741ca0c58ed793bcdf1acd5e4a81fb39	2024-03-31 00:10:29 +08:00
marko1616	5721074af1	fix blank line contains whitespace Former-commit-id: d9a5134617d494ef13ba73f9c540123e89a8c29c	2024-03-30 23:46:55 +08:00
marko1616	67c05c2031	Fix Llama model save for full param train Former-commit-id: eb178eaff390a1dc342cc35ab8c7820d654f3717	2024-03-30 23:45:04 +08:00
hiyouga	89c400633a	update trainers Former-commit-id: 8c77b1091296e204dc3c8c1f157c288ca5b236bd	2024-03-28 18:16:27 +08:00
hiyouga	ec94e5e876	fix #2961 Former-commit-id: 511f6754026fbbf48bd481018015338a6a3ad92f	2024-03-26 17:26:14 +08:00
hiyouga	75829c8699	fix #2928 Former-commit-id: 7afbc85daee295cf38dcee9ded5afd87b2c4cfd1	2024-03-24 00:34:54 +08:00
hiyouga	58aa576ae5	fix #2941 Former-commit-id: a1c8c98c5fecfc0dd0ed1be33ee8dd2ade05b708	2024-03-24 00:28:44 +08:00
hiyouga	7999836fb6	support fsdp + qlora Former-commit-id: 84082251621e1470b3b5406a56d0a967780a1804	2024-03-21 00:36:06 +08:00
hiyouga	cf149bf43c	fix #2346 Former-commit-id: 7b8f5029018f0481f7da83cc5ee4408d95c9beb2	2024-03-20 17:56:33 +08:00
hiyouga	a5537f3ee8	fix patcher Former-commit-id: 85c376fc1e0bcc854ed6e70e6455a0b00b341655	2024-03-15 19:18:42 +08:00
S3Studio	46ef7416e6	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False. Former-commit-id: e75407febdec086f2bdca723a7f69a92b3b1d63f	2024-03-15 08:59:13 +08:00
hiyouga	2cf95d4efe	fix export Former-commit-id: 3b4a59bfb1866a270b9934a4a2303197ffdab531	2024-03-14 18:17:01 +08:00
hiyouga	8b8671817f	improve lora+ impl. Former-commit-id: 72367307dfadf936fb989ebe8bc9f0ff229fb933	2024-03-13 23:32:51 +08:00

1 2 3 4

182 Commits