hiyouga
7dc72fb58c
support unsloth 2024.4
2024-04-16 00:25:03 +08:00
hiyouga
6543f3d449
add codegemma
2024-04-16 00:11:15 +08:00
hiyouga
e0dbac2845
support cohere commandR #3184
2024-04-15 23:26:42 +08:00
Jonery
06c8908d3f
Feature BAdam
2024-04-15 23:15:27 +08:00
hiyouga
cce52351b5
update examples
2024-04-15 22:14:34 +08:00
hoshi-hiyouga
0e0942d388
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
2024-04-15 15:38:16 +08:00
hiyouga
efc345c4b0
fix #3273
2024-04-15 15:32:58 +08:00
liuzc
9f4fe62386
fix: mixtral output_router_logits
2024-04-15 12:11:49 +08:00
hiyouga
9d4c949461
release v0.6.2
2024-04-11 20:08:51 +08:00
hoshi-hiyouga
98bc97d8d2
Update adapter.py
2024-04-10 00:57:51 +08:00
hoshi-hiyouga
2111b586b6
Update adapter.py
2024-04-10 00:57:30 +08:00
Erich Schubert
b5eefe5c4c
Pass additional_target to unsloth
...
Fixes #3200
2024-04-09 17:53:40 +02:00
hiyouga
7f6c2486b8
fix quant infer and qwen2moe
2024-04-09 17:12:59 +08:00
hiyouga
148bda353f
fix resize vocab at inference #3022
2024-04-03 18:14:24 +08:00
hiyouga
92dab8a90b
simplify readme
2024-04-02 20:07:43 +08:00
hiyouga
b267aeb53f
add moe aux loss control #3085
2024-04-02 14:26:31 +08:00
hiyouga
9ddbe2866a
fix #3022
2024-04-02 13:58:39 +08:00
hiyouga
4a6ca621c0
fix #3083
2024-04-01 22:53:52 +08:00
hiyouga
aee634cd20
fix #3077
2024-04-01 21:35:18 +08:00
hiyouga
eb259cc573
support infer 4bit model on GPUs #3023
2024-04-01 17:34:04 +08:00
hiyouga
27776c3474
tiny fix
2024-03-31 00:10:29 +08:00
marko1616
d9a5134617
fix blank line contains whitespace
2024-03-30 23:46:55 +08:00
marko1616
eb178eaff3
Fix Llama model save for full param train
2024-03-30 23:45:04 +08:00
hiyouga
8c77b10912
update trainers
2024-03-28 18:16:27 +08:00
hiyouga
511f675402
fix #2961
2024-03-26 17:26:14 +08:00
hiyouga
7afbc85dae
fix #2928
2024-03-24 00:34:54 +08:00
hiyouga
a1c8c98c5f
fix #2941
2024-03-24 00:28:44 +08:00
hiyouga
8408225162
support fsdp + qlora
2024-03-21 00:36:06 +08:00
hiyouga
7b8f502901
fix #2346
2024-03-20 17:56:33 +08:00
hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00
hiyouga
b9f87cdc11
fix #2802
2024-03-13 12:33:45 +08:00
hiyouga
96ce76cd27
fix kv cache
2024-03-13 01:21:50 +08:00
hiyouga
8d8956bad5
fix #2802
2024-03-12 17:08:34 +08:00
hiyouga
18ffce36b5
fix #2732
2024-03-09 22:37:16 +08:00
hiyouga
bdb496644c
allow non-packing pretraining
2024-03-09 22:21:46 +08:00
hiyouga
e8dd38b7fd
fix #2756 , patch #2746
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
516d0ddc66
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00
hiyouga
10be2f0ecc
fix aqlm version
2024-03-09 00:09:09 +08:00
stephen_zhu
aa71571b77
update
2024-03-08 12:47:44 +08:00
stephen
cdb7f82869
fix ppo runtime error
2024-03-08 11:48:26 +08:00
hiyouga
33a4c24a8a
fix galore
2024-03-08 00:44:51 +08:00
hiyouga
d07ad5cc1c
support vllm
2024-03-07 20:26:31 +08:00
hiyouga
f74f804a71
fix #2735
2024-03-07 16:15:53 +08:00
hiyouga
3e84f430b1
export use balanced gpu
2024-03-06 16:33:14 +08:00
hiyouga
3016e65657
fix version checking
2024-03-06 14:51:51 +08:00
hiyouga
259af60d28
improve aqlm optim
2024-03-05 20:49:50 +08:00
hiyouga
d3d3dac707
optimize aqlm training
2024-03-05 18:35:41 +08:00