hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00
hiyouga
b9f87cdc11
fix #2802
2024-03-13 12:33:45 +08:00
hiyouga
96ce76cd27
fix kv cache
2024-03-13 01:21:50 +08:00
hiyouga
8d8956bad5
fix #2802
2024-03-12 17:08:34 +08:00
hiyouga
18ffce36b5
fix #2732
2024-03-09 22:37:16 +08:00
hiyouga
bdb496644c
allow non-packing pretraining
2024-03-09 22:21:46 +08:00
hiyouga
e8dd38b7fd
fix #2756 , patch #2746
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
516d0ddc66
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00
hiyouga
10be2f0ecc
fix aqlm version
2024-03-09 00:09:09 +08:00
stephen_zhu
aa71571b77
update
2024-03-08 12:47:44 +08:00
stephen
cdb7f82869
fix ppo runtime error
2024-03-08 11:48:26 +08:00
hiyouga
33a4c24a8a
fix galore
2024-03-08 00:44:51 +08:00
hiyouga
d07ad5cc1c
support vllm
2024-03-07 20:26:31 +08:00
hiyouga
f74f804a71
fix #2735
2024-03-07 16:15:53 +08:00
hiyouga
3e84f430b1
export use balanced gpu
2024-03-06 16:33:14 +08:00
hiyouga
3016e65657
fix version checking
2024-03-06 14:51:51 +08:00
hiyouga
259af60d28
improve aqlm optim
2024-03-05 20:49:50 +08:00
hiyouga
d3d3dac707
optimize aqlm training
2024-03-05 18:35:41 +08:00
hiyouga
ddf352f861
fix dora inference
2024-03-05 11:51:41 +08:00
hiyouga
cda2ff8727
fix export on cpu device
2024-03-04 17:35:09 +08:00
hiyouga
4e5fae2fac
fix #2649
2024-03-01 13:02:41 +08:00
hiyouga
c0be617195
fix #2642
2024-02-29 18:32:54 +08:00
hiyouga
4a871e80e2
tiny fix
2024-02-29 17:28:50 +08:00
hiyouga
fa5ab21ebc
release v0.5.3
2024-02-29 00:34:19 +08:00
hiyouga
cfefacaa37
support DoRA, AWQ, AQLM #2512
2024-02-28 19:53:28 +08:00
hiyouga
9aeb404a94
support lora for llama pro
2024-02-21 02:17:22 +08:00
hiyouga
22acab8aff
fix #2481
2024-02-15 19:07:47 +08:00
hiyouga
7924ffc55d
support llama pro #2338 , add rslora
2024-02-15 02:27:36 +08:00
younesbelkada
0ca0f08162
add v1 hf tags
2024-02-13 05:58:49 +00:00
hiyouga
91d09a01ac
add option to disable version check
2024-02-10 22:31:23 +08:00
hiyouga
0ae9a16b9d
update gc kwargs
2024-02-07 00:38:24 +08:00
hiyouga
ebf31b62eb
fix #2438
2024-02-06 15:23:08 +08:00
hiyouga
19d33ede13
fix #2420
2024-02-04 15:51:47 +08:00
hiyouga
38e63bfd28
bump up transformers version
2024-02-04 00:01:16 +08:00
hiyouga
6545c02790
add hint for freeze #2412
2024-02-03 23:38:56 +08:00
hiyouga
4ecadc3512
fix #2376
2024-02-03 23:14:31 +08:00
hiyouga
521ad76552
fix autoset attn impl, update data readme
2024-01-31 11:58:07 +08:00
hiyouga
2bc30763e9
fix #2320
2024-01-24 16:19:18 +08:00
ldwang
c284665425
Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3.
...
Signed-off-by: ldwang <ftgreat@gmail.com >
2024-01-24 15:25:31 +08:00
ldwang
18923b1402
Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3.
...
Signed-off-by: ldwang <ftgreat@gmail.com >
2024-01-24 14:43:16 +08:00
hiyouga
e4ba1deedf
add hint
2024-01-22 23:32:01 +08:00
hoshi-hiyouga
bdc9eff635
Update patcher.py
2024-01-22 23:27:39 +08:00
A-Cepheus
b06a31e76a
🐞 fix: typo
2024-01-22 16:04:39 +08:00
A-Cepheus
319a72b48d
🐞 fix: typo, move MoE fix to patcher
2024-01-22 16:01:58 +08:00
A-Cepheus
e1d5c98519
fix: ZeRO3 does not work with MoE models
2024-01-22 15:21:14 +08:00
hiyouga
e0a717aa3a
fix #2268
2024-01-21 14:11:38 +08:00
hiyouga
638234ceee
format style
2024-01-20 20:15:56 +08:00