S3Studio
46ef7416e6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: e75407febd
2024-03-15 08:59:13 +08:00
hiyouga
2cf95d4efe
fix export
...
Former-commit-id: 3b4a59bfb1
2024-03-14 18:17:01 +08:00
hiyouga
8b8671817f
improve lora+ impl.
...
Former-commit-id: 72367307df
2024-03-13 23:32:51 +08:00
hiyouga
8673abbe5e
fix #2802
...
Former-commit-id: b9f87cdc11
2024-03-13 12:33:45 +08:00
hiyouga
a74426df0f
fix kv cache
...
Former-commit-id: 96ce76cd27
2024-03-13 01:21:50 +08:00
hiyouga
0b7e870b07
fix #2802
...
Former-commit-id: 8d8956bad5
2024-03-12 17:08:34 +08:00
hiyouga
276def1897
fix #2732
...
Former-commit-id: 18ffce36b5
2024-03-09 22:37:16 +08:00
hiyouga
868444e124
allow non-packing pretraining
...
Former-commit-id: bdb496644c
2024-03-09 22:21:46 +08:00
hiyouga
c561b268ef
fix #2756 , patch #2746
...
Former-commit-id: e8dd38b7fd
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
36d65289d0
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 516d0ddc66
2024-03-09 01:37:00 +08:00
hiyouga
398c261c7c
fix aqlm version
...
Former-commit-id: 10be2f0ecc
2024-03-09 00:09:09 +08:00
stephen_zhu
c69b9fbe58
update
...
Former-commit-id: aa71571b77
2024-03-08 12:47:44 +08:00
stephen
495b858606
fix ppo runtime error
...
Former-commit-id: cdb7f82869
2024-03-08 11:48:26 +08:00
hiyouga
5b50458acf
fix galore
...
Former-commit-id: 33a4c24a8a
2024-03-08 00:44:51 +08:00
hiyouga
34533b2f35
support vllm
...
Former-commit-id: d07ad5cc1c
2024-03-07 20:26:31 +08:00
hiyouga
37e40563f1
fix #2735
...
Former-commit-id: f74f804a71
2024-03-07 16:15:53 +08:00
hiyouga
8b6c178249
export use balanced gpu
...
Former-commit-id: 3e84f430b1
2024-03-06 16:33:14 +08:00
hiyouga
e887aface7
fix version checking
...
Former-commit-id: 3016e65657
2024-03-06 14:51:51 +08:00
hiyouga
9561809ce9
improve aqlm optim
...
Former-commit-id: 259af60d28
2024-03-05 20:49:50 +08:00
hiyouga
c776cdfc3e
optimize aqlm training
...
Former-commit-id: d3d3dac707
2024-03-05 18:35:41 +08:00
hiyouga
0f2250b831
fix dora inference
...
Former-commit-id: ddf352f861
2024-03-05 11:51:41 +08:00
hiyouga
a62d17d009
fix export on cpu device
...
Former-commit-id: cda2ff8727
2024-03-04 17:35:09 +08:00
hiyouga
d1e6e02461
fix #2649
...
Former-commit-id: 4e5fae2fac
2024-03-01 13:02:41 +08:00
hiyouga
3787d13816
fix #2642
...
Former-commit-id: c0be617195
2024-02-29 18:32:54 +08:00
hiyouga
1853b5c172
tiny fix
...
Former-commit-id: 4a871e80e2
2024-02-29 17:28:50 +08:00
hiyouga
8e7d50dae4
release v0.5.3
...
Former-commit-id: fa5ab21ebc
2024-02-29 00:34:19 +08:00
hiyouga
5abbca70d3
support DoRA, AWQ, AQLM #2512
...
Former-commit-id: cfefacaa37
2024-02-28 19:53:28 +08:00
hiyouga
0fcb931f18
support lora for llama pro
...
Former-commit-id: 9aeb404a94
2024-02-21 02:17:22 +08:00
hiyouga
62b78001b7
fix #2481
...
Former-commit-id: 22acab8aff
2024-02-15 19:07:47 +08:00
hiyouga
96265ec154
support llama pro #2338 , add rslora
...
Former-commit-id: 7924ffc55d
2024-02-15 02:27:36 +08:00
younesbelkada
6b98435a53
add v1 hf tags
...
Former-commit-id: 0ca0f08162
2024-02-13 05:58:49 +00:00
hiyouga
75adbfec79
add option to disable version check
...
Former-commit-id: 91d09a01ac
2024-02-10 22:31:23 +08:00
hiyouga
bbe5ff0570
update gc kwargs
...
Former-commit-id: 0ae9a16b9d
2024-02-07 00:38:24 +08:00
hiyouga
caeffc780d
fix #2438
...
Former-commit-id: ebf31b62eb
2024-02-06 15:23:08 +08:00
hiyouga
f6b2bcfa16
fix #2420
...
Former-commit-id: 19d33ede13
2024-02-04 15:51:47 +08:00
hiyouga
b1064d2f9b
bump up transformers version
...
Former-commit-id: 38e63bfd28
2024-02-04 00:01:16 +08:00
hiyouga
0fc8612b97
add hint for freeze #2412
...
Former-commit-id: 6545c02790
2024-02-03 23:38:56 +08:00
hiyouga
a9e58740f5
fix #2376
...
Former-commit-id: 4ecadc3512
2024-02-03 23:14:31 +08:00
hiyouga
7beeae2209
fix autoset attn impl, update data readme
...
Former-commit-id: 521ad76552
2024-01-31 11:58:07 +08:00
hiyouga
b8a827faeb
fix #2320
...
Former-commit-id: 2bc30763e9
2024-01-24 16:19:18 +08:00
ldwang
323ec3f89f
Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3.
...
Signed-off-by: ldwang <ftgreat@gmail.com >
Former-commit-id: c284665425
2024-01-24 15:25:31 +08:00
ldwang
db500a2bb6
Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3.
...
Signed-off-by: ldwang <ftgreat@gmail.com >
Former-commit-id: 18923b1402
2024-01-24 14:43:16 +08:00
hiyouga
60a042cc16
add hint
...
Former-commit-id: e4ba1deedf
2024-01-22 23:32:01 +08:00
hoshi-hiyouga
68977c8ca4
Update patcher.py
...
Former-commit-id: bdc9eff635
2024-01-22 23:27:39 +08:00
A-Cepheus
f00ad6b4f8
🐞 fix: typo
...
Former-commit-id: b06a31e76a
2024-01-22 16:04:39 +08:00
A-Cepheus
39d9aba166
🐞 fix: typo, move MoE fix to patcher
...
Former-commit-id: 319a72b48d
2024-01-22 16:01:58 +08:00
A-Cepheus
8985c43033
fix: ZeRO3 does not work with MoE models
...
Former-commit-id: e1d5c98519
2024-01-22 15:21:14 +08:00
hiyouga
fb2d563be5
fix #2268
...
Former-commit-id: e0a717aa3a
2024-01-21 14:11:38 +08:00
hiyouga
b27e91222c
format style
...
Former-commit-id: 638234ceee
2024-01-20 20:15:56 +08:00
hiyouga
69e8925249
support longlora for main branch
...
Former-commit-id: 38af076a75
2024-01-20 19:25:22 +08:00