hiyouga
92dab8a90b
simplify readme
2024-04-02 20:07:43 +08:00
hiyouga
b267aeb53f
add moe aux loss control #3085
2024-04-02 14:26:31 +08:00
hiyouga
9ddbe2866a
fix #3022
2024-04-02 13:58:39 +08:00
hiyouga
4a6ca621c0
fix #3083
2024-04-01 22:53:52 +08:00
hiyouga
aee634cd20
fix #3077
2024-04-01 21:35:18 +08:00
hiyouga
eb259cc573
support infer 4bit model on GPUs #3023
2024-04-01 17:34:04 +08:00
hiyouga
27776c3474
tiny fix
2024-03-31 00:10:29 +08:00
marko1616
d9a5134617
fix blank line contains whitespace
2024-03-30 23:46:55 +08:00
marko1616
eb178eaff3
Fix Llama model save for full param train
2024-03-30 23:45:04 +08:00
hiyouga
511f675402
fix #2961
2024-03-26 17:26:14 +08:00
hiyouga
a1c8c98c5f
fix #2941
2024-03-24 00:28:44 +08:00
hiyouga
8408225162
support fsdp + qlora
2024-03-21 00:36:06 +08:00
hiyouga
7b8f502901
fix #2346
2024-03-20 17:56:33 +08:00
hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00
hiyouga
96ce76cd27
fix kv cache
2024-03-13 01:21:50 +08:00
hiyouga
8d8956bad5
fix #2802
2024-03-12 17:08:34 +08:00
hiyouga
bdb496644c
allow non-packing pretraining
2024-03-09 22:21:46 +08:00
hiyouga
e8dd38b7fd
fix #2756 , patch #2746
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
516d0ddc66
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00
hiyouga
10be2f0ecc
fix aqlm version
2024-03-09 00:09:09 +08:00
stephen_zhu
aa71571b77
update
2024-03-08 12:47:44 +08:00
stephen
cdb7f82869
fix ppo runtime error
2024-03-08 11:48:26 +08:00
hiyouga
f74f804a71
fix #2735
2024-03-07 16:15:53 +08:00
hiyouga
3e84f430b1
export use balanced gpu
2024-03-06 16:33:14 +08:00
hiyouga
d3d3dac707
optimize aqlm training
2024-03-05 18:35:41 +08:00
hiyouga
ddf352f861
fix dora inference
2024-03-05 11:51:41 +08:00
hiyouga
cda2ff8727
fix export on cpu device
2024-03-04 17:35:09 +08:00
hiyouga
4e5fae2fac
fix #2649
2024-03-01 13:02:41 +08:00
hiyouga
c0be617195
fix #2642
2024-02-29 18:32:54 +08:00
hiyouga
4a871e80e2
tiny fix
2024-02-29 17:28:50 +08:00
hiyouga
fa5ab21ebc
release v0.5.3
2024-02-29 00:34:19 +08:00
hiyouga
cfefacaa37
support DoRA, AWQ, AQLM #2512
2024-02-28 19:53:28 +08:00
hiyouga
7924ffc55d
support llama pro #2338 , add rslora
2024-02-15 02:27:36 +08:00
hiyouga
0ae9a16b9d
update gc kwargs
2024-02-07 00:38:24 +08:00
hiyouga
ebf31b62eb
fix #2438
2024-02-06 15:23:08 +08:00
hiyouga
4ecadc3512
fix #2376
2024-02-03 23:14:31 +08:00
hiyouga
521ad76552
fix autoset attn impl, update data readme
2024-01-31 11:58:07 +08:00
hiyouga
2bc30763e9
fix #2320
2024-01-24 16:19:18 +08:00
ldwang
c284665425
Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3.
...
Signed-off-by: ldwang <ftgreat@gmail.com >
2024-01-24 15:25:31 +08:00
ldwang
18923b1402
Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3.
...
Signed-off-by: ldwang <ftgreat@gmail.com >
2024-01-24 14:43:16 +08:00
hiyouga
e4ba1deedf
add hint
2024-01-22 23:32:01 +08:00
hoshi-hiyouga
bdc9eff635
Update patcher.py
2024-01-22 23:27:39 +08:00
A-Cepheus
b06a31e76a
🐞 fix: typo
2024-01-22 16:04:39 +08:00
A-Cepheus
319a72b48d
🐞 fix: typo, move MoE fix to patcher
2024-01-22 16:01:58 +08:00
hiyouga
e0a717aa3a
fix #2268
2024-01-21 14:11:38 +08:00
hiyouga
638234ceee
format style
2024-01-20 20:15:56 +08:00
hiyouga
38af076a75
support longlora for main branch
2024-01-20 19:25:22 +08:00