Commit Graph

798 Commits

Author SHA1 Message Date
hiyouga
9bec3c98a2 fix #2777 #2895 2024-03-20 17:59:45 +08:00
hiyouga
7b8f502901 fix #2346 2024-03-20 17:56:33 +08:00
hiyouga
8e04794b2d fix packages 2024-03-17 22:32:03 +08:00
hiyouga
85c376fc1e fix patcher 2024-03-15 19:18:42 +08:00
hoshi-hiyouga
113cc04719 Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga
6bc2c23b6d fix export 2024-03-15 15:06:30 +08:00
S3Studio
e75407febd Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
6ebde4f23e tiny fix 2024-03-14 21:19:06 +08:00
hiyouga
3b4a59bfb1 fix export 2024-03-14 18:17:01 +08:00
hiyouga
8172530d54 fix bug 2024-03-13 23:55:31 +08:00
hiyouga
714d936dfb fix bug 2024-03-13 23:43:42 +08:00
hiyouga
72367307df improve lora+ impl. 2024-03-13 23:32:51 +08:00
齐保元
a0965cd62c [FEATURE]: ADD LORA+ ALGORITHM 2024-03-13 19:43:27 +08:00
hiyouga
0b4a5bf509 fix #2817 2024-03-13 12:42:03 +08:00
hiyouga
b9f87cdc11 fix #2802 2024-03-13 12:33:45 +08:00
hiyouga
96ce76cd27 fix kv cache 2024-03-13 01:21:50 +08:00
hiyouga
19ef482649 support QDoRA 2024-03-12 22:12:42 +08:00
hiyouga
70a3052dd8 patch for gemma cpt 2024-03-12 21:21:54 +08:00
hiyouga
60cc17f3a8 fix plot issues 2024-03-12 18:41:35 +08:00
hiyouga
b3247d6a16 support olmo 2024-03-12 18:30:38 +08:00
hiyouga
8d8956bad5 fix #2802 2024-03-12 17:08:34 +08:00
hiyouga
07f9b754a7 fix #2782 #2798 2024-03-12 15:53:29 +08:00
hiyouga
e874c00906 fix #2775 2024-03-11 00:42:54 +08:00
hiyouga
352693e2dc tiny fix 2024-03-11 00:17:18 +08:00
hiyouga
be99799413 update parser 2024-03-10 13:35:20 +08:00
hiyouga
8664262cde support layerwise galore 2024-03-10 00:24:11 +08:00
hiyouga
18ffce36b5 fix #2732 2024-03-09 22:37:16 +08:00
hiyouga
bdb496644c allow non-packing pretraining 2024-03-09 22:21:46 +08:00
hiyouga
412c52e325 fix #2766 2024-03-09 21:35:24 +08:00
hiyouga
af0e370fb1 use default arg for freeze tuning 2024-03-09 06:08:48 +08:00
hiyouga
393c2de27c update hardware requirements 2024-03-09 03:58:18 +08:00
hiyouga
e8dd38b7fd fix #2756 , patch #2746 2024-03-09 02:01:26 +08:00
hoshi-hiyouga
516d0ddc66 Merge pull request #2746 from stephen-nju/main
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00
hiyouga
10be2f0ecc fix aqlm version 2024-03-09 00:09:09 +08:00
stephen_zhu
aa71571b77 update 2024-03-08 12:47:44 +08:00
stephen
cdb7f82869 fix ppo runtime error 2024-03-08 11:48:26 +08:00
hiyouga
5d956e2a51 fix chat engine, update webui 2024-03-08 03:01:53 +08:00
hiyouga
0ac6b40a47 update galore args 2024-03-08 01:17:32 +08:00
hiyouga
33a4c24a8a fix galore 2024-03-08 00:44:51 +08:00
hiyouga
57452a4aa1 add Yi-9B model 2024-03-07 23:11:57 +08:00
hiyouga
28f7862188 support galore 2024-03-07 22:41:36 +08:00
hiyouga
d07ad5cc1c support vllm 2024-03-07 20:26:31 +08:00
hiyouga
f74f804a71 fix #2735 2024-03-07 16:15:53 +08:00
hoshi-hiyouga
2185855bdb Merge pull request #2730 from cx2333-gt/main
fix flash_attn in train_web
2024-03-07 14:37:18 +08:00
cx2333
94b7a1b915 revert choice name 2024-03-07 14:28:55 +08:00
hiyouga
921ee82267 fix chatglm3 template 2024-03-07 14:26:16 +08:00
cx2333
a8889498fa fix flash_attn in train_web 2024-03-07 10:13:55 +08:00
hiyouga
0048a2021e tiny fix 2024-03-06 17:25:08 +08:00
hiyouga
3e84f430b1 export use balanced gpu 2024-03-06 16:33:14 +08:00
hiyouga
9658c63cd9 fix add tokens 2024-03-06 15:04:02 +08:00