Commit Graph

1602 Commits

Author SHA1 Message Date
hiyouga
84c3d509fa fix #2936
Former-commit-id: 140ad4ad56
2024-03-24 00:43:21 +08:00
hiyouga
75829c8699 fix #2928
Former-commit-id: 7afbc85dae
2024-03-24 00:34:54 +08:00
hiyouga
58aa576ae5 fix #2941
Former-commit-id: a1c8c98c5f
2024-03-24 00:28:44 +08:00
hiyouga
7999836fb6 support fsdp + qlora
Former-commit-id: 8408225162
2024-03-21 00:36:06 +08:00
hiyouga
8717e98200 fix #2777 #2895
Former-commit-id: 9bec3c98a2
2024-03-20 17:59:45 +08:00
hiyouga
cf149bf43c fix #2346
Former-commit-id: 7b8f502901
2024-03-20 17:56:33 +08:00
hiyouga
3d483e0914 fix packages
Former-commit-id: 8e04794b2d
2024-03-17 22:32:03 +08:00
hiyouga
a5537f3ee8 fix patcher
Former-commit-id: 85c376fc1e
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
30765baa91 Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support

Former-commit-id: 113cc04719
2024-03-15 19:16:02 +08:00
hiyouga
06860e8f0f fix export
Former-commit-id: 6bc2c23b6d
2024-03-15 15:06:30 +08:00
S3Studio
46ef7416e6 Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.


Former-commit-id: e75407febd
2024-03-15 08:59:13 +08:00
hiyouga
7ef49586be tiny fix
Former-commit-id: 6ebde4f23e
2024-03-14 21:19:06 +08:00
hiyouga
2cf95d4efe fix export
Former-commit-id: 3b4a59bfb1
2024-03-14 18:17:01 +08:00
hiyouga
edd28dbe2c fix bug
Former-commit-id: 8172530d54
2024-03-13 23:55:31 +08:00
hiyouga
9ff7c99eb1 fix bug
Former-commit-id: 714d936dfb
2024-03-13 23:43:42 +08:00
hiyouga
8b8671817f improve lora+ impl.
Former-commit-id: 72367307df
2024-03-13 23:32:51 +08:00
齐保元
24c9277488 [FEATURE]: ADD LORA+ ALGORITHM
Former-commit-id: a0965cd62c
2024-03-13 19:43:27 +08:00
hiyouga
922bd8864b fix #2817
Former-commit-id: 0b4a5bf509
2024-03-13 12:42:03 +08:00
hiyouga
8673abbe5e fix #2802
Former-commit-id: b9f87cdc11
2024-03-13 12:33:45 +08:00
hiyouga
a74426df0f fix kv cache
Former-commit-id: 96ce76cd27
2024-03-13 01:21:50 +08:00
hiyouga
bbf272f96e support QDoRA
Former-commit-id: 19ef482649
2024-03-12 22:12:42 +08:00
hiyouga
096c31bfb6 patch for gemma cpt
Former-commit-id: 70a3052dd8
2024-03-12 21:21:54 +08:00
hiyouga
c28818c39f fix plot issues
Former-commit-id: 60cc17f3a8
2024-03-12 18:41:35 +08:00
hiyouga
14ed926a2d support olmo
Former-commit-id: b3247d6a16
2024-03-12 18:30:38 +08:00
hiyouga
0b7e870b07 fix #2802
Former-commit-id: 8d8956bad5
2024-03-12 17:08:34 +08:00
hiyouga
7124b71676 fix #2782 #2798
Former-commit-id: 07f9b754a7
2024-03-12 15:53:29 +08:00
hiyouga
c88062347e fix #2775
Former-commit-id: e874c00906
2024-03-11 00:42:54 +08:00
hiyouga
f776e738f8 tiny fix
Former-commit-id: 352693e2dc
2024-03-11 00:17:18 +08:00
hiyouga
566bfad930 update parser
Former-commit-id: be99799413
2024-03-10 13:35:20 +08:00
hiyouga
4a4e4b4354 support layerwise galore
Former-commit-id: 8664262cde
2024-03-10 00:24:11 +08:00
hiyouga
276def1897 fix #2732
Former-commit-id: 18ffce36b5
2024-03-09 22:37:16 +08:00
hiyouga
868444e124 allow non-packing pretraining
Former-commit-id: bdb496644c
2024-03-09 22:21:46 +08:00
hiyouga
1173441661 fix #2766
Former-commit-id: 412c52e325
2024-03-09 21:35:24 +08:00
hiyouga
8f6eb1383d use default arg for freeze tuning
Former-commit-id: af0e370fb1
2024-03-09 06:08:48 +08:00
hiyouga
5c00783697 update hardware requirements
Former-commit-id: 393c2de27c
2024-03-09 03:58:18 +08:00
hiyouga
c561b268ef fix #2756 , patch #2746
Former-commit-id: e8dd38b7fd
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
36d65289d0 Merge pull request #2746 from stephen-nju/main
fix deepspeed ppo RuntimeError

Former-commit-id: 516d0ddc66
2024-03-09 01:37:00 +08:00
hiyouga
398c261c7c fix aqlm version
Former-commit-id: 10be2f0ecc
2024-03-09 00:09:09 +08:00
stephen_zhu
c69b9fbe58 update
Former-commit-id: aa71571b77
2024-03-08 12:47:44 +08:00
stephen
495b858606 fix ppo runtime error
Former-commit-id: cdb7f82869
2024-03-08 11:48:26 +08:00
hiyouga
7443ac3116 fix chat engine, update webui
Former-commit-id: 5d956e2a51
2024-03-08 03:01:53 +08:00
hiyouga
2235020cc9 update galore args
Former-commit-id: 0ac6b40a47
2024-03-08 01:17:32 +08:00
hiyouga
5b50458acf fix galore
Former-commit-id: 33a4c24a8a
2024-03-08 00:44:51 +08:00
hiyouga
f373290012 add Yi-9B model
Former-commit-id: 57452a4aa1
2024-03-07 23:11:57 +08:00
hiyouga
2c010c72b8 support galore
Former-commit-id: 28f7862188
2024-03-07 22:41:36 +08:00
hiyouga
34533b2f35 support vllm
Former-commit-id: d07ad5cc1c
2024-03-07 20:26:31 +08:00
hiyouga
37e40563f1 fix #2735
Former-commit-id: f74f804a71
2024-03-07 16:15:53 +08:00
hoshi-hiyouga
90e66c8d94 Merge pull request #2730 from cx2333-gt/main
fix flash_attn in train_web

Former-commit-id: 2185855bdb
2024-03-07 14:37:18 +08:00
cx2333
013c12a135 revert choice name
Former-commit-id: 94b7a1b915
2024-03-07 14:28:55 +08:00
hiyouga
843d3f7a97 fix chatglm3 template
Former-commit-id: 921ee82267
2024-03-07 14:26:16 +08:00