1024 Commits

Author SHA1 Message Date
hiyouga
3d483e0914 fix packages
Former-commit-id: 8e04794b2da067a4123b9d7091a54c5647f44244
2024-03-17 22:32:03 +08:00
hiyouga
a5537f3ee8 fix patcher
Former-commit-id: 85c376fc1e0bcc854ed6e70e6455a0b00b341655
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
30765baa91 Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support

Former-commit-id: 113cc047198325b51dac50d8a7ea70396c51e0d9
2024-03-15 19:16:02 +08:00
hiyouga
06860e8f0f fix export
Former-commit-id: 6bc2c23b6d26b52f54ac37fa6149e6eb3cc18ee6
2024-03-15 15:06:30 +08:00
S3Studio
46ef7416e6 Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.


Former-commit-id: e75407febdec086f2bdca723a7f69a92b3b1d63f
2024-03-15 08:59:13 +08:00
S3Studio
dcbc8168a8 improve Docker build and runtime parameters
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.


Former-commit-id: 6a5693d11d065f6e75c8cdd8b5ed962eb520953c
2024-03-15 08:57:46 +08:00
hiyouga
7ef49586be tiny fix
Former-commit-id: 6ebde4f23e761b8a3e3ea6ca6dff249e657608a1
2024-03-14 21:19:06 +08:00
hiyouga
2cf95d4efe fix export
Former-commit-id: 3b4a59bfb1866a270b9934a4a2303197ffdab531
2024-03-14 18:17:01 +08:00
hiyouga
edd28dbe2c fix bug
Former-commit-id: 8172530d54fbd42a9dd3219f06378563d62424e0
2024-03-13 23:55:31 +08:00
hiyouga
9ff7c99eb1 fix bug
Former-commit-id: 714d936dfbe022c4f2cfa6ff643e3482a3f96012
2024-03-13 23:43:42 +08:00
hiyouga
8b8671817f improve lora+ impl.
Former-commit-id: 72367307dfadf936fb989ebe8bc9f0ff229fb933
2024-03-13 23:32:51 +08:00
hoshi-hiyouga
4000de93ea Merge pull request #2830 from qibaoyuan/lora_plus
[FEATURE]: ADD LORA+ ALGORITHM

Former-commit-id: 4e5e99af4320db661a4eebaabc3284f73815ae4e
2024-03-13 20:15:46 +08:00
齐保元
24c9277488 [FEATURE]: ADD LORA+ ALGORITHM
Former-commit-id: a0965cd62c85545aa2364e244295df2963308354
2024-03-13 19:43:27 +08:00
hiyouga
634c44c51a Update wechat.jpg
Former-commit-id: dfd451b722f988a18040b5d7d95945642aad1238
2024-03-13 19:03:00 +08:00
hiyouga
922bd8864b fix #2817
Former-commit-id: 0b4a5bf509a6fbf18337a29a6a498f33d0cbca76
2024-03-13 12:42:03 +08:00
hiyouga
8673abbe5e fix #2802
Former-commit-id: b9f87cdc11b3fe712574b91455dc722b69c60c66
2024-03-13 12:33:45 +08:00
hiyouga
a74426df0f fix kv cache
Former-commit-id: 96ce76cd2753bc91c781ad13aa8f7a972abe815a
2024-03-13 01:21:50 +08:00
hiyouga
bbf272f96e support QDoRA
Former-commit-id: 19ef4826490b79e0c2aee20ad67430aa0e4724a7
2024-03-12 22:12:42 +08:00
hiyouga
096c31bfb6 patch for gemma cpt
Former-commit-id: 70a3052dd8a2d1322fa01ab19e369e465842d416
2024-03-12 21:21:54 +08:00
hiyouga
c28818c39f fix plot issues
Former-commit-id: 60cc17f3a8b56c0b2ad76be7c10ca0b4e1738121
2024-03-12 18:41:35 +08:00
hiyouga
14ed926a2d support olmo
Former-commit-id: b3247d6a1604f4cbeb0d7c163d0082ce91afb870
2024-03-12 18:30:38 +08:00
hiyouga
0b7e870b07 fix #2802
Former-commit-id: 8d8956bad542c0e1c0f7edbf4ffc22bb0f8788ae
2024-03-12 17:08:34 +08:00
hiyouga
b983de9f4f fix #2803
Former-commit-id: 06c97083e150d461631a4a9bebb03b33da760098
2024-03-12 16:57:39 +08:00
hiyouga
7124b71676 fix #2782 #2798
Former-commit-id: 07f9b754a7418b489e839bd674aa47094583a92d
2024-03-12 15:53:29 +08:00
hoshi-hiyouga
52f14211e3 Merge pull request #2743 from S3Studio/DockerizeSupport
Add dockerize support

Former-commit-id: c901aa63ff4fb6daea7f7da467782e8bf6224d4d
2024-03-12 00:05:49 +08:00
hiyouga
c88062347e fix #2775
Former-commit-id: e874c00906c765b81c0e5ff9c7b3679557da8e0e
2024-03-11 00:42:54 +08:00
hiyouga
f776e738f8 tiny fix
Former-commit-id: 352693e2dcc8fc039b5d574e1a5709563929b0ce
2024-03-11 00:17:18 +08:00
hiyouga
566bfad930 update parser
Former-commit-id: be99799413e1ba37807a02838bf2d87fd966bf55
2024-03-10 13:35:20 +08:00
hiyouga
4a4e4b4354 support layerwise galore
Former-commit-id: 8664262cde3919e10eaecbd66e8c5d356856362e
2024-03-10 00:24:11 +08:00
hiyouga
276def1897 fix #2732
Former-commit-id: 18ffce36b5ee0809f2e2905c2fd44843a3725ea0
2024-03-09 22:37:16 +08:00
hiyouga
868444e124 allow non-packing pretraining
Former-commit-id: bdb496644ce2c18806fc4fdae1fedcb3e5b5f808
2024-03-09 22:21:46 +08:00
hiyouga
1173441661 fix #2766
Former-commit-id: 412c52e325660e8b871ffd59f5564f84f46a143f
2024-03-09 21:35:24 +08:00
hiyouga
8f6eb1383d use default arg for freeze tuning
Former-commit-id: af0e370fb16f3e0cf2f4c8036301d5253d8249b9
2024-03-09 06:08:48 +08:00
hiyouga
17e50bcbb1 add GaLore results
Former-commit-id: 818726e9bcdedfbd330ea7a60e02ee5b03aed459
2024-03-09 04:11:55 +08:00
hiyouga
5c00783697 update hardware requirements
Former-commit-id: 393c2de27ce0a2dee793092843ec0afa54f49a6d
2024-03-09 03:58:18 +08:00
hiyouga
eb363b04b9 update examples
Former-commit-id: 4c00bcdcaeb675c9fdb3e977c27c3604d7895ae2
2024-03-09 02:30:37 +08:00
hiyouga
c561b268ef fix #2756 , patch #2746
Former-commit-id: e8dd38b7fdf8e172745d2538eb103895f2839c38
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
36d65289d0 Merge pull request #2746 from stephen-nju/main
fix deepspeed ppo RuntimeError

Former-commit-id: 516d0ddc666c179616a2a610b1353728db57391e
2024-03-09 01:37:00 +08:00
hiyouga
247aab9066 Update setup.py
Former-commit-id: 74ff8664d783f428227fb62e6a1313a73cbd337d
2024-03-09 00:14:48 +08:00
hiyouga
398c261c7c fix aqlm version
Former-commit-id: 10be2f0eccc3963a985afcd24e5b8b8fc638b1c3
2024-03-09 00:09:09 +08:00
hiyouga
ccec17f773 fix example params
Former-commit-id: 8a45213440ffc960947dd69ecf3b092aa724bef3
2024-03-08 20:41:43 +08:00
stephen_zhu
c69b9fbe58 update
Former-commit-id: aa71571b773c5dc527b17219ec87828e4455b330
2024-03-08 12:47:44 +08:00
stephen
495b858606 fix ppo runtime error
Former-commit-id: cdb7f82869b07d9d5d31b7b2aaf6b033bd00e32e
2024-03-08 11:48:26 +08:00
S3Studio
de41334055 Add dockerize support
Already tested with the model of Qwen:1.8B and the dataset of alpaca_data_zh. Some python libraries are added to the Dockerfile as a result of the exception messages displayed throughout test procedure.


Former-commit-id: 3d911ae713b901d6680a9f9ac82569cc5878f820
2024-03-08 10:47:28 +08:00
hiyouga
b268215a0e update readme
Former-commit-id: 4a2cc60b9440d245141e9317c35a0ac4c687dbdb
2024-03-08 03:06:21 +08:00
hiyouga
7443ac3116 fix chat engine, update webui
Former-commit-id: 5d956e2a5167201aecdfce2794c25d8a2d84e234
2024-03-08 03:01:53 +08:00
hiyouga
0a0959facf Update setup.py
Former-commit-id: 5cd4947650403490419e5bddf2b1ac7e137edf8b
2024-03-08 01:23:00 +08:00
hiyouga
2235020cc9 update galore args
Former-commit-id: 0ac6b40a4772b61a3476bb74b976d24c408a2c35
2024-03-08 01:17:32 +08:00
hiyouga
5b50458acf fix galore
Former-commit-id: 33a4c24a8a3c153bc62edf74b9246699a0ae3233
2024-03-08 00:44:51 +08:00
hiyouga
f373290012 add Yi-9B model
Former-commit-id: 57452a4aa1d37a047d659f002c1aaa6246f64178
2024-03-07 23:11:57 +08:00