khazic
13bf8b1f91
Updated README with new information
...
Former-commit-id: 0531dac30d
2024-03-20 14:21:16 +08:00
刘一博
5b8725399e
Updated README with new information
...
Former-commit-id: df9b4fb90a
2024-03-20 14:11:28 +08:00
hiyouga
3d483e0914
fix packages
...
Former-commit-id: 8e04794b2d
2024-03-17 22:32:03 +08:00
hiyouga
a5537f3ee8
fix patcher
...
Former-commit-id: 85c376fc1e
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
30765baa91
Merge pull request #2849 from S3Studio/DockerizeSupport
...
Improve Dockerize support
Former-commit-id: 113cc04719
2024-03-15 19:16:02 +08:00
hiyouga
06860e8f0f
fix export
...
Former-commit-id: 6bc2c23b6d
2024-03-15 15:06:30 +08:00
S3Studio
46ef7416e6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: e75407febd
2024-03-15 08:59:13 +08:00
S3Studio
dcbc8168a8
improve Docker build and runtime parameters
...
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
Former-commit-id: 6a5693d11d
2024-03-15 08:57:46 +08:00
hiyouga
7ef49586be
tiny fix
...
Former-commit-id: 6ebde4f23e
2024-03-14 21:19:06 +08:00
hiyouga
2cf95d4efe
fix export
...
Former-commit-id: 3b4a59bfb1
2024-03-14 18:17:01 +08:00
hiyouga
edd28dbe2c
fix bug
...
Former-commit-id: 8172530d54
2024-03-13 23:55:31 +08:00
hiyouga
9ff7c99eb1
fix bug
...
Former-commit-id: 714d936dfb
2024-03-13 23:43:42 +08:00
hiyouga
8b8671817f
improve lora+ impl.
...
Former-commit-id: 72367307df
2024-03-13 23:32:51 +08:00
hoshi-hiyouga
4000de93ea
Merge pull request #2830 from qibaoyuan/lora_plus
...
[FEATURE]: ADD LORA+ ALGORITHM
Former-commit-id: 4e5e99af43
2024-03-13 20:15:46 +08:00
齐保元
24c9277488
[FEATURE]: ADD LORA+ ALGORITHM
...
Former-commit-id: a0965cd62c
2024-03-13 19:43:27 +08:00
hiyouga
634c44c51a
Update wechat.jpg
...
Former-commit-id: dfd451b722
2024-03-13 19:03:00 +08:00
hiyouga
922bd8864b
fix #2817
...
Former-commit-id: 0b4a5bf509
2024-03-13 12:42:03 +08:00
hiyouga
8673abbe5e
fix #2802
...
Former-commit-id: b9f87cdc11
2024-03-13 12:33:45 +08:00
hiyouga
a74426df0f
fix kv cache
...
Former-commit-id: 96ce76cd27
2024-03-13 01:21:50 +08:00
hiyouga
bbf272f96e
support QDoRA
...
Former-commit-id: 19ef482649
2024-03-12 22:12:42 +08:00
hiyouga
096c31bfb6
patch for gemma cpt
...
Former-commit-id: 70a3052dd8
2024-03-12 21:21:54 +08:00
hiyouga
c28818c39f
fix plot issues
...
Former-commit-id: 60cc17f3a8
2024-03-12 18:41:35 +08:00
hiyouga
14ed926a2d
support olmo
...
Former-commit-id: b3247d6a16
2024-03-12 18:30:38 +08:00
hiyouga
0b7e870b07
fix #2802
...
Former-commit-id: 8d8956bad5
2024-03-12 17:08:34 +08:00
hiyouga
b983de9f4f
fix #2803
...
Former-commit-id: 06c97083e1
2024-03-12 16:57:39 +08:00
hiyouga
7124b71676
fix #2782 #2798
...
Former-commit-id: 07f9b754a7
2024-03-12 15:53:29 +08:00
hoshi-hiyouga
52f14211e3
Merge pull request #2743 from S3Studio/DockerizeSupport
...
Add dockerize support
Former-commit-id: c901aa63ff
2024-03-12 00:05:49 +08:00
hiyouga
c88062347e
fix #2775
...
Former-commit-id: e874c00906
2024-03-11 00:42:54 +08:00
hiyouga
f776e738f8
tiny fix
...
Former-commit-id: 352693e2dc
2024-03-11 00:17:18 +08:00
hiyouga
566bfad930
update parser
...
Former-commit-id: be99799413
2024-03-10 13:35:20 +08:00
hiyouga
4a4e4b4354
support layerwise galore
...
Former-commit-id: 8664262cde
2024-03-10 00:24:11 +08:00
hiyouga
276def1897
fix #2732
...
Former-commit-id: 18ffce36b5
2024-03-09 22:37:16 +08:00
hiyouga
868444e124
allow non-packing pretraining
...
Former-commit-id: bdb496644c
2024-03-09 22:21:46 +08:00
hiyouga
1173441661
fix #2766
...
Former-commit-id: 412c52e325
2024-03-09 21:35:24 +08:00
hiyouga
8f6eb1383d
use default arg for freeze tuning
...
Former-commit-id: af0e370fb1
2024-03-09 06:08:48 +08:00
hiyouga
17e50bcbb1
add GaLore results
...
Former-commit-id: 818726e9bc
2024-03-09 04:11:55 +08:00
hiyouga
5c00783697
update hardware requirements
...
Former-commit-id: 393c2de27c
2024-03-09 03:58:18 +08:00
hiyouga
eb363b04b9
update examples
...
Former-commit-id: 4c00bcdcae
2024-03-09 02:30:37 +08:00
hiyouga
c561b268ef
fix #2756 , patch #2746
...
Former-commit-id: e8dd38b7fd
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
36d65289d0
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 516d0ddc66
2024-03-09 01:37:00 +08:00
hiyouga
247aab9066
Update setup.py
...
Former-commit-id: 74ff8664d7
2024-03-09 00:14:48 +08:00
hiyouga
398c261c7c
fix aqlm version
...
Former-commit-id: 10be2f0ecc
2024-03-09 00:09:09 +08:00
hiyouga
ccec17f773
fix example params
...
Former-commit-id: 8a45213440
2024-03-08 20:41:43 +08:00
stephen_zhu
c69b9fbe58
update
...
Former-commit-id: aa71571b77
2024-03-08 12:47:44 +08:00
stephen
495b858606
fix ppo runtime error
...
Former-commit-id: cdb7f82869
2024-03-08 11:48:26 +08:00
S3Studio
de41334055
Add dockerize support
...
Already tested with the model of Qwen:1.8B and the dataset of alpaca_data_zh. Some python libraries are added to the Dockerfile as a result of the exception messages displayed throughout test procedure.
Former-commit-id: 3d911ae713
2024-03-08 10:47:28 +08:00
hiyouga
b268215a0e
update readme
...
Former-commit-id: 4a2cc60b94
2024-03-08 03:06:21 +08:00
hiyouga
7443ac3116
fix chat engine, update webui
...
Former-commit-id: 5d956e2a51
2024-03-08 03:01:53 +08:00
hiyouga
0a0959facf
Update setup.py
...
Former-commit-id: 5cd4947650
2024-03-08 01:23:00 +08:00
hiyouga
2235020cc9
update galore args
...
Former-commit-id: 0ac6b40a47
2024-03-08 01:17:32 +08:00