Commit Graph

1048 Commits

Author SHA1 Message Date
marko1616
645c27e5e2 fix Llama lora merge crash
Former-commit-id: 51349ea1cc
2024-03-24 02:55:23 +08:00
marko1616
c083708433 fix Llama lora merge crash
Former-commit-id: c1e2c4ea45
2024-03-24 02:44:35 +08:00
hiyouga
84c3d509fa fix #2936
Former-commit-id: 140ad4ad56
2024-03-24 00:43:21 +08:00
hiyouga
75829c8699 fix #2928
Former-commit-id: 7afbc85dae
2024-03-24 00:34:54 +08:00
hiyouga
58aa576ae5 fix #2941
Former-commit-id: a1c8c98c5f
2024-03-24 00:28:44 +08:00
hiyouga
c765b4c1ac Update wechat.jpg
Former-commit-id: 564d57aa23
2024-03-22 14:00:37 +08:00
hoshi-hiyouga
4e067329a3 Merge pull request #2919 from 0xez/main
Update README.md, fix the release date of the paper

Former-commit-id: ce261fdd64
2024-03-22 12:12:24 +08:00
0xez
028a8bc532 Update README_zh.md, fix the release date of the paper
Former-commit-id: be0360303d
2024-03-22 10:41:17 +08:00
0xez
3f50d572ed Update README.md, fix the release date of the paper
Former-commit-id: 675ba41562
2024-03-21 22:14:48 +08:00
hiyouga
cfcea16416 move file
Former-commit-id: 96702620c4
2024-03-21 17:05:17 +08:00
hiyouga
63c83f3802 add citation
Former-commit-id: 5eaa50fa01
2024-03-21 17:04:10 +08:00
hiyouga
0684e315be paper release
Former-commit-id: 0581bfdbc7
2024-03-21 13:49:17 +08:00
hiyouga
ada7e20eb4 update readme
Former-commit-id: bfe7a91289
2024-03-21 00:48:42 +08:00
hiyouga
7999836fb6 support fsdp + qlora
Former-commit-id: 8408225162
2024-03-21 00:36:06 +08:00
hiyouga
6646e18c02 add orca_dpo_pairs dataset
Former-commit-id: 3271af2afc
2024-03-20 20:09:06 +08:00
hoshi-hiyouga
e8cf2794cd Merge pull request #2905 from SirlyDreamer/main
Follow HF_ENDPOINT environment variable

Former-commit-id: b2dfbd728f
2024-03-20 18:09:54 +08:00
hiyouga
8717e98200 fix #2777 #2895
Former-commit-id: 9bec3c98a2
2024-03-20 17:59:45 +08:00
hiyouga
cf149bf43c fix #2346
Former-commit-id: 7b8f502901
2024-03-20 17:56:33 +08:00
SirlyDreamer
78359638e3 Follow HF_ENDPOINT environment variable
Former-commit-id: e165965341
2024-03-20 08:31:30 +00:00
hoshi-hiyouga
a9d85cf3c6 Merge pull request #2903 from khazic/main
Updated README with new information

Former-commit-id: a773035709
2024-03-20 16:13:44 +08:00
khazic
c7824c42ff Updated README with new information
Former-commit-id: 8d10fa71c2
2024-03-20 14:38:08 +08:00
khazic
13bf8b1f91 Updated README with new information
Former-commit-id: 0531dac30d
2024-03-20 14:21:16 +08:00
刘一博
5b8725399e Updated README with new information
Former-commit-id: df9b4fb90a
2024-03-20 14:11:28 +08:00
hiyouga
7fbdbc2419 Update wechat.jpg
Former-commit-id: bea31b9b12
2024-03-18 16:48:32 +08:00
hiyouga
3d483e0914 fix packages
Former-commit-id: 8e04794b2d
2024-03-17 22:32:03 +08:00
hiyouga
a5537f3ee8 fix patcher
Former-commit-id: 85c376fc1e
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
30765baa91 Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support

Former-commit-id: 113cc04719
2024-03-15 19:16:02 +08:00
hiyouga
06860e8f0f fix export
Former-commit-id: 6bc2c23b6d
2024-03-15 15:06:30 +08:00
S3Studio
46ef7416e6 Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.


Former-commit-id: e75407febd
2024-03-15 08:59:13 +08:00
S3Studio
dcbc8168a8 improve Docker build and runtime parameters
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.


Former-commit-id: 6a5693d11d
2024-03-15 08:57:46 +08:00
hiyouga
7ef49586be tiny fix
Former-commit-id: 6ebde4f23e
2024-03-14 21:19:06 +08:00
hiyouga
2cf95d4efe fix export
Former-commit-id: 3b4a59bfb1
2024-03-14 18:17:01 +08:00
hiyouga
edd28dbe2c fix bug
Former-commit-id: 8172530d54
2024-03-13 23:55:31 +08:00
hiyouga
9ff7c99eb1 fix bug
Former-commit-id: 714d936dfb
2024-03-13 23:43:42 +08:00
hiyouga
8b8671817f improve lora+ impl.
Former-commit-id: 72367307df
2024-03-13 23:32:51 +08:00
hoshi-hiyouga
4000de93ea Merge pull request #2830 from qibaoyuan/lora_plus
[FEATURE]: ADD LORA+ ALGORITHM

Former-commit-id: 4e5e99af43
2024-03-13 20:15:46 +08:00
齐保元
24c9277488 [FEATURE]: ADD LORA+ ALGORITHM
Former-commit-id: a0965cd62c
2024-03-13 19:43:27 +08:00
hiyouga
634c44c51a Update wechat.jpg
Former-commit-id: dfd451b722
2024-03-13 19:03:00 +08:00
hiyouga
922bd8864b fix #2817
Former-commit-id: 0b4a5bf509
2024-03-13 12:42:03 +08:00
hiyouga
8673abbe5e fix #2802
Former-commit-id: b9f87cdc11
2024-03-13 12:33:45 +08:00
hiyouga
a74426df0f fix kv cache
Former-commit-id: 96ce76cd27
2024-03-13 01:21:50 +08:00
hiyouga
bbf272f96e support QDoRA
Former-commit-id: 19ef482649
2024-03-12 22:12:42 +08:00
hiyouga
096c31bfb6 patch for gemma cpt
Former-commit-id: 70a3052dd8
2024-03-12 21:21:54 +08:00
hiyouga
c28818c39f fix plot issues
Former-commit-id: 60cc17f3a8
2024-03-12 18:41:35 +08:00
hiyouga
14ed926a2d support olmo
Former-commit-id: b3247d6a16
2024-03-12 18:30:38 +08:00
hiyouga
0b7e870b07 fix #2802
Former-commit-id: 8d8956bad5
2024-03-12 17:08:34 +08:00
hiyouga
b983de9f4f fix #2803
Former-commit-id: 06c97083e1
2024-03-12 16:57:39 +08:00
hiyouga
7124b71676 fix #2782 #2798
Former-commit-id: 07f9b754a7
2024-03-12 15:53:29 +08:00
hoshi-hiyouga
52f14211e3 Merge pull request #2743 from S3Studio/DockerizeSupport
Add dockerize support

Former-commit-id: c901aa63ff
2024-03-12 00:05:49 +08:00
hiyouga
c88062347e fix #2775
Former-commit-id: e874c00906
2024-03-11 00:42:54 +08:00