hiyouga
c765b4c1ac
Update wechat.jpg
...
Former-commit-id: 564d57aa233e3c9f9c1a64bccf95553a7e47acd3
2024-03-22 14:00:37 +08:00
hoshi-hiyouga
4e067329a3
Merge pull request #2919 from 0xez/main
...
Update README.md, fix the release date of the paper
Former-commit-id: ce261fdd64ae29bb00310eba010a5a5a7384f7d6
2024-03-22 12:12:24 +08:00
0xez
028a8bc532
Update README_zh.md, fix the release date of the paper
...
Former-commit-id: be0360303d2e7275e14586dc503a9581f80ce303
2024-03-22 10:41:17 +08:00
0xez
3f50d572ed
Update README.md, fix the release date of the paper
...
Former-commit-id: 675ba41562d812f169c6b2775e57a3f38fc8deee
2024-03-21 22:14:48 +08:00
hiyouga
cfcea16416
move file
...
Former-commit-id: 96702620c4620cb98a362ac8ea6d3dd82e5e07e3
2024-03-21 17:05:17 +08:00
hiyouga
63c83f3802
add citation
...
Former-commit-id: 5eaa50fa01a7172408840255d18bcc0ab43a01fb
2024-03-21 17:04:10 +08:00
hiyouga
0684e315be
paper release
...
Former-commit-id: 0581bfdbc7d6c764e63f8d54271da7663ca354d9
2024-03-21 13:49:17 +08:00
hiyouga
ada7e20eb4
update readme
...
Former-commit-id: bfe7a9128952bacef93d5478938d3e088bd0480d
2024-03-21 00:48:42 +08:00
hiyouga
7999836fb6
support fsdp + qlora
...
Former-commit-id: 84082251621e1470b3b5406a56d0a967780a1804
2024-03-21 00:36:06 +08:00
hiyouga
6646e18c02
add orca_dpo_pairs dataset
...
Former-commit-id: 3271af2afc90f10dcb101aeb9d7e4ef254d2dc0e
2024-03-20 20:09:06 +08:00
hoshi-hiyouga
e8cf2794cd
Merge pull request #2905 from SirlyDreamer/main
...
Follow HF_ENDPOINT environment variable
Former-commit-id: b2dfbd728fec976235c68ff977e874ea4ac81bbb
2024-03-20 18:09:54 +08:00
hiyouga
8717e98200
fix #2777 #2895
...
Former-commit-id: 9bec3c98a22c91b1c28fda757db51eb780291641
2024-03-20 17:59:45 +08:00
hiyouga
cf149bf43c
fix #2346
...
Former-commit-id: 7b8f5029018f0481f7da83cc5ee4408d95c9beb2
2024-03-20 17:56:33 +08:00
SirlyDreamer
78359638e3
Follow HF_ENDPOINT environment variable
...
Former-commit-id: e165965341a150f6faa2c072a9281ad99d7e5ce8
2024-03-20 08:31:30 +00:00
hoshi-hiyouga
a9d85cf3c6
Merge pull request #2903 from khazic/main
...
Updated README with new information
Former-commit-id: a77303570994c3a3a2a0c2faae7fc089cac05629
2024-03-20 16:13:44 +08:00
khazic
c7824c42ff
Updated README with new information
...
Former-commit-id: 8d10fa71c2b4fa2f79ebb08d5e916c3e3f9d7fbe
2024-03-20 14:38:08 +08:00
khazic
13bf8b1f91
Updated README with new information
...
Former-commit-id: 0531dac30d5cbee56b73e06230cd0a62928ee9ca
2024-03-20 14:21:16 +08:00
刘一博
5b8725399e
Updated README with new information
...
Former-commit-id: df9b4fb90a076c18f533da32beb7c42ae5b9ed22
2024-03-20 14:11:28 +08:00
hiyouga
7fbdbc2419
Update wechat.jpg
...
Former-commit-id: bea31b9b12fe18a692590a89d263f9bfbae29698
2024-03-18 16:48:32 +08:00
hiyouga
3d483e0914
fix packages
...
Former-commit-id: 8e04794b2da067a4123b9d7091a54c5647f44244
2024-03-17 22:32:03 +08:00
hiyouga
a5537f3ee8
fix patcher
...
Former-commit-id: 85c376fc1e0bcc854ed6e70e6455a0b00b341655
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
30765baa91
Merge pull request #2849 from S3Studio/DockerizeSupport
...
Improve Dockerize support
Former-commit-id: 113cc047198325b51dac50d8a7ea70396c51e0d9
2024-03-15 19:16:02 +08:00
hiyouga
06860e8f0f
fix export
...
Former-commit-id: 6bc2c23b6d26b52f54ac37fa6149e6eb3cc18ee6
2024-03-15 15:06:30 +08:00
S3Studio
46ef7416e6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: e75407febdec086f2bdca723a7f69a92b3b1d63f
2024-03-15 08:59:13 +08:00
S3Studio
dcbc8168a8
improve Docker build and runtime parameters
...
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
Former-commit-id: 6a5693d11d065f6e75c8cdd8b5ed962eb520953c
2024-03-15 08:57:46 +08:00
hiyouga
7ef49586be
tiny fix
...
Former-commit-id: 6ebde4f23e761b8a3e3ea6ca6dff249e657608a1
2024-03-14 21:19:06 +08:00
hiyouga
2cf95d4efe
fix export
...
Former-commit-id: 3b4a59bfb1866a270b9934a4a2303197ffdab531
2024-03-14 18:17:01 +08:00
hiyouga
edd28dbe2c
fix bug
...
Former-commit-id: 8172530d54fbd42a9dd3219f06378563d62424e0
2024-03-13 23:55:31 +08:00
hiyouga
9ff7c99eb1
fix bug
...
Former-commit-id: 714d936dfbe022c4f2cfa6ff643e3482a3f96012
2024-03-13 23:43:42 +08:00
hiyouga
8b8671817f
improve lora+ impl.
...
Former-commit-id: 72367307dfadf936fb989ebe8bc9f0ff229fb933
2024-03-13 23:32:51 +08:00
hoshi-hiyouga
4000de93ea
Merge pull request #2830 from qibaoyuan/lora_plus
...
[FEATURE]: ADD LORA+ ALGORITHM
Former-commit-id: 4e5e99af4320db661a4eebaabc3284f73815ae4e
2024-03-13 20:15:46 +08:00
齐保元
24c9277488
[FEATURE]: ADD LORA+ ALGORITHM
...
Former-commit-id: a0965cd62c85545aa2364e244295df2963308354
2024-03-13 19:43:27 +08:00
hiyouga
634c44c51a
Update wechat.jpg
...
Former-commit-id: dfd451b722f988a18040b5d7d95945642aad1238
2024-03-13 19:03:00 +08:00
hiyouga
922bd8864b
fix #2817
...
Former-commit-id: 0b4a5bf509a6fbf18337a29a6a498f33d0cbca76
2024-03-13 12:42:03 +08:00
hiyouga
8673abbe5e
fix #2802
...
Former-commit-id: b9f87cdc11b3fe712574b91455dc722b69c60c66
2024-03-13 12:33:45 +08:00
hiyouga
a74426df0f
fix kv cache
...
Former-commit-id: 96ce76cd2753bc91c781ad13aa8f7a972abe815a
2024-03-13 01:21:50 +08:00
hiyouga
bbf272f96e
support QDoRA
...
Former-commit-id: 19ef4826490b79e0c2aee20ad67430aa0e4724a7
2024-03-12 22:12:42 +08:00
hiyouga
096c31bfb6
patch for gemma cpt
...
Former-commit-id: 70a3052dd8a2d1322fa01ab19e369e465842d416
2024-03-12 21:21:54 +08:00
hiyouga
c28818c39f
fix plot issues
...
Former-commit-id: 60cc17f3a8b56c0b2ad76be7c10ca0b4e1738121
2024-03-12 18:41:35 +08:00
hiyouga
14ed926a2d
support olmo
...
Former-commit-id: b3247d6a1604f4cbeb0d7c163d0082ce91afb870
2024-03-12 18:30:38 +08:00
hiyouga
0b7e870b07
fix #2802
...
Former-commit-id: 8d8956bad542c0e1c0f7edbf4ffc22bb0f8788ae
2024-03-12 17:08:34 +08:00
hiyouga
b983de9f4f
fix #2803
...
Former-commit-id: 06c97083e150d461631a4a9bebb03b33da760098
2024-03-12 16:57:39 +08:00
hiyouga
7124b71676
fix #2782 #2798
...
Former-commit-id: 07f9b754a7418b489e839bd674aa47094583a92d
2024-03-12 15:53:29 +08:00
hoshi-hiyouga
52f14211e3
Merge pull request #2743 from S3Studio/DockerizeSupport
...
Add dockerize support
Former-commit-id: c901aa63ff4fb6daea7f7da467782e8bf6224d4d
2024-03-12 00:05:49 +08:00
hiyouga
c88062347e
fix #2775
...
Former-commit-id: e874c00906c765b81c0e5ff9c7b3679557da8e0e
2024-03-11 00:42:54 +08:00
hiyouga
f776e738f8
tiny fix
...
Former-commit-id: 352693e2dcc8fc039b5d574e1a5709563929b0ce
2024-03-11 00:17:18 +08:00
hiyouga
566bfad930
update parser
...
Former-commit-id: be99799413e1ba37807a02838bf2d87fd966bf55
2024-03-10 13:35:20 +08:00
hiyouga
4a4e4b4354
support layerwise galore
...
Former-commit-id: 8664262cde3919e10eaecbd66e8c5d356856362e
2024-03-10 00:24:11 +08:00
hiyouga
276def1897
fix #2732
...
Former-commit-id: 18ffce36b5ee0809f2e2905c2fd44843a3725ea0
2024-03-09 22:37:16 +08:00
hiyouga
868444e124
allow non-packing pretraining
...
Former-commit-id: bdb496644ce2c18806fc4fdae1fedcb3e5b5f808
2024-03-09 22:21:46 +08:00