Commit Graph

1069 Commits

Author SHA1 Message Date
zhangzc
449e2aa38e Supports custom data set sampling quantity 2024-03-27 14:22:50 +08:00
hoshi-hiyouga
3bcd41b639 fix ds optimizer 2024-03-26 23:39:56 +08:00
hiyouga
b29d5560f1 fix #2981 2024-03-26 17:53:04 +08:00
hiyouga
3164b4f11b fix bug 2024-03-26 17:30:12 +08:00
hiyouga
511f675402 fix #2961 2024-03-26 17:26:14 +08:00
hiyouga
7ea1a1f5b3 Update wechat.jpg 2024-03-26 16:24:42 +08:00
hiyouga
ba70aca8fb release v0.6.0 (real) 2024-03-25 23:37:48 +08:00
hiyouga
98a42cbdaa tiny fix 2024-03-25 23:28:52 +08:00
hiyouga
7b3d8188f5 update readme 2024-03-25 23:06:13 +08:00
hoshi-hiyouga
f633ac6646 Merge pull request #2967 from Tsumugii24/main
Update README_zh.md
2024-03-25 23:02:22 +08:00
Tsumugii24
1704599503 Update README.md 2024-03-25 22:54:38 +08:00
Tsumugii24
7aa77a3451 Update README_zh.md 2024-03-25 22:54:26 +08:00
hiyouga
1484f76a95 add arg check 2024-03-25 22:42:58 +08:00
hiyouga
6f2b563f12 release v0.6.0 2024-03-25 22:38:56 +08:00
Tsumugii24
bb4ca1691a Update README_zh.md 2024-03-25 22:31:03 +08:00
hoshi-hiyouga
f33a3dfadc Merge pull request #2963 from rkinas/patch-1
Update requirements.txt
2024-03-25 21:49:34 +08:00
Remek Kinas
b02899bf89 Update requirements.txt 2024-03-25 14:30:58 +01:00
hiyouga
558a538724 tiny fix 2024-03-25 21:18:08 +08:00
hoshi-hiyouga
49f9dbb4b1 Merge pull request #2945 from marko1616/bugfix/lora-model-merge
修复了在 transformers > 4.36.2 版本中部分模型合并 Lora 模型时因生成配置校验而导致的崩溃问题
2024-03-25 13:36:08 +08:00
marko1616
c8f0d99704 pass ruff check 2024-03-24 16:12:10 +08:00
marko1616
6f080fdba3 fix Llama lora merge crash 2024-03-24 03:06:11 +08:00
marko1616
51349ea1cc fix Llama lora merge crash 2024-03-24 02:55:23 +08:00
marko1616
c1e2c4ea45 fix Llama lora merge crash 2024-03-24 02:44:35 +08:00
hiyouga
140ad4ad56 fix #2936 2024-03-24 00:43:21 +08:00
hiyouga
7afbc85dae fix #2928 2024-03-24 00:34:54 +08:00
hiyouga
a1c8c98c5f fix #2941 2024-03-24 00:28:44 +08:00
hiyouga
564d57aa23 Update wechat.jpg 2024-03-22 14:00:37 +08:00
hoshi-hiyouga
ce261fdd64 Merge pull request #2919 from 0xez/main
Update README.md, fix the release date of the paper
2024-03-22 12:12:24 +08:00
0xez
be0360303d Update README_zh.md, fix the release date of the paper 2024-03-22 10:41:17 +08:00
0xez
675ba41562 Update README.md, fix the release date of the paper 2024-03-21 22:14:48 +08:00
hiyouga
96702620c4 move file 2024-03-21 17:05:17 +08:00
hiyouga
5eaa50fa01 add citation 2024-03-21 17:04:10 +08:00
hiyouga
0581bfdbc7 paper release 2024-03-21 13:49:17 +08:00
hiyouga
bfe7a91289 update readme 2024-03-21 00:48:42 +08:00
hiyouga
8408225162 support fsdp + qlora 2024-03-21 00:36:06 +08:00
hiyouga
3271af2afc add orca_dpo_pairs dataset 2024-03-20 20:09:06 +08:00
hoshi-hiyouga
b2dfbd728f Merge pull request #2905 from SirlyDreamer/main
Follow HF_ENDPOINT environment variable
2024-03-20 18:09:54 +08:00
hiyouga
9bec3c98a2 fix #2777 #2895 2024-03-20 17:59:45 +08:00
hiyouga
7b8f502901 fix #2346 2024-03-20 17:56:33 +08:00
SirlyDreamer
e165965341 Follow HF_ENDPOINT environment variable 2024-03-20 08:31:30 +00:00
hoshi-hiyouga
a773035709 Merge pull request #2903 from khazic/main
Updated README with new information
2024-03-20 16:13:44 +08:00
khazic
8d10fa71c2 Updated README with new information 2024-03-20 14:38:08 +08:00
khazic
0531dac30d Updated README with new information 2024-03-20 14:21:16 +08:00
刘一博
df9b4fb90a Updated README with new information 2024-03-20 14:11:28 +08:00
hiyouga
bea31b9b12 Update wechat.jpg 2024-03-18 16:48:32 +08:00
hiyouga
8e04794b2d fix packages 2024-03-17 22:32:03 +08:00
hiyouga
85c376fc1e fix patcher 2024-03-15 19:18:42 +08:00
hoshi-hiyouga
113cc04719 Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga
6bc2c23b6d fix export 2024-03-15 15:06:30 +08:00
S3Studio
e75407febd Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00