hiyouga
dd73a0c248
set dev version
2024-04-01 23:24:08 +08:00
hiyouga
4a6ca621c0
fix #3083
2024-04-01 22:53:52 +08:00
hiyouga
54b7d34908
add qwen1.5 moe
2024-04-01 21:49:40 +08:00
hiyouga
aee634cd20
fix #3077
2024-04-01 21:35:18 +08:00
hiyouga
eb259cc573
support infer 4bit model on GPUs #3023
2024-04-01 17:34:04 +08:00
hiyouga
d0842f6828
update webui
2024-04-01 16:23:28 +08:00
hiyouga
816d714146
fix ORPO loss
2024-04-01 14:42:41 +08:00
hiyouga
5b9b40403d
fix IPO and ORPO loss
2024-04-01 14:37:53 +08:00
hiyouga
5907216a1c
fix plots
2024-03-31 19:43:48 +08:00
hiyouga
68aaa4904b
use log1p in orpo loss
...
https://github.com/huggingface/trl/pull/1491
2024-03-31 19:27:08 +08:00
hiyouga
099db6acc0
update readme
2024-03-31 18:46:34 +08:00
hiyouga
5195add324
support orpo in webui
2024-03-31 18:34:59 +08:00
hiyouga
17bf8a2c3a
support ORPO
2024-03-31 18:29:50 +08:00
hiyouga
27776c3474
tiny fix
2024-03-31 00:10:29 +08:00
marko1616
d9a5134617
fix blank line contains whitespace
2024-03-30 23:46:55 +08:00
marko1616
eb178eaff3
Fix Llama model save for full param train
2024-03-30 23:45:04 +08:00
hiyouga
7a086ed333
support save args in webui #2807 #3046
...
some ideas are borrowed from @marko1616
2024-03-30 23:09:12 +08:00
hiyouga
831c5321ac
upgrade gradio to 4.21.0
2024-03-30 20:37:08 +08:00
hiyouga
ca793028c6
release v0.6.1
2024-03-29 11:36:08 +08:00
hiyouga
8d603f8820
fix #2982
2024-03-28 20:22:31 +08:00
hiyouga
b19c14870d
fix #3010
2024-03-28 18:31:17 +08:00
hiyouga
8c77b10912
update trainers
2024-03-28 18:16:27 +08:00
hoshi-hiyouga
3bcd41b639
fix ds optimizer
2024-03-26 23:39:56 +08:00
hiyouga
3164b4f11b
fix bug
2024-03-26 17:30:12 +08:00
hiyouga
511f675402
fix #2961
2024-03-26 17:26:14 +08:00
hiyouga
ba70aca8fb
release v0.6.0 (real)
2024-03-25 23:37:48 +08:00
hiyouga
98a42cbdaa
tiny fix
2024-03-25 23:28:52 +08:00
hiyouga
1484f76a95
add arg check
2024-03-25 22:42:58 +08:00
hiyouga
6f2b563f12
release v0.6.0
2024-03-25 22:38:56 +08:00
hiyouga
558a538724
tiny fix
2024-03-25 21:18:08 +08:00
marko1616
c8f0d99704
pass ruff check
2024-03-24 16:12:10 +08:00
marko1616
6f080fdba3
fix Llama lora merge crash
2024-03-24 03:06:11 +08:00
marko1616
51349ea1cc
fix Llama lora merge crash
2024-03-24 02:55:23 +08:00
marko1616
c1e2c4ea45
fix Llama lora merge crash
2024-03-24 02:44:35 +08:00
hiyouga
140ad4ad56
fix #2936
2024-03-24 00:43:21 +08:00
hiyouga
7afbc85dae
fix #2928
2024-03-24 00:34:54 +08:00
hiyouga
a1c8c98c5f
fix #2941
2024-03-24 00:28:44 +08:00
hiyouga
8408225162
support fsdp + qlora
2024-03-21 00:36:06 +08:00
hiyouga
9bec3c98a2
fix #2777 #2895
2024-03-20 17:59:45 +08:00
hiyouga
7b8f502901
fix #2346
2024-03-20 17:56:33 +08:00
hiyouga
8e04794b2d
fix packages
2024-03-17 22:32:03 +08:00
hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
113cc04719
Merge pull request #2849 from S3Studio/DockerizeSupport
...
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga
6bc2c23b6d
fix export
2024-03-15 15:06:30 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
6ebde4f23e
tiny fix
2024-03-14 21:19:06 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
8172530d54
fix bug
2024-03-13 23:55:31 +08:00
hiyouga
714d936dfb
fix bug
2024-03-13 23:43:42 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00