hiyouga
008e3b3b10
fix fsdp model loading
2024-05-15 16:32:28 +08:00
hiyouga
af343034dd
add npu examples
2024-05-14 23:32:53 +08:00
hiyouga
b033232aea
fix llava config
2024-05-12 00:02:49 +08:00
hiyouga
b3e33c703e
fix llava rlhf
2024-04-28 03:01:49 +08:00
hiyouga
fc67b736ba
fix llava qlora
2024-04-26 18:00:23 +08:00
hiyouga
3a7c1286ce
add export_device in webui #3333
2024-04-25 19:02:32 +08:00
hiyouga
aa2b79eb23
refactor patcher
2024-04-24 03:02:23 +08:00
hiyouga
07737a3d2d
reenable sdpa and fast tok by default
2024-04-24 02:18:44 +08:00
hiyouga
a1d31ffc8c
fix #3365
2024-04-21 19:20:18 +08:00
hiyouga
f58425ab45
fix mod stuff
2024-04-21 18:11:10 +08:00
hoshi-hiyouga
a950f3b81d
Update patcher.py
2024-04-16 17:29:19 +08:00
Jonery
06c8908d3f
Feature BAdam
2024-04-15 23:15:27 +08:00
hoshi-hiyouga
0e0942d388
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
2024-04-15 15:38:16 +08:00
hiyouga
efc345c4b0
fix #3273
2024-04-15 15:32:58 +08:00
liuzc
9f4fe62386
fix: mixtral output_router_logits
2024-04-15 12:11:49 +08:00
hiyouga
7f6c2486b8
fix quant infer and qwen2moe
2024-04-09 17:12:59 +08:00
hiyouga
92dab8a90b
simplify readme
2024-04-02 20:07:43 +08:00
hiyouga
b267aeb53f
add moe aux loss control #3085
2024-04-02 14:26:31 +08:00
hiyouga
9ddbe2866a
fix #3022
2024-04-02 13:58:39 +08:00
hiyouga
4a6ca621c0
fix #3083
2024-04-01 22:53:52 +08:00
hiyouga
aee634cd20
fix #3077
2024-04-01 21:35:18 +08:00
hiyouga
eb259cc573
support infer 4bit model on GPUs #3023
2024-04-01 17:34:04 +08:00
hiyouga
27776c3474
tiny fix
2024-03-31 00:10:29 +08:00
marko1616
d9a5134617
fix blank line contains whitespace
2024-03-30 23:46:55 +08:00
marko1616
eb178eaff3
Fix Llama model save for full param train
2024-03-30 23:45:04 +08:00
hiyouga
511f675402
fix #2961
2024-03-26 17:26:14 +08:00
hiyouga
a1c8c98c5f
fix #2941
2024-03-24 00:28:44 +08:00
hiyouga
8408225162
support fsdp + qlora
2024-03-21 00:36:06 +08:00
hiyouga
7b8f502901
fix #2346
2024-03-20 17:56:33 +08:00
hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00
hiyouga
96ce76cd27
fix kv cache
2024-03-13 01:21:50 +08:00
hiyouga
8d8956bad5
fix #2802
2024-03-12 17:08:34 +08:00
hiyouga
bdb496644c
allow non-packing pretraining
2024-03-09 22:21:46 +08:00
hiyouga
e8dd38b7fd
fix #2756 , patch #2746
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
516d0ddc66
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00
hiyouga
10be2f0ecc
fix aqlm version
2024-03-09 00:09:09 +08:00
stephen_zhu
aa71571b77
update
2024-03-08 12:47:44 +08:00
stephen
cdb7f82869
fix ppo runtime error
2024-03-08 11:48:26 +08:00
hiyouga
f74f804a71
fix #2735
2024-03-07 16:15:53 +08:00
hiyouga
3e84f430b1
export use balanced gpu
2024-03-06 16:33:14 +08:00
hiyouga
d3d3dac707
optimize aqlm training
2024-03-05 18:35:41 +08:00
hiyouga
ddf352f861
fix dora inference
2024-03-05 11:51:41 +08:00
hiyouga
cda2ff8727
fix export on cpu device
2024-03-04 17:35:09 +08:00
hiyouga
4e5fae2fac
fix #2649
2024-03-01 13:02:41 +08:00
hiyouga
c0be617195
fix #2642
2024-02-29 18:32:54 +08:00
hiyouga
4a871e80e2
tiny fix
2024-02-29 17:28:50 +08:00
hiyouga
fa5ab21ebc
release v0.5.3
2024-02-29 00:34:19 +08:00