hiyouga
bd2b758b48
add codegemma
...
Former-commit-id: 6543f3d449
2024-04-16 00:11:15 +08:00
hiyouga
2dc3343b1c
support cohere commandR #3184
...
Former-commit-id: e0dbac2845
2024-04-15 23:26:42 +08:00
hiyouga
fb385b8c26
update examples
...
Former-commit-id: cce52351b5
2024-04-15 22:14:34 +08:00
hoshi-hiyouga
1bdf7e4b9d
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
Former-commit-id: 0e0942d388
2024-04-15 15:38:16 +08:00
hiyouga
ceccad3419
fix #3273
...
Former-commit-id: efc345c4b0
2024-04-15 15:32:58 +08:00
liuzc
11f4afc5ad
fix: mixtral output_router_logits
...
Former-commit-id: 9f4fe62386
2024-04-15 12:11:49 +08:00
hiyouga
431e9804ee
release v0.6.2
...
Former-commit-id: 9d4c949461
2024-04-11 20:08:51 +08:00
hoshi-hiyouga
77d16ada1e
Update adapter.py
...
Former-commit-id: 98bc97d8d2
2024-04-10 00:57:51 +08:00
hoshi-hiyouga
e5b4cb62e0
Update adapter.py
...
Former-commit-id: 2111b586b6
2024-04-10 00:57:30 +08:00
Erich Schubert
3dccd3c67e
Pass additional_target to unsloth
...
Fixes #3200
Former-commit-id: b5eefe5c4c
2024-04-09 17:53:40 +02:00
hiyouga
0e08c209c4
fix quant infer and qwen2moe
...
Former-commit-id: 7f6c2486b8
2024-04-09 17:12:59 +08:00
hiyouga
2ecf2bcbf0
fix resize vocab at inference #3022
...
Former-commit-id: 148bda353f
2024-04-03 18:14:24 +08:00
hiyouga
bf5ffeeae0
simplify readme
...
Former-commit-id: 92dab8a90b
2024-04-02 20:07:43 +08:00
hiyouga
f4be51f356
add moe aux loss control #3085
...
Former-commit-id: b267aeb53f
2024-04-02 14:26:31 +08:00
hiyouga
c7104f8fab
fix #3022
...
Former-commit-id: 9ddbe2866a
2024-04-02 13:58:39 +08:00
hiyouga
829cf6458a
fix #3083
...
Former-commit-id: 4a6ca621c0
2024-04-01 22:53:52 +08:00
hiyouga
34f1de0574
fix #3077
...
Former-commit-id: aee634cd20
2024-04-01 21:35:18 +08:00
hiyouga
b7468ea0a8
support infer 4bit model on GPUs #3023
...
Former-commit-id: eb259cc573
2024-04-01 17:34:04 +08:00
hiyouga
3cf35e57db
tiny fix
...
Former-commit-id: 27776c3474
2024-03-31 00:10:29 +08:00
marko1616
5721074af1
fix blank line contains whitespace
...
Former-commit-id: d9a5134617
2024-03-30 23:46:55 +08:00
marko1616
67c05c2031
Fix Llama model save for full param train
...
Former-commit-id: eb178eaff3
2024-03-30 23:45:04 +08:00
hiyouga
89c400633a
update trainers
...
Former-commit-id: 8c77b10912
2024-03-28 18:16:27 +08:00
hiyouga
ec94e5e876
fix #2961
...
Former-commit-id: 511f675402
2024-03-26 17:26:14 +08:00
hiyouga
75829c8699
fix #2928
...
Former-commit-id: 7afbc85dae
2024-03-24 00:34:54 +08:00
hiyouga
58aa576ae5
fix #2941
...
Former-commit-id: a1c8c98c5f
2024-03-24 00:28:44 +08:00
hiyouga
7999836fb6
support fsdp + qlora
...
Former-commit-id: 8408225162
2024-03-21 00:36:06 +08:00
hiyouga
cf149bf43c
fix #2346
...
Former-commit-id: 7b8f502901
2024-03-20 17:56:33 +08:00
hiyouga
a5537f3ee8
fix patcher
...
Former-commit-id: 85c376fc1e
2024-03-15 19:18:42 +08:00
S3Studio
46ef7416e6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: e75407febd
2024-03-15 08:59:13 +08:00
hiyouga
2cf95d4efe
fix export
...
Former-commit-id: 3b4a59bfb1
2024-03-14 18:17:01 +08:00
hiyouga
8b8671817f
improve lora+ impl.
...
Former-commit-id: 72367307df
2024-03-13 23:32:51 +08:00
hiyouga
8673abbe5e
fix #2802
...
Former-commit-id: b9f87cdc11
2024-03-13 12:33:45 +08:00
hiyouga
a74426df0f
fix kv cache
...
Former-commit-id: 96ce76cd27
2024-03-13 01:21:50 +08:00
hiyouga
0b7e870b07
fix #2802
...
Former-commit-id: 8d8956bad5
2024-03-12 17:08:34 +08:00
hiyouga
276def1897
fix #2732
...
Former-commit-id: 18ffce36b5
2024-03-09 22:37:16 +08:00
hiyouga
868444e124
allow non-packing pretraining
...
Former-commit-id: bdb496644c
2024-03-09 22:21:46 +08:00
hiyouga
c561b268ef
fix #2756 , patch #2746
...
Former-commit-id: e8dd38b7fd
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
36d65289d0
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 516d0ddc66
2024-03-09 01:37:00 +08:00
hiyouga
398c261c7c
fix aqlm version
...
Former-commit-id: 10be2f0ecc
2024-03-09 00:09:09 +08:00
stephen_zhu
c69b9fbe58
update
...
Former-commit-id: aa71571b77
2024-03-08 12:47:44 +08:00
stephen
495b858606
fix ppo runtime error
...
Former-commit-id: cdb7f82869
2024-03-08 11:48:26 +08:00
hiyouga
5b50458acf
fix galore
...
Former-commit-id: 33a4c24a8a
2024-03-08 00:44:51 +08:00
hiyouga
34533b2f35
support vllm
...
Former-commit-id: d07ad5cc1c
2024-03-07 20:26:31 +08:00
hiyouga
37e40563f1
fix #2735
...
Former-commit-id: f74f804a71
2024-03-07 16:15:53 +08:00
hiyouga
8b6c178249
export use balanced gpu
...
Former-commit-id: 3e84f430b1
2024-03-06 16:33:14 +08:00
hiyouga
e887aface7
fix version checking
...
Former-commit-id: 3016e65657
2024-03-06 14:51:51 +08:00
hiyouga
9561809ce9
improve aqlm optim
...
Former-commit-id: 259af60d28
2024-03-05 20:49:50 +08:00
hiyouga
c776cdfc3e
optimize aqlm training
...
Former-commit-id: d3d3dac707
2024-03-05 18:35:41 +08:00
hiyouga
0f2250b831
fix dora inference
...
Former-commit-id: ddf352f861
2024-03-05 11:51:41 +08:00
hiyouga
a62d17d009
fix export on cpu device
...
Former-commit-id: cda2ff8727
2024-03-04 17:35:09 +08:00