BUAADreamer
cfb485eddf
add llava and instructblip
2024-04-25 00:22:43 +08:00
BUAADreamer
4dcb11eab7
add multimodal LLM BLIP-2 and InstructBLIP
2024-04-23 18:45:43 +08:00
hiyouga
a1d31ffc8c
fix #3365
2024-04-21 19:20:18 +08:00
hiyouga
f58425ab45
fix mod stuff
2024-04-21 18:11:10 +08:00
hoshi-hiyouga
d0273787be
Merge pull request #3338 from astramind-ai/main
...
Adding Mixture of Depth
2024-04-21 18:05:52 +08:00
hoshi-hiyouga
1fa287fd63
fix #3348
2024-04-20 10:34:09 +08:00
Marco
620add7b9f
Added Mixture of Depths
2024-04-18 20:31:24 +02:00
hiyouga
942362d008
fix #3324
2024-04-18 15:34:45 +08:00
hiyouga
c9a477322d
fix #3316
2024-04-17 22:54:34 +08:00
hoshi-hiyouga
4d660c5ade
Merge pull request #3287 from Ledzy/badam
...
[Feature] Add BAdam algorithm
2024-04-16 17:32:16 +08:00
hoshi-hiyouga
38a56706e0
Update utils.py
2024-04-16 17:29:30 +08:00
hoshi-hiyouga
a950f3b81d
Update patcher.py
2024-04-16 17:29:19 +08:00
hoshi-hiyouga
750cdf2e74
Update adapter.py
2024-04-16 17:28:12 +08:00
Jonery
7ecb61822b
resolve gradient checkpointing issue.
2024-04-16 12:05:27 +08:00
hiyouga
7dc72fb58c
support unsloth 2024.4
2024-04-16 00:25:03 +08:00
hiyouga
6543f3d449
add codegemma
2024-04-16 00:11:15 +08:00
hiyouga
e0dbac2845
support cohere commandR #3184
2024-04-15 23:26:42 +08:00
Jonery
06c8908d3f
Feature BAdam
2024-04-15 23:15:27 +08:00
hiyouga
cce52351b5
update examples
2024-04-15 22:14:34 +08:00
hoshi-hiyouga
0e0942d388
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
2024-04-15 15:38:16 +08:00
hiyouga
efc345c4b0
fix #3273
2024-04-15 15:32:58 +08:00
liuzc
9f4fe62386
fix: mixtral output_router_logits
2024-04-15 12:11:49 +08:00
hiyouga
9d4c949461
release v0.6.2
2024-04-11 20:08:51 +08:00
hoshi-hiyouga
98bc97d8d2
Update adapter.py
2024-04-10 00:57:51 +08:00
hoshi-hiyouga
2111b586b6
Update adapter.py
2024-04-10 00:57:30 +08:00
Erich Schubert
b5eefe5c4c
Pass additional_target to unsloth
...
Fixes #3200
2024-04-09 17:53:40 +02:00
hiyouga
7f6c2486b8
fix quant infer and qwen2moe
2024-04-09 17:12:59 +08:00
hiyouga
148bda353f
fix resize vocab at inference #3022
2024-04-03 18:14:24 +08:00
hiyouga
92dab8a90b
simplify readme
2024-04-02 20:07:43 +08:00
hiyouga
b267aeb53f
add moe aux loss control #3085
2024-04-02 14:26:31 +08:00
hiyouga
9ddbe2866a
fix #3022
2024-04-02 13:58:39 +08:00
hiyouga
4a6ca621c0
fix #3083
2024-04-01 22:53:52 +08:00
hiyouga
aee634cd20
fix #3077
2024-04-01 21:35:18 +08:00
hiyouga
eb259cc573
support infer 4bit model on GPUs #3023
2024-04-01 17:34:04 +08:00
hiyouga
27776c3474
tiny fix
2024-03-31 00:10:29 +08:00
marko1616
d9a5134617
fix blank line contains whitespace
2024-03-30 23:46:55 +08:00
marko1616
eb178eaff3
Fix Llama model save for full param train
2024-03-30 23:45:04 +08:00
hiyouga
8c77b10912
update trainers
2024-03-28 18:16:27 +08:00
hiyouga
511f675402
fix #2961
2024-03-26 17:26:14 +08:00
hiyouga
7afbc85dae
fix #2928
2024-03-24 00:34:54 +08:00
hiyouga
a1c8c98c5f
fix #2941
2024-03-24 00:28:44 +08:00
hiyouga
8408225162
support fsdp + qlora
2024-03-21 00:36:06 +08:00
hiyouga
7b8f502901
fix #2346
2024-03-20 17:56:33 +08:00
hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00
hiyouga
b9f87cdc11
fix #2802
2024-03-13 12:33:45 +08:00
hiyouga
96ce76cd27
fix kv cache
2024-03-13 01:21:50 +08:00
hiyouga
8d8956bad5
fix #2802
2024-03-12 17:08:34 +08:00