Commit Graph

179 Commits

Author SHA1 Message Date
BUAADreamer
cfb485eddf add llava and instructblip 2024-04-25 00:22:43 +08:00
BUAADreamer
4dcb11eab7 add multimodal LLM BLIP-2 and InstructBLIP 2024-04-23 18:45:43 +08:00
hiyouga
a1d31ffc8c fix #3365 2024-04-21 19:20:18 +08:00
hiyouga
f58425ab45 fix mod stuff 2024-04-21 18:11:10 +08:00
hoshi-hiyouga
d0273787be Merge pull request #3338 from astramind-ai/main
Adding Mixture of Depth
2024-04-21 18:05:52 +08:00
hoshi-hiyouga
1fa287fd63 fix #3348 2024-04-20 10:34:09 +08:00
Marco
620add7b9f Added Mixture of Depths 2024-04-18 20:31:24 +02:00
hiyouga
942362d008 fix #3324 2024-04-18 15:34:45 +08:00
hiyouga
c9a477322d fix #3316 2024-04-17 22:54:34 +08:00
hoshi-hiyouga
4d660c5ade Merge pull request #3287 from Ledzy/badam
[Feature] Add BAdam algorithm
2024-04-16 17:32:16 +08:00
hoshi-hiyouga
38a56706e0 Update utils.py 2024-04-16 17:29:30 +08:00
hoshi-hiyouga
a950f3b81d Update patcher.py 2024-04-16 17:29:19 +08:00
hoshi-hiyouga
750cdf2e74 Update adapter.py 2024-04-16 17:28:12 +08:00
Jonery
7ecb61822b resolve gradient checkpointing issue. 2024-04-16 12:05:27 +08:00
hiyouga
7dc72fb58c support unsloth 2024.4 2024-04-16 00:25:03 +08:00
hiyouga
6543f3d449 add codegemma 2024-04-16 00:11:15 +08:00
hiyouga
e0dbac2845 support cohere commandR #3184 2024-04-15 23:26:42 +08:00
Jonery
06c8908d3f Feature BAdam 2024-04-15 23:15:27 +08:00
hiyouga
cce52351b5 update examples 2024-04-15 22:14:34 +08:00
hoshi-hiyouga
0e0942d388 Merge pull request #3276 from liu-zichen/fix_mixtral
fix: turn on output_router_logits of mixtral
2024-04-15 15:38:16 +08:00
hiyouga
efc345c4b0 fix #3273 2024-04-15 15:32:58 +08:00
liuzc
9f4fe62386 fix: mixtral output_router_logits 2024-04-15 12:11:49 +08:00
hiyouga
9d4c949461 release v0.6.2 2024-04-11 20:08:51 +08:00
hoshi-hiyouga
98bc97d8d2 Update adapter.py 2024-04-10 00:57:51 +08:00
hoshi-hiyouga
2111b586b6 Update adapter.py 2024-04-10 00:57:30 +08:00
Erich Schubert
b5eefe5c4c Pass additional_target to unsloth
Fixes #3200
2024-04-09 17:53:40 +02:00
hiyouga
7f6c2486b8 fix quant infer and qwen2moe 2024-04-09 17:12:59 +08:00
hiyouga
148bda353f fix resize vocab at inference #3022 2024-04-03 18:14:24 +08:00
hiyouga
92dab8a90b simplify readme 2024-04-02 20:07:43 +08:00
hiyouga
b267aeb53f add moe aux loss control #3085 2024-04-02 14:26:31 +08:00
hiyouga
9ddbe2866a fix #3022 2024-04-02 13:58:39 +08:00
hiyouga
4a6ca621c0 fix #3083 2024-04-01 22:53:52 +08:00
hiyouga
aee634cd20 fix #3077 2024-04-01 21:35:18 +08:00
hiyouga
eb259cc573 support infer 4bit model on GPUs #3023 2024-04-01 17:34:04 +08:00
hiyouga
27776c3474 tiny fix 2024-03-31 00:10:29 +08:00
marko1616
d9a5134617 fix blank line contains whitespace 2024-03-30 23:46:55 +08:00
marko1616
eb178eaff3 Fix Llama model save for full param train 2024-03-30 23:45:04 +08:00
hiyouga
8c77b10912 update trainers 2024-03-28 18:16:27 +08:00
hiyouga
511f675402 fix #2961 2024-03-26 17:26:14 +08:00
hiyouga
7afbc85dae fix #2928 2024-03-24 00:34:54 +08:00
hiyouga
a1c8c98c5f fix #2941 2024-03-24 00:28:44 +08:00
hiyouga
8408225162 support fsdp + qlora 2024-03-21 00:36:06 +08:00
hiyouga
7b8f502901 fix #2346 2024-03-20 17:56:33 +08:00
hiyouga
85c376fc1e fix patcher 2024-03-15 19:18:42 +08:00
S3Studio
e75407febd Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
3b4a59bfb1 fix export 2024-03-14 18:17:01 +08:00
hiyouga
72367307df improve lora+ impl. 2024-03-13 23:32:51 +08:00
hiyouga
b9f87cdc11 fix #2802 2024-03-13 12:33:45 +08:00
hiyouga
96ce76cd27 fix kv cache 2024-03-13 01:21:50 +08:00
hiyouga
8d8956bad5 fix #2802 2024-03-12 17:08:34 +08:00