hiyouga
1f99c367b3
remove redundant code
...
Former-commit-id: 667ce08b27
2024-04-24 05:02:18 +08:00
hiyouga
c0afc4074f
support unsloth generate
...
Former-commit-id: b1deb0a0b9
2024-04-24 04:46:53 +08:00
hiyouga
8465e54d38
refactor patcher
...
Former-commit-id: aa2b79eb23
2024-04-24 03:02:23 +08:00
hiyouga
80c8586534
reenable sdpa and fast tok by default
...
Former-commit-id: 07737a3d2d
2024-04-24 02:18:44 +08:00
hiyouga
34ecad4af8
fix #3347 #3387
...
Former-commit-id: 707f0b1d5d
2024-04-24 01:30:16 +08:00
hiyouga
79666c298d
fix #3365
...
Former-commit-id: a1d31ffc8c
2024-04-21 19:20:18 +08:00
hiyouga
ec81d45d27
fix mod stuff
...
Former-commit-id: f58425ab45
2024-04-21 18:11:10 +08:00
hoshi-hiyouga
7c63a9b5fd
Merge pull request #3338 from astramind-ai/main
...
Adding Mixture of Depth
Former-commit-id: d0273787be
2024-04-21 18:05:52 +08:00
hoshi-hiyouga
e9b1aff447
fix #3348
...
Former-commit-id: 1fa287fd63
2024-04-20 10:34:09 +08:00
Marco
639297a5ef
Added Mixture of Depths
...
Former-commit-id: 620add7b9f
2024-04-18 20:31:24 +02:00
hiyouga
9aa62ffb57
fix #3324
...
Former-commit-id: 942362d008
2024-04-18 15:34:45 +08:00
hiyouga
0170ef83a6
fix #3316
...
Former-commit-id: c9a477322d
2024-04-17 22:54:34 +08:00
hoshi-hiyouga
496396b3bc
Merge pull request #3287 from Ledzy/badam
...
[Feature] Add BAdam algorithm
Former-commit-id: 4d660c5ade
2024-04-16 17:32:16 +08:00
hoshi-hiyouga
b92f690190
Update utils.py
...
Former-commit-id: 38a56706e0
2024-04-16 17:29:30 +08:00
hoshi-hiyouga
48fb0be1b9
Update patcher.py
...
Former-commit-id: a950f3b81d
2024-04-16 17:29:19 +08:00
hoshi-hiyouga
ce56ff22af
Update adapter.py
...
Former-commit-id: 750cdf2e74
2024-04-16 17:28:12 +08:00
Jonery
b3260c7456
resolve gradient checkpointing issue.
...
Former-commit-id: 7ecb61822b
2024-04-16 12:05:27 +08:00
hiyouga
b40f266617
support unsloth 2024.4
...
Former-commit-id: 7dc72fb58c
2024-04-16 00:25:03 +08:00
hiyouga
bd2b758b48
add codegemma
...
Former-commit-id: 6543f3d449
2024-04-16 00:11:15 +08:00
hiyouga
2dc3343b1c
support cohere commandR #3184
...
Former-commit-id: e0dbac2845
2024-04-15 23:26:42 +08:00
Jonery
025f329445
Feature BAdam
...
Former-commit-id: 06c8908d3f
2024-04-15 23:15:27 +08:00
hiyouga
fb385b8c26
update examples
...
Former-commit-id: cce52351b5
2024-04-15 22:14:34 +08:00
hoshi-hiyouga
1bdf7e4b9d
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
Former-commit-id: 0e0942d388
2024-04-15 15:38:16 +08:00
hiyouga
ceccad3419
fix #3273
...
Former-commit-id: efc345c4b0
2024-04-15 15:32:58 +08:00
liuzc
11f4afc5ad
fix: mixtral output_router_logits
...
Former-commit-id: 9f4fe62386
2024-04-15 12:11:49 +08:00
hiyouga
431e9804ee
release v0.6.2
...
Former-commit-id: 9d4c949461
2024-04-11 20:08:51 +08:00
hoshi-hiyouga
77d16ada1e
Update adapter.py
...
Former-commit-id: 98bc97d8d2
2024-04-10 00:57:51 +08:00
hoshi-hiyouga
e5b4cb62e0
Update adapter.py
...
Former-commit-id: 2111b586b6
2024-04-10 00:57:30 +08:00
Erich Schubert
3dccd3c67e
Pass additional_target to unsloth
...
Fixes #3200
Former-commit-id: b5eefe5c4c
2024-04-09 17:53:40 +02:00
hiyouga
0e08c209c4
fix quant infer and qwen2moe
...
Former-commit-id: 7f6c2486b8
2024-04-09 17:12:59 +08:00
hiyouga
2ecf2bcbf0
fix resize vocab at inference #3022
...
Former-commit-id: 148bda353f
2024-04-03 18:14:24 +08:00
hiyouga
bf5ffeeae0
simplify readme
...
Former-commit-id: 92dab8a90b
2024-04-02 20:07:43 +08:00
hiyouga
f4be51f356
add moe aux loss control #3085
...
Former-commit-id: b267aeb53f
2024-04-02 14:26:31 +08:00
hiyouga
c7104f8fab
fix #3022
...
Former-commit-id: 9ddbe2866a
2024-04-02 13:58:39 +08:00
hiyouga
829cf6458a
fix #3083
...
Former-commit-id: 4a6ca621c0
2024-04-01 22:53:52 +08:00
hiyouga
34f1de0574
fix #3077
...
Former-commit-id: aee634cd20
2024-04-01 21:35:18 +08:00
hiyouga
b7468ea0a8
support infer 4bit model on GPUs #3023
...
Former-commit-id: eb259cc573
2024-04-01 17:34:04 +08:00
hiyouga
3cf35e57db
tiny fix
...
Former-commit-id: 27776c3474
2024-03-31 00:10:29 +08:00
marko1616
5721074af1
fix blank line contains whitespace
...
Former-commit-id: d9a5134617
2024-03-30 23:46:55 +08:00
marko1616
67c05c2031
Fix Llama model save for full param train
...
Former-commit-id: eb178eaff3
2024-03-30 23:45:04 +08:00
hiyouga
89c400633a
update trainers
...
Former-commit-id: 8c77b10912
2024-03-28 18:16:27 +08:00
hiyouga
ec94e5e876
fix #2961
...
Former-commit-id: 511f675402
2024-03-26 17:26:14 +08:00
hiyouga
75829c8699
fix #2928
...
Former-commit-id: 7afbc85dae
2024-03-24 00:34:54 +08:00
hiyouga
58aa576ae5
fix #2941
...
Former-commit-id: a1c8c98c5f
2024-03-24 00:28:44 +08:00
hiyouga
7999836fb6
support fsdp + qlora
...
Former-commit-id: 8408225162
2024-03-21 00:36:06 +08:00
hiyouga
cf149bf43c
fix #2346
...
Former-commit-id: 7b8f502901
2024-03-20 17:56:33 +08:00
hiyouga
a5537f3ee8
fix patcher
...
Former-commit-id: 85c376fc1e
2024-03-15 19:18:42 +08:00
S3Studio
46ef7416e6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: e75407febd
2024-03-15 08:59:13 +08:00
hiyouga
2cf95d4efe
fix export
...
Former-commit-id: 3b4a59bfb1
2024-03-14 18:17:01 +08:00
hiyouga
8b8671817f
improve lora+ impl.
...
Former-commit-id: 72367307df
2024-03-13 23:32:51 +08:00