hiyouga
79666c298d
fix #3365
...
Former-commit-id: a1d31ffc8cb7a6a477704efe779d485d83b8b9fb
2024-04-21 19:20:18 +08:00
hiyouga
ec81d45d27
fix mod stuff
...
Former-commit-id: f58425ab45727f7859583d4b9fda776715e27ff6
2024-04-21 18:11:10 +08:00
hoshi-hiyouga
48fb0be1b9
Update patcher.py
...
Former-commit-id: a950f3b81de701f5f23ce3efa60ff0382bb40dfe
2024-04-16 17:29:19 +08:00
Jonery
025f329445
Feature BAdam
...
Former-commit-id: 06c8908d3fe48907ddb585c5fa15677fc5416f94
2024-04-15 23:15:27 +08:00
hoshi-hiyouga
1bdf7e4b9d
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
Former-commit-id: 0e0942d388bdb0122001a7f8e081315059d5d327
2024-04-15 15:38:16 +08:00
hiyouga
ceccad3419
fix #3273
...
Former-commit-id: efc345c4b0095ec959ea23bbe54c344278780cbe
2024-04-15 15:32:58 +08:00
liuzc
11f4afc5ad
fix: mixtral output_router_logits
...
Former-commit-id: 9f4fe623866b10b30c6418dee116b36671274f9f
2024-04-15 12:11:49 +08:00
hiyouga
0e08c209c4
fix quant infer and qwen2moe
...
Former-commit-id: 7f6c2486b83e1d2c96a2314bfa8e1519ca5f574e
2024-04-09 17:12:59 +08:00
hiyouga
bf5ffeeae0
simplify readme
...
Former-commit-id: 92dab8a90bdd82a72a06559943467b56dde12c71
2024-04-02 20:07:43 +08:00
hiyouga
f4be51f356
add moe aux loss control #3085
...
Former-commit-id: b267aeb53fc49d2eeb0f3fc5ebe55e643f5db377
2024-04-02 14:26:31 +08:00
hiyouga
c7104f8fab
fix #3022
...
Former-commit-id: 9ddbe2866a4a4433d7635659a5635d16c59800b1
2024-04-02 13:58:39 +08:00
hiyouga
829cf6458a
fix #3083
...
Former-commit-id: 4a6ca621c09d179561acc5957c8c911a4e44184c
2024-04-01 22:53:52 +08:00
hiyouga
34f1de0574
fix #3077
...
Former-commit-id: aee634cd20e6dfdfbe2fbb47ae57f62b2da2bf9a
2024-04-01 21:35:18 +08:00
hiyouga
b7468ea0a8
support infer 4bit model on GPUs #3023
...
Former-commit-id: eb259cc5738dfb383e4cc5d32579501c580e11b1
2024-04-01 17:34:04 +08:00
hiyouga
3cf35e57db
tiny fix
...
Former-commit-id: 27776c34741ca0c58ed793bcdf1acd5e4a81fb39
2024-03-31 00:10:29 +08:00
marko1616
5721074af1
fix blank line contains whitespace
...
Former-commit-id: d9a5134617d494ef13ba73f9c540123e89a8c29c
2024-03-30 23:46:55 +08:00
marko1616
67c05c2031
Fix Llama model save for full param train
...
Former-commit-id: eb178eaff390a1dc342cc35ab8c7820d654f3717
2024-03-30 23:45:04 +08:00
hiyouga
ec94e5e876
fix #2961
...
Former-commit-id: 511f6754026fbbf48bd481018015338a6a3ad92f
2024-03-26 17:26:14 +08:00
hiyouga
58aa576ae5
fix #2941
...
Former-commit-id: a1c8c98c5fecfc0dd0ed1be33ee8dd2ade05b708
2024-03-24 00:28:44 +08:00
hiyouga
7999836fb6
support fsdp + qlora
...
Former-commit-id: 84082251621e1470b3b5406a56d0a967780a1804
2024-03-21 00:36:06 +08:00
hiyouga
cf149bf43c
fix #2346
...
Former-commit-id: 7b8f5029018f0481f7da83cc5ee4408d95c9beb2
2024-03-20 17:56:33 +08:00
hiyouga
a5537f3ee8
fix patcher
...
Former-commit-id: 85c376fc1e0bcc854ed6e70e6455a0b00b341655
2024-03-15 19:18:42 +08:00
S3Studio
46ef7416e6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: e75407febdec086f2bdca723a7f69a92b3b1d63f
2024-03-15 08:59:13 +08:00
hiyouga
2cf95d4efe
fix export
...
Former-commit-id: 3b4a59bfb1866a270b9934a4a2303197ffdab531
2024-03-14 18:17:01 +08:00
hiyouga
8b8671817f
improve lora+ impl.
...
Former-commit-id: 72367307dfadf936fb989ebe8bc9f0ff229fb933
2024-03-13 23:32:51 +08:00
hiyouga
a74426df0f
fix kv cache
...
Former-commit-id: 96ce76cd2753bc91c781ad13aa8f7a972abe815a
2024-03-13 01:21:50 +08:00
hiyouga
0b7e870b07
fix #2802
...
Former-commit-id: 8d8956bad542c0e1c0f7edbf4ffc22bb0f8788ae
2024-03-12 17:08:34 +08:00
hiyouga
868444e124
allow non-packing pretraining
...
Former-commit-id: bdb496644ce2c18806fc4fdae1fedcb3e5b5f808
2024-03-09 22:21:46 +08:00
hiyouga
c561b268ef
fix #2756 , patch #2746
...
Former-commit-id: e8dd38b7fdf8e172745d2538eb103895f2839c38
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
36d65289d0
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 516d0ddc666c179616a2a610b1353728db57391e
2024-03-09 01:37:00 +08:00
hiyouga
398c261c7c
fix aqlm version
...
Former-commit-id: 10be2f0eccc3963a985afcd24e5b8b8fc638b1c3
2024-03-09 00:09:09 +08:00
stephen_zhu
c69b9fbe58
update
...
Former-commit-id: aa71571b773c5dc527b17219ec87828e4455b330
2024-03-08 12:47:44 +08:00
stephen
495b858606
fix ppo runtime error
...
Former-commit-id: cdb7f82869b07d9d5d31b7b2aaf6b033bd00e32e
2024-03-08 11:48:26 +08:00
hiyouga
37e40563f1
fix #2735
...
Former-commit-id: f74f804a715dfb16bf24a056bc95db6b102f9ed7
2024-03-07 16:15:53 +08:00
hiyouga
8b6c178249
export use balanced gpu
...
Former-commit-id: 3e84f430b14a94e68f5815d8e412f0d74d28a04c
2024-03-06 16:33:14 +08:00
hiyouga
c776cdfc3e
optimize aqlm training
...
Former-commit-id: d3d3dac7070eb9055bcdc91eaf53f5b3741c0bda
2024-03-05 18:35:41 +08:00
hiyouga
0f2250b831
fix dora inference
...
Former-commit-id: ddf352f861e04e813cb8adeb4513964b4945081a
2024-03-05 11:51:41 +08:00
hiyouga
a62d17d009
fix export on cpu device
...
Former-commit-id: cda2ff87272797a062c7addb1bf840ac46208dfd
2024-03-04 17:35:09 +08:00
hiyouga
d1e6e02461
fix #2649
...
Former-commit-id: 4e5fae2fac85227641bd16159cf296a32e0b18b4
2024-03-01 13:02:41 +08:00
hiyouga
3787d13816
fix #2642
...
Former-commit-id: c0be617195f43d972681dd59727857b1247eeb7e
2024-02-29 18:32:54 +08:00
hiyouga
1853b5c172
tiny fix
...
Former-commit-id: 4a871e80e205466262534cdc710b0495954b153e
2024-02-29 17:28:50 +08:00
hiyouga
8e7d50dae4
release v0.5.3
...
Former-commit-id: fa5ab21ebc0ab738178c0c57578db3bda995ae06
2024-02-29 00:34:19 +08:00
hiyouga
5abbca70d3
support DoRA, AWQ, AQLM #2512
...
Former-commit-id: cfefacaa37453a15c55866d019887f24e886a577
2024-02-28 19:53:28 +08:00
hiyouga
96265ec154
support llama pro #2338 , add rslora
...
Former-commit-id: 7924ffc55d98e33bfbfbca303e46c8f476435673
2024-02-15 02:27:36 +08:00
hiyouga
bbe5ff0570
update gc kwargs
...
Former-commit-id: 0ae9a16b9d13bc1093662aa0b9bd990400ec2646
2024-02-07 00:38:24 +08:00
hiyouga
caeffc780d
fix #2438
...
Former-commit-id: ebf31b62eb1b75399cff7c7542c45ac72f6f41dd
2024-02-06 15:23:08 +08:00
hiyouga
a9e58740f5
fix #2376
...
Former-commit-id: 4ecadc35122340b3e520804270c1c1d16c696830
2024-02-03 23:14:31 +08:00
hiyouga
7beeae2209
fix autoset attn impl, update data readme
...
Former-commit-id: 521ad765521bb65aff5a29a8125a2b26ef00bff4
2024-01-31 11:58:07 +08:00
hiyouga
b8a827faeb
fix #2320
...
Former-commit-id: 2bc30763e9a40a82484c27b9a472425fdb9b3bd8
2024-01-24 16:19:18 +08:00
ldwang
323ec3f89f
Add patch_mixtral_replace_moe_impl for full training Mitral using DeepSpeed Zero3.
...
Signed-off-by: ldwang <ftgreat@gmail.com>
Former-commit-id: c284665425e8eefcea2d0dd1c835883e7ce18c97
2024-01-24 15:25:31 +08:00