hiyouga
9aeb88c426
add export_device in webui #3333
...
Former-commit-id: 30ebd3652809d73941e0a5e4a8be11d989faf98d
2024-04-25 19:02:32 +08:00
hiyouga
03f2e3284a
refactor patcher
...
Former-commit-id: 263cfe1294f5c3188f5e8d65791f35ee0d87315a
2024-04-24 03:02:23 +08:00
hiyouga
d2bb1b3a6b
reenable sdpa and fast tok by default
...
Former-commit-id: 9e00902dbedc71d55743d1bf237843506a557891
2024-04-24 02:18:44 +08:00
hiyouga
1d341dcd83
fix #3365
...
Former-commit-id: 415ce41e8fa887e980e5bd575c8e95bd4076b90b
2024-04-21 19:20:18 +08:00
hiyouga
f8e219dc81
fix mod stuff
...
Former-commit-id: cf3988226e6398c67bb2955578e436fc505aa5c5
2024-04-21 18:11:10 +08:00
hoshi-hiyouga
cde9d1b917
Update patcher.py
...
Former-commit-id: 494e6a1e05b38f5ff61d83327303614f53c92e64
2024-04-16 17:29:19 +08:00
Jonery
d4d471450f
Feature BAdam
...
Former-commit-id: d8d2807fbcf587c37f7fd34a23e9397d2775ceed
2024-04-15 23:15:27 +08:00
hoshi-hiyouga
0c80751e87
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
Former-commit-id: 07bbaf5c67d00a152e5304e81b15fd9189e7bb99
2024-04-15 15:38:16 +08:00
hiyouga
9338f878a3
fix #3273
...
Former-commit-id: 3b20c89b342a068356ffc29c3724b645775c65db
2024-04-15 15:32:58 +08:00
liuzc
fde3d91242
fix: mixtral output_router_logits
...
Former-commit-id: ab3171ea97ec968b972287287ef9ee2502c6d37c
2024-04-15 12:11:49 +08:00
hiyouga
566d71b7a9
fix quant infer and qwen2moe
...
Former-commit-id: b75d16767f35c36e2cf2aaab8a3844135085bccf
2024-04-09 17:12:59 +08:00
hiyouga
b12176d818
simplify readme
...
Former-commit-id: 0da6ec2d516326fe9c7583ba71cd1778eb838178
2024-04-02 20:07:43 +08:00
hiyouga
117b67ea30
add moe aux loss control #3085
...
Former-commit-id: c9187ebc944e2de454ace3304b7d28eabb1b1a81
2024-04-02 14:26:31 +08:00
hiyouga
03e20bb5c6
fix #3022
...
Former-commit-id: dac2f617bda9470ac8d85c7e9def09cc04970506
2024-04-02 13:58:39 +08:00
hiyouga
1dc963caa6
fix #3083
...
Former-commit-id: ff9a3f73961a362d0ddc22079f80a85465fffda8
2024-04-01 22:53:52 +08:00
hiyouga
40211db275
fix #3077
...
Former-commit-id: d0340391e8075cff0d84b3ef879c2101b66ca1dc
2024-04-01 21:35:18 +08:00
hiyouga
e7f13098c6
support infer 4bit model on GPUs #3023
...
Former-commit-id: 950a9dab9055839990656b2b40956792b253573d
2024-04-01 17:34:04 +08:00
hiyouga
526111a303
tiny fix
...
Former-commit-id: ba4a9b3c01e2f7467fbc5be268f47c0d003caa65
2024-03-31 00:10:29 +08:00
marko1616
1f617c6e08
fix blank line contains whitespace
...
Former-commit-id: 7bc3bcc64353d5a1d4870c6a9509b64cff710492
2024-03-30 23:46:55 +08:00
marko1616
a6858a36c0
Fix Llama model save for full param train
...
Former-commit-id: ca17b5db4f97c3ec9fe2004877f150e8f51ab4b5
2024-03-30 23:45:04 +08:00
hiyouga
3336422760
fix #2961
...
Former-commit-id: 616917bb3be7f71073b56ad8c7bc4e164b08b9b5
2024-03-26 17:26:14 +08:00
hiyouga
a57d839e1d
fix #2941
...
Former-commit-id: 3775ab52017f0b610ddd8199cccfb8c001eda507
2024-03-24 00:28:44 +08:00
hiyouga
935ee0a023
support fsdp + qlora
...
Former-commit-id: b894bf8e84be689db258021f0638e9ac939abcbc
2024-03-21 00:36:06 +08:00
hiyouga
d8073488be
fix #2346
...
Former-commit-id: c8888c499b0ac51e2fc86c16e8e91c79400a5993
2024-03-20 17:56:33 +08:00
hiyouga
e194efab10
fix patcher
...
Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5
2024-03-15 19:18:42 +08:00
S3Studio
096869c7b6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479
2024-03-15 08:59:13 +08:00
hiyouga
aabe90343e
fix export
...
Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b
2024-03-14 18:17:01 +08:00
hiyouga
46f99ff277
improve lora+ impl.
...
Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39
2024-03-13 23:32:51 +08:00
hiyouga
9a784fb4f3
fix kv cache
...
Former-commit-id: a9588e36e95bed896eea8d79ba7108447ff08f4b
2024-03-13 01:21:50 +08:00
hiyouga
6c1b4aec75
fix #2802
...
Former-commit-id: 1370db270d7ba1a20468abdb29193ce7534d1b4f
2024-03-12 17:08:34 +08:00
hiyouga
4881f4e631
allow non-packing pretraining
...
Former-commit-id: 3fee5cc5a3db9ce874ad90f2500ec092d904bd4e
2024-03-09 22:21:46 +08:00
hiyouga
43b2ede0f8
fix #2756 , patch #2746
...
Former-commit-id: 627d1c91e675f1d9ebf47bad123cbbf29821da4d
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
2f095e2017
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 656c653f0c628f9494b4d7ae12e60c8eeec1ea7a
2024-03-09 01:37:00 +08:00
hiyouga
9b97b23ce7
fix aqlm version
...
Former-commit-id: 05673f81f0295c76957f3247c62f95fda322a63e
2024-03-09 00:09:09 +08:00
stephen_zhu
940c00e7ae
update
...
Former-commit-id: 295f9ef2eff2e8b5d7a21d3da8dd3e6eb2a42006
2024-03-08 12:47:44 +08:00
stephen
18cfd5f349
fix ppo runtime error
...
Former-commit-id: 14e2f221e3e720075e59065a3dc42aa4d993a8b6
2024-03-08 11:48:26 +08:00
hiyouga
9a69cadab3
fix #2735
...
Former-commit-id: 416f6333f66b6afd70a3a936d82593efca583235
2024-03-07 16:15:53 +08:00
hiyouga
7578209735
export use balanced gpu
...
Former-commit-id: 710487dc694489bf3dfe54f8d32df80ce46439e4
2024-03-06 16:33:14 +08:00
hiyouga
a10bead9b5
optimize aqlm training
...
Former-commit-id: 8b42660e4039b3d6475f502f397686ba6b140627
2024-03-05 18:35:41 +08:00
hiyouga
3553e301dd
fix dora inference
...
Former-commit-id: 21b3597b0a05169afe51e1609b532787a65ca8ea
2024-03-05 11:51:41 +08:00
hiyouga
2dca53962e
fix export on cpu device
...
Former-commit-id: e4722a9a627ea4e9a1341cc00a3108dd06a6b550
2024-03-04 17:35:09 +08:00
hiyouga
59a9a5994e
fix #2649
...
Former-commit-id: 1c850de660c671d92f0bc63f230d338b60b7c0bd
2024-03-01 13:02:41 +08:00
hiyouga
88fddb879d
fix #2642
...
Former-commit-id: d8435e7f1850532310e1bee069b45f38cd666e48
2024-02-29 18:32:54 +08:00
hiyouga
30855b924a
tiny fix
...
Former-commit-id: 3b6e1132c4d203e6d5376cf97e81cc160697c822
2024-02-29 17:28:50 +08:00
hiyouga
544e7a491b
release v0.5.3
...
Former-commit-id: f6bc89581b3cd129448da2defc23848de6f494ed
2024-02-29 00:34:19 +08:00
hiyouga
b392e6cfb9
support DoRA, AWQ, AQLM #2512
...
Former-commit-id: 6614cc1f08aa944db083e27e451bbdd733f7dd97
2024-02-28 19:53:28 +08:00
hiyouga
596b6828cb
support llama pro #2338 , add rslora
...
Former-commit-id: 40d659b7f30dd5a004703c176ec1f22dc864e505
2024-02-15 02:27:36 +08:00
hiyouga
f67f781fed
update gc kwargs
...
Former-commit-id: 0cb81c156bc8c21a4bbdd3289a491f78dfcaf730
2024-02-07 00:38:24 +08:00
hiyouga
b564b97b7e
fix #2438
...
Former-commit-id: 412d856eeada2abcea598fac0a8d35ae90cc9c01
2024-02-06 15:23:08 +08:00
hiyouga
5fa52e87cb
fix #2376
...
Former-commit-id: 8e2cfa7cca21b7fd4538d72114e36f704bcc82fe
2024-02-03 23:14:31 +08:00