hiyouga
b5c5283dd6
add codegemma
...
Former-commit-id: 9324176525c2eda22962b0ca1895009b6237e6e3
2024-04-16 00:11:15 +08:00
hiyouga
b638c65519
support cohere commandR #3184
...
Former-commit-id: e077c36872740f6b2ac255aee9da6c4c70f28977
2024-04-15 23:26:42 +08:00
hiyouga
276f2cb24e
update examples
...
Former-commit-id: 369294b31c8a03a1cafcee83eb31a817007d3c49
2024-04-15 22:14:34 +08:00
hoshi-hiyouga
0c80751e87
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
Former-commit-id: 07bbaf5c67d00a152e5304e81b15fd9189e7bb99
2024-04-15 15:38:16 +08:00
hiyouga
9338f878a3
fix #3273
...
Former-commit-id: 3b20c89b342a068356ffc29c3724b645775c65db
2024-04-15 15:32:58 +08:00
liuzc
fde3d91242
fix: mixtral output_router_logits
...
Former-commit-id: ab3171ea97ec968b972287287ef9ee2502c6d37c
2024-04-15 12:11:49 +08:00
hiyouga
7468f2535c
release v0.6.2
...
Former-commit-id: f92ad0a62d957b595f6a76a5403216b163eb3d17
2024-04-11 20:08:51 +08:00
hoshi-hiyouga
7856f98965
Update adapter.py
...
Former-commit-id: 720fde3683529ed7e08ac27c7c4598c6bdc30d44
2024-04-10 00:57:51 +08:00
hoshi-hiyouga
e25ddef08c
Update adapter.py
...
Former-commit-id: a84b8d17dbf221259212e81931d80bcdd6284ad7
2024-04-10 00:57:30 +08:00
Erich Schubert
95a4589bbf
Pass additional_target to unsloth
...
Fixes #3200
Former-commit-id: f8f87f5b0549cba6a011749c42064047f82ba577
2024-04-09 17:53:40 +02:00
hiyouga
566d71b7a9
fix quant infer and qwen2moe
...
Former-commit-id: b75d16767f35c36e2cf2aaab8a3844135085bccf
2024-04-09 17:12:59 +08:00
hiyouga
1348f7d860
fix resize vocab at inference #3022
...
Former-commit-id: c243720b89eec0af2872fa3c7980a0026d893f4d
2024-04-03 18:14:24 +08:00
hiyouga
b12176d818
simplify readme
...
Former-commit-id: 0da6ec2d516326fe9c7583ba71cd1778eb838178
2024-04-02 20:07:43 +08:00
hiyouga
117b67ea30
add moe aux loss control #3085
...
Former-commit-id: c9187ebc944e2de454ace3304b7d28eabb1b1a81
2024-04-02 14:26:31 +08:00
hiyouga
03e20bb5c6
fix #3022
...
Former-commit-id: dac2f617bda9470ac8d85c7e9def09cc04970506
2024-04-02 13:58:39 +08:00
hiyouga
1dc963caa6
fix #3083
...
Former-commit-id: ff9a3f73961a362d0ddc22079f80a85465fffda8
2024-04-01 22:53:52 +08:00
hiyouga
40211db275
fix #3077
...
Former-commit-id: d0340391e8075cff0d84b3ef879c2101b66ca1dc
2024-04-01 21:35:18 +08:00
hiyouga
e7f13098c6
support infer 4bit model on GPUs #3023
...
Former-commit-id: 950a9dab9055839990656b2b40956792b253573d
2024-04-01 17:34:04 +08:00
hiyouga
526111a303
tiny fix
...
Former-commit-id: ba4a9b3c01e2f7467fbc5be268f47c0d003caa65
2024-03-31 00:10:29 +08:00
marko1616
1f617c6e08
fix blank line contains whitespace
...
Former-commit-id: 7bc3bcc64353d5a1d4870c6a9509b64cff710492
2024-03-30 23:46:55 +08:00
marko1616
a6858a36c0
Fix Llama model save for full param train
...
Former-commit-id: ca17b5db4f97c3ec9fe2004877f150e8f51ab4b5
2024-03-30 23:45:04 +08:00
hiyouga
59e6ebf039
update trainers
...
Former-commit-id: d0dd6eefed0b86895ed00a7cafb331e5193db645
2024-03-28 18:16:27 +08:00
hiyouga
3336422760
fix #2961
...
Former-commit-id: 616917bb3be7f71073b56ad8c7bc4e164b08b9b5
2024-03-26 17:26:14 +08:00
hiyouga
c548ad5e69
fix #2928
...
Former-commit-id: 9558ee87bc7260a6596385aaa375df544862bfa9
2024-03-24 00:34:54 +08:00
hiyouga
a57d839e1d
fix #2941
...
Former-commit-id: 3775ab52017f0b610ddd8199cccfb8c001eda507
2024-03-24 00:28:44 +08:00
hiyouga
935ee0a023
support fsdp + qlora
...
Former-commit-id: b894bf8e84be689db258021f0638e9ac939abcbc
2024-03-21 00:36:06 +08:00
hiyouga
d8073488be
fix #2346
...
Former-commit-id: c8888c499b0ac51e2fc86c16e8e91c79400a5993
2024-03-20 17:56:33 +08:00
hiyouga
e194efab10
fix patcher
...
Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5
2024-03-15 19:18:42 +08:00
S3Studio
096869c7b6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479
2024-03-15 08:59:13 +08:00
hiyouga
aabe90343e
fix export
...
Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b
2024-03-14 18:17:01 +08:00
hiyouga
46f99ff277
improve lora+ impl.
...
Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39
2024-03-13 23:32:51 +08:00
hiyouga
6a4e4b9c5b
fix #2802
...
Former-commit-id: f4c56ccd785790c02f0d1275cd75958677a18690
2024-03-13 12:33:45 +08:00
hiyouga
9a784fb4f3
fix kv cache
...
Former-commit-id: a9588e36e95bed896eea8d79ba7108447ff08f4b
2024-03-13 01:21:50 +08:00
hiyouga
6c1b4aec75
fix #2802
...
Former-commit-id: 1370db270d7ba1a20468abdb29193ce7534d1b4f
2024-03-12 17:08:34 +08:00
hiyouga
c635bbe465
fix #2732
...
Former-commit-id: bc39ad1d102b91d5417daa38b8a581e1e1ab2af9
2024-03-09 22:37:16 +08:00
hiyouga
4881f4e631
allow non-packing pretraining
...
Former-commit-id: 3fee5cc5a3db9ce874ad90f2500ec092d904bd4e
2024-03-09 22:21:46 +08:00
hiyouga
43b2ede0f8
fix #2756 , patch #2746
...
Former-commit-id: 627d1c91e675f1d9ebf47bad123cbbf29821da4d
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
2f095e2017
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 656c653f0c628f9494b4d7ae12e60c8eeec1ea7a
2024-03-09 01:37:00 +08:00
hiyouga
9b97b23ce7
fix aqlm version
...
Former-commit-id: 05673f81f0295c76957f3247c62f95fda322a63e
2024-03-09 00:09:09 +08:00
stephen_zhu
940c00e7ae
update
...
Former-commit-id: 295f9ef2eff2e8b5d7a21d3da8dd3e6eb2a42006
2024-03-08 12:47:44 +08:00
stephen
18cfd5f349
fix ppo runtime error
...
Former-commit-id: 14e2f221e3e720075e59065a3dc42aa4d993a8b6
2024-03-08 11:48:26 +08:00
hiyouga
e416cecf62
fix galore
...
Former-commit-id: 62a3ceeef8f60caef43ccc7f971a0c9184e21296
2024-03-08 00:44:51 +08:00
hiyouga
056d2d956a
support vllm
...
Former-commit-id: 889f6e910e654d8ec3922c2185042d737ffbf1c3
2024-03-07 20:26:31 +08:00
hiyouga
9a69cadab3
fix #2735
...
Former-commit-id: 416f6333f66b6afd70a3a936d82593efca583235
2024-03-07 16:15:53 +08:00
hiyouga
7578209735
export use balanced gpu
...
Former-commit-id: 710487dc694489bf3dfe54f8d32df80ce46439e4
2024-03-06 16:33:14 +08:00
hiyouga
73d9dfc7ab
fix version checking
...
Former-commit-id: 5780da8d640609cca388f55983d0251e5547209a
2024-03-06 14:51:51 +08:00
hiyouga
46ee267cfc
improve aqlm optim
...
Former-commit-id: 81be999b407e988c2f42764d827ac859d079ed3e
2024-03-05 20:49:50 +08:00
hiyouga
a10bead9b5
optimize aqlm training
...
Former-commit-id: 8b42660e4039b3d6475f502f397686ba6b140627
2024-03-05 18:35:41 +08:00
hiyouga
3553e301dd
fix dora inference
...
Former-commit-id: 21b3597b0a05169afe51e1609b532787a65ca8ea
2024-03-05 11:51:41 +08:00
hiyouga
2dca53962e
fix export on cpu device
...
Former-commit-id: e4722a9a627ea4e9a1341cc00a3108dd06a6b550
2024-03-04 17:35:09 +08:00