S3Studio
bada9f71a7
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479
2024-03-15 08:59:13 +08:00
hiyouga
b83aedd040
tiny fix
...
Former-commit-id: bf8123669be334338b4268d0a8f7703ff2cf6255
2024-03-14 21:19:06 +08:00
hiyouga
0d245f67ea
fix export
...
Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b
2024-03-14 18:17:01 +08:00
hiyouga
e49a9aa457
fix bug
...
Former-commit-id: 38c618b797ec219c2c45de960c9cbe50ec524c94
2024-03-13 23:55:31 +08:00
hiyouga
36bf98aea8
fix bug
...
Former-commit-id: 47ee0276830adbed65bc111d5a83049e77ad360a
2024-03-13 23:43:42 +08:00
hiyouga
4ef67ed4dd
improve lora+ impl.
...
Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39
2024-03-13 23:32:51 +08:00
齐保元
3c15d4cbec
[FEATURE]: ADD LORA+ ALGORITHM
...
Former-commit-id: c35b3c3b1e27171f8a703f88ede1dc8a84c80a56
2024-03-13 19:43:27 +08:00
hiyouga
dd374c1bb8
fix #2817
...
Former-commit-id: f1c8b8127b3c1ac095176015af5ec92d37a11efe
2024-03-13 12:42:03 +08:00
hiyouga
d233c4a71b
fix #2802
...
Former-commit-id: f4c56ccd785790c02f0d1275cd75958677a18690
2024-03-13 12:33:45 +08:00
hiyouga
33bef29828
fix kv cache
...
Former-commit-id: a9588e36e95bed896eea8d79ba7108447ff08f4b
2024-03-13 01:21:50 +08:00
hiyouga
d1294ffe2f
support QDoRA
...
Former-commit-id: d8ad1c5ef08e733e52084de271aad762b1613129
2024-03-12 22:12:42 +08:00
hiyouga
7b72952adc
patch for gemma cpt
...
Former-commit-id: fc0b19c62f52a90d78b63761dda3d8970a42f2da
2024-03-12 21:21:54 +08:00
hiyouga
0016732c21
fix plot issues
...
Former-commit-id: 01ae196b4916433da9aeec9c0b5c660c6b34464c
2024-03-12 18:41:35 +08:00
hiyouga
28acb02e80
support olmo
...
Former-commit-id: 2719510e8c6baa591c74458b773e4e47215e6052
2024-03-12 18:30:38 +08:00
hiyouga
89770c5a8e
fix #2802
...
Former-commit-id: 1370db270d7ba1a20468abdb29193ce7534d1b4f
2024-03-12 17:08:34 +08:00
hiyouga
5322aa2019
fix #2782 #2798
...
Former-commit-id: eb3ab610610a0964bc8a1c9fa015805353f04c31
2024-03-12 15:53:29 +08:00
hiyouga
a45bc9a96b
fix #2775
...
Former-commit-id: a5c7feb3e8089f4deff760b00a9f84425957c419
2024-03-11 00:42:54 +08:00
hiyouga
900e8edd79
tiny fix
...
Former-commit-id: 1d22c87db2449c7d9915842b70fbd59ce9c2dd70
2024-03-11 00:17:18 +08:00
hiyouga
dbaea2ba2f
update parser
...
Former-commit-id: d98258aa08d93494ad50d7786064e7fda15f6ca9
2024-03-10 13:35:20 +08:00
hiyouga
0aacc41252
support layerwise galore
...
Former-commit-id: d43a4da0947897d0be3f62fad3107754d4c89f2b
2024-03-10 00:24:11 +08:00
hiyouga
7538d8e726
fix #2732
...
Former-commit-id: bc39ad1d102b91d5417daa38b8a581e1e1ab2af9
2024-03-09 22:37:16 +08:00
hiyouga
56565bdbd4
allow non-packing pretraining
...
Former-commit-id: 3fee5cc5a3db9ce874ad90f2500ec092d904bd4e
2024-03-09 22:21:46 +08:00
hiyouga
aa48a7d9b2
fix #2766
...
Former-commit-id: a8cd556230c1d0bc4e090acc2276c035910ce6f6
2024-03-09 21:35:24 +08:00
hiyouga
cad00482d1
use default arg for freeze tuning
...
Former-commit-id: a38fd7c8b39cb59fb61c26fdf80aaa6f2d0623b9
2024-03-09 06:08:48 +08:00
hiyouga
72608bbcb3
update hardware requirements
...
Former-commit-id: 604b3d10fc1448f702943114b66b97bded21e080
2024-03-09 03:58:18 +08:00
hiyouga
e16912b0c0
fix #2756 , patch #2746
...
Former-commit-id: 627d1c91e675f1d9ebf47bad123cbbf29821da4d
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
5469111c65
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 656c653f0c628f9494b4d7ae12e60c8eeec1ea7a
2024-03-09 01:37:00 +08:00
hiyouga
1dd3f17f79
fix aqlm version
...
Former-commit-id: 05673f81f0295c76957f3247c62f95fda322a63e
2024-03-09 00:09:09 +08:00
stephen_zhu
9e8fe6403d
update
...
Former-commit-id: 295f9ef2eff2e8b5d7a21d3da8dd3e6eb2a42006
2024-03-08 12:47:44 +08:00
stephen
eb1ad9f161
fix ppo runtime error
...
Former-commit-id: 14e2f221e3e720075e59065a3dc42aa4d993a8b6
2024-03-08 11:48:26 +08:00
hiyouga
a44beb2b20
fix chat engine, update webui
...
Former-commit-id: 8b32dddd7d883bae07735796a517927c79d1c33b
2024-03-08 03:01:53 +08:00
hiyouga
a0b2d41dc5
update galore args
...
Former-commit-id: c7479a7976f773feb36aab4fdb0500be53d83b6a
2024-03-08 01:17:32 +08:00
hiyouga
b5187e4104
fix galore
...
Former-commit-id: 62a3ceeef8f60caef43ccc7f971a0c9184e21296
2024-03-08 00:44:51 +08:00
hiyouga
6e8df5733a
add Yi-9B model
...
Former-commit-id: bfcb0245b832242eefb84de6f70bd75544f3ceb7
2024-03-07 23:11:57 +08:00
hiyouga
3f56782ffa
support galore
...
Former-commit-id: b67a4a46a88d83bb2a3459b3317b66cda15e0171
2024-03-07 22:41:36 +08:00
hiyouga
ddabd699ca
support vllm
...
Former-commit-id: 889f6e910e654d8ec3922c2185042d737ffbf1c3
2024-03-07 20:26:31 +08:00
hiyouga
a02d518edc
fix #2735
...
Former-commit-id: 416f6333f66b6afd70a3a936d82593efca583235
2024-03-07 16:15:53 +08:00
hoshi-hiyouga
6772cfc6b8
Merge pull request #2730 from cx2333-gt/main
...
fix flash_attn in train_web
Former-commit-id: eff0b774fc8e1a5a07a2554d611cb85bef439dec
2024-03-07 14:37:18 +08:00
cx2333
bb4cffc869
revert choice name
...
Former-commit-id: 7832e68072219c7d1f562aee868812a4d655f4e0
2024-03-07 14:28:55 +08:00
hiyouga
1271263fdb
fix chatglm3 template
...
Former-commit-id: 9be0aa70fdd2e9ec208aa1850ace5c287efc8c3a
2024-03-07 14:26:16 +08:00
cx2333
de8d110ab8
fix flash_attn in train_web
...
Former-commit-id: 5f340e362b0e91fec76c19c77c5705bba1db481a
2024-03-07 10:13:55 +08:00
hiyouga
6bec66c192
tiny fix
...
Former-commit-id: c3145afa4164dd28888f17599a154f7dddbe9326
2024-03-06 17:25:08 +08:00
hiyouga
e7bea6981e
export use balanced gpu
...
Former-commit-id: 710487dc694489bf3dfe54f8d32df80ce46439e4
2024-03-06 16:33:14 +08:00
hiyouga
83cc346f18
fix add tokens
...
Former-commit-id: ff5353681a87d033903bf8cf6133c6bdb3fa9e5a
2024-03-06 15:04:02 +08:00
hiyouga
4aa6db78fb
fix version checking
...
Former-commit-id: 5780da8d640609cca388f55983d0251e5547209a
2024-03-06 14:51:51 +08:00
hiyouga
bfd0ed4afc
fix arg dtype
...
Former-commit-id: 999ae05655815ac04ababddae55d9343f5d39f84
2024-03-05 20:53:30 +08:00
hiyouga
c60b53a164
improve aqlm optim
...
Former-commit-id: 81be999b407e988c2f42764d827ac859d079ed3e
2024-03-05 20:49:50 +08:00
hiyouga
67bb861040
optimize aqlm training
...
Former-commit-id: 8b42660e4039b3d6475f502f397686ba6b140627
2024-03-05 18:35:41 +08:00
hiyouga
0604a84208
fix dora inference
...
Former-commit-id: 21b3597b0a05169afe51e1609b532787a65ca8ea
2024-03-05 11:51:41 +08:00
hiyouga
6123a4e713
fix export model
...
Former-commit-id: 7ba2f7bf8da3c559e05d8dde20e93cd1d3d4e8ef
2024-03-05 11:05:41 +08:00