SirlyDreamer
6fc2d7e063
Follow HF_ENDPOINT environment variable
...
Former-commit-id: 22b36a3cfd2909cb624b1bb7385558eda504defe
2024-03-20 08:31:30 +00:00
khazic
e93c7cdb80
Updated README with new information
...
Former-commit-id: b12f12039ce221decf09a25ec9d64e385d9497c7
2024-03-20 14:38:08 +08:00
khazic
c32d6c8250
Updated README with new information
...
Former-commit-id: 90a81c2e52bd44beb3b7feb5d2517b073f7f6ef9
2024-03-20 14:21:16 +08:00
刘一博
757158da63
Updated README with new information
...
Former-commit-id: fddbc29ca1bd9b13372087e6a349f21240abc013
2024-03-20 14:11:28 +08:00
hiyouga
ffdacaa618
fix packages
...
Former-commit-id: 2f9f334a123d43267bfb3dd26aaa1ad285ffe7a5
2024-03-17 22:32:03 +08:00
hiyouga
e194efab10
fix patcher
...
Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
772fc2eac7
Merge pull request #2849 from S3Studio/DockerizeSupport
...
Improve Dockerize support
Former-commit-id: b63cba317266f5ba217de54fda77ec26a4df344d
2024-03-15 19:16:02 +08:00
hiyouga
ed020579dc
fix export
...
Former-commit-id: 4e996f194406d7eb27b2bde290a12c82c41219d0
2024-03-15 15:06:30 +08:00
S3Studio
096869c7b6
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479
2024-03-15 08:59:13 +08:00
S3Studio
c6873211e9
improve Docker build and runtime parameters
...
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
Former-commit-id: 97f9901c2f5c29a6ab517a1f8fa028b8e89edf4e
2024-03-15 08:57:46 +08:00
hiyouga
623ee1bd88
tiny fix
...
Former-commit-id: bf8123669be334338b4268d0a8f7703ff2cf6255
2024-03-14 21:19:06 +08:00
hiyouga
aabe90343e
fix export
...
Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b
2024-03-14 18:17:01 +08:00
hiyouga
764cfb506d
fix bug
...
Former-commit-id: 38c618b797ec219c2c45de960c9cbe50ec524c94
2024-03-13 23:55:31 +08:00
hiyouga
249ad56075
fix bug
...
Former-commit-id: 47ee0276830adbed65bc111d5a83049e77ad360a
2024-03-13 23:43:42 +08:00
hiyouga
46f99ff277
improve lora+ impl.
...
Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39
2024-03-13 23:32:51 +08:00
hoshi-hiyouga
73f4513c84
Merge pull request #2830 from qibaoyuan/lora_plus
...
[FEATURE]: ADD LORA+ ALGORITHM
Former-commit-id: 456f2aed5811b9f296acd371a1f706daeb37e12a
2024-03-13 20:15:46 +08:00
齐保元
3c91e86268
[FEATURE]: ADD LORA+ ALGORITHM
...
Former-commit-id: c35b3c3b1e27171f8a703f88ede1dc8a84c80a56
2024-03-13 19:43:27 +08:00
hiyouga
42473ec150
fix #2817
...
Former-commit-id: f1c8b8127b3c1ac095176015af5ec92d37a11efe
2024-03-13 12:42:03 +08:00
hiyouga
6a4e4b9c5b
fix #2802
...
Former-commit-id: f4c56ccd785790c02f0d1275cd75958677a18690
2024-03-13 12:33:45 +08:00
hiyouga
9a784fb4f3
fix kv cache
...
Former-commit-id: a9588e36e95bed896eea8d79ba7108447ff08f4b
2024-03-13 01:21:50 +08:00
hiyouga
43fd80a1aa
support QDoRA
...
Former-commit-id: d8ad1c5ef08e733e52084de271aad762b1613129
2024-03-12 22:12:42 +08:00
hiyouga
e6ab1a57ea
patch for gemma cpt
...
Former-commit-id: fc0b19c62f52a90d78b63761dda3d8970a42f2da
2024-03-12 21:21:54 +08:00
hiyouga
282edb9161
fix plot issues
...
Former-commit-id: 01ae196b4916433da9aeec9c0b5c660c6b34464c
2024-03-12 18:41:35 +08:00
hiyouga
dff77004f2
support olmo
...
Former-commit-id: 2719510e8c6baa591c74458b773e4e47215e6052
2024-03-12 18:30:38 +08:00
hiyouga
6c1b4aec75
fix #2802
...
Former-commit-id: 1370db270d7ba1a20468abdb29193ce7534d1b4f
2024-03-12 17:08:34 +08:00
hiyouga
7814db1b42
fix #2803
...
Former-commit-id: d60498cba1ed124e8a678ce7775d55a018f99537
2024-03-12 16:57:39 +08:00
hiyouga
c9ed3fc3a4
fix #2782 #2798
...
Former-commit-id: eb3ab610610a0964bc8a1c9fa015805353f04c31
2024-03-12 15:53:29 +08:00
hoshi-hiyouga
9ee416a8fc
Merge pull request #2743 from S3Studio/DockerizeSupport
...
Add dockerize support
Former-commit-id: 30751a7b9218770cc2bc6cae857a28950bffbb6c
2024-03-12 00:05:49 +08:00
hiyouga
4f9a47c026
fix #2775
...
Former-commit-id: a5c7feb3e8089f4deff760b00a9f84425957c419
2024-03-11 00:42:54 +08:00
hiyouga
3fcb1c6d09
tiny fix
...
Former-commit-id: 1d22c87db2449c7d9915842b70fbd59ce9c2dd70
2024-03-11 00:17:18 +08:00
hiyouga
7c492864e9
update parser
...
Former-commit-id: d98258aa08d93494ad50d7786064e7fda15f6ca9
2024-03-10 13:35:20 +08:00
hiyouga
7ff8a064f3
support layerwise galore
...
Former-commit-id: d43a4da0947897d0be3f62fad3107754d4c89f2b
2024-03-10 00:24:11 +08:00
hiyouga
c635bbe465
fix #2732
...
Former-commit-id: bc39ad1d102b91d5417daa38b8a581e1e1ab2af9
2024-03-09 22:37:16 +08:00
hiyouga
4881f4e631
allow non-packing pretraining
...
Former-commit-id: 3fee5cc5a3db9ce874ad90f2500ec092d904bd4e
2024-03-09 22:21:46 +08:00
hiyouga
c631799f5d
fix #2766
...
Former-commit-id: a8cd556230c1d0bc4e090acc2276c035910ce6f6
2024-03-09 21:35:24 +08:00
hiyouga
48846676d8
use default arg for freeze tuning
...
Former-commit-id: a38fd7c8b39cb59fb61c26fdf80aaa6f2d0623b9
2024-03-09 06:08:48 +08:00
hiyouga
f37d481c5d
add GaLore results
...
Former-commit-id: ac05b9bba62924693bdede85917d21b844849b8c
2024-03-09 04:11:55 +08:00
hiyouga
5d7d8bd55c
update hardware requirements
...
Former-commit-id: 604b3d10fc1448f702943114b66b97bded21e080
2024-03-09 03:58:18 +08:00
hiyouga
8ed1463236
update examples
...
Former-commit-id: 38592faa258f7331afb95bc5db4b9bf37f08105d
2024-03-09 02:30:37 +08:00
hiyouga
43b2ede0f8
fix #2756 , patch #2746
...
Former-commit-id: 627d1c91e675f1d9ebf47bad123cbbf29821da4d
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
2f095e2017
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
Former-commit-id: 656c653f0c628f9494b4d7ae12e60c8eeec1ea7a
2024-03-09 01:37:00 +08:00
hiyouga
9b55bb964c
Update setup.py
...
Former-commit-id: 543740fa00dda2c5d16822f7c9f4ef32d916426f
2024-03-09 00:14:48 +08:00
hiyouga
9b97b23ce7
fix aqlm version
...
Former-commit-id: 05673f81f0295c76957f3247c62f95fda322a63e
2024-03-09 00:09:09 +08:00
hiyouga
53ab28533e
fix example params
...
Former-commit-id: 0280748528488d7bee6b9074025255453966124c
2024-03-08 20:41:43 +08:00
stephen_zhu
940c00e7ae
update
...
Former-commit-id: 295f9ef2eff2e8b5d7a21d3da8dd3e6eb2a42006
2024-03-08 12:47:44 +08:00
stephen
18cfd5f349
fix ppo runtime error
...
Former-commit-id: 14e2f221e3e720075e59065a3dc42aa4d993a8b6
2024-03-08 11:48:26 +08:00
S3Studio
6169df1c52
Add dockerize support
...
Already tested with the model of Qwen:1.8B and the dataset of alpaca_data_zh. Some python libraries are added to the Dockerfile as a result of the exception messages displayed throughout test procedure.
Former-commit-id: 897e083bc28ccb15c46909b9d13fc03a674fb254
2024-03-08 10:47:28 +08:00
hiyouga
d46c2bbcba
update readme
...
Former-commit-id: 353db1e28aa8888228a05813bb09c51e7d28728c
2024-03-08 03:06:21 +08:00
hiyouga
48d4364586
fix chat engine, update webui
...
Former-commit-id: 8b32dddd7d883bae07735796a517927c79d1c33b
2024-03-08 03:01:53 +08:00
hiyouga
8042c66a76
Update setup.py
...
Former-commit-id: 76c3ec05258a5f5d1f78430ef6258a5eda527d65
2024-03-08 01:23:00 +08:00