2777 Commits

Author SHA1 Message Date
hoshi-hiyouga
d88a34bc79 Merge pull request #2919 from 0xez/main
Update README.md, fix the release date of the paper

Former-commit-id: e7157cee78688fdd572a873b1e46accc1a32717e
2024-03-22 12:12:24 +08:00
0xez
60cbc9d0e5 Update README_zh.md, fix the release date of the paper
Former-commit-id: 6ea16156b6456216cefab59265dae1edc9dc938f
2024-03-22 10:41:17 +08:00
0xez
d5005e766f Update README.md, fix the release date of the paper
Former-commit-id: 4bf9ef3095376f0208f783f180c13bef88581824
2024-03-21 22:14:48 +08:00
hiyouga
4d0753cffe move file
Former-commit-id: f9017af7fe1dfbe5b799904ca1d900b3051fb719
2024-03-21 17:05:17 +08:00
hiyouga
1cf0f11840 add citation
Former-commit-id: 54199205f2000c0500d29822387646133e06e8b2
2024-03-21 17:04:10 +08:00
hiyouga
052e8b2cc6 paper release
Former-commit-id: 7bd384655244ce6a8c1f34aa6fed54122d0e9da5
2024-03-21 13:49:17 +08:00
hiyouga
8963e89633 update readme
Former-commit-id: ab98d4d617b7193c474f58a29ca9475fea7564aa
2024-03-21 00:48:42 +08:00
hiyouga
935ee0a023 support fsdp + qlora
Former-commit-id: b894bf8e84be689db258021f0638e9ac939abcbc
2024-03-21 00:36:06 +08:00
hiyouga
5ed234ca63 add orca_dpo_pairs dataset
Former-commit-id: af683aacbae462a2a37d76d37df583e217664bd5
2024-03-20 20:09:06 +08:00
hoshi-hiyouga
04884a0911 Merge pull request #2905 from SirlyDreamer/main
Follow HF_ENDPOINT environment variable

Former-commit-id: fa801ff118433b622f6aa47920c5c93ec9b68414
2024-03-20 18:09:54 +08:00
hiyouga
c7af26a9e3 fix #2777 #2895
Former-commit-id: 54d5f62d29456a8d9d0c0dd3d0bbfffe48935803
2024-03-20 17:59:45 +08:00
hiyouga
d8073488be fix #2346
Former-commit-id: c8888c499b0ac51e2fc86c16e8e91c79400a5993
2024-03-20 17:56:33 +08:00
SirlyDreamer
6fc2d7e063 Follow HF_ENDPOINT environment variable
Former-commit-id: 22b36a3cfd2909cb624b1bb7385558eda504defe
2024-03-20 08:31:30 +00:00
khazic
e93c7cdb80 Updated README with new information
Former-commit-id: b12f12039ce221decf09a25ec9d64e385d9497c7
2024-03-20 14:38:08 +08:00
khazic
c32d6c8250 Updated README with new information
Former-commit-id: 90a81c2e52bd44beb3b7feb5d2517b073f7f6ef9
2024-03-20 14:21:16 +08:00
刘一博
757158da63 Updated README with new information
Former-commit-id: fddbc29ca1bd9b13372087e6a349f21240abc013
2024-03-20 14:11:28 +08:00
hiyouga
ffdacaa618 fix packages
Former-commit-id: 2f9f334a123d43267bfb3dd26aaa1ad285ffe7a5
2024-03-17 22:32:03 +08:00
hiyouga
e194efab10 fix patcher
Former-commit-id: 6a5ad99c8cbf6b7def0a130306d49e7d1eb4e5a5
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
772fc2eac7 Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support

Former-commit-id: b63cba317266f5ba217de54fda77ec26a4df344d
2024-03-15 19:16:02 +08:00
hiyouga
ed020579dc fix export
Former-commit-id: 4e996f194406d7eb27b2bde290a12c82c41219d0
2024-03-15 15:06:30 +08:00
S3Studio
096869c7b6 Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.


Former-commit-id: cd2f5717d676e1a5afd2f4e7a38402d2e55e7479
2024-03-15 08:59:13 +08:00
S3Studio
c6873211e9 improve Docker build and runtime parameters
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.


Former-commit-id: 97f9901c2f5c29a6ab517a1f8fa028b8e89edf4e
2024-03-15 08:57:46 +08:00
hiyouga
623ee1bd88 tiny fix
Former-commit-id: bf8123669be334338b4268d0a8f7703ff2cf6255
2024-03-14 21:19:06 +08:00
hiyouga
aabe90343e fix export
Former-commit-id: c9b968b84c97c9a00fbb43194c3adc9354d74f3b
2024-03-14 18:17:01 +08:00
hiyouga
764cfb506d fix bug
Former-commit-id: 38c618b797ec219c2c45de960c9cbe50ec524c94
2024-03-13 23:55:31 +08:00
hiyouga
249ad56075 fix bug
Former-commit-id: 47ee0276830adbed65bc111d5a83049e77ad360a
2024-03-13 23:43:42 +08:00
hiyouga
46f99ff277 improve lora+ impl.
Former-commit-id: 332bad25455a70ad9204e7dd384bb086d789aa39
2024-03-13 23:32:51 +08:00
hoshi-hiyouga
73f4513c84 Merge pull request #2830 from qibaoyuan/lora_plus
[FEATURE]: ADD LORA+ ALGORITHM

Former-commit-id: 456f2aed5811b9f296acd371a1f706daeb37e12a
2024-03-13 20:15:46 +08:00
齐保元
3c91e86268 [FEATURE]: ADD LORA+ ALGORITHM
Former-commit-id: c35b3c3b1e27171f8a703f88ede1dc8a84c80a56
2024-03-13 19:43:27 +08:00
hiyouga
42473ec150 fix #2817
Former-commit-id: f1c8b8127b3c1ac095176015af5ec92d37a11efe
2024-03-13 12:42:03 +08:00
hiyouga
6a4e4b9c5b fix #2802
Former-commit-id: f4c56ccd785790c02f0d1275cd75958677a18690
2024-03-13 12:33:45 +08:00
hiyouga
9a784fb4f3 fix kv cache
Former-commit-id: a9588e36e95bed896eea8d79ba7108447ff08f4b
2024-03-13 01:21:50 +08:00
hiyouga
43fd80a1aa support QDoRA
Former-commit-id: d8ad1c5ef08e733e52084de271aad762b1613129
2024-03-12 22:12:42 +08:00
hiyouga
e6ab1a57ea patch for gemma cpt
Former-commit-id: fc0b19c62f52a90d78b63761dda3d8970a42f2da
2024-03-12 21:21:54 +08:00
hiyouga
282edb9161 fix plot issues
Former-commit-id: 01ae196b4916433da9aeec9c0b5c660c6b34464c
2024-03-12 18:41:35 +08:00
hiyouga
dff77004f2 support olmo
Former-commit-id: 2719510e8c6baa591c74458b773e4e47215e6052
2024-03-12 18:30:38 +08:00
hiyouga
6c1b4aec75 fix #2802
Former-commit-id: 1370db270d7ba1a20468abdb29193ce7534d1b4f
2024-03-12 17:08:34 +08:00
hiyouga
7814db1b42 fix #2803
Former-commit-id: d60498cba1ed124e8a678ce7775d55a018f99537
2024-03-12 16:57:39 +08:00
hiyouga
c9ed3fc3a4 fix #2782 #2798
Former-commit-id: eb3ab610610a0964bc8a1c9fa015805353f04c31
2024-03-12 15:53:29 +08:00
hoshi-hiyouga
9ee416a8fc Merge pull request #2743 from S3Studio/DockerizeSupport
Add dockerize support

Former-commit-id: 30751a7b9218770cc2bc6cae857a28950bffbb6c
2024-03-12 00:05:49 +08:00
hiyouga
4f9a47c026 fix #2775
Former-commit-id: a5c7feb3e8089f4deff760b00a9f84425957c419
2024-03-11 00:42:54 +08:00
hiyouga
3fcb1c6d09 tiny fix
Former-commit-id: 1d22c87db2449c7d9915842b70fbd59ce9c2dd70
2024-03-11 00:17:18 +08:00
hiyouga
7c492864e9 update parser
Former-commit-id: d98258aa08d93494ad50d7786064e7fda15f6ca9
2024-03-10 13:35:20 +08:00
hiyouga
7ff8a064f3 support layerwise galore
Former-commit-id: d43a4da0947897d0be3f62fad3107754d4c89f2b
2024-03-10 00:24:11 +08:00
hiyouga
c635bbe465 fix #2732
Former-commit-id: bc39ad1d102b91d5417daa38b8a581e1e1ab2af9
2024-03-09 22:37:16 +08:00
hiyouga
4881f4e631 allow non-packing pretraining
Former-commit-id: 3fee5cc5a3db9ce874ad90f2500ec092d904bd4e
2024-03-09 22:21:46 +08:00
hiyouga
c631799f5d fix #2766
Former-commit-id: a8cd556230c1d0bc4e090acc2276c035910ce6f6
2024-03-09 21:35:24 +08:00
hiyouga
48846676d8 use default arg for freeze tuning
Former-commit-id: a38fd7c8b39cb59fb61c26fdf80aaa6f2d0623b9
2024-03-09 06:08:48 +08:00
hiyouga
f37d481c5d add GaLore results
Former-commit-id: ac05b9bba62924693bdede85917d21b844849b8c
2024-03-09 04:11:55 +08:00
hiyouga
5d7d8bd55c update hardware requirements
Former-commit-id: 604b3d10fc1448f702943114b66b97bded21e080
2024-03-09 03:58:18 +08:00