Yaowei Zheng
043103e1c9
[webui] support other hub ( #8567 )
2025-07-07 22:18:48 +08:00
hoshi-hiyouga
e542f95710
[data] fix shared file system ( #8179 )
2025-05-27 18:36:03 +08:00
hoshi-hiyouga
072bfe29d3
[data] add eval_on_each_dataset arg ( #7912 )
2025-04-30 06:56:43 +08:00
Eric Tang
6c53471de2
[data] support for specifying a dataset in cloud storage ( #7567 )
...
* add support for loading datasets from s3/gcs
* add comments to readme
* run linter and address comments
* add option to pass in kwargs to ray init (i.e. runtime env)
* address comment
* revert mixed up changes
2025-04-10 11:31:35 +08:00
hoshi-hiyouga
aaf2e6ba2a
[model] fix kv cache ( #7564 )
2025-04-01 23:07:46 +08:00
Billy Cao
5d1cc863a4
[data] shard the dataset to allow multiprocessing when streaming is enabled ( #7530 )
...
* Shard the dataset when streaming to allow multiprocessing
* Allow user to not set dataset_shards to ensure backward compatibility
2025-04-01 15:36:23 +08:00
hoshi-hiyouga
9ccfb97a2c
[misc] update format ( #7277 )
2025-03-13 02:53:08 +08:00
hoshi-hiyouga
7c1640ed5f
[misc] upgrade format to py39 ( #7256 )
2025-03-12 00:08:41 +08:00
hiyouga
37b844d929
remove exit in preprocess
...
Former-commit-id: 1a800f9993d28d80d4587a08c20f5a69722436b5
2025-03-11 15:08:25 +08:00
hoshi-hiyouga
df63f05b47
[data] fix loader ( #7207 )
...
* fix dataloader
* add test case
* fix type
* fix ci
* fix ci
* fix ci
* disable overwrite cache in ci
Former-commit-id: 8c3f9f6747110107cbbb3695637482e45084dbc1
2025-03-07 17:20:46 +08:00
hoshi-hiyouga
a8c9d5663d
[data] fix predict dataset ( #6972 )
...
Former-commit-id: bdb581c4a82d02458766e73c87b7a92ea31796ec
2025-02-17 20:29:40 +08:00
SrWYG
0ad9f7f058
[data] evaluate on each dataset ( #5522 )
...
* [Update] loader.py , evaluate will run separate evaluations on each dataset.
`If you pass a dictionary with names of datasets as keys and datasets as values, evaluate will run separate evaluations on each dataset. This can be useful to monitor how training affects other datasets or simply to get a more fine-grained evaluation`
seq2seqtrainner support eval_dataset as Dict.
* fix format
* fix
* fix
---------
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
Former-commit-id: 1e35967ae159038a66f3203dd0e6ec51eea9208f
2025-02-13 02:19:03 +08:00
hoshi-hiyouga
1679930e00
[breaking change] refactor data pipeline ( #6901 )
...
* refactor data
* rename file
Former-commit-id: 617c8ab467d32be5f7d5c94fa89c0e3d7d1963bc
2025-02-13 00:39:20 +08:00
hoshi-hiyouga
1fee69f874
[misc] update license year & fix llama pro ( #6814 )
...
* fix llamapro script
* change year
Former-commit-id: e2dc5b952aa22835d5220ba624f44676138b65ac
2025-02-05 01:53:33 +08:00
hiyouga
da542fad18
imporve log
...
Former-commit-id: 47e17dd689840ca9b3c5f34448e5f80265336cca
2025-01-08 09:56:10 +00:00
Yaser Afshar
fe4546a7bb
Add trust_remote_code parameter and remove True
...
- Introduced a new model parameter `trust_remote_code`
- Set the default value of `trust_remote_code` to `False`
to enhance security
Former-commit-id: 09437763267bc7081159a6878cee9652a2b1ddac
2024-12-17 12:25:12 +00:00
hoshi-hiyouga
92940817e7
lint
...
Former-commit-id: 6a5074e46695378b76d58aac8ad7768b6b034b9c
2024-12-04 22:08:27 +08:00
wangdepeng
ae09c6c214
fix:tokenized_path not None and load_from_disk return Dataset Trigger stuck
...
Former-commit-id: 4424d4de8aca0e4d3b92672584978f3cc3fc33da
2024-11-27 16:44:42 +08:00
hiyouga
358708ee97
fix #6149
...
Former-commit-id: 362d579ce83e63007e6f89f264d06d2698671cc6
2024-11-26 16:03:02 +00:00
hiyouga
e83cb17f97
support rank0 logger
...
Former-commit-id: c38aa29336f286266553da4909a7267d7ef21f37
2024-11-02 18:31:04 +08:00
hiyouga
1b02915d19
tiny fix
...
Former-commit-id: 0c22da4f1cc710b471f6d511d50ce878521173ca
2024-10-30 08:56:29 +00:00
hiyouga
0d8aa6e6ef
use pre-commit
...
Former-commit-id: 21db8ed2f4a0eba203754a92ce0741538e8ee709
2024-10-29 09:07:46 +00:00
hiyouga
e90a1199da
tiny fix
...
Former-commit-id: 3af57795dda5d236200bad4aa3f2e29ae8930fe2
2024-10-11 23:51:54 +08:00
huniu20
e8e98bb125
add om_hub_token argument
...
Former-commit-id: 7b91be33c9cd8473453716f0c4c6dec924304efc
2024-10-10 17:16:46 +08:00
huniu20
132c1f1b0f
1. add model and dataset info to support webui
...
Former-commit-id: 0f669f221a31622ec7a53d0baab5da6a7891f9b6
2024-10-10 16:46:34 +08:00
huniu20
26e897e861
1. add modelers hub support
...
Former-commit-id: 24ebe187e360753666b768685a0dcc78054bb702
2024-10-09 17:21:37 +08:00
hiyouga
7ccb86b215
add docstrings, refactor logger
...
Former-commit-id: 54c69059379d77dc9046c144cbe2d0253de3a4da
2024-09-08 00:56:56 +08:00
hiyouga
d5ea05cfff
update get template
...
Former-commit-id: dabad5570bf4a6b1044c963d8f27717030f373ef
2024-09-04 22:36:20 +08:00
hoshi-hiyouga
1dfd1aaf82
Merge pull request #5323 from naem1023/feat/add-dataset-map-batch-size-argument
...
Add batch size of map function in the preprocessed dataset
Former-commit-id: 8f441c2b3a5bb84dec2c037a541084c0201726c6
2024-09-04 22:09:36 +08:00
hoshi-hiyouga
8ac74c8ccb
fix #5228
...
Former-commit-id: 44d6947e554cd61cff23c297248fff32a5f554da
2024-09-04 19:10:30 +08:00
hiyouga
22deca0e9e
lazy image load
...
Former-commit-id: 47ea97fb1ba77de2e8a561904aa8fdc27c3f5025
2024-09-04 02:27:08 +08:00
naem1023
46695e42cc
feat: add batch size of map function in the preprocessed dataset
...
Former-commit-id: 209313eeeab8d1a7c320bd9aa90a5f4656082b7c
2024-09-02 13:52:47 +09:00
hiyouga
b5146facff
follow #5115
...
Former-commit-id: c87023d539875cd8e622d40212a5627c9c182fb8
2024-08-09 18:03:00 +08:00
“Wzw”
13e5fff97a
mask_history args verify valid
...
Former-commit-id: 2fa1e0b2add60142c178e5e21ebaad7132fa5b00
2024-08-08 10:12:01 +08:00
hoshi-hiyouga
2e9c9471da
Update loader.py
...
Former-commit-id: a5b809516e7de1d6d5f4583089fee3028d0db01d
2024-07-15 00:50:06 +08:00
codingma
74f0d02eb8
1. add custom eval dataset support
...
2. merge load dataset and split dataset function
Former-commit-id: 76f3bbcfc0e11aa41f8f5cbebc60b77b987f7901
2024-07-05 15:52:10 +08:00
hoshi-hiyouga
673f27a59e
Update loader.py
...
Former-commit-id: dddfd516ee66e9937e21f05300832aab45034b12
2024-06-24 23:06:18 +08:00
hiyouga
2946153cea
add license
...
Former-commit-id: d87108daa68bd40174b262be1ca65fe6e1b7ab56
2024-06-15 17:54:33 +08:00
hiyouga
8fccaf20c5
fix #4221
...
Former-commit-id: 6baafd4eb3147ad9f7d2952b8eb27c5486940f36
2024-06-13 02:48:21 +08:00
hiyouga
8da149ba40
rename files
...
Former-commit-id: 74f96efef9bcd63f65d0190c901ff9be54ccd350
2024-06-07 00:09:06 +08:00
hiyouga
e0aadd4b34
fix ppo dataset bug #4012
...
Former-commit-id: 149610c636bbb974e546d13fa302884ea65a6d38
2024-06-06 19:03:20 +08:00
hiyouga
0eff6a66d5
tiny fix
...
Former-commit-id: 5a13b3baa63225e7f79e024610722de0f87e0acc
2024-06-04 00:31:10 +08:00
hiyouga
8ecf606230
fix #3992
...
Former-commit-id: a18acf2abe28e37233bf8c8ed2600618ea3b62e9
2024-06-04 00:17:36 +08:00
hiyouga
64d24842fe
fix data loader hint
...
Former-commit-id: 49b1e88e3da3be0fb78f53e5f924a9be67568a02
2024-06-03 18:28:27 +08:00
hoshi-hiyouga
7b83c550ab
Update loader.py
...
Former-commit-id: ca5dd7c6c115a359e4b50e93f4ffcc9f2955ec2f
2024-05-30 00:20:20 +08:00
hoshi-hiyouga
9fc713da89
Update loader.py
...
Former-commit-id: f9a88b89ca8b8f9a0c5def03b154f9d67f558edf
2024-05-30 00:17:21 +08:00
hoshi-hiyouga
c0f11a280e
Update loader.py
...
Former-commit-id: b55fb611c57be03fb38218c7da1d96f6848496ba
2024-05-30 00:12:12 +08:00
seanzhang-zhichen
9c8d79fbe3
Merge branch 'main' into add_dataset_sample_num
...
Former-commit-id: 27cb51f7f86f97ae231abfdcb0114ff245d7af9c
2024-05-24 15:57:47 +08:00
hiyouga
3e729798df
refactor data preprocessing, fix mllm rlhf
...
Former-commit-id: 3a023bca2a502810a436cfba7708df164754ea62
2024-05-24 04:08:25 +08:00
zhangzc
4b90f04c1f
fix conflict
...
Former-commit-id: d956041640d9abc5e59919a227d27270fb513a7e
2024-05-20 17:10:01 +08:00