Commit Graph

41 Commits

Author SHA1 Message Date
Yaowei Zheng
7242caf0ff [assets] update readme (#8461) 2025-06-25 22:15:03 +08:00
hoshi-hiyouga
763fbc294b [misc] update data readme (#8128) 2025-05-21 22:41:18 +08:00
hoshi-hiyouga
b0c8ba73e0 [deps] update to transformers 4.52 (#8125) 2025-05-21 05:16:18 +08:00
hoshi-hiyouga
b83a38eb98 [data] qwen3 fixes (#8109) 2025-05-20 02:00:30 +08:00
hoshi-hiyouga
8d472c20cb [model] add seed coder and qwen3 quant models (#8039) 2025-05-13 15:59:55 +08:00
Eric Tang
6c53471de2 [data] support for specifying a dataset in cloud storage (#7567)
* add support for loading datasets from s3/gcs

* add comments to readme

* run linter and address comments

* add option to pass in kwargs to ray init (i.e. runtime env)

* address comment

* revert mixed up changes
2025-04-10 11:31:35 +08:00
hoshi-hiyouga
34fdabe005 [data] add coig-p dataset (#7657) 2025-04-09 21:18:25 +08:00
Zhangchi Feng
01915eaf40 [model] support audio (#6701)
* support qwen2_audio

* improve code

* lint

* fix

* fix

* fix

---------

Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
Former-commit-id: 24c7842948
2025-02-05 04:59:09 +08:00
hiyouga
dec6ff046b update data readme
Former-commit-id: 70e36ff2f4
2024-09-05 04:44:49 +08:00
hiyouga
c4d7d76358 update data readme
Former-commit-id: 6055fe02de
2024-09-05 04:25:27 +08:00
hiyouga
51a0016873 optimize predict vram
Former-commit-id: a244f143f4
2024-08-30 23:08:45 +08:00
hoshi-hiyouga
ddbd848e49 Update README.md
Former-commit-id: 9d64507bd5
2024-07-14 21:27:04 +08:00
codingma
74f0d02eb8 1. add custom eval dataset support
2. merge load dataset and split dataset function


Former-commit-id: 76f3bbcfc0
2024-07-05 15:52:10 +08:00
hoshi-hiyouga
eb7ee82f16 Update README.md
Former-commit-id: 3761d7d5dd
2024-05-30 00:04:26 +08:00
zhangzc
4b90f04c1f fix conflict
Former-commit-id: d956041640
2024-05-20 17:10:01 +08:00
hiyouga
c53e626c9a update data readme
Former-commit-id: ca48f90f1e
2024-05-18 21:37:38 +08:00
hiyouga
68c07d3e1e update data readme
Former-commit-id: 18cbf8561d
2024-05-18 21:15:20 +08:00
hiyouga
13d7b48efe improve KTO impl., replace datasets
Former-commit-id: c450ee87a3
2024-05-18 03:44:56 +08:00
hoshi-hiyouga
2186deceac Update README.md
Former-commit-id: b072ec9d1b
2024-05-02 02:13:46 +08:00
khazic
6f0b412265 added the second sharegpt format
Former-commit-id: d1ba32e4bb
2024-04-28 14:27:45 +08:00
hiyouga
d2df4c22ab support mllm hf inference
Former-commit-id: e057c8de48
2024-04-26 05:34:58 +08:00
hiyouga
2f878bde11 support ORPO
Former-commit-id: 17bf8a2c3a
2024-03-31 18:29:50 +08:00
zhangzc
05afeb304d Supports custom data set sampling quantity
Former-commit-id: 449e2aa38e
2024-03-27 14:22:50 +08:00
hiyouga
9cf5d89bd1 update data/readme
Former-commit-id: a754f6e9ec
2024-02-10 21:04:29 +08:00
hiyouga
db2051684b improve aligner
Former-commit-id: 7d2dc83c5e
2024-02-10 16:39:19 +08:00
Mark Mueller
4bd7b8375e Slim Orca data parsing
Former-commit-id: 1d3598afa1
2024-02-08 19:32:20 +01:00
hiyouga
7beeae2209 fix autoset attn impl, update data readme
Former-commit-id: 521ad76552
2024-01-31 11:58:07 +08:00
hiyouga
48cab43cb5 add array param format
Former-commit-id: 486cc8d360
2024-01-21 22:17:48 +08:00
hiyouga
1af13cb737 add models
Former-commit-id: 709ac8870a
2023-12-18 19:09:31 +08:00
hiyouga
1a0bdd305c support system column #1765
Former-commit-id: 0a9c6e0146
2023-12-12 19:45:59 +08:00
hiyouga
b641e9e97e fix #1784
Former-commit-id: 28d5de7e78
2023-12-09 20:53:18 +08:00
hiyouga
b2bf10661b update data readme
Former-commit-id: 2b5e33c338
2023-11-03 00:15:23 +08:00
hiyouga
a9db89a025 update data readme (zh)
Former-commit-id: cc8ffa10d8
2023-11-02 23:42:49 +08:00
hiyouga
a1b0655457 support sharegpt format, add datasets
Former-commit-id: a837172413
2023-11-02 23:10:04 +08:00
hiyouga
a4fd976048 refactor dataset_attr, add eos in pt, fix #757
Former-commit-id: a9d1fb72f7
2023-09-01 19:00:45 +08:00
codemayq
b032dc4c4e add readme for dataset
Former-commit-id: cece66d48a
2023-08-23 19:55:45 +08:00
hiyouga
802494e20a update template
Former-commit-id: 4318347d3f
2023-08-22 19:46:09 +08:00
Peter Pan
23443e9696 add rm dataset explanation
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>

Former-commit-id: b0ca8fe634
2023-08-22 01:33:59 -04:00
hiyouga
261ca840d0 update readme, fix web ui postprocess
Former-commit-id: 035c966d5c
2023-07-22 14:29:22 +08:00
hiyouga
fa47c99fa9 add datasets
Former-commit-id: 7159bc54ed
2023-07-19 20:59:15 +08:00
hiyouga
54b8ce7b63 Initial commit
Former-commit-id: 769c6ab56b
2023-05-28 18:09:04 +08:00