Yaowei Zheng
a0d44c650a
[misc] add data files ( #9224 )
2025-10-02 14:02:07 +08:00
Yaowei Zheng
abc6ce6168
[assets] update readme ( #8461 )
2025-06-25 22:15:03 +08:00
hoshi-hiyouga
d2a3036a23
[misc] update data readme ( #8128 )
2025-05-21 22:41:18 +08:00
hoshi-hiyouga
9ae17cd173
[deps] update to transformers 4.52 ( #8125 )
2025-05-21 05:16:18 +08:00
hoshi-hiyouga
9b5baa97f0
[data] qwen3 fixes ( #8109 )
2025-05-20 02:00:30 +08:00
hoshi-hiyouga
dc080399c6
[model] add seed coder and qwen3 quant models ( #8039 )
2025-05-13 15:59:55 +08:00
Eric Tang
a8caf09c7f
[data] support for specifying a dataset in cloud storage ( #7567 )
...
* add support for loading datasets from s3/gcs
* add comments to readme
* run linter and address comments
* add option to pass in kwargs to ray init (i.e. runtime env)
* address comment
* revert mixed up changes
2025-04-10 11:31:35 +08:00
hoshi-hiyouga
4eec541857
[data] add coig-p dataset ( #7657 )
2025-04-09 21:18:25 +08:00
Kingsley
349c56c51c
[data] Fix bugs of use_audio_in_video
in Qwen2.5 Omni ( #7638 )
...
* cache _mm_inputs
* nit
* support for use_audio_in_video
* remove cache
* fix data
* Update mllm_video_audio_demo.json
2025-04-08 18:40:10 +08:00
hoshi-hiyouga
650a9a9057
[misc] update format ( #7277 )
2025-03-13 02:53:08 +08:00
hoshi-hiyouga
264538cb26
[misc] upgrade format to py39 ( #7256 )
2025-03-12 00:08:41 +08:00
Zhangchi Feng
8f401e37f8
[model] support audio ( #6701 )
...
* support qwen2_audio
* improve code
* lint
* fix
* fix
* fix
---------
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
Former-commit-id: 5eacb5629e4d7733cd992a63747a1335f2c6a929
2025-02-05 04:59:09 +08:00
hiyouga
248d5daaff
use pre-commit
...
Former-commit-id: 7cfede95df22a9ff236788f04159b6b16b8d04bb
2024-10-29 09:07:46 +00:00
hiyouga
7f71276ad8
add docstrings, refactor logger
...
Former-commit-id: c34e489d71f8f539028543ccf8ee92cecedd6276
2024-09-08 00:56:56 +08:00
hiyouga
abd26f5f67
update data readme
...
Former-commit-id: 0af5f054b7b8da8b39eb44b1dfa76050f0c45667
2024-09-05 04:44:49 +08:00
hiyouga
4d35ace75e
update data readme
...
Former-commit-id: 81adb153b7d0b30e6cd50c9bf4ca1ccf17458611
2024-09-05 04:25:27 +08:00
hiyouga
1874d579c5
video datasets
...
Former-commit-id: 33f28ce82d9e44d2615909250dc56d6a4a03cd99
2024-09-05 02:04:17 +08:00
hiyouga
d789b667d7
optimize predict vram
...
Former-commit-id: a577e44eee351b3ed8011a33ae01cd713354ff97
2024-08-30 23:08:45 +08:00
hiyouga
e4d11a117b
fix up
...
Former-commit-id: 43a56cb331fae899ca35b0c312730d4ab79d0c42
2024-07-15 01:04:56 +08:00
hoshi-hiyouga
93a6925ec5
Update README.md
...
Former-commit-id: d9aa6a9437994ac29f3e7a0789ec286f091847d6
2024-07-14 21:27:04 +08:00
codingma
5f2bd04799
1. add custom eval dataset support
...
2. merge load dataset and split dataset function
Former-commit-id: 963d97ba07e7efa3a4544c4d077283d9e112b3ad
2024-07-05 15:52:10 +08:00
hoshi-hiyouga
91cc571e6e
Update README_zh.md
...
Former-commit-id: 3007d260ed45169583a74497a53b661337dd5f71
2024-05-30 00:04:47 +08:00
hoshi-hiyouga
890926e60c
Update README.md
...
Former-commit-id: 65fb69e388c0a04c15ecd11441e567966f51fae5
2024-05-30 00:04:26 +08:00
seanzhang-zhichen
a3b52fd380
Merge branch 'main' into add_dataset_sample_num
...
Former-commit-id: 26300127c45f24e63b91f1b0cc73e46c3a936a91
2024-05-24 15:57:47 +08:00
hiyouga
09e78272c2
Update README_zh.md
...
Former-commit-id: 34c4ba6bf9bb89170446fb396aa06ae44d251de0
2024-05-21 18:30:59 +08:00
hiyouga
247cda4b68
fix #3818
...
Former-commit-id: 3f366e05a34be224f53c5bf8334e57ae5d316004
2024-05-20 21:43:19 +08:00
zhangzc
de9f1583c2
fix conflict
...
Former-commit-id: 6922b23a748c2459147bf44b96d86daa89f2c96c
2024-05-20 17:10:01 +08:00
hiyouga
57dde7c3bc
update data readme
...
Former-commit-id: 22c7335b496e4a673383d5a1e4e60bf2cb4e35b3
2024-05-18 21:37:38 +08:00
hiyouga
6b9003f781
update data readme
...
Former-commit-id: beb864a9367943d3274cb6057423d1eb9aaf85c4
2024-05-18 21:15:20 +08:00
hiyouga
2bff90719b
improve KTO impl., replace datasets
...
Former-commit-id: e56a57ddcf061de6e4acc8679f7dbf0b68364986
2024-05-18 03:44:56 +08:00
hiyouga
3f7f1daa33
remove big file
...
Former-commit-id: 8a05242787f810ec25d1b33358257d2867c45497
2024-05-07 22:14:06 +08:00
hoshi-hiyouga
eb99999ca8
Update README_zh.md
...
Former-commit-id: 1c673d89faca3160627009fcd0a4aa39138570c0
2024-05-02 02:14:55 +08:00
hoshi-hiyouga
ea58cf111e
Update README.md
...
Former-commit-id: 4fb43b0c9aa48242126252ad755a2a1683b38d6a
2024-05-02 02:13:46 +08:00
Lao
57fcdca336
Update README_zh.md
...
Former-commit-id: bacc8588dc7b0b43c240189ecf4336bedc299357
2024-04-28 23:31:37 +08:00
khazic
3d88589c0f
Upgrade the second sharegpt format
...
Former-commit-id: 057f992a666b029d207a3dc7dfc353f9abcf8316
2024-04-28 14:30:05 +08:00
khazic
dfd153cc81
added the second sharegpt format
...
Former-commit-id: 6d140ac98a78ecc0a713842bb917dc8eb14450cb
2024-04-28 14:27:45 +08:00
hiyouga
23b881bff1
support mllm hf inference
...
Former-commit-id: 2c7c01282acd7ddabbb17ce3246b8dae4bc4b8cf
2024-04-26 05:34:58 +08:00
hiyouga
0cb596fee1
add dpo mix dataset
...
Former-commit-id: 6def3f8bfa51b2d9d73af112352ce07db972e4c9
2024-04-20 01:31:38 +08:00
hiyouga
106a0104da
fix #3247
...
Former-commit-id: bb67c66f80627805b585d157ba807c0ce378d3f2
2024-04-12 17:41:33 +08:00
hiyouga
d764cd8736
support ORPO
...
Former-commit-id: f44a4c27e2461cdaa1b16865f597a31033c0e6d9
2024-03-31 18:29:50 +08:00
zhangzc
7cdc16abdf
Supports custom data set sampling quantity
...
Former-commit-id: fa8325401df27595de4611a89dfcc14644956abd
2024-03-27 14:22:50 +08:00
hiyouga
5ed234ca63
add orca_dpo_pairs dataset
...
Former-commit-id: af683aacbae462a2a37d76d37df583e217664bd5
2024-03-20 20:09:06 +08:00
SirlyDreamer
6fc2d7e063
Follow HF_ENDPOINT environment variable
...
Former-commit-id: 22b36a3cfd2909cb624b1bb7385558eda504defe
2024-03-20 08:31:30 +00:00
hiyouga
7c492864e9
update parser
...
Former-commit-id: d98258aa08d93494ad50d7786064e7fda15f6ca9
2024-03-10 13:35:20 +08:00
hiyouga
62b6a7971a
update data/readme
...
Former-commit-id: aa566e3cea5bc75688b4399a9da07be0b35b921c
2024-02-10 21:04:29 +08:00
hiyouga
1955a8ea5a
improve aligner
...
Former-commit-id: cc7296b92e10c24967fc753393275b71d300683f
2024-02-10 16:39:19 +08:00
Mark Mueller
1ce82f391a
Slim Orca data parsing
...
Former-commit-id: f2d8efede7e20edafed0d5446eb64f2d419949b1
2024-02-08 19:32:20 +01:00
hiyouga
5b8712d061
fix autoset attn impl, update data readme
...
Former-commit-id: 34a6e5f82baf45cc8dbb11f9f7ab4a480ab7ec5c
2024-01-31 11:58:07 +08:00
hiyouga
75be329994
fix #2282 and update tool prompt
...
Former-commit-id: 1c412f803866bde32b76f7c26c7b464b6b3651f3
2024-01-22 22:27:30 +08:00
hiyouga
fe4d93c6db
add array param format
...
Former-commit-id: bf910f8a5b21ee552fa9ab069610a3f5f611de57
2024-01-21 22:17:48 +08:00