Yaowei Zheng
ae0ef374a3
[assets] update readme ( #8784 )
2025-07-30 17:57:17 +08:00
Yaowei Zheng
7242caf0ff
[assets] update readme ( #8461 )
2025-06-25 22:15:03 +08:00
Muqi Li
00c4988f89
[assets] fix incorrect user_tag in dataset_info.json to prevent skipped entries ( #8197 )
2025-05-28 18:01:37 +08:00
hoshi-hiyouga
763fbc294b
[misc] update data readme ( #8128 )
2025-05-21 22:41:18 +08:00
hoshi-hiyouga
b0c8ba73e0
[deps] update to transformers 4.52 ( #8125 )
2025-05-21 05:16:18 +08:00
hoshi-hiyouga
b83a38eb98
[data] qwen3 fixes ( #8109 )
2025-05-20 02:00:30 +08:00
hoshi-hiyouga
8d472c20cb
[model] add seed coder and qwen3 quant models ( #8039 )
2025-05-13 15:59:55 +08:00
Eric Tang
6c53471de2
[data] support for specifying a dataset in cloud storage ( #7567 )
...
* add support for loading datasets from s3/gcs
* add comments to readme
* run linter and address comments
* add option to pass in kwargs to ray init (i.e. runtime env)
* address comment
* revert mixed up changes
2025-04-10 11:31:35 +08:00
hoshi-hiyouga
34fdabe005
[data] add coig-p dataset ( #7657 )
2025-04-09 21:18:25 +08:00
hoshi-hiyouga
39876b85fc
[assets] update readme ( #7644 )
2025-04-09 01:06:06 +08:00
Kingsley
7d8bee96fc
[data] Fix bugs of use_audio_in_video
in Qwen2.5 Omni ( #7638 )
...
* cache _mm_inputs
* nit
* support for use_audio_in_video
* remove cache
* fix data
* Update mllm_video_audio_demo.json
2025-04-08 18:40:10 +08:00
Victor Nogueira
3dff4ecca8
[dataset] fix ultrachat_200k dataset ( #7259 )
...
The `HuggingFaceH4/ultrachat_200k` dataset doesn't contain the default "train" split. The correct split is "train_sft".
2025-03-13 20:20:18 +08:00
hoshi-hiyouga
9ccfb97a2c
[misc] update format ( #7277 )
2025-03-13 02:53:08 +08:00
hoshi-hiyouga
7c1640ed5f
[misc] upgrade format to py39 ( #7256 )
2025-03-12 00:08:41 +08:00
hoshi-hiyouga
d412301d08
[data] update mm demo data ( #7211 )
...
Former-commit-id: 1774882f5a73760e104e08dfa76fe592b1d876a1
2025-03-07 20:07:15 +08:00
hoshi-hiyouga
beb1a9f9d9
[data] add r1 distill dataset ( #6983 )
...
Former-commit-id: 2591a3fa8b37fed8011fb66b266ef15e18404756
2025-02-18 17:25:09 +08:00
hoshi-hiyouga
fcd0f0480d
[dataset] add openthought ( #6866 )
...
Former-commit-id: 1356f9d8400efaccf677d0b36aaf32a146a09833
2025-02-09 00:53:01 +08:00
Zhangchi Feng
01915eaf40
[model] support audio ( #6701 )
...
* support qwen2_audio
* improve code
* lint
* fix
* fix
* fix
---------
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
Former-commit-id: 24c78429489809873a1269a735ea5421340b32a2
2025-02-05 04:59:09 +08:00
hiyouga
9822cb7bac
fix dataset
...
Former-commit-id: 046b6fb118e3ea75062c6a759720a1759639e93c
2024-11-27 06:27:44 +00:00
hiyouga
ab3782b0fa
add marco-o1 and openo1 dataset
...
Former-commit-id: 17afb7d4103499a9a090a6624896cfa123e9e1d6
2024-11-27 04:20:23 +00:00
hoshi-hiyouga
4f1d5b6396
update dataset
...
Former-commit-id: 5214d3ea06ac73f1179ca9574d7c7030c92b5ee1
2024-11-25 21:47:04 +08:00
hiyouga
0d8aa6e6ef
use pre-commit
...
Former-commit-id: 21db8ed2f4a0eba203754a92ce0741538e8ee709
2024-10-29 09:07:46 +00:00
huniu20
132c1f1b0f
1. add model and dataset info to support webui
...
Former-commit-id: 0f669f221a31622ec7a53d0baab5da6a7891f9b6
2024-10-10 16:46:34 +08:00
hiyouga
7ccb86b215
add docstrings, refactor logger
...
Former-commit-id: 54c69059379d77dc9046c144cbe2d0253de3a4da
2024-09-08 00:56:56 +08:00
hiyouga
dec6ff046b
update data readme
...
Former-commit-id: 70e36ff2f4b500d987160f3a57d5fb3d4d2007d5
2024-09-05 04:44:49 +08:00
hiyouga
c4d7d76358
update data readme
...
Former-commit-id: 6055fe02deb3585b4330a7902bf8821dd41ea5cb
2024-09-05 04:25:27 +08:00
hiyouga
9df7a26e6b
video datasets
...
Former-commit-id: 8cafc7b055a854f483ad1c67f3d487ffd34b5f89
2024-09-05 02:04:17 +08:00
hiyouga
af8c4b4e20
add vl_feedback dataset
...
Former-commit-id: 57497135bf0a956af9c6893177ee97504b9f34ac
2024-09-04 03:13:03 +08:00
hiyouga
549adc888b
add pokemon dataset
...
Former-commit-id: 194064fdae0226dd22522586c9d47c5866a71a8e
2024-09-02 01:02:25 +08:00
hiyouga
bfdcc6bacf
add rlhf-v dataset
...
Former-commit-id: 8e49940746c1a6ff910f07dbefbec14af9d0f3c6
2024-09-01 22:57:41 +08:00
hiyouga
51a0016873
optimize predict vram
...
Former-commit-id: a244f143f48a01910ce1cd56c0855ef11d62a72a
2024-08-30 23:08:45 +08:00
hiyouga
a83756b5e9
refactor mm training
...
Former-commit-id: 3382317e32f88ed377d3e7759bdeaf0f2559d22a
2024-08-30 02:14:31 +08:00
simonJJJ
8a09b1e732
initial-commit
...
Former-commit-id: aeb85f200bd824748008dae6047c2607dfcdf174
2024-08-28 16:51:35 +08:00
hiyouga
bea270042b
add magpie ultra dataset
...
Former-commit-id: c75b5b83c4982a6da1512ad6f9cc4d98cc761094
2024-08-09 20:28:55 +08:00
hiyouga
e1e01d7efd
add unittest
...
Former-commit-id: 608de799a21f37319bf31c04c0aa50c4542ec757
2024-07-19 01:06:27 +08:00
hiyouga
14bc7b0551
fix up
...
Former-commit-id: 29ebcd75d55f70f2891632eba187b643cc3a9e51
2024-07-15 01:04:56 +08:00
hoshi-hiyouga
ddbd848e49
Update README.md
...
Former-commit-id: 9d64507bd5d47f096e81c90bfb347690afaaec2b
2024-07-14 21:27:04 +08:00
codingma
74f0d02eb8
1. add custom eval dataset support
...
2. merge load dataset and split dataset function
Former-commit-id: 76f3bbcfc0e11aa41f8f5cbebc60b77b987f7901
2024-07-05 15:52:10 +08:00
hiyouga
89564e90d7
update data
...
Former-commit-id: 9ab0401948d02d029134aa669c378e2ad80fb9fb
2024-06-19 02:48:43 +08:00
hiyouga
9e5988717d
tiny fix
...
Former-commit-id: 344b9a36b2e0b60ee61fba171b35a391e3517fed
2024-06-18 23:32:18 +08:00
Eli Costa
6bbb8b4cd8
Add Magpie and Webinstruct dataset samples
...
Adds two dataset samples claimed superior performance: Magpie (from Allen AI) and Webinstruct (from TIGER-Lab).
Former-commit-id: 74e49cca957d0bacd2c1d688e995a7370bef69f7
2024-06-15 19:31:56 -03:00
hiyouga
e89d1b1ec3
add neo-sft dataset
...
Former-commit-id: c7a5620ccc72b7574255ea764693ccb866c48263
2024-06-13 01:00:56 +08:00
hiyouga
3547a26f86
add ultrafeedback and fineweb #4085 #4132
...
Former-commit-id: 12d79f89c5082eb29842b501e1cb88433a248ba3
2024-06-08 02:42:34 +08:00
hoshi-hiyouga
9b6bdf9449
Merge pull request #3829 from seanzhang-zhichen/add_dataset_sample_num
...
Add dataset sample num
Former-commit-id: 483eb47e5d670e23fb713b942f6890b8259f4363
2024-05-30 00:25:45 +08:00
hoshi-hiyouga
21e7979837
Update README_zh.md
...
Former-commit-id: c8ae7e0e6571c7ca2e526da3e8adda5f8c9948f1
2024-05-30 00:04:47 +08:00
hoshi-hiyouga
eb7ee82f16
Update README.md
...
Former-commit-id: 3761d7d5dd97ce2fe0098284e6d4821fc0d63d30
2024-05-30 00:04:26 +08:00
hiyouga
b88ecd71fd
fix full/freeze tuning for mllm
...
Former-commit-id: 08564838bd02651668845ed74e2e60561e5b6d8c
2024-05-27 20:37:57 +08:00
BUAADreamer
f9ced0480e
Merge branch 'main' of https://github.com/BUAADreamer/LLaMA-Factory
...
Former-commit-id: 576b0206c27f93ffe19e3b7e6df58a3cd2abbb1d
2024-05-27 20:11:23 +08:00
BUAADreamer
4a958ab909
Merge branch 'hiyouga:main' into main
...
Former-commit-id: e2022ce4e90b115fb8271ef0f6bf05e8f39c997f
2024-05-27 20:10:58 +08:00
BUAADreamer
ea78a629ba
remove mllm_pt_demo.json
...
Former-commit-id: f665342a2752ffb5d715f134603d84e5228f55dc
2024-05-27 20:10:31 +08:00