91 Commits

Author SHA1 Message Date
hiyouga
04db03bdfd add rlhf-v dataset
Former-commit-id: 3fd18fc34a0c994a738504746abfd5548e002437
2024-09-01 22:57:41 +08:00
hiyouga
228f745235 refactor mm training
Former-commit-id: 179c0558699e287cbf38a2d73bff47e86d589c5a
2024-08-30 02:14:31 +08:00
simonJJJ
5e728ec221 initial-commit
Former-commit-id: b6a39847a10b417b09db4b5512dd835e9e4ce928
2024-08-28 16:51:35 +08:00
hiyouga
45fc1dfbda add magpie ultra dataset
Former-commit-id: 3317b24329b87e30f13a78936ac5554f211abf7a
2024-08-09 20:28:55 +08:00
hiyouga
8ce43766c6 fix up
Former-commit-id: 43a56cb331fae899ca35b0c312730d4ab79d0c42
2024-07-15 01:04:56 +08:00
codingma
82e941ff61 1. add custom eval dataset support
2. merge load dataset and split dataset function


Former-commit-id: 963d97ba07e7efa3a4544c4d077283d9e112b3ad
2024-07-05 15:52:10 +08:00
hiyouga
33fe274468 tiny fix
Former-commit-id: bb750fa3dde03ec024ae75596ecd4b884cb126c6
2024-06-18 23:32:18 +08:00
Eli Costa
ef578c39a0 Add Magpie and Webinstruct dataset samples
Adds two dataset samples claimed superior performance: Magpie (from Allen AI) and Webinstruct (from TIGER-Lab).

Former-commit-id: 12f4a2bc3172ecd5b6775061d59103f565ac9562
2024-06-15 19:31:56 -03:00
hiyouga
39e3d3fed6 add neo-sft dataset
Former-commit-id: 34863fa7cb641ceca92e3a2eec914126db537b62
2024-06-13 01:00:56 +08:00
hiyouga
d9aa226c08 add ultrafeedback and fineweb #4085 #4132
Former-commit-id: 968e4992e2f2a3ccba73e8668f1654ddc6eb0034
2024-06-08 02:42:34 +08:00
hiyouga
a3dd6f887c fix full/freeze tuning for mllm
Former-commit-id: df5860ddb593d5b82163a585d12160b41dbce0f3
2024-05-27 20:37:57 +08:00
BUAADreamer
fb33f6e528 Merge branch 'main' of https://github.com/BUAADreamer/LLaMA-Factory
Former-commit-id: d544570ce88a7b784beeffa70ff718109696b1f5
2024-05-27 20:11:23 +08:00
BUAADreamer
5a581acac7 Merge branch 'hiyouga:main' into main
Former-commit-id: cc1b82bf49b060987392c455fdbfe125ad667ec5
2024-05-27 20:10:58 +08:00
BUAADreamer
136e64081f remove mllm_pt_demo.json
Former-commit-id: 5402589f021056f9c9e7b68421282039a508d5b9
2024-05-27 20:10:31 +08:00
hiyouga
3f8314d4e6 add llava 1k datasets
Former-commit-id: 345d3355752f4a4dc454696a39f1610fffbbf382
2024-05-27 19:57:33 +08:00
BUAADreamer
aaadaa18f6 support pretraining of llava
Former-commit-id: 6a4c8cf0a6a1674c693b9337f018ff8df7477f8f
2024-05-21 08:57:14 +08:00
hiyouga
d24969bb7e improve KTO impl., replace datasets
Former-commit-id: e56a57ddcf061de6e4acc8679f7dbf0b68364986
2024-05-18 03:44:56 +08:00
enji.zhou
d16a1d9ed0 add kto
Former-commit-id: ec51986cf70b0bdd79b8141e45916670fb97a08e
2024-05-17 13:09:17 +08:00
hiyouga
3e5a099187 remove checksum and fix ui args
Former-commit-id: 0cfdeb1d30efb63211434bc4656bceb59e666289
2024-05-12 01:10:30 +08:00
codingma
82e830f8e7 fix sha1 of glaive_toolcall dataset
Former-commit-id: 25649cd14899f41fe12c99af12619ddcd5a8ba88
2024-05-09 16:33:45 +08:00
hiyouga
c3dbaf6eba remove big file
Former-commit-id: 8a05242787f810ec25d1b33358257d2867c45497
2024-05-07 22:14:06 +08:00
hiyouga
a978b5dc4e fix stop param
Former-commit-id: f0a850c25211b72eddbb357c81679db9b0930d44
2024-05-07 00:41:04 +08:00
hoshi-hiyouga
34dcba1bc6 Merge pull request #3588 from ZeyuTeng96/patch-1
update hf_hub_url for nectar_rm in dataset_info

Former-commit-id: bcf2c749490d45e3b1363352cc30fd6f9ef29a19
2024-05-07 00:06:11 +08:00
hoshi-hiyouga
224f57f83b Update dataset_info.json
Former-commit-id: c55969c2350548a9b2eda5352b067df63ee98b20
2024-05-07 00:05:45 +08:00
hiyouga
6710e27429 update example docs
Former-commit-id: 102cd42768d9eb2cf1219309a25b41e26149067e
2024-05-06 22:51:02 +08:00
ZeyuTeng96
6041dda838 update hf_hub_url for nectar_rm in dataset_info
Hi there,

I cannot find the "mlinmg/RLAIF-Nectar" on hf, seems like it changed as "AstraMindAI/RLAIF-Nectar". So, making a PR for updating.

See: https://huggingface.co/datasets/AstraMindAI/RLAIF-Nectar
Former-commit-id: 98ea76989f6ee9096edd0d353d8a001cdb6ccc5a
2024-05-06 16:44:50 +08:00
hiyouga
e626d15764 update readme
Former-commit-id: c9190fe36f511c3a5149d45c85a10b02a57fa88a
2024-04-26 23:39:19 +08:00
hoshi-hiyouga
94ae4c42e5 Merge pull request #3471 from BUAADreamer/main
add llava_150k en/zh mllm sft data

Former-commit-id: 991d843d56acd104ceff42f6d74d4e7acd5ccb01
2024-04-26 23:36:41 +08:00
hoshi-hiyouga
08460183a9 Update dataset_info.json
Former-commit-id: 7df511cfb76c833e8cc9be8cb45673395f54c32b
2024-04-26 23:34:34 +08:00
BUAADreamer
47c6d405dc add llava_150k en/zh mllm sft data
Former-commit-id: 62b3fb2f15e7e1c56da8011f0bf27cff35025863
2024-04-26 23:18:58 +08:00
hiyouga
190ae7b73d release v0.7.0
Former-commit-id: 45bb89cb4d26a6b3fb5360bc90ab950738fe4920
2024-04-26 23:18:00 +08:00
hiyouga
a635030931 support mllm hf inference
Former-commit-id: 2c7c01282acd7ddabbb17ce3246b8dae4bc4b8cf
2024-04-26 05:34:58 +08:00
hoshi-hiyouga
7b4a31ba22 Update dataset_info.json
Former-commit-id: b3e3749d49ba561929ed708650314e2c9b47c24d
2024-04-26 03:03:36 +08:00
BUAADreamer
0373a3f2a8 merge data part to the text stream
Former-commit-id: 80537d580119d9d5a06ab236a5284aaae2f83b5b
2024-04-25 19:58:47 +08:00
BUAADreamer
69fb4351f5 merge data part to the text stream
Former-commit-id: 7ee20286d9bcc2d5378bfd6bb02cd3648396d873
2024-04-25 19:19:59 +08:00
BUAADreamer
641c97ba74 add llava and instructblip
Former-commit-id: 142fb6f4541a1acfefe66ff2574dabde53b00c06
2024-04-25 00:22:43 +08:00
BUAADreamer
20e05970ab add multimodal LLM BLIP-2 and InstructBLIP
Former-commit-id: a730f89a972f1a9d37c718c716f199cb8d4903b2
2024-04-23 18:45:43 +08:00
hiyouga
bbf462a17e add dpo mix dataset
Former-commit-id: 6def3f8bfa51b2d9d73af112352ce07db972e4c9
2024-04-20 01:31:38 +08:00
hiyouga
b1ae554c83 fix #3247
Former-commit-id: bb67c66f80627805b585d157ba807c0ce378d3f2
2024-04-12 17:41:33 +08:00
li.yunhao
e6e3571232 fix pile datset hf hub url
Former-commit-id: c06f71f74ee1b177617417d151185757fd4359f5
2024-03-30 16:06:10 +08:00
hiyouga
5f3f0c53f2 add orca_dpo_pairs dataset
Former-commit-id: af683aacbae462a2a37d76d37df583e217664bd5
2024-03-20 20:09:06 +08:00
hiyouga
28f3e60189 update readme, add starcoder2, cosmopedia
Former-commit-id: 1ae7c183640146bb9b06c98942985a1721d2b9c9
2024-03-03 01:01:46 +08:00
hiyouga
9d241d08ae update data
Former-commit-id: bd63af6ede3a103b75ef9c0875557d65e2c4c7f7
2024-03-02 19:37:18 +08:00
hiyouga
5ebd605149 fix #2533
Former-commit-id: 52a81299fcff0fa691e1d6f9a7e9ea9d19751b3a
2024-02-21 22:47:48 +08:00
hiyouga
a6ff18ab17 fix #2481
Former-commit-id: 2a4e3e4a26a2fad77ccc476be7d45434b8af4a55
2024-02-15 19:07:47 +08:00
hiyouga
3e8c3b506a improve aligner
Former-commit-id: cc7296b92e10c24967fc753393275b71d300683f
2024-02-10 16:39:19 +08:00
Mark Mueller
ac5d3811bd Slim Orca data parsing
Former-commit-id: f2d8efede7e20edafed0d5446eb64f2d419949b1
2024-02-08 19:32:20 +01:00
Johann-Peter Hartmann
5a23651531 WS fix
Former-commit-id: 131935346ac06738be5e7c7f54fe2eb7d3769d7a
2024-02-06 20:13:04 +01:00
Johann-Peter Hartmann
66e1781ee9 add ranking to dpo dataset
Former-commit-id: 6a844fb384dd9cac3fd6b845a6b414320c5eb766
2024-02-06 20:12:36 +01:00
Johann-Peter Hartmann
af258902c4 remove comma
Former-commit-id: 57a2f6d35da8cd10fad9859382bc1e983da56705
2024-02-03 08:48:39 +01:00