hoshi-hiyouga
94ae4c42e5
Merge pull request #3471 from BUAADreamer/main
...
add llava_150k en/zh mllm sft data
Former-commit-id: 991d843d56acd104ceff42f6d74d4e7acd5ccb01
2024-04-26 23:36:41 +08:00
hoshi-hiyouga
08460183a9
Update dataset_info.json
...
Former-commit-id: 7df511cfb76c833e8cc9be8cb45673395f54c32b
2024-04-26 23:34:34 +08:00
BUAADreamer
47c6d405dc
add llava_150k en/zh mllm sft data
...
Former-commit-id: 62b3fb2f15e7e1c56da8011f0bf27cff35025863
2024-04-26 23:18:58 +08:00
hiyouga
190ae7b73d
release v0.7.0
...
Former-commit-id: 45bb89cb4d26a6b3fb5360bc90ab950738fe4920
2024-04-26 23:18:00 +08:00
hiyouga
a635030931
support mllm hf inference
...
Former-commit-id: 2c7c01282acd7ddabbb17ce3246b8dae4bc4b8cf
2024-04-26 05:34:58 +08:00
hoshi-hiyouga
7b4a31ba22
Update dataset_info.json
...
Former-commit-id: b3e3749d49ba561929ed708650314e2c9b47c24d
2024-04-26 03:03:36 +08:00
BUAADreamer
0373a3f2a8
merge data part to the text stream
...
Former-commit-id: 80537d580119d9d5a06ab236a5284aaae2f83b5b
2024-04-25 19:58:47 +08:00
BUAADreamer
69fb4351f5
merge data part to the text stream
...
Former-commit-id: 7ee20286d9bcc2d5378bfd6bb02cd3648396d873
2024-04-25 19:19:59 +08:00
BUAADreamer
641c97ba74
add llava and instructblip
...
Former-commit-id: 142fb6f4541a1acfefe66ff2574dabde53b00c06
2024-04-25 00:22:43 +08:00
BUAADreamer
20e05970ab
add multimodal LLM BLIP-2 and InstructBLIP
...
Former-commit-id: a730f89a972f1a9d37c718c716f199cb8d4903b2
2024-04-23 18:45:43 +08:00
hiyouga
bbf462a17e
add dpo mix dataset
...
Former-commit-id: 6def3f8bfa51b2d9d73af112352ce07db972e4c9
2024-04-20 01:31:38 +08:00
hiyouga
b1ae554c83
fix #3247
...
Former-commit-id: bb67c66f80627805b585d157ba807c0ce378d3f2
2024-04-12 17:41:33 +08:00
li.yunhao
e6e3571232
fix pile datset hf hub url
...
Former-commit-id: c06f71f74ee1b177617417d151185757fd4359f5
2024-03-30 16:06:10 +08:00
hiyouga
5f3f0c53f2
add orca_dpo_pairs dataset
...
Former-commit-id: af683aacbae462a2a37d76d37df583e217664bd5
2024-03-20 20:09:06 +08:00
hiyouga
28f3e60189
update readme, add starcoder2, cosmopedia
...
Former-commit-id: 1ae7c183640146bb9b06c98942985a1721d2b9c9
2024-03-03 01:01:46 +08:00
hiyouga
9d241d08ae
update data
...
Former-commit-id: bd63af6ede3a103b75ef9c0875557d65e2c4c7f7
2024-03-02 19:37:18 +08:00
hiyouga
5ebd605149
fix #2533
...
Former-commit-id: 52a81299fcff0fa691e1d6f9a7e9ea9d19751b3a
2024-02-21 22:47:48 +08:00
hiyouga
a6ff18ab17
fix #2481
...
Former-commit-id: 2a4e3e4a26a2fad77ccc476be7d45434b8af4a55
2024-02-15 19:07:47 +08:00
hiyouga
3e8c3b506a
improve aligner
...
Former-commit-id: cc7296b92e10c24967fc753393275b71d300683f
2024-02-10 16:39:19 +08:00
Mark Mueller
ac5d3811bd
Slim Orca data parsing
...
Former-commit-id: f2d8efede7e20edafed0d5446eb64f2d419949b1
2024-02-08 19:32:20 +01:00
Johann-Peter Hartmann
5a23651531
WS fix
...
Former-commit-id: 131935346ac06738be5e7c7f54fe2eb7d3769d7a
2024-02-06 20:13:04 +01:00
Johann-Peter Hartmann
66e1781ee9
add ranking to dpo dataset
...
Former-commit-id: 6a844fb384dd9cac3fd6b845a6b414320c5eb766
2024-02-06 20:12:36 +01:00
Johann-Peter Hartmann
af258902c4
remove comma
...
Former-commit-id: 57a2f6d35da8cd10fad9859382bc1e983da56705
2024-02-03 08:48:39 +01:00
Johann-Peter Hartmann
912bb5cb03
Add support for german datasets
...
Former-commit-id: bbc038aa236952597e97d1ccf1ae2d64a16339b5
2024-01-30 10:18:01 +01:00
hiyouga
fa9939b2b2
Update dataset_info.json
...
Former-commit-id: 4fe04ac7fc464c0ed705281cd3860839c18d6fc0
2024-01-23 00:10:32 +08:00
hiyouga
54f406a26c
enable cutoff len
...
Former-commit-id: e9513d300c338dfcae98eee7d057bfd00da2da0e
2024-01-18 12:25:42 +08:00
hiyouga
a9fc7dbfa6
support function calling
...
Former-commit-id: 66533b3f65babf2429c92c0f8fafe4eff5e0ff63
2024-01-18 09:54:23 +08:00
hiyouga
6298f4779c
tiny update
...
Former-commit-id: 4417b8ee20b381c964f452f52081667dfa33cd7b
2023-12-25 18:29:34 +08:00
hiyouga
cedf58978e
support autogptq in llama board #246
...
Former-commit-id: fea01226703d1534b5cf511bcb6a49e73bc86ce1
2023-12-16 16:31:30 +08:00
hiyouga
d9f621be13
support system column #1765
...
Former-commit-id: f425584a511c5e42bae8b3ba090eaa898b28adad
2023-12-12 19:45:59 +08:00
hiyouga
aa30233322
fix modelscope data hub
...
Former-commit-id: 5b63e8c22538a4788e4b6c8df50e6e6be93ceeac
2023-12-12 18:33:06 +08:00
hoshi-hiyouga
97d5fb3460
Merge branch 'main' into feat/support_ms
...
Former-commit-id: 698756dffb7d4e602b3e0cab66ef0a4befe7215c
2023-12-12 17:55:32 +08:00
xingjun.wang
d4d4efc9e6
modify guanaco
...
Former-commit-id: ed2746fcc29cd07d4fa796f35f8d67c72bf30be8
2023-12-12 15:00:37 +08:00
xingjun.wang
643fa8e685
update dataset info
...
Former-commit-id: c005716ebcef390cf219e45649778f91e1f6e959
2023-12-12 14:53:59 +08:00
xingjun.wang
c1703c4f75
update args for MsDataset.load
...
Former-commit-id: c5f69357a167cbf99a93607177526e787419ea05
2023-12-12 13:02:54 +08:00
xingjun.wang
55b51fd7b2
add new datasets
...
Former-commit-id: d1e2e8430b3d21ca023d66e9ca28f7e5d2da0029
2023-12-12 12:44:15 +08:00
xingjun.wang
b761416dc1
add open orca
...
Former-commit-id: 7994c809b385bdc2c19e1e5e6fa8680aa9f2b77d
2023-12-12 12:34:04 +08:00
hiyouga
8149cee890
fix #1784
...
Former-commit-id: 4e1af5a5d39d9e2f374c1372e2d67120c63fea09
2023-12-09 20:53:18 +08:00
yuze.zyz
0cacd5147a
fix typo
...
Former-commit-id: 29b07291e6b40e9f0a61632609465363291ae5c7
2023-12-08 18:13:26 +08:00
yuze.zyz
c2432b2e8d
support ms dataset
...
Former-commit-id: 98638b35dc24045ac17b9b01d08d3a02372acef3
2023-12-08 18:00:57 +08:00
hiyouga
1602fe2350
fix #1696
...
Former-commit-id: 722ae14a652af34d9b91f9459e613d7959ecaa7e
2023-12-01 15:34:50 +08:00
Marco
238379e64a
Update dataset_info.json
...
Added the Nectar dataset already preprocessed and divided in sft and rl to which I added a preprompt to each instruction since it has been seen that this increase instruction following
Former-commit-id: 6336e247c1535f356194046607038245bc48464f
2023-11-30 16:21:34 +01:00
hiyouga
c7ab341fcd
update dataset
...
Former-commit-id: a310b22b446118d90dd73906847ed3d01a574b50
2023-11-17 23:19:12 +08:00
hiyouga
685d0c975a
support full-parameter PPO
...
Former-commit-id: 4af967d69475e1c9fdf1a7983cd6b83bd431abff
2023-11-16 02:08:04 +08:00
hiyouga
f697474e67
add template, modify datasets
...
Former-commit-id: 81e54beb4d0f792f4fd7f450643caaf10f2f0b7d
2023-11-09 15:53:23 +08:00
hiyouga
fb3d981496
update data readme (zh)
...
Former-commit-id: b32fb3a984c681732b82f6544d6c05a98c34cf4c
2023-11-02 23:42:49 +08:00
hiyouga
33c47f0ebe
support sharegpt format, add datasets
...
Former-commit-id: 202daf8987ccb7523be03ca535b572b5c9e65994
2023-11-02 23:10:04 +08:00
hiyouga
5daa358aab
add MathInstruct dataset
...
Former-commit-id: 3d1d4b47055739854cf9788a902607e1bbba3723
2023-09-13 22:30:14 +08:00
hiyouga
c5fcf5b3a5
refactor dataset_attr, add eos in pt, fix #757
...
Former-commit-id: 0feec9a830b917b36686b61938a66e842eccf930
2023-09-01 19:00:45 +08:00
codemayq
09f61befc8
add ad gen dataset
...
Former-commit-id: fcd0788aa4dda0cecc1420d369d371032a207810
2023-08-27 20:35:32 +08:00