111 Commits

Author SHA1 Message Date
hoshi-hiyouga
04dce0079e Update README.md
Former-commit-id: 65fb69e388c0a04c15ecd11441e567966f51fae5
2024-05-30 00:04:26 +08:00
seanzhang-zhichen
fc6c31127a Merge branch 'main' into add_dataset_sample_num
Former-commit-id: 26300127c45f24e63b91f1b0cc73e46c3a936a91
2024-05-24 15:57:47 +08:00
hiyouga
cc7bdaa459 Update README_zh.md
Former-commit-id: 34c4ba6bf9bb89170446fb396aa06ae44d251de0
2024-05-21 18:30:59 +08:00
hiyouga
d9c5d4ee64 fix #3818
Former-commit-id: 3f366e05a34be224f53c5bf8334e57ae5d316004
2024-05-20 21:43:19 +08:00
zhangzc
e84b72f806 fix conflict
Former-commit-id: 6922b23a748c2459147bf44b96d86daa89f2c96c
2024-05-20 17:10:01 +08:00
hiyouga
e088b3906f update data readme
Former-commit-id: 22c7335b496e4a673383d5a1e4e60bf2cb4e35b3
2024-05-18 21:37:38 +08:00
hiyouga
d732b72f82 update data readme
Former-commit-id: beb864a9367943d3274cb6057423d1eb9aaf85c4
2024-05-18 21:15:20 +08:00
hiyouga
d24969bb7e improve KTO impl., replace datasets
Former-commit-id: e56a57ddcf061de6e4acc8679f7dbf0b68364986
2024-05-18 03:44:56 +08:00
enji.zhou
d16a1d9ed0 add kto
Former-commit-id: ec51986cf70b0bdd79b8141e45916670fb97a08e
2024-05-17 13:09:17 +08:00
hiyouga
3e5a099187 remove checksum and fix ui args
Former-commit-id: 0cfdeb1d30efb63211434bc4656bceb59e666289
2024-05-12 01:10:30 +08:00
codingma
82e830f8e7 fix sha1 of glaive_toolcall dataset
Former-commit-id: 25649cd14899f41fe12c99af12619ddcd5a8ba88
2024-05-09 16:33:45 +08:00
hiyouga
c3dbaf6eba remove big file
Former-commit-id: 8a05242787f810ec25d1b33358257d2867c45497
2024-05-07 22:14:06 +08:00
hiyouga
a978b5dc4e fix stop param
Former-commit-id: f0a850c25211b72eddbb357c81679db9b0930d44
2024-05-07 00:41:04 +08:00
hoshi-hiyouga
34dcba1bc6 Merge pull request #3588 from ZeyuTeng96/patch-1
update hf_hub_url for nectar_rm in dataset_info

Former-commit-id: bcf2c749490d45e3b1363352cc30fd6f9ef29a19
2024-05-07 00:06:11 +08:00
hoshi-hiyouga
224f57f83b Update dataset_info.json
Former-commit-id: c55969c2350548a9b2eda5352b067df63ee98b20
2024-05-07 00:05:45 +08:00
hiyouga
6710e27429 update example docs
Former-commit-id: 102cd42768d9eb2cf1219309a25b41e26149067e
2024-05-06 22:51:02 +08:00
ZeyuTeng96
6041dda838 update hf_hub_url for nectar_rm in dataset_info
Hi there,

I cannot find the "mlinmg/RLAIF-Nectar" on hf, seems like it changed as "AstraMindAI/RLAIF-Nectar". So, making a PR for updating.

See: https://huggingface.co/datasets/AstraMindAI/RLAIF-Nectar
Former-commit-id: 98ea76989f6ee9096edd0d353d8a001cdb6ccc5a
2024-05-06 16:44:50 +08:00
hoshi-hiyouga
bdf78a9b07 Update README_zh.md
Former-commit-id: 1c673d89faca3160627009fcd0a4aa39138570c0
2024-05-02 02:14:55 +08:00
hoshi-hiyouga
fb76d1f262 Update README.md
Former-commit-id: 4fb43b0c9aa48242126252ad755a2a1683b38d6a
2024-05-02 02:13:46 +08:00
Lao
507349d493 Update README_zh.md
Former-commit-id: bacc8588dc7b0b43c240189ecf4336bedc299357
2024-04-28 23:31:37 +08:00
khazic
d2c8575a78 Upgrade the second sharegpt format
Former-commit-id: 057f992a666b029d207a3dc7dfc353f9abcf8316
2024-04-28 14:30:05 +08:00
khazic
abe8436bb4 added the second sharegpt format
Former-commit-id: 6d140ac98a78ecc0a713842bb917dc8eb14450cb
2024-04-28 14:27:45 +08:00
hiyouga
e626d15764 update readme
Former-commit-id: c9190fe36f511c3a5149d45c85a10b02a57fa88a
2024-04-26 23:39:19 +08:00
hoshi-hiyouga
94ae4c42e5 Merge pull request #3471 from BUAADreamer/main
add llava_150k en/zh mllm sft data

Former-commit-id: 991d843d56acd104ceff42f6d74d4e7acd5ccb01
2024-04-26 23:36:41 +08:00
hoshi-hiyouga
08460183a9 Update dataset_info.json
Former-commit-id: 7df511cfb76c833e8cc9be8cb45673395f54c32b
2024-04-26 23:34:34 +08:00
BUAADreamer
47c6d405dc add llava_150k en/zh mllm sft data
Former-commit-id: 62b3fb2f15e7e1c56da8011f0bf27cff35025863
2024-04-26 23:18:58 +08:00
hiyouga
190ae7b73d release v0.7.0
Former-commit-id: 45bb89cb4d26a6b3fb5360bc90ab950738fe4920
2024-04-26 23:18:00 +08:00
hiyouga
a635030931 support mllm hf inference
Former-commit-id: 2c7c01282acd7ddabbb17ce3246b8dae4bc4b8cf
2024-04-26 05:34:58 +08:00
hoshi-hiyouga
7b4a31ba22 Update dataset_info.json
Former-commit-id: b3e3749d49ba561929ed708650314e2c9b47c24d
2024-04-26 03:03:36 +08:00
hoshi-hiyouga
47ed006c81 Update mllm_demo.json
Former-commit-id: 33fc9f716422e963e6636ca5461c5e20dd378e66
2024-04-26 02:58:45 +08:00
hoshi-hiyouga
b314345e24 Update and rename llava_instruct_example.json to mllm_demo.json
Former-commit-id: 513f521afe0238d2e28a586a7a3e28c2745a2cdf
2024-04-26 02:57:54 +08:00
BUAADreamer
0373a3f2a8 merge data part to the text stream
Former-commit-id: 80537d580119d9d5a06ab236a5284aaae2f83b5b
2024-04-25 19:58:47 +08:00
BUAADreamer
69fb4351f5 merge data part to the text stream
Former-commit-id: 7ee20286d9bcc2d5378bfd6bb02cd3648396d873
2024-04-25 19:19:59 +08:00
BUAADreamer
641c97ba74 add llava and instructblip
Former-commit-id: 142fb6f4541a1acfefe66ff2574dabde53b00c06
2024-04-25 00:22:43 +08:00
BUAADreamer
20e05970ab add multimodal LLM BLIP-2 and InstructBLIP
Former-commit-id: a730f89a972f1a9d37c718c716f199cb8d4903b2
2024-04-23 18:45:43 +08:00
hiyouga
bbf462a17e add dpo mix dataset
Former-commit-id: 6def3f8bfa51b2d9d73af112352ce07db972e4c9
2024-04-20 01:31:38 +08:00
hiyouga
b1ae554c83 fix #3247
Former-commit-id: bb67c66f80627805b585d157ba807c0ce378d3f2
2024-04-12 17:41:33 +08:00
hiyouga
e6c7e6e667 support ORPO
Former-commit-id: f44a4c27e2461cdaa1b16865f597a31033c0e6d9
2024-03-31 18:29:50 +08:00
li.yunhao
e6e3571232 fix pile datset hf hub url
Former-commit-id: c06f71f74ee1b177617417d151185757fd4359f5
2024-03-30 16:06:10 +08:00
zhangzc
5e1cb05b95 Supports custom data set sampling quantity
Former-commit-id: fa8325401df27595de4611a89dfcc14644956abd
2024-03-27 14:22:50 +08:00
hiyouga
5f3f0c53f2 add orca_dpo_pairs dataset
Former-commit-id: af683aacbae462a2a37d76d37df583e217664bd5
2024-03-20 20:09:06 +08:00
SirlyDreamer
a7e96bc329 Follow HF_ENDPOINT environment variable
Former-commit-id: 22b36a3cfd2909cb624b1bb7385558eda504defe
2024-03-20 08:31:30 +00:00
hiyouga
dbaea2ba2f update parser
Former-commit-id: d98258aa08d93494ad50d7786064e7fda15f6ca9
2024-03-10 13:35:20 +08:00
hiyouga
28f3e60189 update readme, add starcoder2, cosmopedia
Former-commit-id: 1ae7c183640146bb9b06c98942985a1721d2b9c9
2024-03-03 01:01:46 +08:00
hiyouga
9d241d08ae update data
Former-commit-id: bd63af6ede3a103b75ef9c0875557d65e2c4c7f7
2024-03-02 19:37:18 +08:00
hiyouga
5ebd605149 fix #2533
Former-commit-id: 52a81299fcff0fa691e1d6f9a7e9ea9d19751b3a
2024-02-21 22:47:48 +08:00
hiyouga
a6ff18ab17 fix #2481
Former-commit-id: 2a4e3e4a26a2fad77ccc476be7d45434b8af4a55
2024-02-15 19:07:47 +08:00
hiyouga
ea2eef508a update data/readme
Former-commit-id: aa566e3cea5bc75688b4399a9da07be0b35b921c
2024-02-10 21:04:29 +08:00
hiyouga
3e8c3b506a improve aligner
Former-commit-id: cc7296b92e10c24967fc753393275b71d300683f
2024-02-10 16:39:19 +08:00
Mark Mueller
ac5d3811bd Slim Orca data parsing
Former-commit-id: f2d8efede7e20edafed0d5446eb64f2d419949b1
2024-02-08 19:32:20 +01:00