Commit Graph

122 Commits

Author SHA1 Message Date
zhangzc
05afeb304d Supports custom data set sampling quantity
Former-commit-id: 449e2aa38e
2024-03-27 14:22:50 +08:00
hiyouga
6646e18c02 add orca_dpo_pairs dataset
Former-commit-id: 3271af2afc
2024-03-20 20:09:06 +08:00
SirlyDreamer
78359638e3 Follow HF_ENDPOINT environment variable
Former-commit-id: e165965341
2024-03-20 08:31:30 +00:00
hiyouga
566bfad930 update parser
Former-commit-id: be99799413
2024-03-10 13:35:20 +08:00
hiyouga
9ae1514a75 update readme, add starcoder2, cosmopedia
Former-commit-id: 894d183214
2024-03-03 01:01:46 +08:00
hiyouga
3b16912235 update data
Former-commit-id: 32884523c5
2024-03-02 19:37:18 +08:00
hiyouga
7e2d8b170a fix #2533
Former-commit-id: 1630a4cb8f
2024-02-21 22:47:48 +08:00
hiyouga
62b78001b7 fix #2481
Former-commit-id: 22acab8aff
2024-02-15 19:07:47 +08:00
hiyouga
9cf5d89bd1 update data/readme
Former-commit-id: a754f6e9ec
2024-02-10 21:04:29 +08:00
hiyouga
db2051684b improve aligner
Former-commit-id: 7d2dc83c5e
2024-02-10 16:39:19 +08:00
Mark Mueller
4bd7b8375e Slim Orca data parsing
Former-commit-id: 1d3598afa1
2024-02-08 19:32:20 +01:00
Johann-Peter Hartmann
ace1770085 WS fix
Former-commit-id: 49c69ea4b9
2024-02-06 20:13:04 +01:00
Johann-Peter Hartmann
6ff4e9e62c add ranking to dpo dataset
Former-commit-id: 1126563505
2024-02-06 20:12:36 +01:00
Johann-Peter Hartmann
77746ad86c remove comma
Former-commit-id: 870182c3a9
2024-02-03 08:48:39 +01:00
Johann-Peter Hartmann
81edbd1472 Merge branch 'hiyouga:main' into main
Former-commit-id: 4e27950acb
2024-01-31 14:05:52 +01:00
hiyouga
7beeae2209 fix autoset attn impl, update data readme
Former-commit-id: 521ad76552
2024-01-31 11:58:07 +08:00
Johann-Peter Hartmann
c264eb4793 Add support for german datasets
Former-commit-id: d9a8301ed4
2024-01-30 10:18:01 +01:00
hiyouga
cd4d38e0cc Update dataset_info.json
Former-commit-id: dbaaa4546e
2024-01-23 00:10:32 +08:00
hiyouga
509e35ffc8 fix #2282 and update tool prompt
Former-commit-id: b2fb0eca56
2024-01-22 22:27:30 +08:00
hiyouga
48cab43cb5 add array param format
Former-commit-id: 486cc8d360
2024-01-21 22:17:48 +08:00
hiyouga
e95c0242a8 fix dataset
Former-commit-id: 487dee066f
2024-01-18 12:59:30 +08:00
hiyouga
7f12aedc08 enable cutoff len
Former-commit-id: f1067d2b58
2024-01-18 12:25:42 +08:00
hiyouga
4e3bfb799d support function calling
Former-commit-id: d9f1cae351
2024-01-18 09:54:23 +08:00
hiyouga
a52aafdbdc tiny update
Former-commit-id: 5b93d545e2
2023-12-25 18:29:34 +08:00
hiyouga
1af13cb737 add models
Former-commit-id: 709ac8870a
2023-12-18 19:09:31 +08:00
hiyouga
f0f9d253d8 support autogptq in llama board #246
Former-commit-id: 71389be37c
2023-12-16 16:31:30 +08:00
hiyouga
1a0bdd305c support system column #1765
Former-commit-id: 0a9c6e0146
2023-12-12 19:45:59 +08:00
hiyouga
cefc0b2f03 fix modelscope data hub
Former-commit-id: d5b2c57a35
2023-12-12 18:33:06 +08:00
hoshi-hiyouga
b67085e13a Merge branch 'main' into feat/support_ms
Former-commit-id: 6382efec52
2023-12-12 17:55:32 +08:00
xingjun.wang
e331e8c200 modify guanaco
Former-commit-id: e80a989d49
2023-12-12 15:00:37 +08:00
xingjun.wang
277790d868 update dataset info
Former-commit-id: 73b50a26b9
2023-12-12 14:53:59 +08:00
xingjun.wang
879209829e update args for MsDataset.load
Former-commit-id: 09533e95ed
2023-12-12 13:02:54 +08:00
xingjun.wang
9f17d36ccf add new datasets
Former-commit-id: fe4acc66b0
2023-12-12 12:44:15 +08:00
xingjun.wang
92fb73abd4 add open orca
Former-commit-id: 0ce18a3782
2023-12-12 12:34:04 +08:00
hiyouga
b641e9e97e fix #1784
Former-commit-id: 28d5de7e78
2023-12-09 20:53:18 +08:00
yuze.zyz
9c30cdb53d fix typo
Former-commit-id: e4cf2a75ca
2023-12-08 18:13:26 +08:00
yuze.zyz
c523613f0a support ms dataset
Former-commit-id: 9c2247d700
2023-12-08 18:00:57 +08:00
hiyouga
9a6b694e12 fix #1696
Former-commit-id: bf6f6aeefe
2023-12-01 15:34:50 +08:00
Marco
a26f68ba47 Update dataset_info.json
Added the Nectar dataset already preprocessed and divided in sft and rl to which I added a preprompt to each instruction since it has been seen that this increase instruction following

Former-commit-id: 9468ee9012
2023-11-30 16:21:34 +01:00
hiyouga
303956cbb9 update dataset
Former-commit-id: 7b1aa6f63c
2023-11-17 23:19:12 +08:00
hiyouga
f441932bd1 support full-parameter PPO
Former-commit-id: ce78303600
2023-11-16 02:08:04 +08:00
hiyouga
38755bced7 add template, modify datasets
Former-commit-id: 386f590209
2023-11-09 15:53:23 +08:00
hiyouga
b2bf10661b update data readme
Former-commit-id: 2b5e33c338
2023-11-03 00:15:23 +08:00
hiyouga
a9db89a025 update data readme (zh)
Former-commit-id: cc8ffa10d8
2023-11-02 23:42:49 +08:00
hiyouga
a1b0655457 support sharegpt format, add datasets
Former-commit-id: a837172413
2023-11-02 23:10:04 +08:00
hiyouga
1cd0ea1f13 add MathInstruct dataset
Former-commit-id: 026af87e7f
2023-09-13 22:30:14 +08:00
hiyouga
a4fd976048 refactor dataset_attr, add eos in pt, fix #757
Former-commit-id: a9d1fb72f7
2023-09-01 19:00:45 +08:00
codemayq
d9b9d9d1fe add ad gen dataset
Former-commit-id: 604f85487b
2023-08-27 20:35:32 +08:00
codemayq
b032dc4c4e add readme for dataset
Former-commit-id: cece66d48a
2023-08-23 19:55:45 +08:00
codemayq
4b29d9d2b0 add dataset stage and filter dataset when stage chosen in webui
Former-commit-id: c0e4d1e81b
2023-08-23 18:54:23 +08:00