28 Commits

Author SHA1 Message Date
xingjun.wang
55b51fd7b2 add new datasets
Former-commit-id: d1e2e8430b3d21ca023d66e9ca28f7e5d2da0029
2023-12-12 12:44:15 +08:00
xingjun.wang
b761416dc1 add open orca
Former-commit-id: 7994c809b385bdc2c19e1e5e6fa8680aa9f2b77d
2023-12-12 12:34:04 +08:00
yuze.zyz
0cacd5147a fix typo
Former-commit-id: 29b07291e6b40e9f0a61632609465363291ae5c7
2023-12-08 18:13:26 +08:00
yuze.zyz
c2432b2e8d support ms dataset
Former-commit-id: 98638b35dc24045ac17b9b01d08d3a02372acef3
2023-12-08 18:00:57 +08:00
hiyouga
1602fe2350 fix #1696
Former-commit-id: 722ae14a652af34d9b91f9459e613d7959ecaa7e
2023-12-01 15:34:50 +08:00
Marco
238379e64a Update dataset_info.json
Added the Nectar dataset already preprocessed and divided in sft and rl to which I added a preprompt to each instruction since it has been seen that this increase instruction following

Former-commit-id: 6336e247c1535f356194046607038245bc48464f
2023-11-30 16:21:34 +01:00
hiyouga
c7ab341fcd update dataset
Former-commit-id: a310b22b446118d90dd73906847ed3d01a574b50
2023-11-17 23:19:12 +08:00
hiyouga
685d0c975a support full-parameter PPO
Former-commit-id: 4af967d69475e1c9fdf1a7983cd6b83bd431abff
2023-11-16 02:08:04 +08:00
hiyouga
f697474e67 add template, modify datasets
Former-commit-id: 81e54beb4d0f792f4fd7f450643caaf10f2f0b7d
2023-11-09 15:53:23 +08:00
hiyouga
fb3d981496 update data readme (zh)
Former-commit-id: b32fb3a984c681732b82f6544d6c05a98c34cf4c
2023-11-02 23:42:49 +08:00
hiyouga
33c47f0ebe support sharegpt format, add datasets
Former-commit-id: 202daf8987ccb7523be03ca535b572b5c9e65994
2023-11-02 23:10:04 +08:00
hiyouga
5daa358aab add MathInstruct dataset
Former-commit-id: 3d1d4b47055739854cf9788a902607e1bbba3723
2023-09-13 22:30:14 +08:00
hiyouga
c5fcf5b3a5 refactor dataset_attr, add eos in pt, fix #757
Former-commit-id: 0feec9a830b917b36686b61938a66e842eccf930
2023-09-01 19:00:45 +08:00
codemayq
09f61befc8 add ad gen dataset
Former-commit-id: fcd0788aa4dda0cecc1420d369d371032a207810
2023-08-27 20:35:32 +08:00
codemayq
22cece8acb add dataset stage and filter dataset when stage chosen in webui
Former-commit-id: 26e4136449a4df6028d834fd16a0f4a7c532759d
2023-08-23 18:54:23 +08:00
hiyouga
7ada4f5f6f support DPO training (2305.18290)
Former-commit-id: 6d98de148e4af63a7028dfaeb6cf86eb56a4488f
2023-08-11 03:02:53 +08:00
hiyouga
ed252565f9 update dataset
Former-commit-id: 4a044aabbd19c92a9ae93c1c30536f5086fd47f9
2023-07-26 17:05:12 +08:00
hiyouga
9802398c71 update dataset
Former-commit-id: 4fc2c3293d91d8464527ebd1ddabe572c8355616
2023-07-23 20:01:43 +08:00
hiyouga
e9de1951dd add datasets
Former-commit-id: 02e4b47dea1b25905c61f2ace88bab112610f021
2023-07-19 20:59:15 +08:00
hiyouga
25182c4779 fix Baichuan-13B
Former-commit-id: 6d9d826b3246349454c68f4d13b862da4de986e2
2023-07-13 23:08:45 +08:00
zxbsmk
3f0ee46bd0 Support for WebNovel dataset
Former-commit-id: 655162f530784bc9374962c02d8b414872f83b2f
2023-07-12 17:29:47 +08:00
hiyouga
fe45c17b25 add open assistant dataset
Former-commit-id: 1694cf3078d04a14bce96da04b9d8c52176b1044
2023-06-28 23:09:33 +08:00
hiyouga
e52b201c13 add belle multiturn dataset
Former-commit-id: ac907ae1c37969df3cd09d4ab5f3f7f352eb259c
2023-06-16 20:01:16 +08:00
hiyouga
0da1b7d9ab support RM metrics, add generating Args
Former-commit-id: c461c6190bc124e98dde7f3cf96a59ce40b26fb0
2023-06-12 15:48:48 +08:00
BUAADreamer
2012cb5cbc add code for reading from multi files in one directory
Former-commit-id: b7ebb83a96619e5111b0faa9da9d0feb8d9cdff0
2023-06-10 15:53:47 +08:00
hiyouga
f8d03f3aa9 remove dummy code
Former-commit-id: e6bc89d280945bbf48281107145c40a41d7cbd56
2023-05-30 16:28:00 +08:00
hiyouga
bb6f731461 add pre-training script
Former-commit-id: 935d58de2b3a2eadc4f0bed28c3ad7dee32e9fd5
2023-05-29 21:37:22 +08:00
hiyouga
54574f1dfa Initial commit
Former-commit-id: 5ca8e1d63727e7bcb8cab16542c763c47e48184a
2023-05-28 18:09:04 +08:00