41 Commits

Author SHA1 Message Date
xingjun.wang
643fa8e685 update dataset info
Former-commit-id: c005716ebcef390cf219e45649778f91e1f6e959
2023-12-12 14:53:59 +08:00
xingjun.wang
c1703c4f75 update args for MsDataset.load
Former-commit-id: c5f69357a167cbf99a93607177526e787419ea05
2023-12-12 13:02:54 +08:00
xingjun.wang
55b51fd7b2 add new datasets
Former-commit-id: d1e2e8430b3d21ca023d66e9ca28f7e5d2da0029
2023-12-12 12:44:15 +08:00
xingjun.wang
b761416dc1 add open orca
Former-commit-id: 7994c809b385bdc2c19e1e5e6fa8680aa9f2b77d
2023-12-12 12:34:04 +08:00
yuze.zyz
0cacd5147a fix typo
Former-commit-id: 29b07291e6b40e9f0a61632609465363291ae5c7
2023-12-08 18:13:26 +08:00
yuze.zyz
c2432b2e8d support ms dataset
Former-commit-id: 98638b35dc24045ac17b9b01d08d3a02372acef3
2023-12-08 18:00:57 +08:00
hiyouga
1602fe2350 fix #1696
Former-commit-id: 722ae14a652af34d9b91f9459e613d7959ecaa7e
2023-12-01 15:34:50 +08:00
Marco
238379e64a Update dataset_info.json
Added the Nectar dataset already preprocessed and divided in sft and rl to which I added a preprompt to each instruction since it has been seen that this increase instruction following

Former-commit-id: 6336e247c1535f356194046607038245bc48464f
2023-11-30 16:21:34 +01:00
hiyouga
c7ab341fcd update dataset
Former-commit-id: a310b22b446118d90dd73906847ed3d01a574b50
2023-11-17 23:19:12 +08:00
hiyouga
685d0c975a support full-parameter PPO
Former-commit-id: 4af967d69475e1c9fdf1a7983cd6b83bd431abff
2023-11-16 02:08:04 +08:00
hiyouga
f697474e67 add template, modify datasets
Former-commit-id: 81e54beb4d0f792f4fd7f450643caaf10f2f0b7d
2023-11-09 15:53:23 +08:00
hiyouga
2659753600 update data readme
Former-commit-id: 6a65ef44ed58714c611da60b5af96b85352e8735
2023-11-03 00:15:23 +08:00
hiyouga
fb3d981496 update data readme (zh)
Former-commit-id: b32fb3a984c681732b82f6544d6c05a98c34cf4c
2023-11-02 23:42:49 +08:00
hiyouga
33c47f0ebe support sharegpt format, add datasets
Former-commit-id: 202daf8987ccb7523be03ca535b572b5c9e65994
2023-11-02 23:10:04 +08:00
hiyouga
5daa358aab add MathInstruct dataset
Former-commit-id: 3d1d4b47055739854cf9788a902607e1bbba3723
2023-09-13 22:30:14 +08:00
hiyouga
c5fcf5b3a5 refactor dataset_attr, add eos in pt, fix #757
Former-commit-id: 0feec9a830b917b36686b61938a66e842eccf930
2023-09-01 19:00:45 +08:00
codemayq
09f61befc8 add ad gen dataset
Former-commit-id: fcd0788aa4dda0cecc1420d369d371032a207810
2023-08-27 20:35:32 +08:00
codemayq
9ae6abb7b5 add readme for dataset
Former-commit-id: bdcb0ea40e726e4c5752f938b379ed9a18e7e1d0
2023-08-23 19:55:45 +08:00
codemayq
22cece8acb add dataset stage and filter dataset when stage chosen in webui
Former-commit-id: 26e4136449a4df6028d834fd16a0f4a7c532759d
2023-08-23 18:54:23 +08:00
hiyouga
538c404cc0 update template
Former-commit-id: a95f3a4d62de1073a78125401cf4289ec0523156
2023-08-22 19:46:09 +08:00
Peter Pan
44777f77f8 add rm dataset explanation
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>

Former-commit-id: 1efb95025be6501f1b30b20e7c711d3590b5d1ee
2023-08-22 01:33:59 -04:00
hiyouga
7ada4f5f6f support DPO training (2305.18290)
Former-commit-id: 6d98de148e4af63a7028dfaeb6cf86eb56a4488f
2023-08-11 03:02:53 +08:00
hiyouga
8f25642087 restore from git lfs
Former-commit-id: 0c734a37113b773ae7c0bc8b8d1af39b15bc0fb2
2023-08-01 16:33:25 +08:00
hiyouga
3dee98ebc6 use git lfs
Former-commit-id: 4886d0071751f68c5a2d926bd9fcee0c93337322
2023-08-01 10:14:08 +08:00
hiyouga
ed252565f9 update dataset
Former-commit-id: 4a044aabbd19c92a9ae93c1c30536f5086fd47f9
2023-07-26 17:05:12 +08:00
hiyouga
9802398c71 update dataset
Former-commit-id: 4fc2c3293d91d8464527ebd1ddabe572c8355616
2023-07-23 20:01:43 +08:00
hiyouga
1f8d45d37e update readme, fix web ui postprocess
Former-commit-id: ba51ab3379100108f7b52a3c2444ccdd99e8a6ef
2023-07-22 14:29:22 +08:00
mrhan1993
097c15b657 根据GLM Efficient Tuning添加中文README,web添加了server_port
Former-commit-id: 29e3acd23eafd891667d7a860ec544a5b05d3c33
2023-07-21 16:57:58 +08:00
hiyouga
e9de1951dd add datasets
Former-commit-id: 02e4b47dea1b25905c61f2ace88bab112610f021
2023-07-19 20:59:15 +08:00
hiyouga
25182c4779 fix Baichuan-13B
Former-commit-id: 6d9d826b3246349454c68f4d13b862da4de986e2
2023-07-13 23:08:45 +08:00
zxbsmk
3f0ee46bd0 Support for WebNovel dataset
Former-commit-id: 655162f530784bc9374962c02d8b414872f83b2f
2023-07-12 17:29:47 +08:00
hiyouga
fe45c17b25 add open assistant dataset
Former-commit-id: 1694cf3078d04a14bce96da04b9d8c52176b1044
2023-06-28 23:09:33 +08:00
hiyouga
e52b201c13 add belle multiturn dataset
Former-commit-id: ac907ae1c37969df3cd09d4ab5f3f7f352eb259c
2023-06-16 20:01:16 +08:00
hiyouga
0da1b7d9ab support RM metrics, add generating Args
Former-commit-id: c461c6190bc124e98dde7f3cf96a59ce40b26fb0
2023-06-12 15:48:48 +08:00
BUAADreamer
463e0762c4 update json line file to .jsonl
Former-commit-id: 85e7676c3c1422795a047ffa8587bd4063ad7511
2023-06-11 18:59:19 +08:00
BUAADreamer
ac00fcd114 add some
Former-commit-id: 6982a53ed1f6f9fa03e99623b98fff56bf00317e
2023-06-11 18:55:53 +08:00
BUAADreamer
a976cba730 add code for reading from multi files in one directory
Former-commit-id: 9b80cf08b9f0d4aee896b228fb76399e9a7c9d8b
2023-06-10 16:27:30 +08:00
BUAADreamer
2012cb5cbc add code for reading from multi files in one directory
Former-commit-id: b7ebb83a96619e5111b0faa9da9d0feb8d9cdff0
2023-06-10 15:53:47 +08:00
hiyouga
f8d03f3aa9 remove dummy code
Former-commit-id: e6bc89d280945bbf48281107145c40a41d7cbd56
2023-05-30 16:28:00 +08:00
hiyouga
bb6f731461 add pre-training script
Former-commit-id: 935d58de2b3a2eadc4f0bed28c3ad7dee32e9fd5
2023-05-29 21:37:22 +08:00
hiyouga
54574f1dfa Initial commit
Former-commit-id: 5ca8e1d63727e7bcb8cab16542c763c47e48184a
2023-05-28 18:09:04 +08:00