Commit Graph

11 Commits

Author SHA1 Message Date
hiyouga
8ea32e4046 change to right-padding, update reward score #803 2023-09-08 20:04:31 +08:00
hiyouga
3ec4351cfd support DPO training (2305.18290) 2023-08-11 03:02:53 +08:00
hiyouga
220175ab24 update trainer 2023-08-07 13:34:35 +08:00
hiyouga
b5ba87952a update ppo trainer 2023-08-02 18:46:41 +08:00
hiyouga
ac88ce5233 fix RM save model 2023-08-01 11:56:17 +08:00
hiyouga
d3a0692d4d fix inference 2023-08-01 00:06:48 +08:00
hiyouga
0411a4b3e1 support streaming data, fix #284 #274 #268 2023-07-31 23:33:00 +08:00
hiyouga
d2f18197e3 fix save function 2023-07-21 14:09:07 +08:00
hiyouga
ed0e186a13 update web UI, support rm predict #210 2023-07-21 13:27:27 +08:00
hiyouga
22d9a9c2af fix callback 2023-07-15 17:18:16 +08:00
hiyouga
f751376613 modity code structure 2023-07-15 16:54:28 +08:00