10 Commits

Author SHA1 Message Date
hiyouga
abdfa26d06 support DPO training (2305.18290)
Former-commit-id: 3ec4351cfdaf2aefcc7d13345e19d79874ed61d3
2023-08-11 03:02:53 +08:00
hiyouga
77f6647e8f update trainer
Former-commit-id: 220175ab2410ce22a553344eb75d5a556ed1a276
2023-08-07 13:34:35 +08:00
hiyouga
569df8ccd6 update ppo trainer
Former-commit-id: b5ba87952ab02ed0720365ebd571e47e92e1cda6
2023-08-02 18:46:41 +08:00
hiyouga
c5ad96375e fix RM save model
Former-commit-id: ac88ce5233248dbf1c7943c5f1197e40ba52fde9
2023-08-01 11:56:17 +08:00
hiyouga
e34fc5fd2e fix inference
Former-commit-id: d3a0692d4d9033a3b58d68357294854144479536
2023-08-01 00:06:48 +08:00
hiyouga
e80b75b560 support streaming data, fix #284 #274 #268
Former-commit-id: 0411a4b3e122e7907441bc7a64b004948741a620
2023-07-31 23:33:00 +08:00
hiyouga
daf81288a1 fix save function
Former-commit-id: d2f18197e379601a60fa878af975c68d7c8b9648
2023-07-21 14:09:07 +08:00
hiyouga
f769c2d3fc update web UI, support rm predict #210
Former-commit-id: ed0e186a134de816d6a9278f4e47baa6250a52d1
2023-07-21 13:27:27 +08:00
hiyouga
70b5232f9a fix callback
Former-commit-id: 22d9a9c2af6674eb832ae4aee80d679f19b7006f
2023-07-15 17:18:16 +08:00
hiyouga
a696148d6b modity code structure
Former-commit-id: f75137661358f9070bc70c341dfa2cc5fd69cf94
2023-07-15 16:54:28 +08:00