18 Commits

Author SHA1 Message Date
hiyouga
9ed4bb63d4 change to right-padding, update reward score #803
Former-commit-id: 8ea32e4046d75ddfa9517669e9de9f48fea720c6
2023-09-08 20:04:31 +08:00
hiyouga
5030f05126 add deepspeed check in PPO training
Former-commit-id: ed1c2c5557bb2714c3341294f0ea86f6496d4b0c
2023-09-07 19:12:40 +08:00
hiyouga
a4fd976048 refactor dataset_attr, add eos in pt, fix #757
Former-commit-id: a9d1fb72f791ae57a4d12f4e3a7e2abccf6a7077
2023-09-01 19:00:45 +08:00
codemayq
c955d9267c add dataset stage check
Former-commit-id: f7fdc088d49564f7d436fd445e7e1987a9a00a0b
2023-08-30 16:23:08 +08:00
hiyouga
38080233a5 fix #649
Former-commit-id: 57146c101f3e8f688b016a44c85e8ad5d1b6f938
2023-08-23 20:21:15 +08:00
hiyouga
03edfd07e7 fix PPO trainer #551 , update readme
Former-commit-id: 90205244186df558cd6b0000728d638348db3a10
2023-08-18 11:43:10 +08:00
hiyouga
fceca0bb6a update training resuming
Former-commit-id: 58f13e22da18babed0d2d4348474e07745da8fa5
2023-08-18 01:41:17 +08:00
hiyouga
66771352bb support bf16 ppo #551
Former-commit-id: d125218cde893c7c8527ab27b4d2dfb2474c384d
2023-08-18 00:40:32 +08:00
hiyouga
3f0a2d6adc support rope scaling, fix #475 #476 #478
Former-commit-id: fa940c17b8d3e379af08804003f1a522c1cd6ac4
2023-08-12 20:46:27 +08:00
hiyouga
abdfa26d06 support DPO training (2305.18290)
Former-commit-id: 3ec4351cfdaf2aefcc7d13345e19d79874ed61d3
2023-08-11 03:02:53 +08:00
hiyouga
6404167ab7 support val set in streaming mode
Former-commit-id: d86ea314a197fd821770d895e988c48d46679047
2023-08-09 23:00:26 +08:00
hiyouga
c5ad96375e fix RM save model
Former-commit-id: ac88ce5233248dbf1c7943c5f1197e40ba52fde9
2023-08-01 11:56:17 +08:00
hiyouga
aa4335eac7 release v0.1.4
Former-commit-id: 973a6386657885c7d11ecc8746ebd8804b6b355d
2023-08-01 10:08:47 +08:00
hiyouga
d5d3b2a42f fix arg check
Former-commit-id: 9cb1f119a4757c4fd2dc6db9335589d94f6ab5eb
2023-07-31 23:48:57 +08:00
hiyouga
a437424381 update readme
Former-commit-id: 62dca5bb820b8e75f3e24294d578322b97303b5f
2023-07-31 23:42:32 +08:00
hiyouga
e80b75b560 support streaming data, fix #284 #274 #268
Former-commit-id: 0411a4b3e122e7907441bc7a64b004948741a620
2023-07-31 23:33:00 +08:00
hiyouga
f769c2d3fc update web UI, support rm predict #210
Former-commit-id: ed0e186a134de816d6a9278f4e47baa6250a52d1
2023-07-21 13:27:27 +08:00
hiyouga
a696148d6b modity code structure
Former-commit-id: f75137661358f9070bc70c341dfa2cc5fd69cf94
2023-07-15 16:54:28 +08:00