59 Commits

Author SHA1 Message Date
hiyouga
68c30094e1 support ppo score norm (trl 0.5.1.dev required)
Former-commit-id: 2b25db6d260ec1532281a592e873579346c7d21c
2023-08-18 12:02:42 +08:00
hiyouga
a8dd39ad08 fix PPO trainer #551 , update readme
Former-commit-id: faead74849470cebae9e37cde5fab2a71b32aa43
2023-08-18 11:43:10 +08:00
hiyouga
98aa629843 Release v0.1.6
Former-commit-id: 43c8b3c3c8bfb2e32d17fb3e8b194938e37d54bd
2023-08-11 23:25:57 +08:00
hiyouga
7ada4f5f6f support DPO training (2305.18290)
Former-commit-id: 6d98de148e4af63a7028dfaeb6cf86eb56a4488f
2023-08-11 03:02:53 +08:00
hiyouga
0951a9ce92 update args spec
Former-commit-id: a006068346edda6e2851b23d2005fdb218a7287d
2023-08-07 15:23:35 +08:00
hiyouga
bf15c4a03c support Qwen-7B, fix InternLM-7B inference
Former-commit-id: 25d2ca29ecb70cbfd5206333c667042a0c4d2e5a
2023-08-03 15:53:32 +08:00
hiyouga
7622a0e666 fix #194
Former-commit-id: 9792921531efefb4bcddbde4380169a78fe064a6
2023-07-19 17:07:33 +08:00
hiyouga
9b13d04127 create chat model
Former-commit-id: bddf583b2fc099c957a1037418bd8504a837663e
2023-07-15 19:26:20 +08:00
hiyouga
a69b1b1c3a modity code structure
Former-commit-id: 0682ed357210897e0b67c4a6eb31a94b3eb929f1
2023-07-15 16:54:28 +08:00