mirror of
https://github.com/hiyouga/LLaMA-Factory.git
synced 2025-08-23 06:12:50 +08:00
Merge pull request #4961 from khazic/main
Added the reference address for TRL PPO details. Former-commit-id: 3c424cf69a10846b92a5f969e333e401b691dcb3
This commit is contained in:
commit
ab477e1650
@ -200,6 +200,9 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t
|
||||
| ORPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
|
||||
| SimPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
|
||||
|
||||
> [!TIP]
|
||||
> The implementation details of PPO can be found in [this blog](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html).
|
||||
|
||||
## Provided Datasets
|
||||
|
||||
<details><summary>Pre-training datasets</summary>
|
||||
|
@ -200,6 +200,9 @@ https://github.com/user-attachments/assets/e6ce34b0-52d5-4f3e-a830-592106c4c272
|
||||
| ORPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
|
||||
| SimPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
|
||||
|
||||
> [!TIP]
|
||||
> 有关 PPO 的实现细节,请参考[此博客](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html)。
|
||||
|
||||
## 数据集
|
||||
|
||||
<details><summary>预训练数据集</summary>
|
||||
|
Loading…
x
Reference in New Issue
Block a user