This website requires JavaScript.
Explore
Help
Register
Sign In
423A35C7
/
LLaMA-Factory
Watch
1
Star
0
Fork
0
You've already forked LLaMA-Factory
mirror of
https://github.com/hiyouga/LLaMA-Factory.git
synced
2025-12-15 11:20:35 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
Files
da39715085086c2b5ab487c04a7b31083b783b9e
LLaMA-Factory
/
src
/
llamafactory
/
data
/
processors
History
d
da39715085
经过大量的增量预训练,进行对比试验,发现这个bug:llama3在预训练时使用的tokenizer.eos_toke是'<|end_of_text|>' ,这里在每条数据后面也得用这个,而不是'<|eot_id|>',否则很容易导致严重的性能下降
...
Former-commit-id:
6979f3f848
2024-06-11 16:23:40 +08:00
..
__init__.py
refactor data preprocessing, fix mllm rlhf
2024-05-24 04:08:25 +08:00
feedback.py
update data processors
2024-06-07 04:15:40 +08:00
pairwise.py
update data processors
2024-06-07 04:15:40 +08:00
pretrain.py
经过大量的增量预训练,进行对比试验,发现这个bug:llama3在预训练时使用的tokenizer.eos_toke是'<|end_of_text|>' ,这里在每条数据后面也得用这个,而不是'<|eot_id|>',否则很容易导致严重的性能下降
2024-06-11 16:23:40 +08:00
processor_utils.py
update data processors
2024-06-07 04:15:40 +08:00
supervised.py
update data processors
2024-06-07 04:15:40 +08:00
unsupervised.py
update data processors
2024-06-07 04:15:40 +08:00