[data] add coig-p dataset (#7657)

This commit is contained in:
hoshi-hiyouga
2025-04-09 21:18:25 +08:00
committed by GitHub
parent 7dd35cff8a
commit cca359fb6d
11 changed files with 325 additions and 915 deletions

View File

@@ -85,7 +85,7 @@ Regarding the above dataset, the *dataset description* in `dataset_info.json` sh
### Pre-training Dataset
- [Example dataset](c4_demo.json)
- [Example dataset](c4_demo.jsonl)
In pre-training, only the `text` column will be used for model learning.