[model] add seed coder and qwen3 quant models (#8039)

This commit is contained in:
hoshi-hiyouga
2025-05-13 15:59:55 +08:00
committed by GitHub
parent d374dd08be
commit e0e97d2867
12 changed files with 104 additions and 20 deletions

View File

@@ -1,5 +1,7 @@
The [dataset_info.json](dataset_info.json) contains all available datasets. If you are using a custom dataset, please **make sure** to add a *dataset description* in `dataset_info.json` and specify `dataset: dataset_name` before training to use it.
The `dataset_info.json` file should be put in the `dataset_dir` directory. You can change `dataset_dir` to use another directory. The default value is `./data`.
Currently we support datasets in **alpaca** and **sharegpt** format.
```json

View File

@@ -1,5 +1,7 @@
[dataset_info.json](dataset_info.json) 包含了所有可用的数据集。如果您希望使用自定义数据集,请**务必**在 `dataset_info.json` 文件中添加*数据集描述*,并通过修改 `dataset: 数据集名称` 配置来使用数据集。
其中 `dataset_info.json` 文件应放置在 `dataset_dir` 目录下。您可以通过修改 `dataset_dir` 参数来使用其他目录。默认值为 `./data`
目前我们支持 **alpaca** 格式和 **sharegpt** 格式的数据集。
```json

View File

@@ -559,6 +559,16 @@
"images": "images"
}
},
"rlaif_v": {
"hf_hub_url": "openbmb/RLAIF-V-Dataset",
"ranking": true,
"columns": {
"prompt": "question",
"chosen": "chosen",
"rejected": "rejected",
"images": "image"
}
},
"orca_pairs": {
"hf_hub_url": "Intel/orca_dpo_pairs",
"ranking": true,