[model] add seed coder and qwen3 quant models (#8039)

2026-03-07 12:15:59 +08:00 · 2025-05-13 15:59:55 +08:00
parent d374dd08be
commit e0e97d2867
12 changed files with 104 additions and 20 deletions
--- a/data/README.md
+++ b/data/README.md
@@ -1,5 +1,7 @@
 The [dataset_info.json](dataset_info.json) contains all available datasets. If you are using a custom dataset, please **make sure** to add a *dataset description* in `dataset_info.json` and specify `dataset: dataset_name` before training to use it.

+The `dataset_info.json` file should be put in the `dataset_dir` directory. You can change `dataset_dir` to use another directory. The default value is `./data`.
+
 Currently we support datasets in **alpaca** and **sharegpt** format.

 ```json
--- a/data/README_zh.md
+++ b/data/README_zh.md
@@ -1,5 +1,7 @@
 [dataset_info.json](dataset_info.json) 包含了所有可用的数据集。如果您希望使用自定义数据集，请**务必**在 `dataset_info.json` 文件中添加*数据集描述*，并通过修改 `dataset: 数据集名称` 配置来使用数据集。

+其中 `dataset_info.json` 文件应放置在 `dataset_dir` 目录下。您可以通过修改 `dataset_dir` 参数来使用其他目录。默认值为 `./data`。
+
 目前我们支持 **alpaca** 格式和 **sharegpt** 格式的数据集。

 ```json
--- a/data/dataset_info.json
+++ b/data/dataset_info.json
@@ -559,6 +559,16 @@
      "images": "images"
    }
  },
+  "rlaif_v": {
+    "hf_hub_url": "openbmb/RLAIF-V-Dataset",
+    "ranking": true,
+    "columns": {
+      "prompt": "question",
+      "chosen": "chosen",
+      "rejected": "rejected",
+      "images": "image"
+    }
+  },
  "orca_pairs": {
    "hf_hub_url": "Intel/orca_dpo_pairs",
    "ranking": true,