LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2026-07-28 11:46:09 +08:00

Files

hiyouga 9bba01a033 use git lfs

Former-commit-id: 4886d0071751f68c5a2d926bd9fcee0c93337322

2023-08-01 10:14:08 +08:00

belle_multiturn

add belle multiturn dataset

2023-06-16 20:01:16 +08:00

example_dataset

Initial commit

2023-05-28 18:09:04 +08:00

hh_rlhf_en

Initial commit

2023-05-28 18:09:04 +08:00

ultra_chat

Initial commit

2023-05-28 18:09:04 +08:00

README_zh.md

update readme, fix web ui postprocess

2023-07-22 14:29:22 +08:00

README.md

update readme, fix web ui postprocess

2023-07-22 14:29:22 +08:00

wiki_demo.txt

add pre-training script

2023-05-29 21:37:22 +08:00

README.md

If you are using a custom dataset, please provide your dataset definition in the following format in dataset_info.json.

"dataset_name": {
  "hf_hub_url": "the name of the dataset repository on the HuggingFace hub. (if specified, ignore below 3 arguments)",
  "script_url": "the name of the directory containing a dataset loading script. (if specified, ignore below 2 arguments)",
  "file_name": "the name of the dataset file in the this directory. (required if above are not specified)",
  "file_sha1": "the SHA-1 hash value of the dataset file. (optional)",
  "columns": {
    "prompt": "the name of the column in the datasets containing the prompts. (default: instruction)",
    "query": "the name of the column in the datasets containing the queries. (default: input)",
    "response": "the name of the column in the datasets containing the responses. (default: output)",
    "history": "the name of the column in the datasets containing the history of chat. (default: None)"
  }
}

where the prompt and response columns should contain non-empty values. The query column will be concatenated with the prompt column and used as input for the model. The history column should contain a list where each element is a string tuple representing a query-response pair.