LLaMA-Factory/data at cdd887908c62f0811b6cc0275f557d371c5c9245 - LLaMA-Factory - Gitea: Git with a cup of tea

423A35C7/LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2026-06-18 05:08:54 +08:00

Files

History

mrhan1993 cdd887908c 根据GLM Efficient Tuning添加中文README，web添加了server_port

Former-commit-id: 9f0b57b370

2023-07-21 16:57:58 +08:00

..

belle_multiturn

add belle multiturn dataset

2023-06-16 20:01:16 +08:00

example_dataset

Initial commit

2023-05-28 18:09:04 +08:00

Initial commit

2023-05-28 18:09:04 +08:00

Initial commit

2023-05-28 18:09:04 +08:00

alpaca_data_en_52k.json.REMOVED.git-id

Initial commit

2023-05-28 18:09:04 +08:00

alpaca_data_zh_51k.json.REMOVED.git-id

Initial commit

2023-05-28 18:09:04 +08:00

alpaca_gpt4_data_en.json.REMOVED.git-id

Initial commit

2023-05-28 18:09:04 +08:00

alpaca_gpt4_data_zh.json.REMOVED.git-id

Initial commit

2023-05-28 18:09:04 +08:00

comparison_gpt4_data_en.json.REMOVED.git-id

support RM metrics, add generating Args

2023-06-12 15:48:48 +08:00

comparison_gpt4_data_zh.json.REMOVED.git-id

support RM metrics, add generating Args

2023-06-12 15:48:48 +08:00

dataset_info.json

add datasets

2023-07-19 20:59:15 +08:00

oaast_rm_zh.json

add open assistant dataset

2023-06-28 23:09:33 +08:00

oaast_rm.json.REMOVED.git-id

add open assistant dataset

2023-06-28 23:09:33 +08:00

oaast_sft_zh.json

add open assistant dataset

2023-06-28 23:09:33 +08:00

oaast_sft.json.REMOVED.git-id

add open assistant dataset

2023-06-28 23:09:33 +08:00

README_zh.md

根据GLM Efficient Tuning添加中文README，web添加了server_port

2023-07-21 16:57:58 +08:00

README.md

add datasets

2023-07-19 20:59:15 +08:00

refgpt_zh_50k_p1.json.REMOVED.git-id

add datasets

2023-07-19 20:59:15 +08:00

refgpt_zh_50k_p2.json.REMOVED.git-id

add datasets

2023-07-19 20:59:15 +08:00

self_cognition.json

add datasets

2023-07-19 20:59:15 +08:00

sharegpt_zh_27k.json.REMOVED.git-id

add datasets

2023-07-19 20:59:15 +08:00

wiki_demo.txt

add pre-training script

2023-05-29 21:37:22 +08:00

README.md

If you are using a custom dataset, please provide your dataset definition in the following format in dataset_info.json.

"dataset_name": {
    "hf_hub_url": "the name of the dataset repository on the HuggingFace hub. (if specified, ignore below 3 arguments)",
    "script_url": "the name of the directory containing a dataset loading script. (if specified, ignore below 2 arguments)",
    "file_name": "the name of the dataset file in the this directory. (required if above are not specified)",
    "file_sha1": "the SHA-1 hash value of the dataset file. (optional)",
    "columns": {
        "prompt": "the name of the column in the datasets containing the prompts. (default: instruction)",
        "query": "the name of the column in the datasets containing the queries. (default: input)",
        "response": "the name of the column in the datasets containing the responses. (default: output)",
        "history": "the name of the column in the datasets containing the history of chat. (default: None)"
    }
}

where the prompt and response columns should contain non-empty values. The query column will be concatenated with the prompt column and used as input for the model. The history column should contain a list where each element is a string tuple representing a query-response pair.