From 64bf750a74d65bf4e7a0f6c6478b7807db7c6486 Mon Sep 17 00:00:00 2001 From: hiyouga Date: Fri, 13 Oct 2023 13:53:43 +0800 Subject: [PATCH] update readme Former-commit-id: cb426766944487c72c10ca6e59ffb9888ca8b1e2 --- README.md | 18 ++---------------- README_zh.md | 20 +++----------------- setup.py | 4 ++-- 3 files changed, 7 insertions(+), 35 deletions(-) diff --git a/README.md b/README.md index 0315a3a9..9acf6e61 100644 --- a/README.md +++ b/README.md @@ -20,8 +20,6 @@ [23/09/10] We supported using **[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)** for the LLaMA models. Try `--flash_attn` argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs. -[23/08/18] We supported **resuming training**, upgrade `transformers` to `4.31.0` to enjoy this feature. - [23/08/12] We supported **RoPE scaling** to extend the context length of the LLaMA models. Try `--rope_scaling linear` argument in training and `--rope_scaling dynamic` argument at inference to extrapolate the position embeddings. [23/08/11] We supported **[DPO training](https://arxiv.org/abs/2305.18290)** for instruction-tuned models. See [this example](#dpo-training) to train your models. @@ -60,7 +58,7 @@ > [!NOTE] > **Default module** is used for the `--lora_target` argument, you can use `--lora_target all` to specify all the available modules. > -> For the "base" models, the `--template` argument can be chosen from `default`, `alpaca`, `vicuna` etc. But make sure to use the corresponding template for the "chat" models. +> For the "base" models, the `--template` argument can be chosen from `default`, `alpaca`, `vicuna` etc. But make sure to use the **corresponding template** for the "chat" models. ## Supported Training Approaches @@ -449,19 +447,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ This repository is licensed under the [Apache-2.0 License](LICENSE). -Please follow the model licenses to use the corresponding model weights: - -- [LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) -- [LLaMA-2](https://ai.meta.com/llama/license/) -- [BLOOM](https://huggingface.co/spaces/bigscience/license) -- [Falcon](LICENSE) -- [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B/resolve/main/baichuan-7B%20%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) -- [Baichuan2](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) -- [InternLM](https://github.com/InternLM/InternLM#open-source-license) -- [Qwen](https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/LICENSE) -- [XVERSE](https://github.com/xverse-ai/XVERSE-13B/blob/main/MODEL_LICENSE.pdf) -- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B/blob/main/MODEL_LICENSE) -- [Phi-1.5](https://huggingface.co/microsoft/phi-1_5/resolve/main/Research%20License.docx) +Please follow the model licenses to use the corresponding model weights: [LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) / [LLaMA-2](https://ai.meta.com/llama/license/) / [BLOOM](https://huggingface.co/spaces/bigscience/license) / [Falcon](LICENSE) / [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B/resolve/main/baichuan-7B%20%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) / [Baichuan2](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) / [InternLM](https://github.com/InternLM/InternLM#open-source-license) / [Qwen](https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/LICENSE) / [XVERSE](https://github.com/xverse-ai/XVERSE-13B/blob/main/MODEL_LICENSE.pdf) / [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B/blob/main/MODEL_LICENSE) / [Phi-1.5](https://huggingface.co/microsoft/phi-1_5/resolve/main/Research%20License.docx) ## Citation diff --git a/README_zh.md b/README_zh.md index 0e8e7059..5902ccc3 100644 --- a/README_zh.md +++ b/README_zh.md @@ -18,9 +18,7 @@ [23/09/23] 我们在项目中集成了 MMLU、C-Eval 和 CMMLU 评估集。使用方法请参阅[此示例](#模型评估)。 -[23/09/10] 我们针对 LLaMA 模型支持了 **[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)**。如果您使用的是 RTX4090、A100 或 H100 GPU,请使用 `--flash_attn` 参数以启用 FlashAttention-2(实验性功能)。 - -[23/08/18] 我们支持了**训练状态恢复**,请将 `transformers` 升级至 `4.31.0` 以启用此功能。 +[23/09/10] 我们针对 LLaMA 模型支持了 **[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)**。如果您使用的是 RTX4090、A100 或 H100 GPU,请使用 `--flash_attn` 参数以启用 FlashAttention-2。 [23/08/12] 我们支持了 **RoPE 插值**来扩展 LLaMA 模型的上下文长度。请使用 `--rope_scaling linear` 参数训练模型或使用 `--rope_scaling dynamic` 参数评估模型。 @@ -60,7 +58,7 @@ > [!NOTE] > **默认模块**应作为 `--lora_target` 参数的默认值,可使用 `--lora_target all` 参数指定全部模块。 > -> 对于所有“基座”(Base)模型,`--template` 参数可以是 `default`, `alpaca`, `vicuna` 等任意值。但“对话”(Chat)模型请务必使用对应的模板。 +> 对于所有“基座”(Base)模型,`--template` 参数可以是 `default`, `alpaca`, `vicuna` 等任意值。但“对话”(Chat)模型请务必使用**对应的模板**。 ## 训练方法 @@ -448,19 +446,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ 本仓库的代码依照 [Apache-2.0](LICENSE) 协议开源。 -使用模型权重时,请遵循对应的模型协议: - -- [LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) -- [LLaMA-2](https://ai.meta.com/llama/license/) -- [BLOOM](https://huggingface.co/spaces/bigscience/license) -- [Falcon](LICENSE) -- [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B/resolve/main/baichuan-7B%20%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) -- [Baichuan2](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) -- [InternLM](https://github.com/InternLM/InternLM#open-source-license) -- [Qwen](https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/LICENSE) -- [XVERSE](https://github.com/xverse-ai/XVERSE-13B/blob/main/MODEL_LICENSE.pdf) -- [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B/blob/main/MODEL_LICENSE) -- [Phi-1.5](https://huggingface.co/microsoft/phi-1_5/resolve/main/Research%20License.docx) +使用模型权重时,请遵循对应的模型协议:[LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) / [LLaMA-2](https://ai.meta.com/llama/license/) / [BLOOM](https://huggingface.co/spaces/bigscience/license) / [Falcon](LICENSE) / [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B/resolve/main/baichuan-7B%20%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) / [Baichuan2](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf) / [InternLM](https://github.com/InternLM/InternLM#open-source-license) / [Qwen](https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/LICENSE) / [XVERSE](https://github.com/xverse-ai/XVERSE-13B/blob/main/MODEL_LICENSE.pdf) / [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B/blob/main/MODEL_LICENSE) / [Phi-1.5](https://huggingface.co/microsoft/phi-1_5/resolve/main/Research%20License.docx) ## 引用 diff --git a/setup.py b/setup.py index 930dabb2..7638eaab 100644 --- a/setup.py +++ b/setup.py @@ -25,12 +25,12 @@ def main(): version=get_version(), author="hiyouga", author_email="hiyouga" "@" "buaa.edu.cn", - description="Easy-to-use fine-tuning framework using PEFT", + description="Easy-to-use LLM fine-tuning framework", long_description=open("README.md", "r", encoding="utf-8").read(), long_description_content_type="text/markdown", keywords=["LLaMA", "BLOOM", "Falcon", "LLM", "ChatGPT", "transformer", "pytorch", "deep learning"], license="Apache 2.0 License", - url="https://github.com/hiyouga/LLaMA-Efficient-Tuning", + url="https://github.com/hiyouga/LLaMA-Factory", package_dir={"": "src"}, packages=find_packages("src"), python_requires=">=3.8.0",