diff --git a/README.md b/README.md index be019fa9..0fdea14c 100644 --- a/README.md +++ b/README.md @@ -415,7 +415,7 @@ To enable FlashAttention-2 on the Windows platform, you need to install the prec
For Ascend NPU users -To install LLaMA Factory on Ascend NPU devices, please specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands: +To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands: ```bash # replace the url according to your CANN version and devices @@ -444,6 +444,33 @@ If you cannot infer model on NPU devices, try setting `do_sample: false` in the Download the pre-built Docker images: [32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html) +To use nf4 QLoRA quantization based on bitsandbytes in Ascend NPU, please follow these 3 steps: + +1. Manually compile bnb: Refer to [the installation documentation](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU) for the NPU version of bitsandbytes to complete the compilation and installation of bnb. The compilation requires a cmake version of at least 3.22.1 and a g++ version of at least 12.x. +```bash +# Install bitsandbytes from source +# Clone bitsandbytes repo, Ascend NPU backend is currently enabled on multi-backend-refactor branch +git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git +cd bitsandbytes/ + +# Install dependencies +pip install -r requirements-dev.txt + +# Install the dependencies for the compilation tools. Note that the commands for this step may vary depending on the operating system. The following are provided for reference +apt-get install -y build-essential cmake + +# Compile & install +cmake -DCOMPUTE_BACKEND=npu -S . +make +pip install -e . +``` +2. Install and use the main branch version of transformers. +``` +git clone -b https://github.com/huggingface/transformers.git +cd transformers +pip install . +``` +3. Set the double_quantization parameter to false in the training configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_otfq_npu.yaml) for guidance.
### Data Preparation diff --git a/README_zh.md b/README_zh.md index 50ec1acc..72e090f6 100644 --- a/README_zh.md +++ b/README_zh.md @@ -416,7 +416,7 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl
昇腾 NPU 用户指南 -在昇腾 NPU 设备上安装 LLaMA Factory 时,需要指定额外依赖项,使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外,还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**,安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令: +在昇腾 NPU 设备上安装 LLaMA Factory 时,请升级Python到3.10及以上,并需要指定额外依赖项,使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外,还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**,安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令: ```bash # 请替换 URL 为 CANN 版本和设备型号对应的 URL @@ -445,6 +445,33 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh 下载预构建 Docker 镜像:[32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html) +如果要在 Ascend NPU中使用 基于bitsandbytes 的nf4 QLoRA量化,请执行如下3个步骤 +1. 手动编译bnb:请参考 bitsandbytes npu版本的[安装文档](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU)完成bnb的编译安装,编译要求环境cmake版本不低于3.22.1,g++版本不低于12.x +``` +# 从源码安装bitsandbytes +# 克隆bitsandbytes仓库, Ascend NPU目前在multi-backend-refactor中支持 +git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git +cd bitsandbytes/ + +# 安装依赖 +pip install -r requirements-dev.txt + +# 安装编译工具依赖,该步骤在不同系统上命令有所不同,供参考 +apt-get install -y build-essential cmake + +# 编译 & 安装 +cmake -DCOMPUTE_BACKEND=npu -S . +make +pip install -e . +``` +2. 安装使用transformers的main分支版本 +``` +git clone -b https://github.com/huggingface/transformers.git +cd transformers +pip install . +``` +3. 设置训练参数中的double_quantization参数为false,可参考[示例](examples/train_qlora/llama3_lora_sft_otfq_npu.yaml) +
### 数据准备 diff --git a/examples/train_qlora/llama3_lora_sft_otfq_npu.yaml b/examples/train_qlora/llama3_lora_sft_otfq_npu.yaml new file mode 100644 index 00000000..983acd39 --- /dev/null +++ b/examples/train_qlora/llama3_lora_sft_otfq_npu.yaml @@ -0,0 +1,43 @@ +### model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +quantization_bit: 4 +quantization_method: bitsandbytes # choices: [bitsandbytes (4/8), hqq (2/3/4/5/6/8), eetq (8)] +double_quantization: false +trust_remote_code: true + +### method +stage: sft +do_train: true +finetuning_type: lora +lora_target: all + +### dataset +dataset: identity,alpaca_en_demo +template: llama3 +cutoff_len: 2048 +max_samples: 1000 +overwrite_cache: true +preprocessing_num_workers: 16 + +### output +output_dir: saves/llama3-8b/lora/sft +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +### train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 1.0e-4 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_ratio: 0.1 +bf16: true +ddp_timeout: 180000000 + +### eval +val_size: 0.1 +per_device_eval_batch_size: 1 +eval_strategy: steps +eval_steps: 500 diff --git a/src/llamafactory/model/model_utils/longlora.py b/src/llamafactory/model/model_utils/longlora.py index 89457846..53043c2b 100644 --- a/src/llamafactory/model/model_utils/longlora.py +++ b/src/llamafactory/model/model_utils/longlora.py @@ -23,14 +23,7 @@ from typing import TYPE_CHECKING, Optional, Tuple import torch import torch.nn as nn import transformers -from transformers.models.llama.modeling_llama import ( - Cache, - LlamaAttention, - LlamaFlashAttention2, - LlamaSdpaAttention, - apply_rotary_pos_emb, - repeat_kv, -) +from transformers.models.llama.modeling_llama import Cache, apply_rotary_pos_emb, repeat_kv from ...extras import logging from ...extras.constants import SUPPORTED_CLASS_FOR_S2ATTN @@ -38,6 +31,10 @@ from ...extras.misc import check_version from ...extras.packages import is_transformers_version_greater_than +if not is_transformers_version_greater_than("4.48.0"): + from transformers.models.llama.modeling_llama import LlamaAttention, LlamaFlashAttention2, LlamaSdpaAttention + + if TYPE_CHECKING: from transformers import PretrainedConfig