add nf4 qlora support on Ascend NPU (#6601)

* add nf4 qlora support on Ascend NPU

* add transformers version check

* add python>=3.10 requirement description for npu

* tiny fix

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
This commit is contained in:
codingma
2025-01-13 19:43:36 +08:00
committed by GitHub
parent 3077f20339
commit 03de5ac912
4 changed files with 104 additions and 10 deletions

View File

@@ -415,7 +415,7 @@ To enable FlashAttention-2 on the Windows platform, you need to install the prec
<details><summary>For Ascend NPU users</summary>
To install LLaMA Factory on Ascend NPU devices, please specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
```bash
# replace the url according to your CANN version and devices
@@ -444,6 +444,33 @@ If you cannot infer model on NPU devices, try setting `do_sample: false` in the
Download the pre-built Docker images: [32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html)
To use nf4 QLoRA quantization based on bitsandbytes in Ascend NPU, please follow these 3 steps:
1. Manually compile bnb: Refer to [the installation documentation](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU) for the NPU version of bitsandbytes to complete the compilation and installation of bnb. The compilation requires a cmake version of at least 3.22.1 and a g++ version of at least 12.x.
```bash
# Install bitsandbytes from source
# Clone bitsandbytes repo, Ascend NPU backend is currently enabled on multi-backend-refactor branch
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git
cd bitsandbytes/
# Install dependencies
pip install -r requirements-dev.txt
# Install the dependencies for the compilation tools. Note that the commands for this step may vary depending on the operating system. The following are provided for reference
apt-get install -y build-essential cmake
# Compile & install
cmake -DCOMPUTE_BACKEND=npu -S .
make
pip install -e .
```
2. Install and use the main branch version of transformers.
```
git clone -b https://github.com/huggingface/transformers.git
cd transformers
pip install .
```
3. Set the double_quantization parameter to false in the training configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_otfq_npu.yaml) for guidance.
</details>
### Data Preparation