mirror of
https://github.com/hiyouga/LLaMA-Factory.git
synced 2025-12-16 20:00:36 +08:00
add nf4 qlora support on Ascend NPU (#6601)
* add nf4 qlora support on Ascend NPU * add transformers version check * add python>=3.10 requirement description for npu * tiny fix --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
This commit is contained in:
29
README.md
29
README.md
@@ -415,7 +415,7 @@ To enable FlashAttention-2 on the Windows platform, you need to install the prec
|
||||
|
||||
<details><summary>For Ascend NPU users</summary>
|
||||
|
||||
To install LLaMA Factory on Ascend NPU devices, please specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
|
||||
To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
|
||||
|
||||
```bash
|
||||
# replace the url according to your CANN version and devices
|
||||
@@ -444,6 +444,33 @@ If you cannot infer model on NPU devices, try setting `do_sample: false` in the
|
||||
|
||||
Download the pre-built Docker images: [32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html)
|
||||
|
||||
To use nf4 QLoRA quantization based on bitsandbytes in Ascend NPU, please follow these 3 steps:
|
||||
|
||||
1. Manually compile bnb: Refer to [the installation documentation](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU) for the NPU version of bitsandbytes to complete the compilation and installation of bnb. The compilation requires a cmake version of at least 3.22.1 and a g++ version of at least 12.x.
|
||||
```bash
|
||||
# Install bitsandbytes from source
|
||||
# Clone bitsandbytes repo, Ascend NPU backend is currently enabled on multi-backend-refactor branch
|
||||
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git
|
||||
cd bitsandbytes/
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements-dev.txt
|
||||
|
||||
# Install the dependencies for the compilation tools. Note that the commands for this step may vary depending on the operating system. The following are provided for reference
|
||||
apt-get install -y build-essential cmake
|
||||
|
||||
# Compile & install
|
||||
cmake -DCOMPUTE_BACKEND=npu -S .
|
||||
make
|
||||
pip install -e .
|
||||
```
|
||||
2. Install and use the main branch version of transformers.
|
||||
```
|
||||
git clone -b https://github.com/huggingface/transformers.git
|
||||
cd transformers
|
||||
pip install .
|
||||
```
|
||||
3. Set the double_quantization parameter to false in the training configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_otfq_npu.yaml) for guidance.
|
||||
</details>
|
||||
|
||||
### Data Preparation
|
||||
|
||||
Reference in New Issue
Block a user