mirror of
				https://github.com/hiyouga/LLaMA-Factory.git
				synced 2025-11-04 18:02:19 +08:00 
			
		
		
		
	add nf4 qlora support on Ascend NPU (#6601)
* add nf4 qlora support on Ascend NPU * add transformers version check * add python>=3.10 requirement description for npu * tiny fix --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn> Former-commit-id: 7912d1acac5f10dab22145fe729a90c57aad8d85
This commit is contained in:
		
							parent
							
								
									73c1c15b62
								
							
						
					
					
						commit
						11c38b9173
					
				
							
								
								
									
										29
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										29
									
								
								README.md
									
									
									
									
									
								
							@ -415,7 +415,7 @@ To enable FlashAttention-2 on the Windows platform, you need to install the prec
 | 
			
		||||
 | 
			
		||||
<details><summary>For Ascend NPU users</summary>
 | 
			
		||||
 | 
			
		||||
To install LLaMA Factory on Ascend NPU devices, please specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
 | 
			
		||||
To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# replace the url according to your CANN version and devices
 | 
			
		||||
@ -444,6 +444,33 @@ If you cannot infer model on NPU devices, try setting `do_sample: false` in the
 | 
			
		||||
 | 
			
		||||
Download the pre-built Docker images: [32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html)
 | 
			
		||||
 | 
			
		||||
To use nf4 QLoRA quantization based on bitsandbytes in Ascend NPU, please follow these 3 steps:
 | 
			
		||||
 | 
			
		||||
1. Manually compile bnb: Refer to [the installation documentation](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU) for the NPU version of bitsandbytes to complete the compilation and installation of bnb. The compilation requires a cmake version of at least 3.22.1 and a g++ version of at least 12.x.
 | 
			
		||||
```bash
 | 
			
		||||
# Install bitsandbytes from source
 | 
			
		||||
# Clone bitsandbytes repo, Ascend NPU backend is currently enabled on multi-backend-refactor branch
 | 
			
		||||
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git
 | 
			
		||||
cd bitsandbytes/
 | 
			
		||||
 | 
			
		||||
# Install dependencies
 | 
			
		||||
pip install -r requirements-dev.txt
 | 
			
		||||
 | 
			
		||||
# Install the dependencies for the compilation tools. Note that the commands for this step may vary depending on the operating system. The following are provided for reference
 | 
			
		||||
apt-get install -y build-essential cmake
 | 
			
		||||
 | 
			
		||||
# Compile & install  
 | 
			
		||||
cmake -DCOMPUTE_BACKEND=npu -S .
 | 
			
		||||
make
 | 
			
		||||
pip install -e .
 | 
			
		||||
```
 | 
			
		||||
2. Install and use the main branch version of transformers.
 | 
			
		||||
```
 | 
			
		||||
git clone -b https://github.com/huggingface/transformers.git
 | 
			
		||||
cd transformers
 | 
			
		||||
pip install .
 | 
			
		||||
```
 | 
			
		||||
3. Set the double_quantization parameter to false in the training configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_otfq_npu.yaml) for guidance.
 | 
			
		||||
</details>
 | 
			
		||||
 | 
			
		||||
### Data Preparation
 | 
			
		||||
 | 
			
		||||
							
								
								
									
										29
									
								
								README_zh.md
									
									
									
									
									
								
							
							
						
						
									
										29
									
								
								README_zh.md
									
									
									
									
									
								
							@ -416,7 +416,7 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl
 | 
			
		||||
 | 
			
		||||
<details><summary>昇腾 NPU 用户指南</summary>
 | 
			
		||||
 | 
			
		||||
在昇腾 NPU 设备上安装 LLaMA Factory 时,需要指定额外依赖项,使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外,还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**,安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令:
 | 
			
		||||
在昇腾 NPU 设备上安装 LLaMA Factory 时,请升级Python到3.10及以上,并需要指定额外依赖项,使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外,还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**,安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# 请替换 URL 为 CANN 版本和设备型号对应的 URL
 | 
			
		||||
@ -445,6 +445,33 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh
 | 
			
		||||
 | 
			
		||||
下载预构建 Docker 镜像:[32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html)
 | 
			
		||||
 | 
			
		||||
如果要在 Ascend NPU中使用 基于bitsandbytes 的nf4 QLoRA量化,请执行如下3个步骤
 | 
			
		||||
1. 手动编译bnb:请参考 bitsandbytes npu版本的[安装文档](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU)完成bnb的编译安装,编译要求环境cmake版本不低于3.22.1,g++版本不低于12.x
 | 
			
		||||
```
 | 
			
		||||
# 从源码安装bitsandbytes
 | 
			
		||||
# 克隆bitsandbytes仓库, Ascend NPU目前在multi-backend-refactor中支持
 | 
			
		||||
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git
 | 
			
		||||
cd bitsandbytes/
 | 
			
		||||
 | 
			
		||||
# 安装依赖
 | 
			
		||||
pip install -r requirements-dev.txt
 | 
			
		||||
 | 
			
		||||
# 安装编译工具依赖,该步骤在不同系统上命令有所不同,供参考
 | 
			
		||||
apt-get install -y build-essential cmake
 | 
			
		||||
 | 
			
		||||
# 编译 & 安装
 | 
			
		||||
cmake -DCOMPUTE_BACKEND=npu -S .
 | 
			
		||||
make
 | 
			
		||||
pip install -e .
 | 
			
		||||
```
 | 
			
		||||
2. 安装使用transformers的main分支版本
 | 
			
		||||
```
 | 
			
		||||
git clone -b https://github.com/huggingface/transformers.git
 | 
			
		||||
cd transformers
 | 
			
		||||
pip install .
 | 
			
		||||
```
 | 
			
		||||
3. 设置训练参数中的double_quantization参数为false,可参考[示例](examples/train_qlora/llama3_lora_sft_otfq_npu.yaml)
 | 
			
		||||
 | 
			
		||||
</details>
 | 
			
		||||
 | 
			
		||||
### 数据准备
 | 
			
		||||
 | 
			
		||||
							
								
								
									
										43
									
								
								examples/train_qlora/llama3_lora_sft_otfq_npu.yaml
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										43
									
								
								examples/train_qlora/llama3_lora_sft_otfq_npu.yaml
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,43 @@
 | 
			
		||||
### model
 | 
			
		||||
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 | 
			
		||||
quantization_bit: 4
 | 
			
		||||
quantization_method: bitsandbytes  # choices: [bitsandbytes (4/8), hqq (2/3/4/5/6/8), eetq (8)]
 | 
			
		||||
double_quantization: false
 | 
			
		||||
trust_remote_code: true
 | 
			
		||||
 | 
			
		||||
### method
 | 
			
		||||
stage: sft
 | 
			
		||||
do_train: true
 | 
			
		||||
finetuning_type: lora
 | 
			
		||||
lora_target: all
 | 
			
		||||
 | 
			
		||||
### dataset
 | 
			
		||||
dataset: identity,alpaca_en_demo
 | 
			
		||||
template: llama3
 | 
			
		||||
cutoff_len: 2048
 | 
			
		||||
max_samples: 1000
 | 
			
		||||
overwrite_cache: true
 | 
			
		||||
preprocessing_num_workers: 16
 | 
			
		||||
 | 
			
		||||
### output
 | 
			
		||||
output_dir: saves/llama3-8b/lora/sft
 | 
			
		||||
logging_steps: 10
 | 
			
		||||
save_steps: 500
 | 
			
		||||
plot_loss: true
 | 
			
		||||
overwrite_output_dir: true
 | 
			
		||||
 | 
			
		||||
### train
 | 
			
		||||
per_device_train_batch_size: 1
 | 
			
		||||
gradient_accumulation_steps: 8
 | 
			
		||||
learning_rate: 1.0e-4
 | 
			
		||||
num_train_epochs: 3.0
 | 
			
		||||
lr_scheduler_type: cosine
 | 
			
		||||
warmup_ratio: 0.1
 | 
			
		||||
bf16: true
 | 
			
		||||
ddp_timeout: 180000000
 | 
			
		||||
 | 
			
		||||
### eval
 | 
			
		||||
val_size: 0.1
 | 
			
		||||
per_device_eval_batch_size: 1
 | 
			
		||||
eval_strategy: steps
 | 
			
		||||
eval_steps: 500
 | 
			
		||||
@ -23,14 +23,7 @@ from typing import TYPE_CHECKING, Optional, Tuple
 | 
			
		||||
import torch
 | 
			
		||||
import torch.nn as nn
 | 
			
		||||
import transformers
 | 
			
		||||
from transformers.models.llama.modeling_llama import (
 | 
			
		||||
    Cache,
 | 
			
		||||
    LlamaAttention,
 | 
			
		||||
    LlamaFlashAttention2,
 | 
			
		||||
    LlamaSdpaAttention,
 | 
			
		||||
    apply_rotary_pos_emb,
 | 
			
		||||
    repeat_kv,
 | 
			
		||||
)
 | 
			
		||||
from transformers.models.llama.modeling_llama import Cache, apply_rotary_pos_emb, repeat_kv
 | 
			
		||||
 | 
			
		||||
from ...extras import logging
 | 
			
		||||
from ...extras.constants import SUPPORTED_CLASS_FOR_S2ATTN
 | 
			
		||||
@ -38,6 +31,10 @@ from ...extras.misc import check_version
 | 
			
		||||
from ...extras.packages import is_transformers_version_greater_than
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if not is_transformers_version_greater_than("4.48.0"):
 | 
			
		||||
    from transformers.models.llama.modeling_llama import LlamaAttention, LlamaFlashAttention2, LlamaSdpaAttention
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
if TYPE_CHECKING:
 | 
			
		||||
    from transformers import PretrainedConfig
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user