From eb2aa2c073985b5436e817739873ad48b90ee3d5 Mon Sep 17 00:00:00 2001 From: codemayq Date: Tue, 20 Feb 2024 11:26:22 +0800 Subject: [PATCH 1/4] 1. update the version of pre-built bitsandbytes library 2. add pre-built flash-attn library Former-commit-id: 9b40eddf7aeb6b3bcf58374d43cbe44eb24f3849 --- README.md | 2 ++ README_zh.md | 6 ++++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 32150a7a..c7b55d8b 100644 --- a/README.md +++ b/README.md @@ -267,6 +267,8 @@ If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you wi pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl ``` +To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements. + ### Use ModelScope Hub (optional) If you have trouble with downloading models and datasets from Hugging Face, you can use LLaMA-Factory together with ModelScope in the following manner. diff --git a/README_zh.md b/README_zh.md index f99f91bc..6067ebfa 100644 --- a/README_zh.md +++ b/README_zh.md @@ -261,12 +261,14 @@ cd LLaMA-Factory pip install -r requirements.txt ``` -如果要在 Windows 平台上开启量化 LoRA(QLoRA),需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.1. +如果要在 Windows 平台上开启量化 LoRA(QLoRA),需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2. ```bash -pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl +pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl ``` +如果要在 Windows 平台上开启Flash Attention, 需要安装预编译的 `flash-attn` 库,支持CUDA 12.1 到12.2, 请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装 + ### 使用魔搭社区(可跳过) 如果您在 Hugging Face 模型和数据集的下载中遇到了问题,可以通过下述方法使用魔搭社区。 From e52e0d9b075d747276e6b3524c3c0a722e1d27b4 Mon Sep 17 00:00:00 2001 From: codemayq Date: Tue, 20 Feb 2024 11:28:25 +0800 Subject: [PATCH 2/4] 1. update the version of pre-built bitsandbytes library 2. add pre-built flash-attn library Former-commit-id: 2b76a300995a74398ee11d9274e5c0eb6ef53403 --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index c7b55d8b..3aef67c8 100644 --- a/README.md +++ b/README.md @@ -261,10 +261,10 @@ cd LLaMA-Factory pip install -r requirements.txt ``` -If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of `bitsandbytes` library, which supports CUDA 11.1 to 12.1. +If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of `bitsandbytes` library, which supports CUDA 11.1 to 12.2. ```bash -pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl +pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl ``` To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements. From 0158812afb6eb9f290736623c79eb47663c845c4 Mon Sep 17 00:00:00 2001 From: hoshi-hiyouga Date: Tue, 20 Feb 2024 16:06:59 +0800 Subject: [PATCH 3/4] Update README_zh.md Former-commit-id: 4c3310651b67bbea8c893d503de2b5736184daaf --- README_zh.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README_zh.md b/README_zh.md index 6067ebfa..e00ff5a3 100644 --- a/README_zh.md +++ b/README_zh.md @@ -261,13 +261,13 @@ cd LLaMA-Factory pip install -r requirements.txt ``` -如果要在 Windows 平台上开启量化 LoRA(QLoRA),需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2. +如果要在 Windows 平台上开启量化 LoRA(QLoRA),需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2。 ```bash pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl ``` -如果要在 Windows 平台上开启Flash Attention, 需要安装预编译的 `flash-attn` 库,支持CUDA 12.1 到12.2, 请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装 +如果要在 Windows 平台上开启 FlashAttention-2,需要安装预编译的 `flash-attn` 库,支持 CUDA 12.1 到 12.2,请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装。 ### 使用魔搭社区(可跳过) From 688adad66505c87a7a50dd6e054da56e6a7204fb Mon Sep 17 00:00:00 2001 From: hoshi-hiyouga Date: Tue, 20 Feb 2024 16:07:55 +0800 Subject: [PATCH 4/4] Update README.md Former-commit-id: 8a7a02fcba077778a84164a16ff2cf33ec813dc4 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3aef67c8..e98b18b0 100644 --- a/README.md +++ b/README.md @@ -267,7 +267,7 @@ If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you wi pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl ``` -To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements. +To enable FlashAttention-2 on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements. ### Use ModelScope Hub (optional)