From 16599f23766dafba1c82e9f2a2e34f6b8e014766 Mon Sep 17 00:00:00 2001
From: codemayq <codingma@pku.edu.cn>
Date: Tue, 20 Feb 2024 11:26:22 +0800
Subject: [PATCH 1/4] 1. update the version of pre-built bitsandbytes library
 2. add pre-built flash-attn library

Former-commit-id: 95f53a46bd1d0012d90a1b6198d35807de1dc100
---
 README.md    | 2 ++
 README_zh.md | 6 ++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 32150a7a..c7b55d8b 100644
--- a/README.md
+++ b/README.md
@@ -267,6 +267,8 @@ If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you wi
 pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl
 ```
 
+To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements.
+
 ### Use ModelScope Hub (optional)
 
 If you have trouble with downloading models and datasets from Hugging Face, you can use LLaMA-Factory together with ModelScope in the following manner.
diff --git a/README_zh.md b/README_zh.md
index f99f91bc..6067ebfa 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -261,12 +261,14 @@ cd LLaMA-Factory
 pip install -r requirements.txt
 ```
 
-如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.1.
+如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2.
 
 ```bash
-pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl
+pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl
 ```
 
+如果要在 Windows 平台上开启Flash Attention， 需要安装预编译的 `flash-attn` 库，支持CUDA 12.1 到12.2, 请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装
+
 ### 使用魔搭社区（可跳过）
 
 如果您在 Hugging Face 模型和数据集的下载中遇到了问题，可以通过下述方法使用魔搭社区。

From e649ddd99fa8099429de5375338363b170d40f0d Mon Sep 17 00:00:00 2001
From: codemayq <codingma@pku.edu.cn>
Date: Tue, 20 Feb 2024 11:28:25 +0800
Subject: [PATCH 2/4] 1. update the version of pre-built bitsandbytes library
 2. add pre-built flash-attn library

Former-commit-id: d47e40633a9428175db9319f6778eb7c98df02e0
---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index c7b55d8b..3aef67c8 100644
--- a/README.md
+++ b/README.md
@@ -261,10 +261,10 @@ cd LLaMA-Factory
 pip install -r requirements.txt
 ```
 
-If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of `bitsandbytes` library, which supports CUDA 11.1 to 12.1.
+If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of `bitsandbytes` library, which supports CUDA 11.1 to 12.2.
 
 ```bash
-pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl
+pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl
 ```
 
 To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements.

From 4a9eee7e25a7967caa4957f701bc27d90c71efa2 Mon Sep 17 00:00:00 2001
From: hoshi-hiyouga <hiyouga@buaa.edu.cn>
Date: Tue, 20 Feb 2024 16:06:59 +0800
Subject: [PATCH 3/4] Update README_zh.md

Former-commit-id: 175a48d79d869da9d162fa881c3bed0e43235ee2
---
 README_zh.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README_zh.md b/README_zh.md
index 6067ebfa..e00ff5a3 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -261,13 +261,13 @@ cd LLaMA-Factory
 pip install -r requirements.txt
 ```
 
-如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2.
+如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2。
 
 ```bash
 pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl
 ```
 
-如果要在 Windows 平台上开启Flash Attention， 需要安装预编译的 `flash-attn` 库，支持CUDA 12.1 到12.2, 请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装
+如果要在 Windows 平台上开启 FlashAttention-2，需要安装预编译的 `flash-attn` 库，支持 CUDA 12.1 到 12.2，请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装。
 
 ### 使用魔搭社区（可跳过）
 

From 48dab3ad37eb09aad31776c746a41a469a5e8202 Mon Sep 17 00:00:00 2001
From: hoshi-hiyouga <hiyouga@buaa.edu.cn>
Date: Tue, 20 Feb 2024 16:07:55 +0800
Subject: [PATCH 4/4] Update README.md

Former-commit-id: 869fd208a81efd8a2e4785549684978fc2e17d64
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3aef67c8..e98b18b0 100644
--- a/README.md
+++ b/README.md
@@ -267,7 +267,7 @@ If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you wi
 pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl
 ```
 
-To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements.
+To enable FlashAttention-2 on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements.
 
 ### Use ModelScope Hub (optional)