mirror of
				https://github.com/hiyouga/LLaMA-Factory.git
				synced 2025-11-04 18:02:19 +08:00 
			
		
		
		
	update readme
Former-commit-id: 312d4f90784800dc8db4eaa7d908e6761115bc51
This commit is contained in:
		
							parent
							
								
									32dcc5a491
								
							
						
					
					
						commit
						50224b09cc
					
				
							
								
								
									
										26
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										26
									
								
								README.md
									
									
									
									
									
								
							@ -76,10 +76,10 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/
 | 
			
		||||
 | 
			
		||||
[24/03/07] We supported gradient low-rank projection (**[GaLore](https://arxiv.org/abs/2403.03507)**) algorithm. See `examples/extras/galore` for usage.
 | 
			
		||||
 | 
			
		||||
[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)
 | 
			
		||||
 | 
			
		||||
<details><summary>Full Changelog</summary>
 | 
			
		||||
 | 
			
		||||
[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)
 | 
			
		||||
 | 
			
		||||
[24/02/28] We supported weight-decomposed LoRA (**[DoRA](https://arxiv.org/abs/2402.09353)**). Try `--use_dora` to activate DoRA training.
 | 
			
		||||
 | 
			
		||||
[24/02/15] We supported **block expansion** proposed by [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro). See `examples/extras/llama_pro` for usage.
 | 
			
		||||
@ -586,7 +586,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> Use `--model_name_or_path path_to_export` solely to use the exported model.
 | 
			
		||||
> 
 | 
			
		||||
> Use `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.
 | 
			
		||||
> Use `CUDA_VISIBLE_DEVICES=0`, `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.
 | 
			
		||||
 | 
			
		||||
### Inference with OpenAI-style API
 | 
			
		||||
 | 
			
		||||
@ -662,19 +662,23 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
 | 
			
		||||
 | 
			
		||||
### Dockerize Training
 | 
			
		||||
 | 
			
		||||
#### Get ready
 | 
			
		||||
 | 
			
		||||
Necessary dockerized environment is needed, such as Docker or Docker Compose.
 | 
			
		||||
 | 
			
		||||
#### Docker support
 | 
			
		||||
#### Use Docker
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
docker build -f ./Dockerfile -t llama-factory:latest .
 | 
			
		||||
 | 
			
		||||
docker run --gpus=all -v ./hf_cache:/root/.cache/huggingface/ -v ./data:/app/data -v ./output:/app/output -p 7860:7860 --shm-size 16G --name llama_factory -d llama-factory:latest
 | 
			
		||||
docker run --gpus=all \
 | 
			
		||||
    -v ./hf_cache:/root/.cache/huggingface/ \
 | 
			
		||||
    -v ./data:/app/data \
 | 
			
		||||
    -v ./output:/app/output \
 | 
			
		||||
    -e CUDA_VISIBLE_DEVICES=0 \
 | 
			
		||||
    -p 7860:7860 \
 | 
			
		||||
    --shm-size 16G \
 | 
			
		||||
    --name llama_factory \
 | 
			
		||||
    -d llama-factory:latest
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### Docker Compose support
 | 
			
		||||
#### Use Docker Compose
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
docker compose -f ./docker-compose.yml up -d
 | 
			
		||||
@ -682,7 +686,7 @@ docker compose -f ./docker-compose.yml up -d
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> Details about volume:
 | 
			
		||||
> * hf_cache: Utilize Huggingface cache on the host machine. Reassignable if a cache already exists in a different directory.
 | 
			
		||||
> * hf_cache: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory.
 | 
			
		||||
> * data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
 | 
			
		||||
> * output: Set export dir to this location so that the merged result can be accessed directly on the host machine.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
							
								
								
									
										36
									
								
								README_zh.md
									
									
									
									
									
								
							
							
						
						
									
										36
									
								
								README_zh.md
									
									
									
									
									
								
							@ -76,10 +76,10 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd
 | 
			
		||||
 | 
			
		||||
[24/03/07] 我们支持了梯度低秩投影(**[GaLore](https://arxiv.org/abs/2403.03507)**)算法。详细用法请参照 `examples/extras/galore`。
 | 
			
		||||
 | 
			
		||||
[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。)
 | 
			
		||||
 | 
			
		||||
<details><summary>展开日志</summary>
 | 
			
		||||
 | 
			
		||||
[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。)
 | 
			
		||||
 | 
			
		||||
[24/02/28] 我们支持了 **[DoRA](https://arxiv.org/abs/2402.09353)** 微调。请使用 `--use_dora` 参数进行 DoRA 微调。
 | 
			
		||||
 | 
			
		||||
[24/02/15] 我们支持了 [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro) 提出的**块扩展**方法。详细用法请参照 `examples/extras/llama_pro`。
 | 
			
		||||
@ -585,7 +585,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> 仅使用 `--model_name_or_path path_to_export` 来加载导出后的模型。
 | 
			
		||||
> 
 | 
			
		||||
> 合并 LoRA 权重之后可再次使用 `--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。
 | 
			
		||||
> 合并 LoRA 权重之后可再次使用 `CUDA_VISIBLE_DEVICES=0`、`--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。
 | 
			
		||||
 | 
			
		||||
### 使用 OpenAI 风格 API 推理
 | 
			
		||||
 | 
			
		||||
@ -659,6 +659,36 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> 我们建议在量化模型的预测中使用 `--per_device_eval_batch_size=1` 和 `--max_target_length 128`。
 | 
			
		||||
 | 
			
		||||
### 使用容器
 | 
			
		||||
 | 
			
		||||
#### 使用 Docker
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
docker build -f ./Dockerfile -t llama-factory:latest .
 | 
			
		||||
 | 
			
		||||
docker run --gpus=all \
 | 
			
		||||
    -v ./hf_cache:/root/.cache/huggingface/ \
 | 
			
		||||
    -v ./data:/app/data \
 | 
			
		||||
    -v ./output:/app/output \
 | 
			
		||||
    -e CUDA_VISIBLE_DEVICES=0 \
 | 
			
		||||
    -p 7860:7860 \
 | 
			
		||||
    --shm-size 16G \
 | 
			
		||||
    --name llama_factory \
 | 
			
		||||
    -d llama-factory:latest
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### 使用 Docker Compose
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
docker compose -f ./docker-compose.yml up -d
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> 数据卷详情:
 | 
			
		||||
> * hf_cache:使用宿主机的 Hugging Face 缓存文件夹,允许更改为新的目录。
 | 
			
		||||
> * data:宿主机中存放数据集的文件夹路径。
 | 
			
		||||
> * output:将导出目录设置为该路径后,即可在宿主机中访问导出后的模型。
 | 
			
		||||
 | 
			
		||||
## 使用了 LLaMA Factory 的项目
 | 
			
		||||
 | 
			
		||||
1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [[arxiv]](https://arxiv.org/abs/2308.02223)
 | 
			
		||||
 | 
			
		||||
@ -10,6 +10,8 @@ services:
 | 
			
		||||
      - ./hf_cache:/root/.cache/huggingface/
 | 
			
		||||
      - ./data:/app/data
 | 
			
		||||
      - ./output:/app/output
 | 
			
		||||
    environment:
 | 
			
		||||
      - CUDA_VISIBLE_DEVICES=0
 | 
			
		||||
    ports:
 | 
			
		||||
      - "7860:7860"
 | 
			
		||||
    ipc: host
 | 
			
		||||
 | 
			
		||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user