35 Commits

Author SHA1 Message Date
Yaowei Zheng
95ac3f2373 [release] Bye 2025 (#9702) 2025-12-31 22:22:40 +08:00
Username_Full
000526908a [core deps] upgrade TRL to be between 0.18 and 0.24 (#9617)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-12-31 20:54:27 +08:00
fivehaitao
c8d7e85b3e [fix] Fix prediction metrics in scripts/vllm_infer.py to match Transformers (#9701)
Co-authored-by: xuht6 <xuht6@asiainfo.com>
2025-12-31 18:30:00 +08:00
浮梦
16735b9e35 [v1] Refactor kernel plugin (#9669)
Co-authored-by: frozenleaves <frozen@Mac.local>
2025-12-31 18:26:48 +08:00
Weize Liu
4e1d69579a [data] add DLR-Web dataset for supervised fine-tuning (#9696) 2025-12-30 20:50:38 +08:00
浮梦
1857fbdd6b [ci] add cuda workflow (#9682)
Co-authored-by: frozenleaves <frozen@Mac.local>
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-12-29 20:03:00 +08:00
Kingsley
bb1ba31005 [misc] lint mca code (#9692) 2025-12-29 11:44:38 +08:00
Copilot
e97d0474fb [ci] Fix NPU device condition in docker workflow (#9688)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
2025-12-28 20:04:59 +08:00
Yaowei Zheng
3f0c3dc84d [assets] fix installation (#9687)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-28 19:29:28 +08:00
Hertz
c107cc22d0 [model] support MiniMax-M1&M2 series (#9680)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-12-28 19:02:05 +08:00
Yaowei Zheng
7ef1fba34a [version] fix gradio (#9685) 2025-12-28 05:00:51 +08:00
Copilot
eceec8ab69 [deps] goodbye python 3.9 (#9677)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
2025-12-27 02:50:44 +08:00
Yaowei Zheng
b44f651e09 [ci] fix docker (#9678) 2025-12-27 02:43:46 +08:00
Yaowei Zheng
55590f5ece [misc] fix ci with uv (#9676) 2025-12-27 01:39:13 +08:00
Copilot
a1b1931b4a [breaking] migrate from setuptools to uv (#9673)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
2025-12-26 22:47:23 +08:00
Xunpeng Xiao
3c17f2722c [model] Update ernie_vl to adapt new version (#9665) 2025-12-26 19:57:49 +08:00
Copilot
a882e2d5fc [assets] Add GitHub Copilot instructions for repository (#9675)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
2025-12-26 17:32:48 +08:00
Yaowei Zheng
a754604c11 [misc] fix accelerator (#9661)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-25 02:11:04 +08:00
Xunpeng Xiao
6a2eafbae3 [feat] Models trained and inferred with Mxfp4 are dequantized by default (#9652)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2025-12-24 00:26:40 +08:00
Yaowei Zheng
84485406b7 [ci] disable pip cache for ci (#9654) 2025-12-23 18:37:40 +08:00
Kingsley
1c8a42d2f8 [v1&WIP] dataloader init (#9645) 2025-12-23 16:29:47 +08:00
thulyubh22
7901b2f32e [model] efficient tuning for gpt-oss (#9354)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-23 16:28:38 +08:00
Yaowei Zheng
1f1f5a7d1b [ci] remove docker cache (#9640) 2025-12-22 01:03:10 +08:00
Yaowei Zheng
6ef9854713 [misc] fix cache & pin transformers to 4.57.1 (#9638) 2025-12-22 00:20:55 +08:00
Hertz
4923f52a28 [model] support MiMo-V2-Flash model (#9637) 2025-12-21 14:38:18 +08:00
Yaowei Zheng
0894b4f37e [misc] lint (#9636) 2025-12-20 16:19:39 +08:00
ZIYI ZENG
b0d49e137f [misc] Support split eval_dataset when explict set "predict_with_generate" (#9604)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-20 01:46:00 +08:00
Xunpeng Xiao
ddd7dcc722 [data] Fix the video frame sampling issue #9620 (#9634) 2025-12-19 18:36:31 +08:00
浮梦
5204cd2bca [misc] add version check for moe (#9633) 2025-12-19 14:57:37 +08:00
Xunpeng Xiao
8c74dca76a [feat] Models trained and inferred with FP8 are dequantized by default (#9627) 2025-12-18 22:54:35 +08:00
xvxuopop
e8deda53a1 [example] add Qwen3 series examples (#9624)
Co-authored-by: UsernameFull <tohowtodoit@gmail.com>
2025-12-18 21:27:00 +08:00
mrhaoxx
a769fb94b9 [feat] support ktransformers for dpo (#9621)
Co-authored-by: poryfly <porykid@gmail.com>
2025-12-18 21:26:25 +08:00
mrhaoxx
964569751f [kt] refactor ktransformers integration (#9632) 2025-12-18 21:26:04 +08:00
Hertz
9fd4b094d4 [model] support VibeThinker models (#9616) 2025-12-16 21:50:46 +08:00
浮梦
18c21bce5a [test] add allreduce test on npu (#9619)
Co-authored-by: frozenleaves <frozen@Mac.local>
2025-12-16 21:33:30 +08:00
216 changed files with 4025 additions and 1992 deletions

180
.github/copilot-instructions.md vendored Normal file
View File

@@ -0,0 +1,180 @@
# GitHub Copilot Instructions for LLaMA Factory
## Project Overview
LLaMA Factory is an efficient fine-tuning framework for 100+ large language models (LLMs). It provides:
- Support for various models: LLaMA, LLaVA, Mistral, Qwen, DeepSeek, Yi, Gemma, ChatGLM, Phi, etc.
- Multiple training methods: pre-training, supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO
- Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and QLoRA variants
- Advanced algorithms: GaLore, BAdam, APOLLO, Adam-mini, Muon, OFT, DoRA, etc.
- Web UI (LLaMA Board) and CLI interfaces
### Architecture Versions
LLaMA Factory has two parallel architectures that can be switched via the `USE_V1` environment variable:
**v0 (default)** - File hierarchy:
- `api`, `webui``chat`, `eval`, `train``data`, `model``hparams``extras`
**v1** - File hierarchy:
- `trainers``core``accelerator`, `plugins`, `config``utils`
Set `USE_V1=1` to enable v1 architecture.
## Code Structure
### v0 Architecture (Default)
- `src/llamafactory/` - Main package directory
- `api/` - OpenAI-style API implementation
- `chat/` - Chat interface implementation
- `cli.py` - Command-line interface
- `data/` - Data processing and dataset handling
- `eval/` - Model evaluation utilities
- `extras/` - Additional utilities and helpers
- `hparams/` - Hyperparameter definitions
- `model/` - Model loading, patching, and utilities
- `train/` - Training pipeline implementation
- `webui/` - Gradio-based web interface
- `src/train.py` - Training entry script (delegates to `llamafactory.train.tuner`)
- `src/webui.py` - Web UI entry script (delegates to `llamafactory.webui.interface`)
- `src/api.py` - API server entry script (delegates to `llamafactory.api.app`)
- `tests/` - Test suite
- `examples/` - Example configurations for various training scenarios
- `data/` - Dataset definitions and examples
### v1 Architecture (USE_V1=1)
- `src/llamafactory/v1/` - Version 1 package directory
- `trainers/` - Training implementations
- `core/` - Core training utilities
- `accelerator/` - Acceleration and distributed training
- `plugins/` - Pluggable components (model, data, sampler, trainer)
- `config/` - Configuration management
- `utils/` - Utility functions
## Development Practices
### Code Style
- Follow the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
- Use ruff for linting and formatting
- Line length: 119 characters
- Indentation: 4 spaces
- Quote style: double quotes
- Use Google-style docstrings for documentation
### Import Organization
- Known first-party: `llamafactory`
- Known third-party: `accelerate`, `datasets`, `gradio`, `numpy`, `peft`, `torch`, `transformers`, `trl`
- Use 2 blank lines after imports
### Quality Checks
Before committing code, run:
```bash
make style # Auto-fix style issues
make quality # Check code quality
make test # Run test suite
```
Or use the combined command:
```bash
make commit # Run pre-commit hooks
```
### Testing
- Use pytest for testing
- Tests are located in `tests/` and `tests_v1/` directories
- Run tests with: `make test` (which runs `WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/ tests_v1/`)
- Disable wandb during testing to avoid external dependencies
- **Note**: Training configurations require GPU machines, so training is typically not tested end-to-end. Use `make test` to validate file-level functionality.
### Building
Build the package with:
```bash
pip3 install build && python3 -m build
```
### License
- All source files must include the Apache 2.0 license header
- Check license headers with: `make license`
## Common Patterns
### Configuration Files
- Training configurations are typically YAML or JSON files in `examples/` directory
- Hyperparameters are defined using dataclasses in `src/llamafactory/hparams/`
### Model Support
- New model support is added through model patches in `src/llamafactory/model/`
- Visual models use the visual utilities in `src/llamafactory/model/model_utils/visual.py`
- Quantization support is in `src/llamafactory/model/model_utils/quantization.py`
### Data Processing
- Dataset definitions are in `data/dataset_info.json`
- Data templates and processors are in `src/llamafactory/data/`
### Training
- Training pipelines are in `src/llamafactory/train/`
- Support for different training methods: SFT, DPO, PPO, RM, PT, KTO, ORPO
## Key Dependencies
- Python >= 3.9.0
- PyTorch and transformers for model handling
- datasets for data processing
- peft for parameter-efficient fine-tuning
- accelerate for distributed training
- gradio for web UI
- trl for reinforcement learning
- Optional: vllm/sglang for inference, flash-attention-2, unsloth, liger-kernel
## Entry Points
- **CLI Training**: `llamafactory-cli train --config examples/train_lora/llama3_lora_sft.yaml`
- **Web UI**: `llamafactory-cli webui` or `python src/webui.py`
- **API Server**: `llamafactory-cli api` or `python src/api.py`
- **Chat Interface**: `llamafactory-cli chat --model_name_or_path MODEL_PATH`
## Environment Setup
For development:
```bash
pip install -e ".[dev]"
```
## Important Notes
- The project supports multiple backends: default PyTorch, vLLM, SGLang
- Megatron-core training is supported via mcore_adapter
- SwanLab and W&B are supported for experiment tracking
- Docker support is available with pre-built images
- Day-0/Day-1 support for latest cutting-edge models
- Multi-modal support for vision and audio understanding tasks
## Contribution Guidelines
1. Fork the repository
2. Create a development branch
3. Set up development environment with `pip install -e ".[dev]"`
4. Make changes following the style guide
5. Run quality checks: `make style && make quality`
6. Run tests: `make test`
7. Submit a pull request
## Common Commands
- `make style` - Format code
- `make quality` - Run linters
- `make test` - Run tests
- `make commit` - Install and run pre-commit hooks
- `make license` - Check license headers

View File

@@ -7,7 +7,7 @@ on:
- "main" - "main"
paths: paths:
- "**/*.py" - "**/*.py"
- "requirements.txt" - "pyproject.toml"
- "docker/**" - "docker/**"
- ".github/workflows/*.yml" - ".github/workflows/*.yml"
pull_request: pull_request:
@@ -15,7 +15,7 @@ on:
- "main" - "main"
paths: paths:
- "**/*.py" - "**/*.py"
- "requirements.txt" - "pyproject.toml"
- "docker/**" - "docker/**"
- ".github/workflows/*.yml" - ".github/workflows/*.yml"
release: release:
@@ -29,16 +29,13 @@ jobs:
matrix: matrix:
include: include:
- device: "cuda" - device: "cuda"
npu_type: "" - device: "npu-a2"
- device: "npu" - device: "npu-a3"
npu_type: "a2"
- device: "npu"
npu_type: "a3"
runs-on: ubuntu-latest runs-on: ubuntu-latest
concurrency: concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.device }}-${{ matrix.npu_type }} group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.device }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }} cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
environment: environment:
@@ -55,16 +52,11 @@ jobs:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Get llamafactory version - name: Get llamafactory version
id: version id: version
run: | run: |
if [ "${{ github.event_name }}" = "release" ]; then if [ "${{ github.event_name }}" = "release" ]; then
echo "tag=$(python setup.py --version)" >> "$GITHUB_OUTPUT" echo "tag=$(grep -oP 'VERSION = "\K[^"]+' src/llamafactory/extras/env.py)" >> "$GITHUB_OUTPUT"
else else
echo "tag=latest" >> "$GITHUB_OUTPUT" echo "tag=latest" >> "$GITHUB_OUTPUT"
fi fi
@@ -80,7 +72,7 @@ jobs:
password: ${{ secrets.DOCKERHUB_TOKEN }} password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Login to Quay - name: Login to Quay
if: ${{ github.event_name != 'pull_request' && matrix.device == 'npu'}} if: ${{ github.event_name != 'pull_request' && startsWith(matrix.device, 'npu') }}
uses: docker/login-action@v3 uses: docker/login-action@v3
with: with:
registry: quay.io registry: quay.io
@@ -93,16 +85,12 @@ jobs:
with: with:
context: . context: .
file: ./docker/docker-cuda/Dockerfile file: ./docker/docker-cuda/Dockerfile
build-args: |
EXTRAS=metrics,deepspeed,liger-kernel
push: ${{ github.event_name != 'pull_request' }} push: ${{ github.event_name != 'pull_request' }}
tags: | tags: |
docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }} docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Build and push Docker image (NPU-A2) - name: Build and push Docker image (NPU-A2)
if: ${{ matrix.device == 'npu' && matrix.npu_type == 'a2' }} if: ${{ matrix.device == 'npu-a2' }}
uses: docker/build-push-action@v6 uses: docker/build-push-action@v6
with: with:
context: . context: .
@@ -112,11 +100,9 @@ jobs:
tags: | tags: |
docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a2 docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a2
quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a2 quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a2
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Build and push Docker image (NPU-A3) - name: Build and push Docker image (NPU-A3)
if: ${{ matrix.device == 'npu' && matrix.npu_type == 'a3' }} if: ${{ matrix.device == 'npu-a3' }}
uses: docker/build-push-action@v6 uses: docker/build-push-action@v6
with: with:
context: . context: .
@@ -128,5 +114,3 @@ jobs:
tags: | tags: |
docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a3 docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a3
quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a3 quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a3
cache-from: type=gha
cache-to: type=gha,mode=max

View File

@@ -23,10 +23,11 @@ jobs:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: Set up Python - name: Install uv
uses: actions/setup-python@v5 uses: astral-sh/setup-uv@v7
with: with:
python-version: "3.9" python-version: "3.11"
github-token: ${{ github.token }}
- name: Build package - name: Build package
run: | run: |

View File

@@ -7,7 +7,7 @@ on:
- "main" - "main"
paths: paths:
- "**/*.py" - "**/*.py"
- "requirements.txt" - "pyproject.toml"
- "Makefile" - "Makefile"
- ".github/workflows/*.yml" - ".github/workflows/*.yml"
pull_request: pull_request:
@@ -15,7 +15,7 @@ on:
- "main" - "main"
paths: paths:
- "**/*.py" - "**/*.py"
- "requirements.txt" - "pyproject.toml"
- "Makefile" - "Makefile"
- ".github/workflows/*.yml" - ".github/workflows/*.yml"
@@ -25,29 +25,25 @@ jobs:
fail-fast: false fail-fast: false
matrix: matrix:
python: python:
- "3.9"
- "3.10"
- "3.11" - "3.11"
- "3.12" - "3.12"
- "3.13"
os: os:
- "ubuntu-latest" - "ubuntu-latest"
- "windows-latest" - "windows-latest"
- "macos-latest" - "macos-latest"
transformers: transformers:
- null - ""
include: # test backward compatibility include: # test backward compatibility
- python: "3.9" - python: "3.11"
os: "ubuntu-latest"
transformers: "4.49.0"
- python: "3.9"
os: "ubuntu-latest" os: "ubuntu-latest"
transformers: "4.51.0" transformers: "4.51.0"
- python: "3.9" - python: "3.11"
os: "ubuntu-latest" os: "ubuntu-latest"
transformers: "4.53.0" transformers: "4.53.0"
exclude: # exclude python 3.9 on macos - python: "3.11"
- python: "3.9" os: "ubuntu-latest"
os: "macos-latest" transformers: "4.55.0"
runs-on: ${{ matrix.os }} runs-on: ${{ matrix.os }}
@@ -63,22 +59,23 @@ jobs:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: Set up Python - name: Install uv
uses: actions/setup-python@v5 uses: astral-sh/setup-uv@v7
with: with:
python-version: ${{ matrix.python }} python-version: ${{ matrix.python }}
cache: "pip" github-token: ${{ github.token }}
cache-dependency-path: "**/requirements*.txt" enable-cache: false
- name: Install dependencies - name: Install dependencies
run: | run: |
python -m pip install --upgrade pip uv venv
python -m pip install ".[torch,dev]" uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
uv pip install -e ".[dev]"
- name: Install transformers - name: Install transformers
if: ${{ matrix.transformers }} if: ${{ matrix.transformers }}
run: | run: |
python -m pip install "transformers==${{ matrix.transformers }}" uv pip install "transformers==${{ matrix.transformers }}"
- name: Cache files - name: Cache files
id: hf-hub-cache id: hf-hub-cache
@@ -90,18 +87,25 @@ jobs:
- name: Check quality - name: Check quality
run: | run: |
make style && make quality make style && make quality
env:
UV_NO_SYNC: 1
- name: Check license - name: Check license
run: | run: |
make license make license
env:
UV_NO_SYNC: 1
- name: Check build - name: Check build
run: | run: |
make build make build
env:
UV_NO_SYNC: 1
- name: Test with pytest - name: Test with pytest
run: | run: |
make test make test
env: env:
UV_NO_SYNC: 1
HF_HOME: ${{ runner.temp }}/huggingface HF_HOME: ${{ runner.temp }}/huggingface
HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}" HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"

88
.github/workflows/tests_cuda.yml vendored Normal file
View File

@@ -0,0 +1,88 @@
name: tests_cuda
on:
workflow_dispatch:
push:
branches:
- "main"
paths:
- "**/*.py"
- "pyproject.toml"
- "Makefile"
- ".github/workflows/*.yml"
pull_request:
branches:
- "main"
paths:
- "**/*.py"
- "pyproject.toml"
- "Makefile"
- ".github/workflows/*.yml"
jobs:
tests:
strategy:
fail-fast: false
matrix:
python:
- "3.11"
os:
- "linux-x86_64-gpu-2"
runs-on: ${{ matrix.os }}
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}-${{ matrix.python }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
python-version: ${{ matrix.python }}
github-token: ${{ github.token }}
enable-cache: false
- name: Check GPU Status
run: nvidia-smi
- name: Install dependencies
run: |
uv venv
uv pip install -e ".[dev]"
- name: Cache HuggingFace models
id: hf-hub-cache
uses: actions/cache@v4
with:
path: ${{ runner.temp }}/huggingface
key: hf-cache-${{ runner.os }}-${{ hashFiles('tests/version.txt') }}
- name: Check quality
run: |
make style && make quality
env:
UV_NO_SYNC: 1
- name: Check license
run: |
make license
env:
UV_NO_SYNC: 1
- name: Check build
run: |
make build
env:
UV_NO_SYNC: 1
- name: Test with pytest
run: |
make test
env:
UV_NO_SYNC: 1
HF_HOME: ${{ runner.temp }}/huggingface
HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"

View File

@@ -7,7 +7,7 @@ on:
- "main" - "main"
paths: paths:
- "**/*.py" - "**/*.py"
- "requirements.txt" - "pyproject.toml"
- "Makefile" - "Makefile"
- ".github/workflows/*.yml" - ".github/workflows/*.yml"
pull_request: pull_request:
@@ -15,7 +15,7 @@ on:
- "main" - "main"
paths: paths:
- "**/*.py" - "**/*.py"
- "requirements.txt" - "pyproject.toml"
- "Makefile" - "Makefile"
- ".github/workflows/*.yml" - ".github/workflows/*.yml"
@@ -48,10 +48,18 @@ jobs:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
python-version: ${{ matrix.python }}
github-token: ${{ github.token }}
enable-cache: false
- name: Install dependencies - name: Install dependencies
run: | run: |
python -m pip install --upgrade pip uv venv
python -m pip install ".[torch-npu,dev]" torch-npu==${{matrix.pytorch_npu}} uv pip install torch-npu==${{matrix.pytorch_npu}}
uv pip install -e ".[dev]"
- name: Install node - name: Install node
run: | run: |
@@ -70,18 +78,25 @@ jobs:
- name: Check quality - name: Check quality
run: | run: |
make style && make quality make style && make quality
env:
UV_NO_SYNC: 1
- name: Check license - name: Check license
run: | run: |
make license make license
env:
UV_NO_SYNC: 1
- name: Check build - name: Check build
run: | run: |
make build make build
env:
UV_NO_SYNC: 1
- name: Test with pytest - name: Test with pytest
run: | run: |
make test make test
env: env:
UV_NO_SYNC: 1
HF_HOME: /root/.cache/huggingface HF_HOME: /root/.cache/huggingface
HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}" HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"

2
.gitignore vendored
View File

@@ -85,7 +85,7 @@ ipython_config.py
# pyenv # pyenv
# For a library or package, you might want to ignore these files since the code is # For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in: # intended to run in multiple environments; otherwise, check them in:
# .python-version .python-version
# pipenv # pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.

View File

@@ -1 +1 @@
include LICENSE requirements.txt include LICENSE

View File

@@ -1,24 +1,28 @@
.PHONY: build commit license quality style test .PHONY: build commit license quality style test
check_dirs := scripts src tests tests_v1 setup.py check_dirs := scripts src tests tests_v1
RUN := $(shell command -v uv >/dev/null 2>&1 && echo "uv run" || echo "")
BUILD := $(shell command -v uv >/dev/null 2>&1 && echo "uv build" || echo "python -m build")
TOOL := $(shell command -v uv >/dev/null 2>&1 && echo "uvx" || echo "")
build: build:
pip3 install build && python3 -m build $(BUILD)
commit: commit:
pre-commit install $(TOOL) pre-commit install
pre-commit run --all-files $(TOOL) pre-commit run --all-files
license: license:
python3 tests/check_license.py $(check_dirs) $(RUN) python3 tests/check_license.py $(check_dirs)
quality: quality:
ruff check $(check_dirs) $(TOOL) ruff check $(check_dirs)
ruff format --check $(check_dirs) $(TOOL) ruff format --check $(check_dirs)
style: style:
ruff check $(check_dirs) --fix $(TOOL) ruff check $(check_dirs) --fix
ruff format $(check_dirs) $(TOOL) ruff format $(check_dirs)
test: test:
CUDA_VISIBLE_DEVICES= ASCEND_RT_VISIBLE_DEVICES=0 WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/ tests_v1/ WANDB_DISABLED=true $(RUN) pytest -vv --import-mode=importlib tests/ tests_v1/

View File

@@ -278,27 +278,21 @@ Read technical notes:
| Model | Model size | Template | | Model | Model size | Template |
| ----------------------------------------------------------------- | -------------------------------- | -------------------- | | ----------------------------------------------------------------- | -------------------------------- | -------------------- |
| [Baichuan 2](https://huggingface.co/baichuan-inc) | 7B/13B | baichuan2 |
| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - | | [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
| [ChatGLM3](https://huggingface.co/THUDM) | 6B | chatglm3 |
| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere | | [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere |
| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek | | [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 | | [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 | | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink | | [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink |
| [Falcon](https://huggingface.co/tiiuae) | 7B/11B/40B/180B | falcon | | [Falcon/Falcon H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 |
| [Falcon-H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/34B | falcon_h1 |
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 | | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 |
| [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n | | [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n |
| [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org) | 9B/32B | glm4/glmz1 | | [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org) | 9B/32B | glm4/glmz1 |
| [GLM-4.1V](https://huggingface.co/zai-org) | 9B | glm4v |
| [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org) | 9B/106B/355B | glm4_moe/glm4_5v | | [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org) | 9B/106B/355B | glm4_moe/glm4_5v |
| [GPT-2](https://huggingface.co/openai-community) | 0.1B/0.4B/0.8B/1.5B | - | | [GPT-2](https://huggingface.co/openai-community) | 0.1B/0.4B/0.8B/1.5B | - |
| [GPT-OSS](https://huggingface.co/openai) | 20B/120B | gpt | | [GPT-OSS](https://huggingface.co/openai) | 20B/120B | gpt_oss |
| [Granite 3.0-3.3](https://huggingface.co/ibm-granite) | 1B/2B/3B/8B | granite3 | | [Granite 3-4](https://huggingface.co/ibm-granite) | 1B/2B/3B/7B/8B | granite3/granite4 |
| [Granite 4](https://huggingface.co/ibm-granite) | 7B | granite4 |
| [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan | | [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan |
| [Index](https://huggingface.co/IndexTeam) | 1.9B | index |
| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 | | [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
| [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl | | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl |
| [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 | | [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 |
@@ -312,15 +306,14 @@ Read technical notes:
| [LLaVA-1.5](https://huggingface.co/llava-hf) | 7B/13B | llava | | [LLaVA-1.5](https://huggingface.co/llava-hf) | 7B/13B | llava |
| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next | | [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video | | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B | mimo | | [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 |
| [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 | | [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 |
| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v | | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
| [Ministral(3)/Mistral-Nemo](https://huggingface.co/mistralai) | 3B/8B/12B/14B | ministral/ministral3 | | [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models) | 229B/456B | minimax1/minimax2 |
| [Ministral 3](https://huggingface.co/mistralai) | 3B/8B/14B | ministral3 |
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral | | [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
| [Mistral Small](https://huggingface.co/mistralai) | 24B | mistral_small |
| [OLMo](https://huggingface.co/allenai) | 1B/7B | - | | [OLMo](https://huggingface.co/allenai) | 1B/7B | - |
| [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma | | [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma |
| [Phi-1.5/Phi-2](https://huggingface.co/microsoft) | 1.3B/2.7B | - |
| [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi | | [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi |
| [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small | | [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small |
| [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 | | [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 |
@@ -333,12 +326,9 @@ Read technical notes:
| [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen) | 2B/3B/7B/32B/72B | qwen2_vl | | [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen) | 2B/3B/7B/32B/72B | qwen2_vl |
| [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl | | [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl |
| [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder | | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder |
| [Skywork o1](https://huggingface.co/Skywork) | 8B | skywork_o1 |
| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - | | [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
| [TeleChat2](https://huggingface.co/Tele-AI) | 3B/7B/35B/115B | telechat2 | | [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | qwen3 |
| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi | | [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi |
| [Yi-VL](https://huggingface.co/01-ai) | 6B/34B | yi_vl |
| [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan | | [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan |
> [!NOTE] > [!NOTE]
@@ -444,6 +434,7 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t
- [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT) - [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
- [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k) - [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
- [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions) - [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
- [DLR-Web (en)](https://huggingface.co/datasets/Attention1115/DLR-Web)
- [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de) - [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
- [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de) - [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
- [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de) - [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
@@ -525,10 +516,12 @@ huggingface-cli login
```bash ```bash
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation pip install -e ".[metrics]"
``` ```
Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, aqlm, vllm, sglang, galore, apollo, badam, adam-mini, qwen, minicpm_v, openmind, swanlab, dev Optional dependencies available: `metrics`, `deepspeed`. Install with: `pip install -e ".[metrics,deepspeed]"`
Additional dependencies for specific features are available in `examples/requirements/`.
#### Install from Docker Image #### Install from Docker Image
@@ -547,13 +540,7 @@ Please refer to [build docker](#build-docker) to build the image yourself.
Create an isolated Python environment with [uv](https://github.com/astral-sh/uv): Create an isolated Python environment with [uv](https://github.com/astral-sh/uv):
```bash ```bash
uv sync --extra torch --extra metrics --prerelease=allow uv run llamafactory-cli webui
```
Run LLaMA-Factory in the isolated environment:
```bash
uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
``` ```
</details> </details>
@@ -590,7 +577,7 @@ To enable FlashAttention-2 on the Windows platform, please use the script from [
<details><summary>For Ascend NPU users</summary> <details><summary>For Ascend NPU users</summary>
To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands: To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher: `pip install -e . torch-npu==2.7.1`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
```bash ```bash
# replace the url according to your CANN version and devices # replace the url according to your CANN version and devices
@@ -609,8 +596,8 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh
| Requirement | Minimum | Recommend | | Requirement | Minimum | Recommend |
| ------------ | ------- | -------------- | | ------------ | ------- | -------------- |
| CANN | 8.0.RC1 | 8.0.0.alpha002 | | CANN | 8.0.RC1 | 8.0.0.alpha002 |
| torch | 2.1.0 | 2.4.0 | | torch | 2.1.0 | 2.7.1 |
| torch-npu | 2.1.0 | 2.4.0.post2 | | torch-npu | 2.1.0 | 2.7.1 |
| deepspeed | 0.13.2 | 0.13.2 | | deepspeed | 0.13.2 | 0.13.2 |
| vllm-ascend | - | 0.7.3 | | vllm-ascend | - | 0.7.3 |
@@ -652,7 +639,7 @@ cd transformers
pip install . pip install .
``` ```
3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml). 3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml).
</details> </details>
@@ -667,12 +654,12 @@ You can also use **[Easy Dataset](https://github.com/ConardLi/easy-dataset)**, *
### Quickstart ### Quickstart
Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Llama3-8B-Instruct model, respectively. Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Qwen3-4B-Instruct model, respectively.
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
``` ```
See [examples/README.md](examples/README.md) for advanced usage (including distributed training). See [examples/README.md](examples/README.md) for advanced usage (including distributed training).
@@ -725,7 +712,6 @@ For CUDA users:
```bash ```bash
docker build -f ./docker/docker-cuda/Dockerfile \ docker build -f ./docker/docker-cuda/Dockerfile \
--build-arg PIP_INDEX=https://pypi.org/simple \ --build-arg PIP_INDEX=https://pypi.org/simple \
--build-arg EXTRAS=metrics \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host --gpus=all \ docker run -dit --ipc=host --gpus=all \
@@ -742,7 +728,6 @@ For Ascend NPU users:
```bash ```bash
docker build -f ./docker/docker-npu/Dockerfile \ docker build -f ./docker/docker-npu/Dockerfile \
--build-arg PIP_INDEX=https://pypi.org/simple \ --build-arg PIP_INDEX=https://pypi.org/simple \
--build-arg EXTRAS=torch-npu,metrics \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host \ docker run -dit --ipc=host \
@@ -767,7 +752,6 @@ For AMD ROCm users:
```bash ```bash
docker build -f ./docker/docker-rocm/Dockerfile \ docker build -f ./docker/docker-rocm/Dockerfile \
--build-arg PIP_INDEX=https://pypi.org/simple \ --build-arg PIP_INDEX=https://pypi.org/simple \
--build-arg EXTRAS=metrics \
-t llamafactory:latest . -t llamafactory:latest .
docker run -dit --ipc=host \ docker run -dit --ipc=host \
@@ -798,7 +782,7 @@ When building the Docker image, use `-v ./hf_cache:/root/.cache/huggingface` arg
### Deploy with OpenAI-style API and vLLM ### Deploy with OpenAI-style API and vLLM
```bash ```bash
API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
``` ```
> [!TIP] > [!TIP]

View File

@@ -280,27 +280,21 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
| 模型名 | 参数量 | Template | | 模型名 | 参数量 | Template |
| ----------------------------------------------------------------- | -------------------------------- | -------------------- | | ----------------------------------------------------------------- | -------------------------------- | -------------------- |
| [Baichuan 2](https://huggingface.co/baichuan-inc) | 7B/13B | baichuan2 |
| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - | | [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
| [ChatGLM3](https://huggingface.co/THUDM) | 6B | chatglm3 |
| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere | | [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere |
| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek | | [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 | | [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 | | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink | | [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink |
| [Falcon](https://huggingface.co/tiiuae) | 7B/11B/40B/180B | falcon | | [Falcon/Falcon H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 |
| [Falcon-H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/34B | falcon_h1 |
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 | | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 |
| [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n | | [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n |
| [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org) | 9B/32B | glm4/glmz1 | | [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org) | 9B/32B | glm4/glmz1 |
| [GLM-4.1V](https://huggingface.co/zai-org) | 9B | glm4v |
| [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org) | 9B/106B/355B | glm4_moe/glm4_5v | | [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org) | 9B/106B/355B | glm4_moe/glm4_5v |
| [GPT-2](https://huggingface.co/openai-community) | 0.1B/0.4B/0.8B/1.5B | - | | [GPT-2](https://huggingface.co/openai-community) | 0.1B/0.4B/0.8B/1.5B | - |
| [GPT-OSS](https://huggingface.co/openai) | 20B/120B | gpt | | [GPT-OSS](https://huggingface.co/openai) | 20B/120B | gpt_oss |
| [Granite 3.0-3.3](https://huggingface.co/ibm-granite) | 1B/2B/3B/8B | granite3 | | [Granite 3-4](https://huggingface.co/ibm-granite) | 1B/2B/3B/7B/8B | granite3/granite4 |
| [Granite 4](https://huggingface.co/ibm-granite) | 7B | granite4 |
| [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan | | [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan |
| [Index](https://huggingface.co/IndexTeam) | 1.9B | index |
| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 | | [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
| [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl | | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl |
| [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 | | [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 |
@@ -314,15 +308,14 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
| [LLaVA-1.5](https://huggingface.co/llava-hf) | 7B/13B | llava | | [LLaVA-1.5](https://huggingface.co/llava-hf) | 7B/13B | llava |
| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next | | [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video | | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B | mimo | | [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 |
| [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 | | [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 |
| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v | | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
| [Ministral(3)/Mistral-Nemo](https://huggingface.co/mistralai) | 3B/8B/12B/14B | ministral/ministral3 | | [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models) | 229B/456B | minimax1/minimax2 |
| [Ministral 3](https://huggingface.co/mistralai) | 3B/8B/14B | ministral3 |
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral | | [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
| [Mistral Small](https://huggingface.co/mistralai) | 24B | mistral_small |
| [OLMo](https://huggingface.co/allenai) | 1B/7B | - | | [OLMo](https://huggingface.co/allenai) | 1B/7B | - |
| [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma | | [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma |
| [Phi-1.5/Phi-2](https://huggingface.co/microsoft) | 1.3B/2.7B | - |
| [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi | | [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi |
| [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small | | [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small |
| [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 | | [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 |
@@ -335,12 +328,9 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
| [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen) | 2B/3B/7B/32B/72B | qwen2_vl | | [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen) | 2B/3B/7B/32B/72B | qwen2_vl |
| [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl | | [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl |
| [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder | | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder |
| [Skywork o1](https://huggingface.co/Skywork) | 8B | skywork_o1 |
| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - | | [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
| [TeleChat2](https://huggingface.co/Tele-AI) | 3B/7B/35B/115B | telechat2 | | [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | qwen3 |
| [XVERSE](https://huggingface.co/xverse) | 7B/13B/65B | xverse |
| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi | | [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi |
| [Yi-VL](https://huggingface.co/01-ai) | 6B/34B | yi_vl |
| [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan | | [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan |
> [!NOTE] > [!NOTE]
@@ -446,6 +436,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
- [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT) - [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
- [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k) - [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
- [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions) - [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
- [DLR-Web (en)](https://huggingface.co/datasets/Attention1115/DLR-Web)
- [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de) - [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
- [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de) - [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
- [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de) - [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
@@ -527,10 +518,12 @@ huggingface-cli login
```bash ```bash
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation pip install -e ".[metrics]"
``` ```
可选的额外依赖项:torch、torch-npu、metricsdeepspeed、liger-kernel、bitsandbytes、hqq、eetq、gptq、aqlm、vllm、sglang、galore、apollo、badam、adam-mini、qwen、minicpm_v、openmind、swanlab、dev 可选的额外依赖项:`metrics``deepspeed`。使用 `pip install -e ".[metrics,deepspeed]"` 安装。
其他可选依赖项请参考 `examples/requirements/` 目录下的文件。
#### 从镜像安装 #### 从镜像安装
@@ -549,13 +542,7 @@ docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest
使用 [uv](https://github.com/astral-sh/uv) 创建隔离的 Python 环境: 使用 [uv](https://github.com/astral-sh/uv) 创建隔离的 Python 环境:
```bash ```bash
uv sync --extra torch --extra metrics --prerelease=allow uv run llamafactory-cli webui
```
在环境中运行 LLaMA-Factory
```bash
uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
``` ```
</details> </details>
@@ -592,7 +579,7 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl
<details><summary>昇腾 NPU 用户指南</summary> <details><summary>昇腾 NPU 用户指南</summary>
在昇腾 NPU 设备上安装 LLaMA Factory 时,请升级 Python 到 3.10 及以上,并需要指定额外依赖项,使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外,还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**,安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令: 在昇腾 NPU 设备上安装 LLaMA Factory 时,请升级 Python 到 3.10 及以上,并需要指定额外依赖项,使用 `pip install -e . torch-npu==2.7.1` 命令安装。此外,还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**,安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令:
```bash ```bash
# 请替换 URL 为 CANN 版本和设备型号对应的 URL # 请替换 URL 为 CANN 版本和设备型号对应的 URL
@@ -611,8 +598,8 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh
| 依赖项 | 至少 | 推荐 | | 依赖项 | 至少 | 推荐 |
| ------------ | ------- | -------------- | | ------------ | ------- | -------------- |
| CANN | 8.0.RC1 | 8.0.0.alpha002 | | CANN | 8.0.RC1 | 8.0.0.alpha002 |
| torch | 2.1.0 | 2.4.0 | | torch | 2.1.0 | 2.7.1 |
| torch-npu | 2.1.0 | 2.4.0.post2 | | torch-npu | 2.1.0 | 2.7.1 |
| deepspeed | 0.13.2 | 0.13.2 | | deepspeed | 0.13.2 | 0.13.2 |
| vllm-ascend | - | 0.7.3 | | vllm-ascend | - | 0.7.3 |
@@ -654,7 +641,7 @@ cd transformers
pip install . pip install .
``` ```
3. 在训练参数中设置 `double_quantization: false`,可参考[示例](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml)。 3. 在训练参数中设置 `double_quantization: false`,可参考[示例](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml)。
</details> </details>
@@ -669,12 +656,12 @@ pip install .
### 快速开始 ### 快速开始
下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。 下面三行命令分别对 Qwen3-4B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
``` ```
高级用法请参考 [examples/README_zh.md](examples/README_zh.md)(包括多 GPU 微调)。 高级用法请参考 [examples/README_zh.md](examples/README_zh.md)(包括多 GPU 微调)。
@@ -800,7 +787,7 @@ docker exec -it llamafactory bash
### 利用 vLLM 部署 OpenAI API ### 利用 vLLM 部署 OpenAI API
```bash ```bash
API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
``` ```
> [!TIP] > [!TIP]

View File

@@ -471,6 +471,14 @@
"ultrachat_de": { "ultrachat_de": {
"hf_hub_url": "mayflowergmbh/ultra-chat_de" "hf_hub_url": "mayflowergmbh/ultra-chat_de"
}, },
"dlr_web": {
"hf_hub_url": "Attention1115/DLR-Web",
"split": "full",
"columns": {
"prompt": "question",
"response": "response"
}
},
"dpo_en_demo": { "dpo_en_demo": {
"file_name": "dpo_en_demo.json", "file_name": "dpo_en_demo.json",
"ranking": true, "ranking": true,

View File

@@ -4,7 +4,6 @@ FROM ${BASE_IMAGE}
# Installation arguments # Installation arguments
ARG PIP_INDEX=https://pypi.org/simple ARG PIP_INDEX=https://pypi.org/simple
ARG EXTRAS=metrics
ARG INSTALL_FLASHATTN=false ARG INSTALL_FLASHATTN=false
ARG HTTP_PROXY="" ARG HTTP_PROXY=""
@@ -27,17 +26,13 @@ WORKDIR /app
# Change pip source # Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \ RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \ pip config set global.extra-index-url "${PIP_INDEX}" && \
pip install --no-cache-dir --upgrade pip packaging wheel setuptools pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"
# Install the requirements # Copy the application into the image
COPY requirements.txt /app
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application into the image
COPY . /app COPY . /app
# Install LLaMA Factory # Install LLaMA Factory
RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation RUN pip install --no-cache-dir --no-build-isolation -e ".[metrics,deepspeed]"
# Rebuild flash attention # Rebuild flash attention
RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \ RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \

View File

@@ -8,7 +8,7 @@ ENV PYPI_MIRROR=https://mirrors.aliyun.com/pypi/simple/
ENV PYPI_TRUSTED_HOST=mirrors.aliyun.com ENV PYPI_TRUSTED_HOST=mirrors.aliyun.com
ENV APT_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ ENV APT_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
RUN pip install --upgrade pip setuptools wheel --trusted-host ${PYPI_TRUSTED_HOST} --index-url ${PYPI_MIRROR} RUN pip install --upgrade pip setuptools wheel "hatchling>=1.18.0" editables --trusted-host ${PYPI_TRUSTED_HOST} --index-url ${PYPI_MIRROR}
RUN pip uninstall -y torch torchvision torch-tensorrt \ RUN pip uninstall -y torch torchvision torch-tensorrt \
flash_attn transformer-engine \ flash_attn transformer-engine \
@@ -56,14 +56,14 @@ ENV JAVA_HOME /usr/lib/jvm/java-21-openjdk-amd64
# pip install LLaMA-Factory # pip install LLaMA-Factory
WORKDIR /app WORKDIR /app
COPY requirements.txt /app/ # Copy the application into the image
RUN pip install --no-cache-dir -r requirements.txt COPY . /app
# Install LLaMA Factory
RUN pip install --no-cache-dir -e ".[metrics]" --no-build-isolation
RUN pip install "git+https://github.com/alibaba/roll.git#subdirectory=mcore_adapter" RUN pip install "git+https://github.com/alibaba/roll.git#subdirectory=mcore_adapter"
COPY . /app/
RUN pip install -e ".[metrics]" --no-build-isolation
# Expose port 7860 for LLaMA Board # Expose port 7860 for LLaMA Board
ENV GRADIO_SERVER_PORT=7860 ENV GRADIO_SERVER_PORT=7860
EXPOSE 7860 EXPOSE 7860

View File

@@ -5,7 +5,6 @@ services:
context: ../.. context: ../..
args: args:
PIP_INDEX: https://pypi.org/simple PIP_INDEX: https://pypi.org/simple
EXTRAS: metrics
container_name: llamafactory container_name: llamafactory
ports: ports:
- "7860:7860" - "7860:7860"

View File

@@ -5,7 +5,6 @@ FROM ${BASE_IMAGE}
# Installation arguments # Installation arguments
ARG PIP_INDEX=https://pypi.org/simple ARG PIP_INDEX=https://pypi.org/simple
ARG EXTRAS=torch-npu,metrics
ARG HTTP_PROXY="" ARG HTTP_PROXY=""
ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cpu ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cpu
@@ -28,21 +27,15 @@ WORKDIR /app
# Change pip source # Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \ RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \ pip config set global.extra-index-url "${PIP_INDEX}" && \
pip install --no-cache-dir --upgrade pip packaging wheel setuptools pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"
# Copy the application into the image
COPY . /app
# Install torch-npu # Install torch-npu
RUN pip uninstall -y torch torchvision torchaudio && \ RUN pip uninstall -y torch torchvision torchaudio && \
pip install --no-cache-dir "torch==2.7.1" "torch-npu==2.7.1" "torchvision==0.22.1" --index-url "${PYTORCH_INDEX}" pip install --no-cache-dir "torch==2.7.1" "torch-npu==2.7.1" "torchvision==0.22.1" "torchaudio==2.7.1" --index-url "${PYTORCH_INDEX}" && \
pip install --no-cache-dir -e ".[metrics]" --no-build-isolation
# Install the requirements
COPY requirements.txt /app
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application into the image
COPY . /app
# Install LLaMA Factory
RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation
# Set up volumes # Set up volumes
# VOLUME [ "/root/.cache/huggingface", "/app/shared_data", "/app/output" ] # VOLUME [ "/root/.cache/huggingface", "/app/shared_data", "/app/output" ]

View File

@@ -5,7 +5,6 @@ services:
context: ../.. context: ../..
args: args:
PIP_INDEX: https://pypi.org/simple PIP_INDEX: https://pypi.org/simple
EXTRAS: torch-npu,metrics
container_name: llamafactory-a2 container_name: llamafactory-a2
image: llamafactory:npu-a2 image: llamafactory:npu-a2
volumes: volumes:
@@ -36,7 +35,6 @@ services:
args: args:
BASE_IMAGE: quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11 BASE_IMAGE: quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11
PIP_INDEX: https://pypi.org/simple PIP_INDEX: https://pypi.org/simple
EXTRAS: torch-npu,metrics
container_name: llamafactory-a3 container_name: llamafactory-a3
image: llamafactory:npu-a3 image: llamafactory:npu-a3
volumes: volumes:

View File

@@ -4,7 +4,6 @@ FROM ${BASE_IMAGE}
# Installation arguments # Installation arguments
ARG PIP_INDEX=https://pypi.org/simple ARG PIP_INDEX=https://pypi.org/simple
ARG EXTRAS=metrics
ARG INSTALL_FLASHATTN=false ARG INSTALL_FLASHATTN=false
ARG HTTP_PROXY="" ARG HTTP_PROXY=""
ARG PYTORCH_INDEX=https://download.pytorch.org/whl/rocm6.3 ARG PYTORCH_INDEX=https://download.pytorch.org/whl/rocm6.3
@@ -28,21 +27,14 @@ WORKDIR /app
# Change pip source # Change pip source
RUN pip config set global.index-url "${PIP_INDEX}" && \ RUN pip config set global.index-url "${PIP_INDEX}" && \
pip config set global.extra-index-url "${PIP_INDEX}" && \ pip config set global.extra-index-url "${PIP_INDEX}" && \
pip install --no-cache-dir --upgrade pip packaging wheel setuptools pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"
# Reinstall pytorch rocm # Copy the application into the image
RUN pip uninstall -y torch torchvision torchaudio && \
pip install --no-cache-dir --pre torch torchvision torchaudio --index-url "${PYTORCH_INDEX}"
# Install the requirements
COPY requirements.txt /app
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application into the image
COPY . /app COPY . /app
# Install LLaMA Factory # Reinstall pytorch rocm and install LLaMA Factory
RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation RUN pip uninstall -y torch torchvision torchaudio && \
pip install --no-cache-dir --no-build-isolation -e --pre ".[metrics,deepspeed]" --index-url "${PYTORCH_INDEX}"
# Rebuild flash attention # Rebuild flash attention
RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \ RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \

View File

@@ -5,7 +5,6 @@ services:
context: ../.. context: ../..
args: args:
PIP_INDEX: https://pypi.org/simple PIP_INDEX: https://pypi.org/simple
EXTRAS: metrics
container_name: llamafactory container_name: llamafactory
ports: ports:
- "7860:7860" - "7860:7860"

View File

@@ -18,19 +18,19 @@ By default, LLaMA-Factory uses all visible computing devices.
Basic usage: Basic usage:
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
``` ```
Advanced usage: Advanced usage:
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \ CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
learning_rate=1e-5 \ learning_rate=1e-5 \
logging_steps=1 logging_steps=1
``` ```
```bash ```bash
bash examples/train_lora/llama3_lora_sft.sh bash examples/train_lora/qwen3_lora_sft.sh
``` ```
## Examples ## Examples
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
#### (Continuous) Pre-Training #### (Continuous) Pre-Training
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
``` ```
#### Supervised Fine-Tuning #### Supervised Fine-Tuning
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
``` ```
#### Multimodal Supervised Fine-Tuning #### Multimodal Supervised Fine-Tuning
```bash ```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
``` ```
#### DPO/ORPO/SimPO Training #### DPO/ORPO/SimPO Training
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
``` ```
#### Multimodal DPO/ORPO/SimPO Training #### Multimodal DPO/ORPO/SimPO Training
```bash ```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
``` ```
#### Reward Modeling #### Reward Modeling
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
```
#### PPO Training
```bash
llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
``` ```
#### KTO Training #### KTO Training
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
``` ```
#### Preprocess Dataset #### Preprocess Dataset
@@ -90,32 +84,26 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
It is useful for large dataset, use `tokenized_path` in config to load the preprocessed dataset. It is useful for large dataset, use `tokenized_path` in config to load the preprocessed dataset.
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_preprocess.yaml llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
```
#### Evaluating on MMLU/CMMLU/C-Eval Benchmarks
```bash
llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
``` ```
#### Supervised Fine-Tuning on Multiple Nodes #### Supervised Fine-Tuning on Multiple Nodes
```bash ```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
``` ```
#### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding) #### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)
```bash ```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
``` ```
#### Supervised Fine-Tuning with Ray on 4 GPUs #### Supervised Fine-Tuning with Ray on 4 GPUs
```bash ```bash
USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
``` ```
### QLoRA Fine-Tuning ### QLoRA Fine-Tuning
@@ -123,13 +111,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
#### Supervised Fine-Tuning with 4/8-bit Bitsandbytes/HQQ/EETQ Quantization (Recommended) #### Supervised Fine-Tuning with 4/8-bit Bitsandbytes/HQQ/EETQ Quantization (Recommended)
```bash ```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
``` ```
#### Supervised Fine-Tuning with 4-bit Bitsandbytes Quantization on Ascend NPU #### Supervised Fine-Tuning with 4-bit Bitsandbytes Quantization on Ascend NPU
```bash ```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
``` ```
#### Supervised Fine-Tuning with 4/8-bit GPTQ Quantization #### Supervised Fine-Tuning with 4/8-bit GPTQ Quantization
@@ -155,14 +143,14 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
#### Supervised Fine-Tuning on Single Node #### Supervised Fine-Tuning on Single Node
```bash ```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
``` ```
#### Supervised Fine-Tuning on Multiple Nodes #### Supervised Fine-Tuning on Multiple Nodes
```bash ```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
``` ```
### Elastic and Fault-Tolerant Supervised Fine-Tuning on Multiple Nodes ### Elastic and Fault-Tolerant Supervised Fine-Tuning on Multiple Nodes
@@ -170,13 +158,13 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
To launch an elastic job with `MAX_RESTARTS` failures retries, run the following on at least `MIN_NNODES` nodes and at most `MAX_NNODES` nodes. `RDZV_ID` should be set as a unique job id (shared by all nodes participating in the job). See also [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html). To launch an elastic job with `MAX_RESTARTS` failures retries, run the following on at least `MIN_NNODES` nodes and at most `MAX_NNODES` nodes. `RDZV_ID` should be set as a unique job id (shared by all nodes participating in the job). See also [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html).
```bash ```bash
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
``` ```
#### Multimodal Supervised Fine-Tuning #### Multimodal Supervised Fine-Tuning
```bash ```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
``` ```
### Merging LoRA Adapters and Quantization ### Merging LoRA Adapters and Quantization
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
Note: DO NOT use quantized model or `quantization_bit` when merging LoRA adapters. Note: DO NOT use quantized model or `quantization_bit` when merging LoRA adapters.
```bash ```bash
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
``` ```
#### Quantizing Model using AutoGPTQ #### Quantizing Model using AutoGPTQ
```bash ```bash
llamafactory-cli export examples/merge_lora/llama3_gptq.yaml llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
``` ```
### Save Ollama modelfile ### Save Ollama modelfile
```bash ```bash
llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
``` ```
### Inferring LoRA Fine-Tuned Models ### Inferring LoRA Fine-Tuned Models
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
#### Evaluation using vLLM's Multi-GPU Inference #### Evaluation using vLLM's Multi-GPU Inference
``` ```
python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
python scripts/eval_bleu_rouge.py generated_predictions.jsonl python scripts/eval_bleu_rouge.py generated_predictions.jsonl
``` ```
#### Use CLI ChatBox #### Use CLI ChatBox
```bash ```bash
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
``` ```
#### Use Web UI ChatBox #### Use Web UI ChatBox
```bash ```bash
llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
``` ```
#### Launch OpenAI-style API #### Launch OpenAI-style API
```bash ```bash
llamafactory-cli api examples/inference/llama3_lora_sft.yaml llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
``` ```
### Extras ### Extras

View File

@@ -18,19 +18,19 @@ LLaMA-Factory 默认使用所有可见的计算设备。
基础用法: 基础用法:
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
``` ```
高级用法: 高级用法:
```bash ```bash
CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \ CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
learning_rate=1e-5 \ learning_rate=1e-5 \
logging_steps=1 logging_steps=1
``` ```
```bash ```bash
bash examples/train_lora/llama3_lora_sft.sh bash examples/train_lora/qwen3_lora_sft.sh
``` ```
## 示例 ## 示例
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
#### (增量)预训练 #### (增量)预训练
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
``` ```
#### 指令监督微调 #### 指令监督微调
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
``` ```
#### 多模态指令监督微调 #### 多模态指令监督微调
```bash ```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
``` ```
#### DPO/ORPO/SimPO 训练 #### DPO/ORPO/SimPO 训练
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
``` ```
#### 多模态 DPO/ORPO/SimPO 训练 #### 多模态 DPO/ORPO/SimPO 训练
```bash ```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
``` ```
#### 奖励模型训练 #### 奖励模型训练
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
```
#### PPO 训练
```bash
llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
``` ```
#### KTO 训练 #### KTO 训练
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
``` ```
#### 预处理数据集 #### 预处理数据集
@@ -90,20 +84,14 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
对于大数据集有帮助,在配置中使用 `tokenized_path` 以加载预处理后的数据集。 对于大数据集有帮助,在配置中使用 `tokenized_path` 以加载预处理后的数据集。
```bash ```bash
llamafactory-cli train examples/train_lora/llama3_preprocess.yaml llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
```
#### 在 MMLU/CMMLU/C-Eval 上评估
```bash
llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
``` ```
#### 多机指令监督微调 #### 多机指令监督微调
```bash ```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
``` ```
### 支持弹性和容错的多机指令监督微调 ### 支持弹性和容错的多机指令监督微调
@@ -111,19 +99,19 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
要启动一个支持弹性节点和容错的多机指令微调,在每个节点上执行以下命令。弹性节点数量范围为 `MIN_NNODES:MAX_NNODES`,每个节点最多允许因为错误重启 `MAX_RESTARTS` 次。`RDZV_ID` 应设置为一个唯一的作业 ID由参与该作业的所有节点共享。更多新可以参考官方文档 [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html)。 要启动一个支持弹性节点和容错的多机指令微调,在每个节点上执行以下命令。弹性节点数量范围为 `MIN_NNODES:MAX_NNODES`,每个节点最多允许因为错误重启 `MAX_RESTARTS` 次。`RDZV_ID` 应设置为一个唯一的作业 ID由参与该作业的所有节点共享。更多新可以参考官方文档 [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html)。
```bash ```bash
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
``` ```
#### 使用 DeepSpeed ZeRO-3 平均分配显存 #### 使用 DeepSpeed ZeRO-3 平均分配显存
```bash ```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
``` ```
#### 使用 Ray 在 4 张 GPU 上微调 #### 使用 Ray 在 4 张 GPU 上微调
```bash ```bash
USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
``` ```
### QLoRA 微调 ### QLoRA 微调
@@ -131,13 +119,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
#### 基于 4/8 比特 Bitsandbytes/HQQ/EETQ 量化进行指令监督微调(推荐) #### 基于 4/8 比特 Bitsandbytes/HQQ/EETQ 量化进行指令监督微调(推荐)
```bash ```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
``` ```
#### 在 NPU 上基于 4 比特 Bitsandbytes 量化进行指令监督微调 #### 在 NPU 上基于 4 比特 Bitsandbytes 量化进行指令监督微调
```bash ```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
``` ```
#### 基于 4/8 比特 GPTQ 量化进行指令监督微调 #### 基于 4/8 比特 GPTQ 量化进行指令监督微调
@@ -163,20 +151,20 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
#### 在单机上进行指令监督微调 #### 在单机上进行指令监督微调
```bash ```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
``` ```
#### 在多机上进行指令监督微调 #### 在多机上进行指令监督微调
```bash ```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
``` ```
#### 多模态指令监督微调 #### 多模态指令监督微调
```bash ```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
``` ```
### 合并 LoRA 适配器与模型量化 ### 合并 LoRA 适配器与模型量化
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
注:请勿使用量化后的模型或 `quantization_bit` 参数来合并 LoRA 适配器。 注:请勿使用量化后的模型或 `quantization_bit` 参数来合并 LoRA 适配器。
```bash ```bash
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
``` ```
#### 使用 AutoGPTQ 量化模型 #### 使用 AutoGPTQ 量化模型
```bash ```bash
llamafactory-cli export examples/merge_lora/llama3_gptq.yaml llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
``` ```
### 保存 Ollama 配置文件 ### 保存 Ollama 配置文件
```bash ```bash
llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
``` ```
### 推理 LoRA 模型 ### 推理 LoRA 模型
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
#### 使用 vLLM 多卡推理评估 #### 使用 vLLM 多卡推理评估
``` ```
python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
python scripts/eval_bleu_rouge.py generated_predictions.jsonl python scripts/eval_bleu_rouge.py generated_predictions.jsonl
``` ```
#### 使用命令行对话框 #### 使用命令行对话框
```bash ```bash
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
``` ```
#### 使用浏览器对话框 #### 使用浏览器对话框
```bash ```bash
llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
``` ```
#### 启动 OpenAI 风格 API #### 启动 OpenAI 风格 API
```bash ```bash
llamafactory-cli api examples/inference/llama3_lora_sft.yaml llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
``` ```
### 杂项 ### 杂项

View File

@@ -1,16 +1,22 @@
# Start FSDP2 fine-tuning
# accelerate launch \
# --config_file examples/accelerate/fsdp2_config.yaml \
# src/train.py examples/ascend/qwen3_full_sft_fsdp2.yaml
# Change `num_processes` in fsdp2_config.yaml to 16 in A3
### model ### model
model_name_or_path: Qwen/Qwen3-32B model_name_or_path: Qwen/Qwen3-8B
trust_remote_code: true trust_remote_code: true
use_v1_kernels: true use_v1_kernels: true
flash_attn: fa2
### method ### method
stage: sft stage: sft
do_train: true do_train: true
finetuning_type: full finetuning_type: full
deepspeed: examples/deepspeed/ds_z2_autotp_config.json
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: alpaca_en_demo
template: qwen3 template: qwen3
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
@@ -19,28 +25,21 @@ preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/qwen3-32b/full/sft_autotp output_dir: saves/Qwen3-8B/full/sft
logging_steps: 1 logging_steps: 1
save_steps: 500 save_steps: 500
max_steps: 500
plot_loss: true plot_loss: true
overwrite_output_dir: true overwrite_output_dir: true
save_only_model: false save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow] report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train ### train
per_device_train_batch_size: 4 per_device_train_batch_size: 8
gradient_accumulation_steps: 1 gradient_accumulation_steps: 1
learning_rate: 1.0e-4 learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine lr_scheduler_type: cosine
warmup_ratio: 0.1 warmup_ratio: 0.1
bf16: true bf16: true
ddp_timeout: 180000000 ddp_timeout: 1800
resume_from_checkpoint: null resume_from_checkpoint: null
### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

View File

@@ -0,0 +1,46 @@
# Start FSDP fine-tuning
# accelerate launch \
# --config_file examples/accelerate/fsdp_config.yaml \
# src/train.py examples/ascend/qwen3moe_full_sft_fsdp.yaml
# Change `num_processes` in fsdp_config.yaml to 16 in A3
### model
model_name_or_path: Qwen/Qwen3-30B-A3B-Instruct-2507
trust_remote_code: true
use_v1_kernels: true
flash_attn: fa2
### method
stage: sft
do_train: true
finetuning_type: full
disable_gradient_checkpointing: false
### dataset
dataset: alpaca_zh
template: qwen3
cutoff_len: 1024
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/Qwen3-30B-A3B-Instruct-2507/full/sft
logging_steps: 1
save_steps: 500
max_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: true
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
seed: 1234

View File

@@ -0,0 +1,48 @@
# Start FSDP2 fine-tuning
# accelerate launch \
# --config_file examples/accelerate/fsdp2_config.yaml \
# src/train.py examples/ascend/qwen3vlmoe_full_sft_fsdp2.yaml
# Change `num_processes` in fsdp2_config.yaml to 16 in A3
### model
model_name_or_path: Qwen/Qwen3-VL-30B-A3B-Instruct
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true
use_v1_kernels: true
flash_attn: fa2
### method
stage: sft
do_train: true
finetuning_type: full
disable_gradient_checkpointing: false
### dataset
dataset: llava_1k_en, llava_1k_zh
template: qwen3_vl
cutoff_len: 1024
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/Qwen3-VL-30B-A3B-Instruct/full/sft
logging_steps: 1
save_steps: 500
max_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: true
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
seed: 1234

View File

@@ -1,5 +0,0 @@
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true

View File

@@ -1,4 +1,4 @@
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
template: qwen2_vl template: qwen3_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers] infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true trust_remote_code: true

View File

@@ -1,4 +1,4 @@
model_name_or_path: saves/llama3-8b/full/sft model_name_or_path: saves/qwen3-4b/full/sft
template: llama3 template: qwen3_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers] infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true trust_remote_code: true

View File

@@ -0,0 +1,5 @@
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
adapter_name_or_path: saves/qwen3-4b/lora/sft
template: qwen3_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true

View File

@@ -1,4 +1,4 @@
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
template: llama3 template: qwen3_vl_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers] infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true trust_remote_code: true

View File

@@ -1,10 +1,10 @@
### model ### model
model_name_or_path: saves/llama3-8b/full/sft model_name_or_path: saves/qwen3-4b/full/sft
template: llama3 template: qwen3_nothink
trust_remote_code: true trust_remote_code: true
### export ### export
export_dir: output/llama3_full_sft export_dir: saves/qwen3_sft_merged
export_size: 5 export_size: 5
export_device: cpu # choices: [cpu, auto] export_device: cpu # choices: [cpu, auto]
export_legacy_format: false export_legacy_format: false

View File

@@ -1,10 +1,10 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
template: llama3 template: qwen3_nothink
trust_remote_code: true trust_remote_code: true
### export ### export
export_dir: output/llama3_gptq export_dir: saves/qwen3_gptq
export_quantization_bit: 4 export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.jsonl export_quantization_dataset: data/c4_demo.jsonl
export_size: 5 export_size: 5

View File

@@ -1,13 +1,13 @@
### Note: DO NOT use quantized model or quantization_bit when merging lora adapters ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
### model ### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
adapter_name_or_path: saves/qwen2_5vl-7b/lora/sft adapter_name_or_path: saves/qwen3-4b/lora/sft
template: qwen2_vl template: qwen3_nothink
trust_remote_code: true trust_remote_code: true
### export ### export
export_dir: output/qwen2_5vl_lora_sft export_dir: saves/qwen3_sft_merged
export_size: 5 export_size: 5
export_device: cpu # choices: [cpu, auto] export_device: cpu # choices: [cpu, auto]
export_legacy_format: false export_legacy_format: false

View File

@@ -1,13 +1,13 @@
### Note: DO NOT use quantized model or quantization_bit when merging lora adapters ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft adapter_name_or_path: saves/qwen3-vl-4b/lora/sft
template: llama3 template: qwen3_vl_nothink
trust_remote_code: true trust_remote_code: true
### export ### export
export_dir: output/llama3_lora_sft export_dir: saves/qwen3_vl_sft_merged
export_size: 5 export_size: 5
export_device: cpu # choices: [cpu, auto] export_device: cpu # choices: [cpu, auto]
export_legacy_format: false export_legacy_format: false

View File

@@ -0,0 +1 @@
adam-mini

View File

@@ -0,0 +1 @@
apollo-torch

View File

@@ -0,0 +1 @@
aqlm[gpu]>=1.1.0

View File

@@ -0,0 +1 @@
badam>=1.2.1

View File

@@ -0,0 +1 @@
bitsandbytes>=0.39.0

View File

@@ -0,0 +1 @@
eetq

View File

@@ -0,0 +1,2 @@
transformer_engine[pytorch]>=2.0.0
accelerate>=1.10.0

View File

@@ -0,0 +1,2 @@
torchao>=0.8.0
accelerate>=1.10.0

View File

@@ -0,0 +1 @@
galore-torch

View File

@@ -0,0 +1,2 @@
optimum>=1.24.0
gptqmodel>=2.0.0

View File

@@ -0,0 +1 @@
hqq

View File

@@ -0,0 +1 @@
liger-kernel>=0.5.5

View File

@@ -0,0 +1,8 @@
soundfile
torchvision
torchaudio
vector_quantize_pytorch
vocos
msgpack
referencing
jsonschema_specifications

View File

@@ -0,0 +1 @@
openmind

View File

@@ -0,0 +1,2 @@
sglang[srt]>=0.4.5
transformers==4.51.1

View File

@@ -0,0 +1 @@
swanlab

View File

@@ -0,0 +1 @@
vllm>=0.4.3,<=0.11.0

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -10,15 +10,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json,
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/full/sft output_dir: saves/qwen3-4b/full/sft
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
image_max_pixels: 262144 image_max_pixels: 262144
video_max_pixels: 16384 video_max_pixels: 16384
trust_remote_code: true trust_remote_code: true
@@ -15,15 +15,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json
### dataset ### dataset
dataset: mllm_demo,identity,alpaca_en_demo dataset: mllm_demo,identity,alpaca_en_demo
template: qwen2_vl template: qwen3_vl_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/qwen2_5vl-7b/full/sft output_dir: saves/qwen3-vl-4b/full/sft
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,19 +0,0 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
trust_remote_code: true
### method
finetuning_type: lora
### dataset
task: mmlu_test # choices: [mmlu_test, ceval_validation, cmmlu_test]
template: fewshot
lang: en
n_shot: 5
### output
save_dir: saves/llama3-8b/lora/eval
### eval
batch_size: 4

View File

@@ -1,43 +0,0 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
reward_model: saves/llama3-8b/lora/reward
trust_remote_code: true
### method
stage: ppo
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/ppo
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### generate
max_new_tokens: 512
top_k: 0
top_p: 0.9

View File

@@ -1,46 +0,0 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

View File

@@ -1,49 +0,0 @@
# pip install git+https://github.com/hiyouga/transformers.git@llama4_train
### model
model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
### dataset
dataset: mllm_demo,identity,alpaca_en_demo
template: llama4
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama4-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -13,15 +13,14 @@ pref_loss: sigmoid # choices: [sigmoid (dpo), orpo, simpo]
### dataset ### dataset
dataset: dpo_en_demo dataset: dpo_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/lora/dpo output_dir: saves/qwen3-4b/lora/dpo
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -12,15 +12,14 @@ pref_beta: 0.1
### dataset ### dataset
dataset: kto_en_demo dataset: kto_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/lora/kto output_dir: saves/qwen3-4b/lora/kto
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -13,12 +13,11 @@ lora_target: all
dataset: c4_demo dataset: c4_demo
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/lora/pretrain output_dir: saves/qwen3-4b/lora/pretrain
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -11,15 +11,14 @@ lora_target: all
### dataset ### dataset
dataset: dpo_en_demo dataset: dpo_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/lora/reward output_dir: saves/qwen3-4b/lora/reward
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -2,7 +2,7 @@
set -x set -x
MODEL_PATH=meta-llama/Meta-Llama-3-8B-Instruct MODEL_PATH=Qwen/Qwen3-4B-Instruct-2507
llamafactory-cli train \ llamafactory-cli train \
--model_name_or_path ${MODEL_PATH} \ --model_name_or_path ${MODEL_PATH} \
@@ -13,13 +13,12 @@ llamafactory-cli train \
--lora_rank 8 \ --lora_rank 8 \
--lora_target all \ --lora_target all \
--dataset identity,alpaca_en_demo \ --dataset identity,alpaca_en_demo \
--template llama3 \ --template qwen3_nothink \
--cutoff_len 2048 \ --cutoff_len 2048 \
--max_samples 1000 \ --max_samples 1000 \
--overwrite_cache \
--preprocessing_num_workers 16 \ --preprocessing_num_workers 16 \
--dataloader_num_workers 4 \ --dataloader_num_workers 4 \
--output_dir saves/llama3-8b/lora/sft \ --output_dir saves/qwen3-4b/lora/sft \
--logging_steps 10 \ --logging_steps 10 \
--save_steps 500 \ --save_steps 500 \
--plot_loss \ --plot_loss \

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: openai/gpt-oss-20b model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -11,15 +11,14 @@ lora_target: all
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
template: gpt template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/gpt-20b/lora/sft output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -12,15 +12,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json,
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/lora/sft output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct # or use local absolute path model_name_or_path: Qwen/Qwen3-4B-Instruct-2507 # or use local absolute path
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -12,10 +12,9 @@ lora_target: all
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
dataset_dir: REMOTE:llamafactory/demo_data # or use local absolute path dataset_dir: REMOTE:llamafactory/demo_data # or use local absolute path
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
@@ -29,7 +28,7 @@ save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow] report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### ray ### ray
ray_run_name: llama3_8b_sft_lora ray_run_name: qwen3_4b_sft_lora
ray_storage_path: ./saves ray_storage_path: ./saves
ray_num_workers: 4 # Number of GPUs to use. ray_num_workers: 4 # Number of GPUs to use.
placement_strategy: PACK placement_strategy: PACK

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true trust_remote_code: true
### method ### method
@@ -11,13 +11,12 @@ lora_target: all
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
tokenized_path: saves/llama3-8b/dataset/sft tokenized_path: saves/qwen3-4b/dataset/sft
### output ### output (not used)
output_dir: saves/llama3-8b/lora/sft output_dir: saves/qwen3-4b/lora/sft
overwrite_output_dir: true overwrite_output_dir: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
image_max_pixels: 262144 image_max_pixels: 262144
video_max_pixels: 16384 video_max_pixels: 16384
trust_remote_code: true trust_remote_code: true
@@ -15,15 +15,14 @@ pref_loss: sigmoid # choices: [sigmoid (dpo), orpo, simpo]
### dataset ### dataset
dataset: rlhf_v dataset: rlhf_v
template: qwen2_vl template: qwen3_vl_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/qwen2_5vl-7b/lora/dpo output_dir: saves/qwen3-vl-4b/lora/dpo
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
image_max_pixels: 262144 image_max_pixels: 262144
video_max_pixels: 16384 video_max_pixels: 16384
trust_remote_code: true trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all
### dataset ### dataset
dataset: mllm_demo,identity,alpaca_en_demo # video: mllm_video_demo dataset: mllm_demo,identity,alpaca_en_demo # video: mllm_video_demo
template: qwen2_vl template: qwen3_vl_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/qwen2_5vl-7b/lora/sft output_dir: saves/qwen3-vl-4b/lora/sft
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
template: llama3 template: llama3
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4

View File

@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
template: llama3 template: llama3
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4

View File

@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
template: llama3 template: llama3
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
quantization_bit: 4 quantization_bit: 4
quantization_method: bnb quantization_method: bnb
double_quantization: false double_quantization: false
@@ -14,15 +14,14 @@ lora_target: all
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/lora/sft output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,5 +1,5 @@
### model ### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
quantization_bit: 4 # choices: [8 (bnb/hqq/eetq), 4 (bnb/hqq), 3 (hqq), 2 (hqq)] quantization_bit: 4 # choices: [8 (bnb/hqq/eetq), 4 (bnb/hqq), 3 (hqq), 2 (hqq)]
quantization_method: bnb # choices: [bnb, hqq, eetq] quantization_method: bnb # choices: [bnb, hqq, eetq]
trust_remote_code: true trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all
### dataset ### dataset
dataset: identity,alpaca_en_demo dataset: identity,alpaca_en_demo
template: llama3 template: qwen3_nothink
cutoff_len: 2048 cutoff_len: 2048
max_samples: 1000 max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16 preprocessing_num_workers: 16
dataloader_num_workers: 4 dataloader_num_workers: 4
### output ### output
output_dir: saves/llama3-8b/lora/sft output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10 logging_steps: 10
save_steps: 500 save_steps: 500
plot_loss: true plot_loss: true

View File

@@ -1,42 +1,122 @@
[build-system] [build-system]
requires = ["setuptools>=61.0"] requires = ["hatchling"]
build-backend = "setuptools.build_meta" build-backend = "hatchling.build"
[project] [project]
name = "llamafactory" name = "llamafactory"
requires-python = ">=3.9.0" dynamic = ["version"]
dynamic = [ description = "Unified Efficient Fine-Tuning of 100+ LLMs"
"version", readme = "README.md"
"dependencies", license = "Apache-2.0"
"optional-dependencies", requires-python = ">=3.11.0"
"scripts", authors = [
"authors", { name = "hiyouga", email = "hiyouga@buaa.edu.cn" }
"description", ]
"readme", keywords = [
"license", "AI",
"keywords", "LLM",
"classifiers" "GPT",
"ChatGPT",
"Llama",
"Transformer",
"DeepSeek",
"Pytorch"
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Intended Audience :: Education",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Topic :: Scientific/Engineering :: Artificial Intelligence"
]
dependencies = [
# core deps
"torch>=2.4.0",
"torchvision>=0.19.0",
"torchaudio>=2.4.0",
"transformers>=4.51.0,<=4.57.1,!=4.52.0,!=4.57.0",
"datasets>=2.16.0,<=4.0.0",
"accelerate>=1.3.0,<=1.11.0",
"peft>=0.14.0,<=0.17.1",
"trl>=0.18.0,<=0.24.0",
"torchdata>=0.10.0,<=0.11.0",
# gui
"gradio>=4.38.0,<=5.50.0",
"matplotlib>=3.7.0",
"tyro<0.9.0",
# ops
"einops",
"numpy",
"pandas",
"scipy",
# model and tokenizer
"sentencepiece",
"tiktoken",
"modelscope",
"hf-transfer",
"safetensors",
# python
"av",
"fire",
"omegaconf",
"packaging",
"protobuf",
"pyyaml",
"pydantic",
# api
"uvicorn",
"fastapi",
"sse-starlette"
] ]
[project.optional-dependencies]
dev = ["pre-commit", "ruff", "pytest", "build"]
metrics = ["nltk", "jieba", "rouge-chinese"]
deepspeed = ["deepspeed>=0.10.0,<=0.16.9"]
[project.scripts]
llamafactory-cli = "llamafactory.cli:main"
lmf = "llamafactory.cli:main"
[project.urls]
Homepage = "https://github.com/hiyouga/LLaMA-Factory"
Repository = "https://github.com/hiyouga/LLaMA-Factory"
[tool.hatch.build.targets.wheel]
packages = ["src/llamafactory"]
[tool.hatch.version]
path = "src/llamafactory/extras/env.py"
pattern = "VERSION = \"(?P<version>[^\"]+)\""
[tool.ruff] [tool.ruff]
target-version = "py39" target-version = "py311"
line-length = 119 line-length = 119
indent-width = 4 indent-width = 4
[tool.ruff.lint] [tool.ruff.lint]
ignore = [ ignore = [
"C408", # collection "C408", # collection
"C901", # complex "C901", # complex
"E501", # line too long "E501", # line too long
"E731", # lambda function "E731", # lambda function
"E741", # ambiguous var name "E741", # ambiguous var name
"D100", # no doc public module "UP007", # no upgrade union
"D101", # no doc public class "UP045", # no upgrade optional
"D102", # no doc public method "D100", # no doc public module
"D103", # no doc public function "D101", # no doc public class
"D104", # no doc public package "D102", # no doc public method
"D105", # no doc magic method "D103", # no doc public function
"D107", # no doc __init__ "D104", # no doc public package
"D105", # no doc magic method
"D107", # no doc __init__
] ]
extend-select = [ extend-select = [
"C", # complexity "C", # complexity
@@ -73,23 +153,3 @@ indent-style = "space"
docstring-code-format = true docstring-code-format = true
skip-magic-trailing-comma = false skip-magic-trailing-comma = false
line-ending = "auto" line-ending = "auto"
[tool.uv]
conflicts = [
[
{ extra = "torch-npu" },
{ extra = "aqlm" },
],
[
{ extra = "torch-npu" },
{ extra = "vllm" },
],
[
{ extra = "torch-npu" },
{ extra = "sglang" },
],
[
{ extra = "vllm" },
{ extra = "sglang" },
],
]

View File

@@ -1,38 +0,0 @@
# core deps
transformers>=4.49.0,<=4.56.2,!=4.52.0; python_version < '3.10'
transformers>=4.49.0,<=4.57.3,!=4.52.0,!=4.57.0; python_version >= '3.10'
datasets>=2.16.0,<=4.0.0
accelerate>=1.3.0,<=1.11.0
peft>=0.14.0,<=0.17.1
trl>=0.8.6,<=0.9.6
# gui
gradio>=4.38.0,<=5.45.0
matplotlib>=3.7.0
tyro<0.9.0
# ops
einops
numpy<2.0.0
pandas>=2.0.0
scipy
# model and tokenizer
sentencepiece
tiktoken
modelscope>=1.14.0
hf-transfer
safetensors<=0.5.3
# python
fire
omegaconf
packaging
protobuf
pyyaml
pydantic<=2.10.6
# api
uvicorn
fastapi
sse-starlette
# media
av
librosa
# yanked
propcache!=0.4.0

View File

@@ -16,7 +16,6 @@
# limitations under the License. # limitations under the License.
import os import os
from typing import Optional
import fire import fire
import torch import torch
@@ -34,7 +33,7 @@ def convert_mca_to_hf(
output_path: str = "./output", output_path: str = "./output",
bf16: bool = False, bf16: bool = False,
fp16: bool = False, fp16: bool = False,
convert_model_max_length: Optional[int] = None, convert_model_max_length: int | None = None,
): ):
"""Convert megatron checkpoint to HuggingFace format. """Convert megatron checkpoint to HuggingFace format.
@@ -67,11 +66,11 @@ def convert(
output_path: str = "./output", output_path: str = "./output",
bf16: bool = False, bf16: bool = False,
fp16: bool = False, fp16: bool = False,
convert_model_max_length: Optional[int] = None, convert_model_max_length: int | None = None,
tensor_model_parallel_size: int = 1, tensor_model_parallel_size: int = 1,
pipeline_model_parallel_size: int = 1, pipeline_model_parallel_size: int = 1,
expert_model_parallel_size: int = 1, expert_model_parallel_size: int = 1,
virtual_pipeline_model_parallel_size: Optional[int] = None, virtual_pipeline_model_parallel_size: int | None = None,
): ):
"""Convert checkpoint between MCA and HuggingFace formats. """Convert checkpoint between MCA and HuggingFace formats.

View File

@@ -14,7 +14,7 @@
import json import json
from dataclasses import dataclass from dataclasses import dataclass
from typing import Any, Literal, Optional from typing import Any, Literal
import fire import fire
import torch import torch
@@ -61,7 +61,7 @@ def calculate_ppl(
dataset_dir: str = "data", dataset_dir: str = "data",
template: str = "default", template: str = "default",
cutoff_len: int = 2048, cutoff_len: int = 2048,
max_samples: Optional[int] = None, max_samples: int | None = None,
train_on_prompt: bool = False, train_on_prompt: bool = False,
): ):
r"""Calculate the ppl on the dataset of the pre-trained models. r"""Calculate the ppl on the dataset of the pre-trained models.

View File

@@ -14,10 +14,12 @@
import gc import gc
import json import json
from typing import Optional import time
import av import av
import fire import fire
from datasets import load_dataset
from eval_bleu_rouge import compute_metrics
from tqdm import tqdm from tqdm import tqdm
from transformers import Seq2SeqTrainingArguments from transformers import Seq2SeqTrainingArguments
@@ -49,18 +51,19 @@ def vllm_infer(
dataset_dir: str = "data", dataset_dir: str = "data",
template: str = "default", template: str = "default",
cutoff_len: int = 2048, cutoff_len: int = 2048,
max_samples: Optional[int] = None, max_samples: int | None = None,
vllm_config: str = "{}", vllm_config: str = "{}",
save_name: str = "generated_predictions.jsonl", save_name: str = "generated_predictions.jsonl",
matrix_save_name: str = None,
temperature: float = 0.95, temperature: float = 0.95,
top_p: float = 0.7, top_p: float = 0.7,
top_k: int = 50, top_k: int = 50,
max_new_tokens: int = 1024, max_new_tokens: int = 1024,
repetition_penalty: float = 1.0, repetition_penalty: float = 1.0,
skip_special_tokens: bool = True, skip_special_tokens: bool = True,
default_system: Optional[str] = None, default_system: str | None = None,
enable_thinking: bool = True, enable_thinking: bool = True,
seed: Optional[int] = None, seed: int | None = None,
pipeline_parallel_size: int = 1, pipeline_parallel_size: int = 1,
image_max_pixels: int = 768 * 768, image_max_pixels: int = 768 * 768,
image_min_pixels: int = 32 * 32, image_min_pixels: int = 32 * 32,
@@ -118,6 +121,7 @@ def vllm_infer(
if isinstance(model_args.vllm_config, dict): if isinstance(model_args.vllm_config, dict):
engine_args.update(model_args.vllm_config) engine_args.update(model_args.vllm_config)
model_preparation_start_time = time.time()
llm = LLM(**engine_args) llm = LLM(**engine_args)
# load datasets # load datasets
@@ -143,6 +147,7 @@ def vllm_infer(
all_prompts, all_preds, all_labels = [], [], [] all_prompts, all_preds, all_labels = [], [], []
need_video_kwargs = _need_video_kwargs(template) need_video_kwargs = _need_video_kwargs(template)
model_predict_start_time = time.time()
# Add batch process to avoid the issue of too many files opened # Add batch process to avoid the issue of too many files opened
for i in tqdm(range(0, len(train_dataset), batch_size), desc="Processing batched inference"): for i in tqdm(range(0, len(train_dataset), batch_size), desc="Processing batched inference"):
vllm_inputs, prompts, labels = [], [], [] vllm_inputs, prompts, labels = [], [], []
@@ -219,6 +224,7 @@ def vllm_infer(
all_labels.extend(labels) all_labels.extend(labels)
gc.collect() gc.collect()
model_predict_end_time = time.time()
# Write all results at once outside the loop # Write all results at once outside the loop
with open(save_name, "w", encoding="utf-8") as f: with open(save_name, "w", encoding="utf-8") as f:
for text, pred, label in zip(all_prompts, all_preds, all_labels): for text, pred, label in zip(all_prompts, all_preds, all_labels):
@@ -228,6 +234,49 @@ def vllm_infer(
print(f"{len(all_prompts)} total generated results have been saved at {save_name}.") print(f"{len(all_prompts)} total generated results have been saved at {save_name}.")
print("*" * 70) print("*" * 70)
# Write all matrix results when matrix_save_name is not None,
# The result matrix is referencing src.llamafactory.train.sft.workflow.run_sft # 127~132
# trainer.save_metrics("predict", predict_results.metrics)
#
# {
# "predict_bleu-4": 4.349975,
# "predict_model_preparation_time": 0.0128,
# "predict_rouge-1": 21.873359375,
# "predict_rouge-2": 4.144340625,
# "predict_rouge-l": 10.83949375,
# "predict_runtime": 131.664,
# "predict_samples_per_second": 0.076,
# "predict_steps_per_second": 0.008
# }
#
if matrix_save_name is not None:
predict_time = model_predict_end_time - model_predict_start_time
preparation_time = model_predict_start_time - model_preparation_start_time
start_time = time.time()
dataset = load_dataset("json", data_files=save_name, split="train")
dataset = dataset.map(compute_metrics, num_proc=8, remove_columns=dataset.column_names)
score_dict = dataset.to_dict()
average_score = {}
for task, scores in sorted(score_dict.items(), key=lambda x: x[0]):
score = sum(scores) / len(scores) if scores else 0.0
print(f"predict_{task}: {score:.4f}")
average_score["predict_" + task] = score
average_score["predict_model_preparation_time"] = preparation_time
average_score["predict_runtime"] = predict_time
num_steps = len(range(0, len(train_dataset), batch_size))
average_score["predict_samples_per_second"] = len(dataset) / predict_time if predict_time > 0 else 0.0
average_score["predict_steps_per_second"] = num_steps / predict_time if predict_time > 0 else 0.0
with open(matrix_save_name, "w", encoding="utf-8") as f:
json.dump(average_score, f, indent=4)
print("*" * 70)
print(f"\nDone in {time.time() - start_time:.3f}s.\nScore file saved to {matrix_save_name}.")
print("*" * 70)
if __name__ == "__main__": if __name__ == "__main__":
fire.Fire(vllm_infer) fire.Fire(vllm_infer)

116
setup.py
View File

@@ -1,116 +0,0 @@
# Copyright 2025 the LlamaFactory team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import re
from setuptools import find_packages, setup
def get_version() -> str:
with open(os.path.join("src", "llamafactory", "extras", "env.py"), encoding="utf-8") as f:
file_content = f.read()
pattern = r"{}\W*=\W*\"([^\"]+)\"".format("VERSION")
(version,) = re.findall(pattern, file_content)
return version
def get_requires() -> list[str]:
with open("requirements.txt", encoding="utf-8") as f:
file_content = f.read()
lines = [line.strip() for line in file_content.strip().split("\n") if not line.startswith("#")]
return lines
def get_console_scripts() -> list[str]:
console_scripts = ["llamafactory-cli = llamafactory.cli:main"]
if os.getenv("ENABLE_SHORT_CONSOLE", "1").lower() in ["true", "y", "1"]:
console_scripts.append("lmf = llamafactory.cli:main")
return console_scripts
extra_require = {
"torch": ["torch>=2.0.0", "torchvision>=0.15.0"],
"torch-npu": ["torch==2.7.1", "torch-npu==2.7.1", "torchvision==0.22.1", "decorator"],
"metrics": ["nltk", "jieba", "rouge-chinese"],
"deepspeed": ["deepspeed>=0.10.0,<=0.16.9"],
"liger-kernel": ["liger-kernel>=0.5.5"],
"bitsandbytes": ["bitsandbytes>=0.39.0"],
"hqq": ["hqq"],
"eetq": ["eetq"],
"gptq": ["optimum>=1.24.0", "gptqmodel>=2.0.0"],
"aqlm": ["aqlm[gpu]>=1.1.0"],
"vllm": ["vllm>=0.4.3,<=0.11.0"],
"sglang": ["sglang[srt]>=0.4.5", "transformers==4.51.1"],
"galore": ["galore-torch"],
"apollo": ["apollo-torch"],
"badam": ["badam>=1.2.1"],
"adam-mini": ["adam-mini"],
"minicpm_v": [
"soundfile",
"torchvision",
"torchaudio",
"vector_quantize_pytorch",
"vocos",
"msgpack",
"referencing",
"jsonschema_specifications",
],
"openmind": ["openmind"],
"swanlab": ["swanlab"],
"fp8": ["torchao>=0.8.0", "accelerate>=1.10.0"],
"fp8-te": ["transformer_engine[pytorch]>=2.0.0", "accelerate>=1.10.0"],
"fp8-all": ["torchao>=0.8.0", "transformer_engine[pytorch]>=2.0.0", "accelerate>=1.10.0"],
"dev": ["pre-commit", "ruff", "pytest", "build"],
}
def main():
setup(
name="llamafactory",
version=get_version(),
author="hiyouga",
author_email="hiyouga@buaa.edu.cn",
description="Unified Efficient Fine-Tuning of 100+ LLMs",
long_description=open("README.md", encoding="utf-8").read(),
long_description_content_type="text/markdown",
keywords=["AI", "LLM", "GPT", "ChatGPT", "Llama", "Transformer", "DeepSeek", "Pytorch"],
license="Apache 2.0 License",
url="https://github.com/hiyouga/LLaMA-Factory",
package_dir={"": "src"},
packages=find_packages("src"),
python_requires=">=3.9.0",
install_requires=get_requires(),
extras_require=extra_require,
entry_points={"console_scripts": get_console_scripts()},
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Intended Audience :: Education",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
],
)
if __name__ == "__main__":
main()

View File

@@ -16,7 +16,7 @@ import asyncio
import os import os
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
from functools import partial from functools import partial
from typing import Annotated, Optional from typing import Annotated
from ..chat import ChatModel from ..chat import ChatModel
from ..extras.constants import EngineName from ..extras.constants import EngineName
@@ -79,7 +79,7 @@ def create_app(chat_model: "ChatModel") -> "FastAPI":
api_key = os.getenv("API_KEY") api_key = os.getenv("API_KEY")
security = HTTPBearer(auto_error=False) security = HTTPBearer(auto_error=False)
async def verify_api_key(auth: Annotated[Optional[HTTPAuthorizationCredentials], Depends(security)]): async def verify_api_key(auth: Annotated[HTTPAuthorizationCredentials | None, Depends(security)]):
if api_key and (auth is None or auth.credentials != api_key): if api_key and (auth is None or auth.credentials != api_key):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key.") raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key.")

View File

@@ -14,10 +14,9 @@
import time import time
from enum import Enum, unique from enum import Enum, unique
from typing import Any, Optional, Union from typing import Any, Literal
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from typing_extensions import Literal
@unique @unique
@@ -61,7 +60,7 @@ class FunctionDefinition(BaseModel):
class FunctionAvailable(BaseModel): class FunctionAvailable(BaseModel):
type: Literal["function", "code_interpreter"] = "function" type: Literal["function", "code_interpreter"] = "function"
function: Optional[FunctionDefinition] = None function: FunctionDefinition | None = None
class FunctionCall(BaseModel): class FunctionCall(BaseModel):
@@ -77,35 +76,35 @@ class URL(BaseModel):
class MultimodalInputItem(BaseModel): class MultimodalInputItem(BaseModel):
type: Literal["text", "image_url", "video_url", "audio_url"] type: Literal["text", "image_url", "video_url", "audio_url"]
text: Optional[str] = None text: str | None = None
image_url: Optional[URL] = None image_url: URL | None = None
video_url: Optional[URL] = None video_url: URL | None = None
audio_url: Optional[URL] = None audio_url: URL | None = None
class ChatMessage(BaseModel): class ChatMessage(BaseModel):
role: Role role: Role
content: Optional[Union[str, list[MultimodalInputItem]]] = None content: str | list[MultimodalInputItem] | None = None
tool_calls: Optional[list[FunctionCall]] = None tool_calls: list[FunctionCall] | None = None
class ChatCompletionMessage(BaseModel): class ChatCompletionMessage(BaseModel):
role: Optional[Role] = None role: Role | None = None
content: Optional[str] = None content: str | None = None
tool_calls: Optional[list[FunctionCall]] = None tool_calls: list[FunctionCall] | None = None
class ChatCompletionRequest(BaseModel): class ChatCompletionRequest(BaseModel):
model: str model: str
messages: list[ChatMessage] messages: list[ChatMessage]
tools: Optional[list[FunctionAvailable]] = None tools: list[FunctionAvailable] | None = None
do_sample: Optional[bool] = None do_sample: bool | None = None
temperature: Optional[float] = None temperature: float | None = None
top_p: Optional[float] = None top_p: float | None = None
n: int = 1 n: int = 1
presence_penalty: Optional[float] = None presence_penalty: float | None = None
max_tokens: Optional[int] = None max_tokens: int | None = None
stop: Optional[Union[str, list[str]]] = None stop: str | list[str] | None = None
stream: bool = False stream: bool = False
@@ -118,7 +117,7 @@ class ChatCompletionResponseChoice(BaseModel):
class ChatCompletionStreamResponseChoice(BaseModel): class ChatCompletionStreamResponseChoice(BaseModel):
index: int index: int
delta: ChatCompletionMessage delta: ChatCompletionMessage
finish_reason: Optional[Finish] = None finish_reason: Finish | None = None
class ChatCompletionResponseUsage(BaseModel): class ChatCompletionResponseUsage(BaseModel):
@@ -147,7 +146,7 @@ class ChatCompletionStreamResponse(BaseModel):
class ScoreEvaluationRequest(BaseModel): class ScoreEvaluationRequest(BaseModel):
model: str model: str
messages: list[str] messages: list[str]
max_length: Optional[int] = None max_length: int | None = None
class ScoreEvaluationResponse(BaseModel): class ScoreEvaluationResponse(BaseModel):

View File

@@ -14,9 +14,9 @@
import asyncio import asyncio
import os import os
from collections.abc import AsyncGenerator from collections.abc import AsyncGenerator, Callable
from threading import Thread from threading import Thread
from typing import TYPE_CHECKING, Any, Callable, Optional, Union from typing import TYPE_CHECKING, Any, Optional, Union
import torch import torch
from transformers import GenerationConfig, TextIteratorStreamer from transformers import GenerationConfig, TextIteratorStreamer

View File

@@ -15,7 +15,7 @@ import json
import os import os
from abc import abstractmethod from abc import abstractmethod
from dataclasses import dataclass from dataclasses import dataclass
from typing import TYPE_CHECKING, Any, Optional, Union from typing import TYPE_CHECKING, Any, Union
from ..extras import logging from ..extras import logging
from .data_utils import Role from .data_utils import Role
@@ -40,7 +40,7 @@ class DatasetConverter:
dataset_attr: "DatasetAttr" dataset_attr: "DatasetAttr"
data_args: "DataArguments" data_args: "DataArguments"
def _find_medias(self, medias: Union["MediaType", list["MediaType"], None]) -> Optional[list["MediaType"]]: def _find_medias(self, medias: Union["MediaType", list["MediaType"], None]) -> list["MediaType"] | None:
r"""Optionally concatenate media path to media dir when loading from local disk.""" r"""Optionally concatenate media path to media dir when loading from local disk."""
if medias is None: if medias is None:
return None return None

Some files were not shown because too many files have changed in this diff Show More