[release] Bye 2025 (#9702 )

[core deps] upgrade TRL to be between 0.18 and 0.24 (#9617 )
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2026-02-26 15:56:00 +08:00 · 2025-12-31 22:22:40 +08:00 · 2025-12-31 20:54:27 +08:00 · 2025-12-31 18:30:00 +08:00 · 2025-12-31 18:26:48 +08:00 · 2025-12-30 20:50:38 +08:00
216 changed files with 4025 additions and 1992 deletions
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,180 @@
 # GitHub Copilot Instructions for LLaMA Factory
 ## Project Overview
 LLaMA Factory is an efficient fine-tuning framework for 100+ large language models (LLMs). It provides:
 - Support for various models: LLaMA, LLaVA, Mistral, Qwen, DeepSeek, Yi, Gemma, ChatGLM, Phi, etc.
 - Multiple training methods: pre-training, supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO
 - Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and QLoRA variants
 - Advanced algorithms: GaLore, BAdam, APOLLO, Adam-mini, Muon, OFT, DoRA, etc.
 - Web UI (LLaMA Board) and CLI interfaces
 ### Architecture Versions
 LLaMA Factory has two parallel architectures that can be switched via the `USE_V1` environment variable:
 **v0 (default)** - File hierarchy:
 - `api`, `webui` → `chat`, `eval`, `train` → `data`, `model` → `hparams` → `extras`
 **v1** - File hierarchy:
 - `trainers` → `core` → `accelerator`, `plugins`, `config` → `utils`
 Set `USE_V1=1` to enable v1 architecture.
 ## Code Structure
 ### v0 Architecture (Default)
 - `src/llamafactory/` - Main package directory
  - `api/` - OpenAI-style API implementation
  - `chat/` - Chat interface implementation
  - `cli.py` - Command-line interface
  - `data/` - Data processing and dataset handling
  - `eval/` - Model evaluation utilities
  - `extras/` - Additional utilities and helpers
  - `hparams/` - Hyperparameter definitions
  - `model/` - Model loading, patching, and utilities
  - `train/` - Training pipeline implementation
  - `webui/` - Gradio-based web interface
 - `src/train.py` - Training entry script (delegates to `llamafactory.train.tuner`)
 - `src/webui.py` - Web UI entry script (delegates to `llamafactory.webui.interface`)
 - `src/api.py` - API server entry script (delegates to `llamafactory.api.app`)
 - `tests/` - Test suite
 - `examples/` - Example configurations for various training scenarios
 - `data/` - Dataset definitions and examples
 ### v1 Architecture (USE_V1=1)
 - `src/llamafactory/v1/` - Version 1 package directory
  - `trainers/` - Training implementations
  - `core/` - Core training utilities
  - `accelerator/` - Acceleration and distributed training
  - `plugins/` - Pluggable components (model, data, sampler, trainer)
  - `config/` - Configuration management
  - `utils/` - Utility functions
 ## Development Practices
 ### Code Style
 - Follow the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
 - Use ruff for linting and formatting
 - Line length: 119 characters
 - Indentation: 4 spaces
 - Quote style: double quotes
 - Use Google-style docstrings for documentation
 ### Import Organization
 - Known first-party: `llamafactory`
 - Known third-party: `accelerate`, `datasets`, `gradio`, `numpy`, `peft`, `torch`, `transformers`, `trl`
 - Use 2 blank lines after imports
 ### Quality Checks
 Before committing code, run:
 ```bash
 make style      # Auto-fix style issues
 make quality    # Check code quality
 make test       # Run test suite
 ```
 Or use the combined command:
 ```bash
 make commit     # Run pre-commit hooks
 ```
 ### Testing
 - Use pytest for testing
 - Tests are located in `tests/` and `tests_v1/` directories
 - Run tests with: `make test` (which runs `WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/ tests_v1/`)
 - Disable wandb during testing to avoid external dependencies
 - **Note**: Training configurations require GPU machines, so training is typically not tested end-to-end. Use `make test` to validate file-level functionality.
 ### Building
 Build the package with:
 ```bash
 pip3 install build && python3 -m build
 ```
 ### License
 - All source files must include the Apache 2.0 license header
 - Check license headers with: `make license`
 ## Common Patterns
 ### Configuration Files
 - Training configurations are typically YAML or JSON files in `examples/` directory
 - Hyperparameters are defined using dataclasses in `src/llamafactory/hparams/`
 ### Model Support
 - New model support is added through model patches in `src/llamafactory/model/`
 - Visual models use the visual utilities in `src/llamafactory/model/model_utils/visual.py`
 - Quantization support is in `src/llamafactory/model/model_utils/quantization.py`
 ### Data Processing
 - Dataset definitions are in `data/dataset_info.json`
 - Data templates and processors are in `src/llamafactory/data/`
 ### Training
 - Training pipelines are in `src/llamafactory/train/`
 - Support for different training methods: SFT, DPO, PPO, RM, PT, KTO, ORPO
 ## Key Dependencies
 - Python >= 3.9.0
 - PyTorch and transformers for model handling
 - datasets for data processing
 - peft for parameter-efficient fine-tuning
 - accelerate for distributed training
 - gradio for web UI
 - trl for reinforcement learning
 - Optional: vllm/sglang for inference, flash-attention-2, unsloth, liger-kernel
 ## Entry Points
 - **CLI Training**: `llamafactory-cli train --config examples/train_lora/llama3_lora_sft.yaml`
 - **Web UI**: `llamafactory-cli webui` or `python src/webui.py`
 - **API Server**: `llamafactory-cli api` or `python src/api.py`
 - **Chat Interface**: `llamafactory-cli chat --model_name_or_path MODEL_PATH`
 ## Environment Setup
 For development:
 ```bash
 pip install -e ".[dev]"
 ```
 ## Important Notes
 - The project supports multiple backends: default PyTorch, vLLM, SGLang
 - Megatron-core training is supported via mcore_adapter
 - SwanLab and W&B are supported for experiment tracking
 - Docker support is available with pre-built images
 - Day-0/Day-1 support for latest cutting-edge models
 - Multi-modal support for vision and audio understanding tasks
 ## Contribution Guidelines
 1. Fork the repository
 2. Create a development branch
 3. Set up development environment with `pip install -e ".[dev]"`
 4. Make changes following the style guide
 5. Run quality checks: `make style && make quality`
 6. Run tests: `make test`
 7. Submit a pull request
 ## Common Commands
 - `make style` - Format code
 - `make quality` - Run linters
 - `make test` - Run tests
 - `make commit` - Install and run pre-commit hooks
 - `make license` - Check license headers
--- a/.github/workflows/docker.yml
+++ b/.github/workflows/docker.yml
@@ -7,7 +7,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "docker/**"
      - ".github/workflows/*.yml"
  pull_request:
@@ -15,7 +15,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "docker/**"
      - ".github/workflows/*.yml"
  release:
@@ -29,16 +29,13 @@ jobs:
      matrix:
        include:
          - device: "cuda"
-            npu_type: ""
+          - device: "npu-a2"
-          - device: "npu"
+          - device: "npu-a3"
            npu_type: "a2"
          - device: "npu"
            npu_type: "a3"
    runs-on: ubuntu-latest
    concurrency:
-      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.device }}-${{ matrix.npu_type }}
+      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.device }}
      cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
    environment:
@@ -55,16 +52,11 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - name: Get llamafactory version
        id: version
        run: |
          if [ "${{ github.event_name }}" = "release" ]; then
-            echo "tag=$(python setup.py --version)" >> "$GITHUB_OUTPUT"
+            echo "tag=$(grep -oP 'VERSION = "\K[^"]+' src/llamafactory/extras/env.py)" >> "$GITHUB_OUTPUT"
          else
            echo "tag=latest" >> "$GITHUB_OUTPUT"
          fi
@@ -80,7 +72,7 @@ jobs:
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Login to Quay
-        if: ${{ github.event_name != 'pull_request' && matrix.device == 'npu'}}
+        if: ${{ github.event_name != 'pull_request' && startsWith(matrix.device, 'npu') }}
        uses: docker/login-action@v3
        with:
          registry: quay.io
@@ -93,16 +85,12 @@ jobs:
        with:
          context: .
          file: ./docker/docker-cuda/Dockerfile
          build-args: |
            EXTRAS=metrics,deepspeed,liger-kernel
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
            docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - name: Build and push Docker image (NPU-A2)
-        if: ${{ matrix.device == 'npu' && matrix.npu_type == 'a2' }}
+        if: ${{ matrix.device == 'npu-a2' }}
        uses: docker/build-push-action@v6
        with:
          context: .
@@ -112,11 +100,9 @@ jobs:
          tags: |
            docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a2
            quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a2
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - name: Build and push Docker image (NPU-A3)
-        if: ${{ matrix.device == 'npu' && matrix.npu_type == 'a3' }}
+        if: ${{ matrix.device == 'npu-a3' }}
        uses: docker/build-push-action@v6
        with:
          context: .
@@ -128,5 +114,3 @@ jobs:
          tags: |
            docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a3
            quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a3
          cache-from: type=gha
          cache-to: type=gha,mode=max
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -23,10 +23,11 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4
-      - name: Set up Python
+      - name: Install uv
-        uses: actions/setup-python@v5
+        uses: astral-sh/setup-uv@v7
        with:
-          python-version: "3.9"
+          python-version: "3.11"
          github-token: ${{ github.token }}
      - name: Build package
        run: |
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -7,7 +7,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "Makefile"
      - ".github/workflows/*.yml"
  pull_request:
@@ -15,7 +15,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "Makefile"
      - ".github/workflows/*.yml"
@@ -25,29 +25,25 @@ jobs:
      fail-fast: false
      matrix:
        python:
          - "3.9"
          - "3.10"
          - "3.11"
          - "3.12"
          - "3.13"
        os:
          - "ubuntu-latest"
          - "windows-latest"
          - "macos-latest"
        transformers:
-          - null
+          - ""
        include:  # test backward compatibility
-          - python: "3.9"
+          - python: "3.11"
            os: "ubuntu-latest"
            transformers: "4.49.0"
          - python: "3.9"
            os: "ubuntu-latest"
            transformers: "4.51.0"
-          - python: "3.9"
+          - python: "3.11"
            os: "ubuntu-latest"
            transformers: "4.53.0"
-        exclude:  # exclude python 3.9 on macos
+          - python: "3.11"
-          - python: "3.9"
+            os: "ubuntu-latest"
-            os: "macos-latest"
+            transformers: "4.55.0"
    runs-on: ${{ matrix.os }}
@@ -63,22 +59,23 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4
-      - name: Set up Python
+      - name: Install uv
-        uses: actions/setup-python@v5
+        uses: astral-sh/setup-uv@v7
        with:
          python-version: ${{ matrix.python }}
-          cache: "pip"
+          github-token: ${{ github.token }}
-          cache-dependency-path: "**/requirements*.txt"
+          enable-cache: false
      - name: Install dependencies
        run: |
-          python -m pip install --upgrade pip
+          uv venv
-          python -m pip install ".[torch,dev]"
+          uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
          uv pip install -e ".[dev]"
      - name: Install transformers
        if: ${{ matrix.transformers }}
        run: |
-          python -m pip install "transformers==${{ matrix.transformers }}"
+          uv pip install "transformers==${{ matrix.transformers }}"
      - name: Cache files
        id: hf-hub-cache
@@ -90,18 +87,25 @@ jobs:
      - name: Check quality
        run: |
          make style && make quality
        env:
          UV_NO_SYNC: 1
      - name: Check license
        run: |
          make license
        env:
          UV_NO_SYNC: 1
      - name: Check build
        run: |
          make build
        env:
          UV_NO_SYNC: 1
      - name: Test with pytest
        run: |
          make test
        env:
          UV_NO_SYNC: 1
          HF_HOME: ${{ runner.temp }}/huggingface
          HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"
--- a/.github/workflows/tests_cuda.yml
+++ b/.github/workflows/tests_cuda.yml
@@ -0,0 +1,88 @@
 name: tests_cuda
 on:
  workflow_dispatch:
  push:
    branches:
      - "main"
    paths:
      - "**/*.py"
      - "pyproject.toml"
      - "Makefile"
      - ".github/workflows/*.yml"
  pull_request:
    branches:
      - "main"
    paths:
      - "**/*.py"
      - "pyproject.toml"
      - "Makefile"
      - ".github/workflows/*.yml"
 jobs:
  tests:
    strategy:
      fail-fast: false
      matrix:
        python:
          - "3.11"
        os:
          - "linux-x86_64-gpu-2"
    runs-on: ${{ matrix.os }}
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}-${{ matrix.python }}
      cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install uv
        uses: astral-sh/setup-uv@v7
        with:
          python-version: ${{ matrix.python }}
          github-token: ${{ github.token }}
          enable-cache: false
      - name: Check GPU Status
        run: nvidia-smi
      - name: Install dependencies
        run: |
          uv venv
          uv pip install -e ".[dev]"
      - name: Cache HuggingFace models
        id: hf-hub-cache
        uses: actions/cache@v4
        with:
          path: ${{ runner.temp }}/huggingface
          key: hf-cache-${{ runner.os }}-${{ hashFiles('tests/version.txt') }}
      - name: Check quality
        run: |
          make style && make quality
        env:
          UV_NO_SYNC: 1
      - name: Check license
        run: |
          make license
        env:
          UV_NO_SYNC: 1
      - name: Check build
        run: |
          make build
        env:
          UV_NO_SYNC: 1
      - name: Test with pytest
        run: |
          make test
        env:
          UV_NO_SYNC: 1
          HF_HOME: ${{ runner.temp }}/huggingface
          HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"
--- a/.github/workflows/tests_npu.yml
+++ b/.github/workflows/tests_npu.yml
@@ -7,7 +7,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "Makefile"
      - ".github/workflows/*.yml"
  pull_request:
@@ -15,7 +15,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "Makefile"
      - ".github/workflows/*.yml"
@@ -48,10 +48,18 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install uv
        uses: astral-sh/setup-uv@v7
        with:
          python-version: ${{ matrix.python }}
          github-token: ${{ github.token }}
          enable-cache: false
      - name: Install dependencies
        run: |
-          python -m pip install --upgrade pip
+          uv venv
-          python -m pip install ".[torch-npu,dev]" torch-npu==${{matrix.pytorch_npu}}
+          uv pip install torch-npu==${{matrix.pytorch_npu}}
          uv pip install -e ".[dev]"
      - name: Install node
        run: |
@@ -70,18 +78,25 @@ jobs:
      - name: Check quality
        run: |
          make style && make quality
        env:
          UV_NO_SYNC: 1
      - name: Check license
        run: |
          make license
        env:
          UV_NO_SYNC: 1
      - name: Check build
        run: |
          make build
        env:
          UV_NO_SYNC: 1
      - name: Test with pytest
        run: |
          make test
        env:
          UV_NO_SYNC: 1
          HF_HOME: /root/.cache/huggingface
          HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"
--- a/.gitignore
+++ b/.gitignore
@@ -85,7 +85,7 @@ ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
-# .python-version
+.python-version
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1 +1 @@
-include LICENSE requirements.txt
+include LICENSE
--- a/24
+++ b/24
@@ -1,24 +1,28 @@
 .PHONY: build commit license quality style test
-check_dirs := scripts src tests tests_v1 setup.py
+check_dirs := scripts src tests tests_v1
 RUN := $(shell command -v uv >/dev/null 2>&1 && echo "uv run" || echo "")
 BUILD := $(shell command -v uv >/dev/null 2>&1 && echo "uv build" || echo "python -m build")
 TOOL := $(shell command -v uv >/dev/null 2>&1 && echo "uvx" || echo "")
 build:
-	pip3 install build && python3 -m build
+	$(BUILD)
 commit:
-	pre-commit install
+	$(TOOL) pre-commit install
-	pre-commit run --all-files
+	$(TOOL) pre-commit run --all-files
 license:
-	python3 tests/check_license.py $(check_dirs)
+	$(RUN) python3 tests/check_license.py $(check_dirs)
 quality:
-	ruff check $(check_dirs)
+	$(TOOL) ruff check $(check_dirs)
-	ruff format --check $(check_dirs)
+	$(TOOL) ruff format --check $(check_dirs)
 style:
-	ruff check $(check_dirs) --fix
+	$(TOOL) ruff check $(check_dirs) --fix
-	ruff format $(check_dirs)
+	$(TOOL) ruff format $(check_dirs)
 test:
-	CUDA_VISIBLE_DEVICES= ASCEND_RT_VISIBLE_DEVICES=0 WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/ tests_v1/
+	WANDB_DISABLED=true $(RUN) pytest -vv --import-mode=importlib tests/ tests_v1/
--- a/README.md
+++ b/README.md
@@ -278,27 +278,21 @@ Read technical notes:
 | Model                                                             | Model size                       | Template             |
 | ----------------------------------------------------------------- | -------------------------------- | -------------------- |
 | [Baichuan 2](https://huggingface.co/baichuan-inc)                 | 7B/13B                           | baichuan2            |
 | [BLOOM/BLOOMZ](https://huggingface.co/bigscience)                 | 560M/1.1B/1.7B/3B/7.1B/176B      | -                    |
 | [ChatGLM3](https://huggingface.co/THUDM)                          | 6B                               | chatglm3             |
 | [Command R](https://huggingface.co/CohereForAI)                   | 35B/104B                         | cohere               |
-| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai)         | 7B/16B/67B/236B                  | deepseek             |
+| [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai)     | 7B/16B/67B/236B                  | deepseek             |
-| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
+| [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
 | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai)       | 1.5B/7B/8B/14B/32B/70B/671B      | deepseekr1           |
 | [ERNIE-4.5](https://huggingface.co/baidu)                         | 0.3B/21B/300B                    | ernie/ernie_nothink  |
-| [Falcon](https://huggingface.co/tiiuae)                           | 7B/11B/40B/180B                  | falcon               |
+| [Falcon/Falcon H1](https://huggingface.co/tiiuae)                 | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1     |
 | [Falcon-H1](https://huggingface.co/tiiuae)                        | 0.5B/1.5B/3B/7B/34B              | falcon_h1            |
 | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google)          | 2B/7B/9B/27B                     | gemma/gemma2         |
 | [Gemma 3/Gemma 3n](https://huggingface.co/google)                 | 270M/1B/4B/6B/8B/12B/27B         | gemma3/gemma3n       |
 | [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org)         | 9B/32B                           | glm4/glmz1           |
 | [GLM-4.1V](https://huggingface.co/zai-org)                        | 9B                               | glm4v                |
 | [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org)             | 9B/106B/355B                     | glm4_moe/glm4_5v     |
 | [GPT-2](https://huggingface.co/openai-community)                  | 0.1B/0.4B/0.8B/1.5B              | -                    |
-| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt                  |
+| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt_oss              |
-| [Granite 3.0-3.3](https://huggingface.co/ibm-granite)             | 1B/2B/3B/8B                      | granite3             |
+| [Granite 3-4](https://huggingface.co/ibm-granite)                 | 1B/2B/3B/7B/8B                   | granite3/granite4    |
 | [Granite 4](https://huggingface.co/ibm-granite)                   | 7B                               | granite4             |
 | [Hunyuan (MT)](https://huggingface.co/tencent/)                   | 7B                               | hunyuan              |
 | [Index](https://huggingface.co/IndexTeam)                         | 1.9B                             | index                |
 | [InternLM 2-3](https://huggingface.co/internlm)                   | 7B/8B/20B                        | intern2              |
 | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab)              | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl            |
 | [InternLM/Intern-S1-mini](https://huggingface.co/internlm/)       | 8B                               | intern_s1            |
@@ -312,15 +306,14 @@ Read technical notes:
 | [LLaVA-1.5](https://huggingface.co/llava-hf)                      | 7B/13B                           | llava                |
 | [LLaVA-NeXT](https://huggingface.co/llava-hf)                     | 7B/8B/13B/34B/72B/110B           | llava_next           |
 | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf)               | 7B/34B                           | llava_next_video     |
-| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B                               | mimo                 |
+| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B/309B                          | mimo/mimo_v2         |
 | [MiniCPM 1-4.1](https://huggingface.co/openbmb)                   | 0.5B/1B/2B/4B/8B                 | cpm/cpm3/cpm4        |
 | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb)     | 8B                               | minicpm_o/minicpm_v  |
-| [Ministral(3)/Mistral-Nemo](https://huggingface.co/mistralai)     | 3B/8B/12B/14B                    | ministral/ministral3 |
+| [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models)  | 229B/456B                        | minimax1/minimax2    |
 | [Ministral 3](https://huggingface.co/mistralai)                   | 3B/8B/14B                        | ministral3           |
 | [Mistral/Mixtral](https://huggingface.co/mistralai)               | 7B/8x7B/8x22B                    | mistral              |
 | [Mistral Small](https://huggingface.co/mistralai)                 | 24B                              | mistral_small        |
 | [OLMo](https://huggingface.co/allenai)                            | 1B/7B                            | -                    |
 | [PaliGemma/PaliGemma2](https://huggingface.co/google)             | 3B/10B/28B                       | paligemma            |
 | [Phi-1.5/Phi-2](https://huggingface.co/microsoft)                 | 1.3B/2.7B                        | -                    |
 | [Phi-3/Phi-3.5](https://huggingface.co/microsoft)                 | 4B/14B                           | phi                  |
 | [Phi-3-small](https://huggingface.co/microsoft)                   | 7B                               | phi_small            |
 | [Phi-4](https://huggingface.co/microsoft)                         | 14B                              | phi4                 |
@@ -333,12 +326,9 @@ Read technical notes:
 | [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen)            | 2B/3B/7B/32B/72B                 | qwen2_vl             |
 | [Qwen3-VL](https://huggingface.co/Qwen)                           | 2B/4B/8B/30B/32B/235B            | qwen3_vl             |
 | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed)         | 8B/36B                           | seed_oss/seed_coder  |
 | [Skywork o1](https://huggingface.co/Skywork)                      | 8B                               | skywork_o1           |
 | [StarCoder 2](https://huggingface.co/bigcode)                     | 3B/7B/15B                        | -                    |
-| [TeleChat2](https://huggingface.co/Tele-AI)                       | 3B/7B/35B/115B                   | telechat2            |
+| [VibeThinker-1.5B](https://huggingface.co/WeiboAI)                | 1.5B                             | qwen3                |
 | [XVERSE](https://huggingface.co/xverse)                           | 7B/13B/65B                       | xverse               |
 | [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai)                  | 1.5B/6B/9B/34B                   | yi                   |
 | [Yi-VL](https://huggingface.co/01-ai)                             | 6B/34B                           | yi_vl                |
 | [Yuan 2](https://huggingface.co/IEITYuan)                         | 2B/51B/102B                      | yuan                 |
 > [!NOTE]
@@ -444,6 +434,7 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t
 - [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
 - [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
 - [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
 - [DLR-Web (en)](https://huggingface.co/datasets/Attention1115/DLR-Web)
 - [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
 - [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
 - [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
@@ -525,10 +516,12 @@ huggingface-cli login
 ```bash
 git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
 cd LLaMA-Factory
-pip install -e ".[torch,metrics]" --no-build-isolation
+pip install -e ".[metrics]"
 ```
-Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, aqlm, vllm, sglang, galore, apollo, badam, adam-mini, qwen, minicpm_v, openmind, swanlab, dev
+Optional dependencies available: `metrics`, `deepspeed`. Install with: `pip install -e ".[metrics,deepspeed]"`
 Additional dependencies for specific features are available in `examples/requirements/`.
 #### Install from Docker Image
@@ -547,13 +540,7 @@ Please refer to [build docker](#build-docker) to build the image yourself.
 Create an isolated Python environment with [uv](https://github.com/astral-sh/uv):
 ```bash
-uv sync --extra torch --extra metrics --prerelease=allow
+uv run llamafactory-cli webui
 ```
 Run LLaMA-Factory in the isolated environment:
 ```bash
 uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
 ```
 </details>
@@ -590,7 +577,7 @@ To enable FlashAttention-2 on the Windows platform, please use the script from [
 <details><summary>For Ascend NPU users</summary>
-To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
+To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher: `pip install -e . torch-npu==2.7.1`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
 ```bash
 # replace the url according to your CANN version and devices
@@ -609,8 +596,8 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh
 | Requirement  | Minimum | Recommend      |
 | ------------ | ------- | -------------- |
 | CANN         | 8.0.RC1 | 8.0.0.alpha002 |
-| torch        | 2.1.0   | 2.4.0          |
+| torch        | 2.1.0   | 2.7.1          |
-| torch-npu    | 2.1.0   | 2.4.0.post2    |
+| torch-npu    | 2.1.0   | 2.7.1          |
 | deepspeed    | 0.13.2  | 0.13.2         |
 | vllm-ascend  | -       | 0.7.3          |
@@ -652,7 +639,7 @@ cd transformers
 pip install .
 ```
-3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml).
+3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml).
 </details>
@@ -667,12 +654,12 @@ You can also use **[Easy Dataset](https://github.com/ConardLi/easy-dataset)**, *
 ### Quickstart
-Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Llama3-8B-Instruct model, respectively.
+Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Qwen3-4B-Instruct model, respectively.
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```
 See [examples/README.md](examples/README.md) for advanced usage (including distributed training).
@@ -725,7 +712,6 @@ For CUDA users:
 ```bash
 docker build -f ./docker/docker-cuda/Dockerfile \
    --build-arg PIP_INDEX=https://pypi.org/simple \
    --build-arg EXTRAS=metrics \
    -t llamafactory:latest .
 docker run -dit --ipc=host --gpus=all \
@@ -742,7 +728,6 @@ For Ascend NPU users:
 ```bash
 docker build -f ./docker/docker-npu/Dockerfile \
    --build-arg PIP_INDEX=https://pypi.org/simple \
    --build-arg EXTRAS=torch-npu,metrics \
    -t llamafactory:latest .
 docker run -dit --ipc=host \
@@ -767,7 +752,6 @@ For AMD ROCm users:
 ```bash
 docker build -f ./docker/docker-rocm/Dockerfile \
    --build-arg PIP_INDEX=https://pypi.org/simple \
    --build-arg EXTRAS=metrics \
    -t llamafactory:latest .
 docker run -dit --ipc=host \
@@ -798,7 +782,7 @@ When building the Docker image, use `-v ./hf_cache:/root/.cache/huggingface` arg
 ### Deploy with OpenAI-style API and vLLM
 ```bash
-API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true
+API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
 ```
 > [!TIP]
--- a/README_zh.md
+++ b/README_zh.md
@@ -280,27 +280,21 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 | 模型名                                                             | 参数量                            | Template             |
 | ----------------------------------------------------------------- | -------------------------------- | -------------------- |
 | [Baichuan 2](https://huggingface.co/baichuan-inc)                 | 7B/13B                           | baichuan2            |
 | [BLOOM/BLOOMZ](https://huggingface.co/bigscience)                 | 560M/1.1B/1.7B/3B/7.1B/176B      | -                    |
 | [ChatGLM3](https://huggingface.co/THUDM)                          | 6B                               | chatglm3             |
 | [Command R](https://huggingface.co/CohereForAI)                   | 35B/104B                         | cohere               |
-| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai)         | 7B/16B/67B/236B                  | deepseek             |
+| [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai)     | 7B/16B/67B/236B                  | deepseek             |
-| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
+| [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
 | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai)       | 1.5B/7B/8B/14B/32B/70B/671B      | deepseekr1           |
 | [ERNIE-4.5](https://huggingface.co/baidu)                         | 0.3B/21B/300B                    | ernie/ernie_nothink  |
-| [Falcon](https://huggingface.co/tiiuae)                           | 7B/11B/40B/180B                  | falcon               |
+| [Falcon/Falcon H1](https://huggingface.co/tiiuae)                 | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1     |
 | [Falcon-H1](https://huggingface.co/tiiuae)                        | 0.5B/1.5B/3B/7B/34B              | falcon_h1            |
 | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google)          | 2B/7B/9B/27B                     | gemma/gemma2         |
 | [Gemma 3/Gemma 3n](https://huggingface.co/google)                 | 270M/1B/4B/6B/8B/12B/27B         | gemma3/gemma3n       |
 | [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org)         | 9B/32B                           | glm4/glmz1           |
 | [GLM-4.1V](https://huggingface.co/zai-org)                        | 9B                               | glm4v                |
 | [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org)             | 9B/106B/355B                     | glm4_moe/glm4_5v     |
 | [GPT-2](https://huggingface.co/openai-community)                  | 0.1B/0.4B/0.8B/1.5B              | -                    |
-| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt                  |
+| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt_oss              |
-| [Granite 3.0-3.3](https://huggingface.co/ibm-granite)             | 1B/2B/3B/8B                      | granite3             |
+| [Granite 3-4](https://huggingface.co/ibm-granite)                 | 1B/2B/3B/7B/8B                   | granite3/granite4    |
 | [Granite 4](https://huggingface.co/ibm-granite)                   | 7B                               | granite4             |
 | [Hunyuan (MT)](https://huggingface.co/tencent/)                   | 7B                               | hunyuan              |
 | [Index](https://huggingface.co/IndexTeam)                         | 1.9B                             | index                |
 | [InternLM 2-3](https://huggingface.co/internlm)                   | 7B/8B/20B                        | intern2              |
 | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab)              | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl            |
 | [InternLM/Intern-S1-mini](https://huggingface.co/internlm/)       | 8B                               | intern_s1            |
@@ -314,15 +308,14 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 | [LLaVA-1.5](https://huggingface.co/llava-hf)                      | 7B/13B                           | llava                |
 | [LLaVA-NeXT](https://huggingface.co/llava-hf)                     | 7B/8B/13B/34B/72B/110B           | llava_next           |
 | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf)               | 7B/34B                           | llava_next_video     |
-| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B                               | mimo                 |
+| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B/309B                          | mimo/mimo_v2         |
 | [MiniCPM 1-4.1](https://huggingface.co/openbmb)                   | 0.5B/1B/2B/4B/8B                 | cpm/cpm3/cpm4        |
 | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb)     | 8B                               | minicpm_o/minicpm_v  |
-| [Ministral(3)/Mistral-Nemo](https://huggingface.co/mistralai)     | 3B/8B/12B/14B                    | ministral/ministral3 |
+| [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models)  | 229B/456B                        | minimax1/minimax2    |
 | [Ministral 3](https://huggingface.co/mistralai)                   | 3B/8B/14B                        | ministral3           |
 | [Mistral/Mixtral](https://huggingface.co/mistralai)               | 7B/8x7B/8x22B                    | mistral              |
 | [Mistral Small](https://huggingface.co/mistralai)                 | 24B                              | mistral_small        |
 | [OLMo](https://huggingface.co/allenai)                            | 1B/7B                            | -                    |
 | [PaliGemma/PaliGemma2](https://huggingface.co/google)             | 3B/10B/28B                       | paligemma            |
 | [Phi-1.5/Phi-2](https://huggingface.co/microsoft)                 | 1.3B/2.7B                        | -                    |
 | [Phi-3/Phi-3.5](https://huggingface.co/microsoft)                 | 4B/14B                           | phi                  |
 | [Phi-3-small](https://huggingface.co/microsoft)                   | 7B                               | phi_small            |
 | [Phi-4](https://huggingface.co/microsoft)                         | 14B                              | phi4                 |
@@ -335,12 +328,9 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 | [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen)            | 2B/3B/7B/32B/72B                 | qwen2_vl             |
 | [Qwen3-VL](https://huggingface.co/Qwen)                           | 2B/4B/8B/30B/32B/235B            | qwen3_vl             |
 | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed)         | 8B/36B                           | seed_oss/seed_coder  |
 | [Skywork o1](https://huggingface.co/Skywork)                      | 8B                               | skywork_o1           |
 | [StarCoder 2](https://huggingface.co/bigcode)                     | 3B/7B/15B                        | -                    |
-| [TeleChat2](https://huggingface.co/Tele-AI)                       | 3B/7B/35B/115B                   | telechat2            |
+| [VibeThinker-1.5B](https://huggingface.co/WeiboAI)                | 1.5B                             | qwen3                |
 | [XVERSE](https://huggingface.co/xverse)                           | 7B/13B/65B                       | xverse               |
 | [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai)                  | 1.5B/6B/9B/34B                   | yi                   |
 | [Yi-VL](https://huggingface.co/01-ai)                             | 6B/34B                           | yi_vl                |
 | [Yuan 2](https://huggingface.co/IEITYuan)                         | 2B/51B/102B                      | yuan                 |
 > [!NOTE]
@@ -446,6 +436,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 - [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
 - [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
 - [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
 - [DLR-Web (en)](https://huggingface.co/datasets/Attention1115/DLR-Web)
 - [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
 - [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
 - [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
@@ -527,10 +518,12 @@ huggingface-cli login
 ```bash
 git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
 cd LLaMA-Factory
-pip install -e ".[torch,metrics]" --no-build-isolation
+pip install -e ".[metrics]"
 ```
-可选的额外依赖项：torch、torch-npu、metrics、deepspeed、liger-kernel、bitsandbytes、hqq、eetq、gptq、aqlm、vllm、sglang、galore、apollo、badam、adam-mini、qwen、minicpm_v、openmind、swanlab、dev
+可选的额外依赖项：`metrics`、`deepspeed`。使用 `pip install -e ".[metrics,deepspeed]"` 安装。
 其他可选依赖项请参考 `examples/requirements/` 目录下的文件。
 #### 从镜像安装
@@ -549,13 +542,7 @@ docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest
 使用 [uv](https://github.com/astral-sh/uv) 创建隔离的 Python 环境：
 ```bash
-uv sync --extra torch --extra metrics --prerelease=allow
+uv run llamafactory-cli webui
 ```
 在环境中运行 LLaMA-Factory：
 ```bash
 uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
 ```
 </details>
@@ -592,7 +579,7 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl
 <details><summary>昇腾 NPU 用户指南</summary>
-在昇腾 NPU 设备上安装 LLaMA Factory 时，请升级 Python 到 3.10 及以上，并需要指定额外依赖项，使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外，还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**，安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令：
+在昇腾 NPU 设备上安装 LLaMA Factory 时，请升级 Python 到 3.10 及以上，并需要指定额外依赖项，使用 `pip install -e . torch-npu==2.7.1` 命令安装。此外，还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**，安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令：
 ```bash
 # 请替换 URL 为 CANN 版本和设备型号对应的 URL
@@ -611,8 +598,8 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh
 | 依赖项        | 至少     | 推荐           |
 | ------------ | ------- | -------------- |
 | CANN         | 8.0.RC1 | 8.0.0.alpha002 |
-| torch        | 2.1.0   | 2.4.0          |
+| torch        | 2.1.0   | 2.7.1          |
-| torch-npu    | 2.1.0   | 2.4.0.post2    |
+| torch-npu    | 2.1.0   | 2.7.1          |
 | deepspeed    | 0.13.2  | 0.13.2         |
 | vllm-ascend  | -       | 0.7.3          |
@@ -654,7 +641,7 @@ cd transformers
 pip install .
 ```
-3. 在训练参数中设置 `double_quantization: false`，可参考[示例](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml)。
+3. 在训练参数中设置 `double_quantization: false`，可参考[示例](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml)。
 </details>
@@ -669,12 +656,12 @@ pip install .
 ### 快速开始
-下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。
+下面三行命令分别对 Qwen3-4B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```
 高级用法请参考 [examples/README_zh.md](examples/README_zh.md)（包括多 GPU 微调）。
@@ -800,7 +787,7 @@ docker exec -it llamafactory bash
 ### 利用 vLLM 部署 OpenAI API
 ```bash
-API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true
+API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
 ```
 > [!TIP]
--- a/data/dataset_info.json
+++ b/data/dataset_info.json
@@ -471,6 +471,14 @@
  "ultrachat_de": {
    "hf_hub_url": "mayflowergmbh/ultra-chat_de"
  },
  "dlr_web": {
    "hf_hub_url": "Attention1115/DLR-Web",
    "split": "full",
    "columns": {
      "prompt": "question",
      "response": "response"
    }
  },
  "dpo_en_demo": {
    "file_name": "dpo_en_demo.json",
    "ranking": true,
--- a/docker/docker-cuda/Dockerfile
+++ b/docker/docker-cuda/Dockerfile
@@ -4,7 +4,6 @@ FROM ${BASE_IMAGE}
 # Installation arguments
 ARG PIP_INDEX=https://pypi.org/simple
 ARG EXTRAS=metrics
 ARG INSTALL_FLASHATTN=false
 ARG HTTP_PROXY=""
@@ -27,17 +26,13 @@ WORKDIR /app
 # Change pip source
 RUN pip config set global.index-url "${PIP_INDEX}" && \
    pip config set global.extra-index-url "${PIP_INDEX}" && \
-    pip install --no-cache-dir --upgrade pip packaging wheel setuptools
+    pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"
-# Install the requirements
+# Copy the application into the image
 COPY requirements.txt /app
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy the rest of the application into the image
 COPY . /app
 # Install LLaMA Factory
-RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation
+RUN pip install --no-cache-dir --no-build-isolation -e ".[metrics,deepspeed]"
 # Rebuild flash attention
 RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \
--- a/docker/docker-cuda/Dockerfile.megatron
+++ b/docker/docker-cuda/Dockerfile.megatron
@@ -8,7 +8,7 @@ ENV PYPI_MIRROR=https://mirrors.aliyun.com/pypi/simple/
 ENV PYPI_TRUSTED_HOST=mirrors.aliyun.com
 ENV APT_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/
-RUN pip install --upgrade pip setuptools wheel --trusted-host ${PYPI_TRUSTED_HOST} --index-url ${PYPI_MIRROR}
+RUN pip install --upgrade pip setuptools wheel "hatchling>=1.18.0" editables --trusted-host ${PYPI_TRUSTED_HOST} --index-url ${PYPI_MIRROR}
 RUN pip uninstall -y torch torchvision torch-tensorrt \
    flash_attn transformer-engine \
@@ -56,14 +56,14 @@ ENV JAVA_HOME /usr/lib/jvm/java-21-openjdk-amd64
 # pip install LLaMA-Factory
 WORKDIR /app
-COPY requirements.txt /app/
+# Copy the application into the image
-RUN pip install --no-cache-dir -r requirements.txt
+COPY . /app
 # Install LLaMA Factory
 RUN pip install --no-cache-dir -e ".[metrics]" --no-build-isolation
 RUN pip install "git+https://github.com/alibaba/roll.git#subdirectory=mcore_adapter"
 COPY . /app/
 RUN pip install -e ".[metrics]" --no-build-isolation
 # Expose port 7860 for LLaMA Board
 ENV GRADIO_SERVER_PORT=7860
 EXPOSE 7860
--- a/docker/docker-cuda/docker-compose.yml
+++ b/docker/docker-cuda/docker-compose.yml
@@ -5,7 +5,6 @@ services:
      context: ../..
      args:
        PIP_INDEX: https://pypi.org/simple
        EXTRAS: metrics
    container_name: llamafactory
    ports:
      - "7860:7860"
--- a/docker/docker-npu/Dockerfile
+++ b/docker/docker-npu/Dockerfile
@@ -5,7 +5,6 @@ FROM ${BASE_IMAGE}
 # Installation arguments
 ARG PIP_INDEX=https://pypi.org/simple
 ARG EXTRAS=torch-npu,metrics
 ARG HTTP_PROXY=""
 ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cpu
@@ -28,21 +27,15 @@ WORKDIR /app
 # Change pip source
 RUN pip config set global.index-url "${PIP_INDEX}" && \
    pip config set global.extra-index-url "${PIP_INDEX}" && \
-    pip install --no-cache-dir --upgrade pip packaging wheel setuptools
+    pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"
 # Copy the application into the image
 COPY . /app
 # Install torch-npu
 RUN pip uninstall -y torch torchvision torchaudio && \
-    pip install --no-cache-dir "torch==2.7.1" "torch-npu==2.7.1" "torchvision==0.22.1" --index-url "${PYTORCH_INDEX}"
+    pip install --no-cache-dir "torch==2.7.1" "torch-npu==2.7.1" "torchvision==0.22.1" "torchaudio==2.7.1" --index-url "${PYTORCH_INDEX}" && \
-
+    pip install --no-cache-dir -e ".[metrics]" --no-build-isolation
 # Install the requirements
 COPY requirements.txt /app
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy the rest of the application into the image
 COPY . /app
 # Install LLaMA Factory
 RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation
 # Set up volumes
 # VOLUME [ "/root/.cache/huggingface", "/app/shared_data", "/app/output" ]
--- a/docker/docker-npu/docker-compose.yml
+++ b/docker/docker-npu/docker-compose.yml
@@ -5,7 +5,6 @@ services:
      context: ../..
      args:
        PIP_INDEX: https://pypi.org/simple
        EXTRAS: torch-npu,metrics
    container_name: llamafactory-a2
    image: llamafactory:npu-a2
    volumes:
@@ -36,7 +35,6 @@ services:
      args:
        BASE_IMAGE: quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11
        PIP_INDEX: https://pypi.org/simple
        EXTRAS: torch-npu,metrics
    container_name: llamafactory-a3
    image: llamafactory:npu-a3
    volumes:
--- a/docker/docker-rocm/Dockerfile
+++ b/docker/docker-rocm/Dockerfile
@@ -4,7 +4,6 @@ FROM ${BASE_IMAGE}
 # Installation arguments
 ARG PIP_INDEX=https://pypi.org/simple
 ARG EXTRAS=metrics
 ARG INSTALL_FLASHATTN=false
 ARG HTTP_PROXY=""
 ARG PYTORCH_INDEX=https://download.pytorch.org/whl/rocm6.3
@@ -28,21 +27,14 @@ WORKDIR /app
 # Change pip source
 RUN pip config set global.index-url "${PIP_INDEX}" && \
    pip config set global.extra-index-url "${PIP_INDEX}" && \
-    pip install --no-cache-dir --upgrade pip packaging wheel setuptools
+    pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"
-# Reinstall pytorch rocm
+# Copy the application into the image
 RUN pip uninstall -y torch torchvision torchaudio && \
    pip install --no-cache-dir --pre torch torchvision torchaudio --index-url "${PYTORCH_INDEX}"
 # Install the requirements
 COPY requirements.txt /app
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy the rest of the application into the image
 COPY . /app
-# Install LLaMA Factory
+# Reinstall pytorch rocm and install LLaMA Factory
-RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation
+RUN pip uninstall -y torch torchvision torchaudio && \
    pip install --no-cache-dir --no-build-isolation -e --pre ".[metrics,deepspeed]" --index-url "${PYTORCH_INDEX}"
 # Rebuild flash attention
 RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \
--- a/docker/docker-rocm/docker-compose.yml
+++ b/docker/docker-rocm/docker-compose.yml
@@ -5,7 +5,6 @@ services:
      context: ../..
      args:
        PIP_INDEX: https://pypi.org/simple
        EXTRAS: metrics
    container_name: llamafactory
    ports:
      - "7860:7860"
--- a/examples/README.md
+++ b/examples/README.md
@@ -18,19 +18,19 @@ By default, LLaMA-Factory uses all visible computing devices.
 Basic usage:
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```
 Advanced usage:
 ```bash
-CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
+CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
    learning_rate=1e-5 \
    logging_steps=1
 ```
 ```bash
-bash examples/train_lora/llama3_lora_sft.sh
+bash examples/train_lora/qwen3_lora_sft.sh
 ```
 ## Examples
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
 #### (Continuous) Pre-Training
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
 ```
 #### Supervised Fine-Tuning
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```
 #### Multimodal Supervised Fine-Tuning
 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
 ```
 #### DPO/ORPO/SimPO Training
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
 ```
 #### Multimodal DPO/ORPO/SimPO Training
 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
 ```
 #### Reward Modeling
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
 ```
 #### PPO Training
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
 ```
 #### KTO Training
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
 ```
 #### Preprocess Dataset
@@ -90,32 +84,26 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
 It is useful for large dataset, use `tokenized_path` in config to load the preprocessed dataset.
 ```bash
-llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
+llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
 ```
 #### Evaluating on MMLU/CMMLU/C-Eval Benchmarks
 ```bash
 llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
 ```
 #### Supervised Fine-Tuning on Multiple Nodes
 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```
 #### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)
 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
 ```
 #### Supervised Fine-Tuning with Ray on 4 GPUs
 ```bash
-USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
+USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
 ```
 ### QLoRA Fine-Tuning
@@ -123,13 +111,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
 #### Supervised Fine-Tuning with 4/8-bit Bitsandbytes/HQQ/EETQ Quantization (Recommended)
 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
 ```
 #### Supervised Fine-Tuning with 4-bit Bitsandbytes Quantization on Ascend NPU
 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
 ```
 #### Supervised Fine-Tuning with 4/8-bit GPTQ Quantization
@@ -155,14 +143,14 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
 #### Supervised Fine-Tuning on Single Node
 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```
 #### Supervised Fine-Tuning on Multiple Nodes
 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```
 ### Elastic and Fault-Tolerant Supervised Fine-Tuning on Multiple Nodes
@@ -170,13 +158,13 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
 To launch an elastic job with `MAX_RESTARTS` failures retries, run the following on at least `MIN_NNODES` nodes and at most `MAX_NNODES` nodes. `RDZV_ID` should be set as a unique job id (shared by all nodes participating in the job). See also [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html).
 ```bash
-FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```
 #### Multimodal Supervised Fine-Tuning
 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
 ```
 ### Merging LoRA Adapters and Quantization
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
 Note: DO NOT use quantized model or `quantization_bit` when merging LoRA adapters.
 ```bash
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```
 #### Quantizing Model using AutoGPTQ
 ```bash
-llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
+llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
 ```
 ### Save Ollama modelfile
 ```bash
-llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
 ```
 ### Inferring LoRA Fine-Tuned Models
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
 #### Evaluation using vLLM's Multi-GPU Inference
 ```
-python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo
+python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
 python scripts/eval_bleu_rouge.py generated_predictions.jsonl
 ```
 #### Use CLI ChatBox
 ```bash
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
 ```
 #### Use Web UI ChatBox
 ```bash
-llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
 ```
 #### Launch OpenAI-style API
 ```bash
-llamafactory-cli api examples/inference/llama3_lora_sft.yaml
+llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
 ```
 ### Extras
--- a/examples/README_zh.md
+++ b/examples/README_zh.md
@@ -18,19 +18,19 @@ LLaMA-Factory 默认使用所有可见的计算设备。
 基础用法：
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```
 高级用法：
 ```bash
-CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
+CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
    learning_rate=1e-5 \
    logging_steps=1
 ```
 ```bash
-bash examples/train_lora/llama3_lora_sft.sh
+bash examples/train_lora/qwen3_lora_sft.sh
 ```
 ## 示例
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
 #### （增量）预训练
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
 ```
 #### 指令监督微调
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```
 #### 多模态指令监督微调
 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
 ```
 #### DPO/ORPO/SimPO 训练
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
 ```
 #### 多模态 DPO/ORPO/SimPO 训练
 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
 ```
 #### 奖励模型训练
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
 ```
 #### PPO 训练
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
 ```
 #### KTO 训练
 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
 ```
 #### 预处理数据集
@@ -90,20 +84,14 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
 对于大数据集有帮助，在配置中使用 `tokenized_path` 以加载预处理后的数据集。
 ```bash
-llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
+llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
 ```
 #### 在 MMLU/CMMLU/C-Eval 上评估
 ```bash
 llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
 ```
 #### 多机指令监督微调
 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```
 ### 支持弹性和容错的多机指令监督微调
@@ -111,19 +99,19 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
 要启动一个支持弹性节点和容错的多机指令微调，在每个节点上执行以下命令。弹性节点数量范围为 `MIN_NNODES:MAX_NNODES`，每个节点最多允许因为错误重启 `MAX_RESTARTS` 次。`RDZV_ID` 应设置为一个唯一的作业 ID（由参与该作业的所有节点共享）。更多新可以参考官方文档 [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html)。
 ```bash
-FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```
 #### 使用 DeepSpeed ZeRO-3 平均分配显存
 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
 ```
 #### 使用 Ray 在 4 张 GPU 上微调
 ```bash
-USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
+USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
 ```
 ### QLoRA 微调
@@ -131,13 +119,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
 #### 基于 4/8 比特 Bitsandbytes/HQQ/EETQ 量化进行指令监督微调（推荐）
 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
 ```
 #### 在 NPU 上基于 4 比特 Bitsandbytes 量化进行指令监督微调
 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
 ```
 #### 基于 4/8 比特 GPTQ 量化进行指令监督微调
@@ -163,20 +151,20 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
 #### 在单机上进行指令监督微调
 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```
 #### 在多机上进行指令监督微调
 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```
 #### 多模态指令监督微调
 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
 ```
 ### 合并 LoRA 适配器与模型量化
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
 注：请勿使用量化后的模型或 `quantization_bit` 参数来合并 LoRA 适配器。
 ```bash
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```
 #### 使用 AutoGPTQ 量化模型
 ```bash
-llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
+llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
 ```
 ### 保存 Ollama 配置文件
 ```bash
-llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
 ```
 ### 推理 LoRA 模型
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
 #### 使用 vLLM 多卡推理评估
 ```
-python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo
+python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
 python scripts/eval_bleu_rouge.py generated_predictions.jsonl
 ```
 #### 使用命令行对话框
 ```bash
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
 ```
 #### 使用浏览器对话框
 ```bash
-llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
 ```
 #### 启动 OpenAI 风格 API
 ```bash
-llamafactory-cli api examples/inference/llama3_lora_sft.yaml
+llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
 ```
 ### 杂项
--- a/examples/train_full/qwen3_full_sft_autotp.yaml
+++ b/examples/train_full/qwen3_full_sft_autotp.yaml
@@ -1,16 +1,22 @@
 # Start FSDP2 fine-tuning
 # accelerate launch \
 #     --config_file examples/accelerate/fsdp2_config.yaml \
 #     src/train.py examples/ascend/qwen3_full_sft_fsdp2.yaml
 # Change `num_processes` in fsdp2_config.yaml to 16 in A3
 ### model
-model_name_or_path: Qwen/Qwen3-32B
+model_name_or_path: Qwen/Qwen3-8B
 trust_remote_code: true
 use_v1_kernels: true
 flash_attn: fa2
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 deepspeed: examples/deepspeed/ds_z2_autotp_config.json
 ### dataset
-dataset: identity,alpaca_en_demo
+dataset: alpaca_en_demo
 template: qwen3
 cutoff_len: 2048
 max_samples: 1000
@@ -19,28 +25,21 @@ preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/qwen3-32b/full/sft_autotp
+output_dir: saves/Qwen3-8B/full/sft
 logging_steps: 1
 save_steps: 500
 max_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 save_only_model: false
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 ### train
-per_device_train_batch_size: 4
+per_device_train_batch_size: 8
 gradient_accumulation_steps: 1
-learning_rate: 1.0e-4
+learning_rate: 1.0e-5
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
-ddp_timeout: 180000000
+ddp_timeout: 1800
 resume_from_checkpoint: null
 ### eval
 # eval_dataset: alpaca_en_demo
 # val_size: 0.1
 # per_device_eval_batch_size: 1
 # eval_strategy: steps
 # eval_steps: 500
--- a/examples/ascend/qwen3moe_full_sft_fsdp.yaml
+++ b/examples/ascend/qwen3moe_full_sft_fsdp.yaml
@@ -0,0 +1,46 @@
 # Start FSDP fine-tuning
 # accelerate launch \
 #     --config_file examples/accelerate/fsdp_config.yaml \
 #     src/train.py examples/ascend/qwen3moe_full_sft_fsdp.yaml
 # Change `num_processes` in fsdp_config.yaml to 16 in A3
 ### model
 model_name_or_path: Qwen/Qwen3-30B-A3B-Instruct-2507
 trust_remote_code: true
 use_v1_kernels: true
 flash_attn: fa2
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 disable_gradient_checkpointing: false
 ### dataset
 dataset: alpaca_zh
 template: qwen3
 cutoff_len: 1024
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
 output_dir: saves/Qwen3-30B-A3B-Instruct-2507/full/sft
 logging_steps: 1
 save_steps: 500
 max_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 save_only_model: true
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 ### train
 per_device_train_batch_size: 4
 gradient_accumulation_steps: 1
 learning_rate: 1.0e-4
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 resume_from_checkpoint: null
 seed: 1234
--- a/examples/ascend/qwen3vlmoe_full_sft_fsdp2.yaml
+++ b/examples/ascend/qwen3vlmoe_full_sft_fsdp2.yaml
@@ -0,0 +1,48 @@
 # Start FSDP2 fine-tuning
 # accelerate launch \
 #     --config_file examples/accelerate/fsdp2_config.yaml \
 #     src/train.py examples/ascend/qwen3vlmoe_full_sft_fsdp2.yaml
 # Change `num_processes` in fsdp2_config.yaml to 16 in A3
 ### model
 model_name_or_path: Qwen/Qwen3-VL-30B-A3B-Instruct
 image_max_pixels: 262144
 video_max_pixels: 16384
 trust_remote_code: true
 use_v1_kernels: true
 flash_attn: fa2
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 disable_gradient_checkpointing: false
 ### dataset
 dataset: llava_1k_en, llava_1k_zh
 template: qwen3_vl
 cutoff_len: 1024
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
 output_dir: saves/Qwen3-VL-30B-A3B-Instruct/full/sft
 logging_steps: 1
 save_steps: 500
 max_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 save_only_model: true
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 ### train
 per_device_train_batch_size: 2
 gradient_accumulation_steps: 1
 learning_rate: 1.0e-4
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 resume_from_checkpoint: null
 seed: 1234
--- a/examples/inference/llama3_lora_sft.yaml
+++ b/examples/inference/llama3_lora_sft.yaml
@@ -1,5 +0,0 @@
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 adapter_name_or_path: saves/llama3-8b/lora/sft
 template: llama3
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/inference/qwen2_5vl.yaml
+++ b/examples/inference/qwen2_5vl.yaml
@@ -1,4 +1,4 @@
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
-template: qwen2_vl
+template: qwen3_nothink
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/inference/llama3_full_sft.yaml
+++ b/examples/inference/llama3_full_sft.yaml
@@ -1,4 +1,4 @@
-model_name_or_path: saves/llama3-8b/full/sft
+model_name_or_path: saves/qwen3-4b/full/sft
-template: llama3
+template: qwen3_nothink
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/inference/qwen3_lora_sft.yaml
+++ b/examples/inference/qwen3_lora_sft.yaml
@@ -0,0 +1,5 @@
 model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 adapter_name_or_path: saves/qwen3-4b/lora/sft
 template: qwen3_nothink
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/inference/qwen3vl.yaml
+++ b/examples/inference/qwen3vl.yaml
@@ -1,4 +1,4 @@
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
-template: llama3
+template: qwen3_vl_nothink
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/ktransformers/infer_lora/deepseek2_lora_sft_kt.yaml
+++ b/examples/ktransformers/infer_lora/deepseek2_lora_sft_kt.yaml
--- a/examples/ktransformers/infer_lora/deepseek3_kt.yaml
+++ b/examples/ktransformers/infer_lora/deepseek3_kt.yaml
--- a/examples/ktransformers/infer_lora/deepseek3_lora_sft_kt.yaml
+++ b/examples/ktransformers/infer_lora/deepseek3_lora_sft_kt.yaml
--- a/examples/ktransformers/infer_lora/qwen3moe_lora_sft_kt.yaml
+++ b/examples/ktransformers/infer_lora/qwen3moe_lora_sft_kt.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat-sft-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx-multi-gpu.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx-multi-gpu.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu-4.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu-4.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/Qwen3Moe-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/Qwen3Moe-sft-amx.yaml
--- a/examples/ktransformers/train_lora/deepseek2_lora_sft_kt.yaml
+++ b/examples/ktransformers/train_lora/deepseek2_lora_sft_kt.yaml
--- a/examples/ktransformers/train_lora/deepseek3_lora_sft_kt.yaml
+++ b/examples/ktransformers/train_lora/deepseek3_lora_sft_kt.yaml
--- a/examples/ktransformers/train_lora/qwen3moe_lora_sft_kt.yaml
+++ b/examples/ktransformers/train_lora/qwen3moe_lora_sft_kt.yaml
--- a/examples/merge_lora/llama3_full_sft.yaml
+++ b/examples/merge_lora/llama3_full_sft.yaml
@@ -1,10 +1,10 @@
 ### model
-model_name_or_path: saves/llama3-8b/full/sft
+model_name_or_path: saves/qwen3-4b/full/sft
-template: llama3
+template: qwen3_nothink
 trust_remote_code: true
 ### export
-export_dir: output/llama3_full_sft
+export_dir: saves/qwen3_sft_merged
 export_size: 5
 export_device: cpu  # choices: [cpu, auto]
 export_legacy_format: false
--- a/examples/merge_lora/llama3_gptq.yaml
+++ b/examples/merge_lora/llama3_gptq.yaml
@@ -1,10 +1,10 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
-template: llama3
+template: qwen3_nothink
 trust_remote_code: true
 ### export
-export_dir: output/llama3_gptq
+export_dir: saves/qwen3_gptq
 export_quantization_bit: 4
 export_quantization_dataset: data/c4_demo.jsonl
 export_size: 5
--- a/examples/merge_lora/qwen2_5vl_lora_sft.yaml
+++ b/examples/merge_lora/qwen2_5vl_lora_sft.yaml
@@ -1,13 +1,13 @@
 ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
-adapter_name_or_path: saves/qwen2_5vl-7b/lora/sft
+adapter_name_or_path: saves/qwen3-4b/lora/sft
-template: qwen2_vl
+template: qwen3_nothink
 trust_remote_code: true
 ### export
-export_dir: output/qwen2_5vl_lora_sft
+export_dir: saves/qwen3_sft_merged
 export_size: 5
 export_device: cpu  # choices: [cpu, auto]
 export_legacy_format: false
--- a/examples/merge_lora/qwen3vl_lora_sft.yaml
+++ b/examples/merge_lora/qwen3vl_lora_sft.yaml
@@ -1,13 +1,13 @@
 ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
-adapter_name_or_path: saves/llama3-8b/lora/sft
+adapter_name_or_path: saves/qwen3-vl-4b/lora/sft
-template: llama3
+template: qwen3_vl_nothink
 trust_remote_code: true
 ### export
-export_dir: output/llama3_lora_sft
+export_dir: saves/qwen3_vl_sft_merged
 export_size: 5
 export_device: cpu  # choices: [cpu, auto]
 export_legacy_format: false
--- a/examples/requirements/adam-mini.txt
+++ b/examples/requirements/adam-mini.txt
@@ -0,0 +1 @@
 adam-mini
--- a/examples/requirements/apollo.txt
+++ b/examples/requirements/apollo.txt
@@ -0,0 +1 @@
 apollo-torch
--- a/examples/requirements/aqlm.txt
+++ b/examples/requirements/aqlm.txt
@@ -0,0 +1 @@
 aqlm[gpu]>=1.1.0
--- a/examples/requirements/badam.txt
+++ b/examples/requirements/badam.txt
@@ -0,0 +1 @@
 badam>=1.2.1
--- a/examples/requirements/bitsandbytes.txt
+++ b/examples/requirements/bitsandbytes.txt
@@ -0,0 +1 @@
 bitsandbytes>=0.39.0
--- a/examples/requirements/eetq.txt
+++ b/examples/requirements/eetq.txt
@@ -0,0 +1 @@
 eetq
--- a/examples/requirements/fp8-te.txt
+++ b/examples/requirements/fp8-te.txt
@@ -0,0 +1,2 @@
 transformer_engine[pytorch]>=2.0.0
 accelerate>=1.10.0
--- a/examples/requirements/fp8.txt
+++ b/examples/requirements/fp8.txt
@@ -0,0 +1,2 @@
 torchao>=0.8.0
 accelerate>=1.10.0
--- a/examples/requirements/galore.txt
+++ b/examples/requirements/galore.txt
@@ -0,0 +1 @@
 galore-torch
--- a/examples/requirements/gptq.txt
+++ b/examples/requirements/gptq.txt
@@ -0,0 +1,2 @@
 optimum>=1.24.0
 gptqmodel>=2.0.0
--- a/examples/requirements/hqq.txt
+++ b/examples/requirements/hqq.txt
@@ -0,0 +1 @@
 hqq
--- a/examples/requirements/liger-kernel.txt
+++ b/examples/requirements/liger-kernel.txt
@@ -0,0 +1 @@
 liger-kernel>=0.5.5
--- a/examples/requirements/minicpm-v.txt
+++ b/examples/requirements/minicpm-v.txt
@@ -0,0 +1,8 @@
 soundfile
 torchvision
 torchaudio
 vector_quantize_pytorch
 vocos
 msgpack
 referencing
 jsonschema_specifications
--- a/examples/requirements/openmind.txt
+++ b/examples/requirements/openmind.txt
@@ -0,0 +1 @@
 openmind
--- a/examples/requirements/sglang.txt
+++ b/examples/requirements/sglang.txt
@@ -0,0 +1,2 @@
 sglang[srt]>=0.4.5
 transformers==4.51.1
--- a/examples/requirements/swanlab.txt
+++ b/examples/requirements/swanlab.txt
@@ -0,0 +1 @@
 swanlab
--- a/examples/requirements/vllm.txt
+++ b/examples/requirements/vllm.txt
@@ -0,0 +1 @@
 vllm>=0.4.3,<=0.11.0
--- a/examples/train_full/llama3_full_sft.yaml
+++ b/examples/train_full/llama3_full_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -10,15 +10,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json,
 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/full/sft
+output_dir: saves/qwen3-4b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_full/qwen2_5vl_full_sft.yaml
+++ b/examples/train_full/qwen2_5vl_full_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
 image_max_pixels: 262144
 video_max_pixels: 16384
 trust_remote_code: true
@@ -15,15 +15,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json
 ### dataset
 dataset: mllm_demo,identity,alpaca_en_demo
-template: qwen2_vl
+template: qwen3_vl_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/qwen2_5vl-7b/full/sft
+output_dir: saves/qwen3-vl-4b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_eval.yaml
+++ b/examples/train_lora/llama3_lora_eval.yaml
@@ -1,19 +0,0 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 adapter_name_or_path: saves/llama3-8b/lora/sft
 trust_remote_code: true
 ### method
 finetuning_type: lora
 ### dataset
 task: mmlu_test  # choices: [mmlu_test, ceval_validation, cmmlu_test]
 template: fewshot
 lang: en
 n_shot: 5
 ### output
 save_dir: saves/llama3-8b/lora/eval
 ### eval
 batch_size: 4
--- a/examples/train_lora/llama3_lora_ppo.yaml
+++ b/examples/train_lora/llama3_lora_ppo.yaml
@@ -1,43 +0,0 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 reward_model: saves/llama3-8b/lora/reward
 trust_remote_code: true
 ### method
 stage: ppo
 do_train: true
 finetuning_type: lora
 lora_rank: 8
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
 output_dir: saves/llama3-8b/lora/ppo
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-5
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### generate
 max_new_tokens: 512
 top_k: 0
 top_p: 0.9
--- a/examples/train_lora/llama3_lora_sft.yaml
+++ b/examples/train_lora/llama3_lora_sft.yaml
@@ -1,46 +0,0 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 trust_remote_code: true
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_rank: 8
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 save_only_model: false
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 resume_from_checkpoint: null
 ### eval
 # eval_dataset: alpaca_en_demo
 # val_size: 0.1
 # per_device_eval_batch_size: 1
 # eval_strategy: steps
 # eval_steps: 500
--- a/examples/train_lora/llama4_lora_sft_ds3.yaml
+++ b/examples/train_lora/llama4_lora_sft_ds3.yaml
@@ -1,49 +0,0 @@
 # pip install git+https://github.com/hiyouga/transformers.git@llama4_train
 ### model
 model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
 trust_remote_code: true
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_rank: 8
 lora_target: all
 deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
 ### dataset
 dataset: mllm_demo,identity,alpaca_en_demo
 template: llama4
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
 output_dir: saves/llama4-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 save_only_model: false
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 2
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 resume_from_checkpoint: null
 ### eval
 # eval_dataset: alpaca_en_demo
 # val_size: 0.1
 # per_device_eval_batch_size: 1
 # eval_strategy: steps
 # eval_steps: 500
--- a/examples/train_lora/llama3_lora_dpo.yaml
+++ b/examples/train_lora/llama3_lora_dpo.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -13,15 +13,14 @@ pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]
 ### dataset
 dataset: dpo_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/lora/dpo
+output_dir: saves/qwen3-4b/lora/dpo
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_kto.yaml
+++ b/examples/train_lora/llama3_lora_kto.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -12,15 +12,14 @@ pref_beta: 0.1
 ### dataset
 dataset: kto_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/lora/kto
+output_dir: saves/qwen3-4b/lora/kto
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_pretrain.yaml
+++ b/examples/train_lora/llama3_lora_pretrain.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -13,12 +13,11 @@ lora_target: all
 dataset: c4_demo
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/lora/pretrain
+output_dir: saves/qwen3-4b/lora/pretrain
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_reward.yaml
+++ b/examples/train_lora/llama3_lora_reward.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -11,15 +11,14 @@ lora_target: all
 ### dataset
 dataset: dpo_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/lora/reward
+output_dir: saves/qwen3-4b/lora/reward
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_sft.sh
+++ b/examples/train_lora/llama3_lora_sft.sh
@@ -2,7 +2,7 @@
 set -x
-MODEL_PATH=meta-llama/Meta-Llama-3-8B-Instruct
+MODEL_PATH=Qwen/Qwen3-4B-Instruct-2507
 llamafactory-cli train \
    --model_name_or_path ${MODEL_PATH} \
@@ -13,13 +13,12 @@ llamafactory-cli train \
    --lora_rank 8 \
    --lora_target all \
    --dataset identity,alpaca_en_demo \
-    --template llama3 \
+    --template qwen3_nothink \
    --cutoff_len 2048 \
    --max_samples 1000 \
    --overwrite_cache \
    --preprocessing_num_workers 16 \
    --dataloader_num_workers 4 \
-    --output_dir saves/llama3-8b/lora/sft \
+    --output_dir saves/qwen3-4b/lora/sft \
    --logging_steps 10 \
    --save_steps 500 \
    --plot_loss \
--- a/examples/train_lora/qwen3_lora_sft.yaml
+++ b/examples/train_lora/qwen3_lora_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: openai/gpt-oss-20b
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -11,15 +11,14 @@ lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
-template: gpt
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/gpt-20b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_sft_ds3.yaml
+++ b/examples/train_lora/llama3_lora_sft_ds3.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -12,15 +12,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json,
 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_sft_ray.yaml
+++ b/examples/train_lora/llama3_lora_sft_ray.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct  # or use local absolute path
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507  # or use local absolute path
 trust_remote_code: true
 ### method
@@ -12,10 +12,9 @@ lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 dataset_dir: REMOTE:llamafactory/demo_data  # or use local absolute path
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
@@ -29,7 +28,7 @@ save_only_model: false
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 ### ray
-ray_run_name: llama3_8b_sft_lora
+ray_run_name: qwen3_4b_sft_lora
 ray_storage_path: ./saves
 ray_num_workers: 4  # Number of GPUs to use.
 placement_strategy: PACK
--- a/examples/train_lora/llama3_preprocess.yaml
+++ b/examples/train_lora/llama3_preprocess.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true
 ### method
@@ -11,13 +11,12 @@ lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
-tokenized_path: saves/llama3-8b/dataset/sft
+tokenized_path: saves/qwen3-4b/dataset/sft
-### output
+### output (not used)
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 overwrite_output_dir: true
--- a/examples/train_lora/qwen2_5vl_lora_dpo.yaml
+++ b/examples/train_lora/qwen2_5vl_lora_dpo.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
 image_max_pixels: 262144
 video_max_pixels: 16384
 trust_remote_code: true
@@ -15,15 +15,14 @@ pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]
 ### dataset
 dataset: rlhf_v
-template: qwen2_vl
+template: qwen3_vl_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/qwen2_5vl-7b/lora/dpo
+output_dir: saves/qwen3-vl-4b/lora/dpo
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/qwen2_5vl_lora_sft.yaml
+++ b/examples/train_lora/qwen2_5vl_lora_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
 image_max_pixels: 262144
 video_max_pixels: 16384
 trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all
 ### dataset
 dataset: mllm_demo,identity,alpaca_en_demo  # video: mllm_video_demo
-template: qwen2_vl
+template: qwen3_vl_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/qwen2_5vl-7b/lora/sft
+output_dir: saves/qwen3-vl-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_qlora/llama3_lora_sft_aqlm.yaml
+++ b/examples/train_qlora/llama3_lora_sft_aqlm.yaml
@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
--- a/examples/train_qlora/llama3_lora_sft_awq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_awq.yaml
@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
--- a/examples/train_qlora/llama3_lora_sft_gptq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_gptq.yaml
@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
--- a/examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+++ b/examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 quantization_bit: 4
 quantization_method: bnb
 double_quantization: false
@@ -14,15 +14,14 @@ lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_qlora/llama3_lora_sft_otfq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_otfq.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 quantization_bit: 4  # choices: [8 (bnb/hqq/eetq), 4 (bnb/hqq), 3 (hqq), 2 (hqq)]
 quantization_method: bnb  # choices: [bnb, hqq, eetq]
 trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4
 ### output
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,42 +1,122 @@
 [build-system]
-requires = ["setuptools>=61.0"]
+requires = ["hatchling"]
-build-backend = "setuptools.build_meta"
+build-backend = "hatchling.build"
 [project]
 name = "llamafactory"
-requires-python = ">=3.9.0"
+dynamic = ["version"]
-dynamic = [
+description = "Unified Efficient Fine-Tuning of 100+ LLMs"
-    "version",
+readme = "README.md"
-    "dependencies",
+license = "Apache-2.0"
-    "optional-dependencies",
+requires-python = ">=3.11.0"
-    "scripts",
+authors = [
-    "authors",
+    { name = "hiyouga", email = "hiyouga@buaa.edu.cn" }
-    "description",
+]
-    "readme",
+keywords = [
-    "license",
+    "AI",
-    "keywords",
+    "LLM",
-    "classifiers"
+    "GPT",
    "ChatGPT",
    "Llama",
    "Transformer",
    "DeepSeek",
    "Pytorch"
 ]
 classifiers = [
    "Development Status :: 4 - Beta",
    "Intended Audience :: Developers",
    "Intended Audience :: Education",
    "Intended Audience :: Science/Research",
    "License :: OSI Approved :: Apache Software License",
    "Operating System :: OS Independent",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Python :: 3.13",
    "Topic :: Scientific/Engineering :: Artificial Intelligence"
 ]
 dependencies = [
    # core deps
    "torch>=2.4.0",
    "torchvision>=0.19.0",
    "torchaudio>=2.4.0",
    "transformers>=4.51.0,<=4.57.1,!=4.52.0,!=4.57.0",
    "datasets>=2.16.0,<=4.0.0",
    "accelerate>=1.3.0,<=1.11.0",
    "peft>=0.14.0,<=0.17.1",
    "trl>=0.18.0,<=0.24.0",
    "torchdata>=0.10.0,<=0.11.0",
    # gui
    "gradio>=4.38.0,<=5.50.0",
    "matplotlib>=3.7.0",
    "tyro<0.9.0",
    # ops
    "einops",
    "numpy",
    "pandas",
    "scipy",
    # model and tokenizer
    "sentencepiece",
    "tiktoken",
    "modelscope",
    "hf-transfer",
    "safetensors",
    # python
    "av",
    "fire",
    "omegaconf",
    "packaging",
    "protobuf",
    "pyyaml",
    "pydantic",
    # api
    "uvicorn",
    "fastapi",
    "sse-starlette"
 ]
 [project.optional-dependencies]
 dev = ["pre-commit", "ruff", "pytest", "build"]
 metrics = ["nltk", "jieba", "rouge-chinese"]
 deepspeed = ["deepspeed>=0.10.0,<=0.16.9"]
 [project.scripts]
 llamafactory-cli = "llamafactory.cli:main"
 lmf = "llamafactory.cli:main"
 [project.urls]
 Homepage = "https://github.com/hiyouga/LLaMA-Factory"
 Repository = "https://github.com/hiyouga/LLaMA-Factory"
 [tool.hatch.build.targets.wheel]
 packages = ["src/llamafactory"]
 [tool.hatch.version]
 path = "src/llamafactory/extras/env.py"
 pattern = "VERSION = \"(?P<version>[^\"]+)\""
 [tool.ruff]
-target-version = "py39"
+target-version = "py311"
 line-length = 119
 indent-width = 4
 [tool.ruff.lint]
 ignore = [
-    "C408", # collection
+    "C408",  # collection
-    "C901", # complex
+    "C901",  # complex
-    "E501", # line too long
+    "E501",  # line too long
-    "E731", # lambda function
+    "E731",  # lambda function
-    "E741", # ambiguous var name
+    "E741",  # ambiguous var name
-    "D100", # no doc public module
+    "UP007", # no upgrade union
-    "D101", # no doc public class
+    "UP045", # no upgrade optional
-    "D102", # no doc public method
+    "D100",  # no doc public module
-    "D103", # no doc public function
+    "D101",  # no doc public class
-    "D104", # no doc public package
+    "D102",  # no doc public method
-    "D105", # no doc magic method
+    "D103",  # no doc public function
-    "D107", # no doc __init__
+    "D104",  # no doc public package
    "D105",  # no doc magic method
    "D107",  # no doc __init__
 ]
 extend-select = [
    "C",      # complexity
@@ -73,23 +153,3 @@ indent-style = "space"
 docstring-code-format = true
 skip-magic-trailing-comma = false
 line-ending = "auto"
 [tool.uv]
 conflicts = [
    [
        { extra = "torch-npu" },
        { extra = "aqlm" },
    ],
    [
        { extra = "torch-npu" },
        { extra = "vllm" },
    ],
    [
        { extra = "torch-npu" },
        { extra = "sglang" },
    ],
    [
        { extra = "vllm" },
        { extra = "sglang" },
    ],
 ]
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,38 +0,0 @@
 # core deps
 transformers>=4.49.0,<=4.56.2,!=4.52.0; python_version < '3.10'
 transformers>=4.49.0,<=4.57.3,!=4.52.0,!=4.57.0; python_version >= '3.10'
 datasets>=2.16.0,<=4.0.0
 accelerate>=1.3.0,<=1.11.0
 peft>=0.14.0,<=0.17.1
 trl>=0.8.6,<=0.9.6
 # gui
 gradio>=4.38.0,<=5.45.0
 matplotlib>=3.7.0
 tyro<0.9.0
 # ops
 einops
 numpy<2.0.0
 pandas>=2.0.0
 scipy
 # model and tokenizer
 sentencepiece
 tiktoken
 modelscope>=1.14.0
 hf-transfer
 safetensors<=0.5.3
 # python
 fire
 omegaconf
 packaging
 protobuf
 pyyaml
 pydantic<=2.10.6
 # api
 uvicorn
 fastapi
 sse-starlette
 # media
 av
 librosa
 # yanked
 propcache!=0.4.0
--- a/scripts/megatron_merge.py
+++ b/scripts/megatron_merge.py
@@ -16,7 +16,6 @@
 # limitations under the License.
 import os
 from typing import Optional
 import fire
 import torch
@@ -34,7 +33,7 @@ def convert_mca_to_hf(
    output_path: str = "./output",
    bf16: bool = False,
    fp16: bool = False,
-    convert_model_max_length: Optional[int] = None,
+    convert_model_max_length: int | None = None,
 ):
    """Convert megatron checkpoint to HuggingFace format.
@@ -67,11 +66,11 @@ def convert(
    output_path: str = "./output",
    bf16: bool = False,
    fp16: bool = False,
-    convert_model_max_length: Optional[int] = None,
+    convert_model_max_length: int | None = None,
    tensor_model_parallel_size: int = 1,
    pipeline_model_parallel_size: int = 1,
    expert_model_parallel_size: int = 1,
-    virtual_pipeline_model_parallel_size: Optional[int] = None,
+    virtual_pipeline_model_parallel_size: int | None = None,
 ):
    """Convert checkpoint between MCA and HuggingFace formats.
--- a/scripts/stat_utils/cal_ppl.py
+++ b/scripts/stat_utils/cal_ppl.py
@@ -14,7 +14,7 @@
 import json
 from dataclasses import dataclass
-from typing import Any, Literal, Optional
+from typing import Any, Literal
 import fire
 import torch
@@ -61,7 +61,7 @@ def calculate_ppl(
    dataset_dir: str = "data",
    template: str = "default",
    cutoff_len: int = 2048,
-    max_samples: Optional[int] = None,
+    max_samples: int | None = None,
    train_on_prompt: bool = False,
 ):
    r"""Calculate the ppl on the dataset of the pre-trained models.
--- a/scripts/vllm_infer.py
+++ b/scripts/vllm_infer.py
@@ -14,10 +14,12 @@
 import gc
 import json
-from typing import Optional
+import time
 import av
 import fire
 from datasets import load_dataset
 from eval_bleu_rouge import compute_metrics
 from tqdm import tqdm
 from transformers import Seq2SeqTrainingArguments
@@ -49,18 +51,19 @@ def vllm_infer(
    dataset_dir: str = "data",
    template: str = "default",
    cutoff_len: int = 2048,
-    max_samples: Optional[int] = None,
+    max_samples: int | None = None,
    vllm_config: str = "{}",
    save_name: str = "generated_predictions.jsonl",
    matrix_save_name: str = None,
    temperature: float = 0.95,
    top_p: float = 0.7,
    top_k: int = 50,
    max_new_tokens: int = 1024,
    repetition_penalty: float = 1.0,
    skip_special_tokens: bool = True,
-    default_system: Optional[str] = None,
+    default_system: str | None = None,
    enable_thinking: bool = True,
-    seed: Optional[int] = None,
+    seed: int | None = None,
    pipeline_parallel_size: int = 1,
    image_max_pixels: int = 768 * 768,
    image_min_pixels: int = 32 * 32,
@@ -118,6 +121,7 @@ def vllm_infer(
    if isinstance(model_args.vllm_config, dict):
        engine_args.update(model_args.vllm_config)
    model_preparation_start_time = time.time()
    llm = LLM(**engine_args)
    # load datasets
@@ -143,6 +147,7 @@ def vllm_infer(
    all_prompts, all_preds, all_labels = [], [], []
    need_video_kwargs = _need_video_kwargs(template)
    model_predict_start_time = time.time()
    # Add batch process to avoid the issue of too many files opened
    for i in tqdm(range(0, len(train_dataset), batch_size), desc="Processing batched inference"):
        vllm_inputs, prompts, labels = [], [], []
@@ -219,6 +224,7 @@ def vllm_infer(
        all_labels.extend(labels)
        gc.collect()
    model_predict_end_time = time.time()
    # Write all results at once outside the loop
    with open(save_name, "w", encoding="utf-8") as f:
        for text, pred, label in zip(all_prompts, all_preds, all_labels):
@@ -228,6 +234,49 @@ def vllm_infer(
    print(f"{len(all_prompts)} total generated results have been saved at {save_name}.")
    print("*" * 70)
    # Write all matrix results when matrix_save_name is not None,
    # The result matrix is referencing src.llamafactory.train.sft.workflow.run_sft # 127~132
    # trainer.save_metrics("predict", predict_results.metrics)
    #
    #   {
    #        "predict_bleu-4": 4.349975,
    #        "predict_model_preparation_time": 0.0128,
    #        "predict_rouge-1": 21.873359375,
    #        "predict_rouge-2": 4.144340625,
    #        "predict_rouge-l": 10.83949375,
    #        "predict_runtime": 131.664,
    #        "predict_samples_per_second": 0.076,
    #        "predict_steps_per_second": 0.008
    #    }
    #
    if matrix_save_name is not None:
        predict_time = model_predict_end_time - model_predict_start_time
        preparation_time = model_predict_start_time - model_preparation_start_time
        start_time = time.time()
        dataset = load_dataset("json", data_files=save_name, split="train")
        dataset = dataset.map(compute_metrics, num_proc=8, remove_columns=dataset.column_names)
        score_dict = dataset.to_dict()
        average_score = {}
        for task, scores in sorted(score_dict.items(), key=lambda x: x[0]):
            score = sum(scores) / len(scores) if scores else 0.0
            print(f"predict_{task}: {score:.4f}")
            average_score["predict_" + task] = score
        average_score["predict_model_preparation_time"] = preparation_time
        average_score["predict_runtime"] = predict_time
        num_steps = len(range(0, len(train_dataset), batch_size))
        average_score["predict_samples_per_second"] = len(dataset) / predict_time if predict_time > 0 else 0.0
        average_score["predict_steps_per_second"] = num_steps / predict_time if predict_time > 0 else 0.0
        with open(matrix_save_name, "w", encoding="utf-8") as f:
            json.dump(average_score, f, indent=4)
        print("*" * 70)
        print(f"\nDone in {time.time() - start_time:.3f}s.\nScore file saved to {matrix_save_name}.")
        print("*" * 70)
 if __name__ == "__main__":
    fire.Fire(vllm_infer)
--- a/setup.py
+++ b/setup.py
@@ -1,116 +0,0 @@
 # Copyright 2025 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import re
 from setuptools import find_packages, setup
 def get_version() -> str:
    with open(os.path.join("src", "llamafactory", "extras", "env.py"), encoding="utf-8") as f:
        file_content = f.read()
        pattern = r"{}\W*=\W*\"([^\"]+)\"".format("VERSION")
        (version,) = re.findall(pattern, file_content)
        return version
 def get_requires() -> list[str]:
    with open("requirements.txt", encoding="utf-8") as f:
        file_content = f.read()
        lines = [line.strip() for line in file_content.strip().split("\n") if not line.startswith("#")]
        return lines
 def get_console_scripts() -> list[str]:
    console_scripts = ["llamafactory-cli = llamafactory.cli:main"]
    if os.getenv("ENABLE_SHORT_CONSOLE", "1").lower() in ["true", "y", "1"]:
        console_scripts.append("lmf = llamafactory.cli:main")
    return console_scripts
 extra_require = {
    "torch": ["torch>=2.0.0", "torchvision>=0.15.0"],
    "torch-npu": ["torch==2.7.1", "torch-npu==2.7.1", "torchvision==0.22.1", "decorator"],
    "metrics": ["nltk", "jieba", "rouge-chinese"],
    "deepspeed": ["deepspeed>=0.10.0,<=0.16.9"],
    "liger-kernel": ["liger-kernel>=0.5.5"],
    "bitsandbytes": ["bitsandbytes>=0.39.0"],
    "hqq": ["hqq"],
    "eetq": ["eetq"],
    "gptq": ["optimum>=1.24.0", "gptqmodel>=2.0.0"],
    "aqlm": ["aqlm[gpu]>=1.1.0"],
    "vllm": ["vllm>=0.4.3,<=0.11.0"],
    "sglang": ["sglang[srt]>=0.4.5", "transformers==4.51.1"],
    "galore": ["galore-torch"],
    "apollo": ["apollo-torch"],
    "badam": ["badam>=1.2.1"],
    "adam-mini": ["adam-mini"],
    "minicpm_v": [
        "soundfile",
        "torchvision",
        "torchaudio",
        "vector_quantize_pytorch",
        "vocos",
        "msgpack",
        "referencing",
        "jsonschema_specifications",
    ],
    "openmind": ["openmind"],
    "swanlab": ["swanlab"],
    "fp8": ["torchao>=0.8.0", "accelerate>=1.10.0"],
    "fp8-te": ["transformer_engine[pytorch]>=2.0.0", "accelerate>=1.10.0"],
    "fp8-all": ["torchao>=0.8.0", "transformer_engine[pytorch]>=2.0.0", "accelerate>=1.10.0"],
    "dev": ["pre-commit", "ruff", "pytest", "build"],
 }
 def main():
    setup(
        name="llamafactory",
        version=get_version(),
        author="hiyouga",
        author_email="hiyouga@buaa.edu.cn",
        description="Unified Efficient Fine-Tuning of 100+ LLMs",
        long_description=open("README.md", encoding="utf-8").read(),
        long_description_content_type="text/markdown",
        keywords=["AI", "LLM", "GPT", "ChatGPT", "Llama", "Transformer", "DeepSeek", "Pytorch"],
        license="Apache 2.0 License",
        url="https://github.com/hiyouga/LLaMA-Factory",
        package_dir={"": "src"},
        packages=find_packages("src"),
        python_requires=">=3.9.0",
        install_requires=get_requires(),
        extras_require=extra_require,
        entry_points={"console_scripts": get_console_scripts()},
        classifiers=[
            "Development Status :: 4 - Beta",
            "Intended Audience :: Developers",
            "Intended Audience :: Education",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: Apache Software License",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.9",
            "Programming Language :: Python :: 3.10",
            "Programming Language :: Python :: 3.11",
            "Programming Language :: Python :: 3.12",
            "Topic :: Scientific/Engineering :: Artificial Intelligence",
        ],
    )
 if __name__ == "__main__":
    main()
--- a/src/llamafactory/api/app.py
+++ b/src/llamafactory/api/app.py
@@ -16,7 +16,7 @@ import asyncio
 import os
 from contextlib import asynccontextmanager
 from functools import partial
-from typing import Annotated, Optional
+from typing import Annotated
 from ..chat import ChatModel
 from ..extras.constants import EngineName
@@ -79,7 +79,7 @@ def create_app(chat_model: "ChatModel") -> "FastAPI":
    api_key = os.getenv("API_KEY")
    security = HTTPBearer(auto_error=False)
-    async def verify_api_key(auth: Annotated[Optional[HTTPAuthorizationCredentials], Depends(security)]):
+    async def verify_api_key(auth: Annotated[HTTPAuthorizationCredentials | None, Depends(security)]):
        if api_key and (auth is None or auth.credentials != api_key):
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key.")
--- a/src/llamafactory/api/protocol.py
+++ b/src/llamafactory/api/protocol.py
@@ -14,10 +14,9 @@
 import time
 from enum import Enum, unique
-from typing import Any, Optional, Union
+from typing import Any, Literal
 from pydantic import BaseModel, Field
 from typing_extensions import Literal
@unique
@@ -61,7 +60,7 @@ class FunctionDefinition(BaseModel):
 class FunctionAvailable(BaseModel):
    type: Literal["function", "code_interpreter"] = "function"
-    function: Optional[FunctionDefinition] = None
+    function: FunctionDefinition | None = None
 class FunctionCall(BaseModel):
@@ -77,35 +76,35 @@ class URL(BaseModel):
 class MultimodalInputItem(BaseModel):
    type: Literal["text", "image_url", "video_url", "audio_url"]
-    text: Optional[str] = None
+    text: str | None = None
-    image_url: Optional[URL] = None
+    image_url: URL | None = None
-    video_url: Optional[URL] = None
+    video_url: URL | None = None
-    audio_url: Optional[URL] = None
+    audio_url: URL | None = None
 class ChatMessage(BaseModel):
    role: Role
-    content: Optional[Union[str, list[MultimodalInputItem]]] = None
+    content: str | list[MultimodalInputItem] | None = None
-    tool_calls: Optional[list[FunctionCall]] = None
+    tool_calls: list[FunctionCall] | None = None
 class ChatCompletionMessage(BaseModel):
-    role: Optional[Role] = None
+    role: Role | None = None
-    content: Optional[str] = None
+    content: str | None = None
-    tool_calls: Optional[list[FunctionCall]] = None
+    tool_calls: list[FunctionCall] | None = None
 class ChatCompletionRequest(BaseModel):
    model: str
    messages: list[ChatMessage]
-    tools: Optional[list[FunctionAvailable]] = None
+    tools: list[FunctionAvailable] | None = None
-    do_sample: Optional[bool] = None
+    do_sample: bool | None = None
-    temperature: Optional[float] = None
+    temperature: float | None = None
-    top_p: Optional[float] = None
+    top_p: float | None = None
    n: int = 1
-    presence_penalty: Optional[float] = None
+    presence_penalty: float | None = None
-    max_tokens: Optional[int] = None
+    max_tokens: int | None = None
-    stop: Optional[Union[str, list[str]]] = None
+    stop: str | list[str] | None = None
    stream: bool = False
@@ -118,7 +117,7 @@ class ChatCompletionResponseChoice(BaseModel):
 class ChatCompletionStreamResponseChoice(BaseModel):
    index: int
    delta: ChatCompletionMessage
-    finish_reason: Optional[Finish] = None
+    finish_reason: Finish | None = None
 class ChatCompletionResponseUsage(BaseModel):
@@ -147,7 +146,7 @@ class ChatCompletionStreamResponse(BaseModel):
 class ScoreEvaluationRequest(BaseModel):
    model: str
    messages: list[str]
-    max_length: Optional[int] = None
+    max_length: int | None = None
 class ScoreEvaluationResponse(BaseModel):
--- a/src/llamafactory/chat/hf_engine.py
+++ b/src/llamafactory/chat/hf_engine.py
@@ -14,9 +14,9 @@
 import asyncio
 import os
-from collections.abc import AsyncGenerator
+from collections.abc import AsyncGenerator, Callable
 from threading import Thread
-from typing import TYPE_CHECKING, Any, Callable, Optional, Union
+from typing import TYPE_CHECKING, Any, Optional, Union
 import torch
 from transformers import GenerationConfig, TextIteratorStreamer
--- a/src/llamafactory/data/converter.py
+++ b/src/llamafactory/data/converter.py
@@ -15,7 +15,7 @@ import json
 import os
 from abc import abstractmethod
 from dataclasses import dataclass
-from typing import TYPE_CHECKING, Any, Optional, Union
+from typing import TYPE_CHECKING, Any, Union
 from ..extras import logging
 from .data_utils import Role
@@ -40,7 +40,7 @@ class DatasetConverter:
    dataset_attr: "DatasetAttr"
    data_args: "DataArguments"
-    def _find_medias(self, medias: Union["MediaType", list["MediaType"], None]) -> Optional[list["MediaType"]]:
+    def _find_medias(self, medias: Union["MediaType", list["MediaType"], None]) -> list["MediaType"] | None:
        r"""Optionally concatenate media path to media dir when loading from local disk."""
        if medias is None:
            return None
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Yaowei Zheng	95ac3f2373	[release] Bye 2025 (#9702 )	2025-12-31 22:22:40 +08:00
Username_Full	000526908a	[core deps] upgrade TRL to be between 0.18 and 0.24 (#9617 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-31 20:54:27 +08:00
fivehaitao	c8d7e85b3e	[fix] Fix prediction metrics in scripts/vllm_infer.py to match Transformers (#9701 ) Co-authored-by: xuht6 <xuht6@asiainfo.com>	2025-12-31 18:30:00 +08:00
浮梦	16735b9e35	[v1] Refactor kernel plugin (#9669 ) Co-authored-by: frozenleaves <frozen@Mac.local>	2025-12-31 18:26:48 +08:00
Weize Liu	4e1d69579a	[data] add DLR-Web dataset for supervised fine-tuning (#9696 )	2025-12-30 20:50:38 +08:00
浮梦	1857fbdd6b	[ci] add cuda workflow (#9682 ) Co-authored-by: frozenleaves <frozen@Mac.local> Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-29 20:03:00 +08:00
Kingsley	bb1ba31005	[misc] lint mca code (#9692 )	2025-12-29 11:44:38 +08:00
Copilot	e97d0474fb	[ci] Fix NPU device condition in docker workflow (#9688 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>	2025-12-28 20:04:59 +08:00
Yaowei Zheng	3f0c3dc84d	[assets] fix installation (#9687 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-28 19:29:28 +08:00
Hertz	c107cc22d0	[model] support MiniMax-M1&M2 series (#9680 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-28 19:02:05 +08:00
Yaowei Zheng	7ef1fba34a	[version] fix gradio (#9685 )	2025-12-28 05:00:51 +08:00
Copilot	eceec8ab69	[deps] goodbye python 3.9 (#9677 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com> Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>	2025-12-27 02:50:44 +08:00
Yaowei Zheng	b44f651e09	[ci] fix docker (#9678 )	2025-12-27 02:43:46 +08:00
Yaowei Zheng	55590f5ece	[misc] fix ci with uv (#9676 )	2025-12-27 01:39:13 +08:00
Copilot	a1b1931b4a	[breaking] migrate from setuptools to uv (#9673 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>	2025-12-26 22:47:23 +08:00
Xunpeng Xiao	3c17f2722c	[model] Update ernie_vl to adapt new version (#9665 )	2025-12-26 19:57:49 +08:00
Copilot	a882e2d5fc	[assets] Add GitHub Copilot instructions for repository (#9675 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>	2025-12-26 17:32:48 +08:00
Yaowei Zheng	a754604c11	[misc] fix accelerator (#9661 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-25 02:11:04 +08:00
Xunpeng Xiao	6a2eafbae3	[feat] Models trained and inferred with Mxfp4 are dequantized by default (#9652 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-24 00:26:40 +08:00
Yaowei Zheng	84485406b7	[ci] disable pip cache for ci (#9654 )	2025-12-23 18:37:40 +08:00
Kingsley	1c8a42d2f8	[v1&WIP] dataloader init (#9645 )	2025-12-23 16:29:47 +08:00
thulyubh22	7901b2f32e	[model] efficient tuning for gpt-oss (#9354 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-23 16:28:38 +08:00
Yaowei Zheng	1f1f5a7d1b	[ci] remove docker cache (#9640 )	2025-12-22 01:03:10 +08:00
Yaowei Zheng	6ef9854713	[misc] fix cache & pin transformers to 4.57.1 (#9638 )	2025-12-22 00:20:55 +08:00
Hertz	4923f52a28	[model] support MiMo-V2-Flash model (#9637 )	2025-12-21 14:38:18 +08:00
Yaowei Zheng	0894b4f37e	[misc] lint (#9636 )	2025-12-20 16:19:39 +08:00
ZIYI ZENG	b0d49e137f	[misc] Support split eval_dataset when explict set "predict_with_generate" (#9604 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-20 01:46:00 +08:00
Xunpeng Xiao	ddd7dcc722	[data] Fix the video frame sampling issue #9620 (#9634 )	2025-12-19 18:36:31 +08:00
浮梦	5204cd2bca	[misc] add version check for moe (#9633 )	2025-12-19 14:57:37 +08:00
Xunpeng Xiao	8c74dca76a	[feat] Models trained and inferred with FP8 are dequantized by default (#9627 )	2025-12-18 22:54:35 +08:00
xvxuopop	e8deda53a1	[example] add Qwen3 series examples (#9624 ) Co-authored-by: UsernameFull <tohowtodoit@gmail.com>	2025-12-18 21:27:00 +08:00
mrhaoxx	a769fb94b9	[feat] support ktransformers for dpo (#9621 ) Co-authored-by: poryfly <porykid@gmail.com>	2025-12-18 21:26:25 +08:00
mrhaoxx	964569751f	[kt] refactor ktransformers integration (#9632 )	2025-12-18 21:26:04 +08:00
Hertz	9fd4b094d4	[model] support VibeThinker models (#9616 )	2025-12-16 21:50:46 +08:00
浮梦	18c21bce5a	[test] add allreduce test on npu (#9619 ) Co-authored-by: frozenleaves <frozen@Mac.local>	2025-12-16 21:33:30 +08:00
`@@ -1 +1 @@`
	`include LICENSE requirements.txt`	`include LICENSE`
		`@@ -0,0 +1,2 @@`
							`transformer_engine[pytorch]>=2.0.0`
							`accelerate>=1.10.0`