[release] Bye 2025 (#9702 )

[core deps] upgrade TRL to be between 0.18 and 0.24 (#9617 )
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2026-02-27 08:15:59 +08:00 · 2025-12-31 22:22:40 +08:00 · 2025-12-31 20:54:27 +08:00 · 2025-12-31 18:30:00 +08:00 · 2025-12-31 18:26:48 +08:00 · 2025-12-30 20:50:38 +08:00
231 changed files with 6473 additions and 1924 deletions
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,180 @@
+# GitHub Copilot Instructions for LLaMA Factory
+
+## Project Overview
+
+LLaMA Factory is an efficient fine-tuning framework for 100+ large language models (LLMs). It provides:
+- Support for various models: LLaMA, LLaVA, Mistral, Qwen, DeepSeek, Yi, Gemma, ChatGLM, Phi, etc.
+- Multiple training methods: pre-training, supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO
+- Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and QLoRA variants
+- Advanced algorithms: GaLore, BAdam, APOLLO, Adam-mini, Muon, OFT, DoRA, etc.
+- Web UI (LLaMA Board) and CLI interfaces
+
+### Architecture Versions
+
+LLaMA Factory has two parallel architectures that can be switched via the `USE_V1` environment variable:
+
+**v0 (default)** - File hierarchy:
+- `api`, `webui` → `chat`, `eval`, `train` → `data`, `model` → `hparams` → `extras`
+
+**v1** - File hierarchy:
+- `trainers` → `core` → `accelerator`, `plugins`, `config` → `utils`
+
+Set `USE_V1=1` to enable v1 architecture.
+
+## Code Structure
+
+### v0 Architecture (Default)
+
+- `src/llamafactory/` - Main package directory
+  - `api/` - OpenAI-style API implementation
+  - `chat/` - Chat interface implementation
+  - `cli.py` - Command-line interface
+  - `data/` - Data processing and dataset handling
+  - `eval/` - Model evaluation utilities
+  - `extras/` - Additional utilities and helpers
+  - `hparams/` - Hyperparameter definitions
+  - `model/` - Model loading, patching, and utilities
+  - `train/` - Training pipeline implementation
+  - `webui/` - Gradio-based web interface
+- `src/train.py` - Training entry script (delegates to `llamafactory.train.tuner`)
+- `src/webui.py` - Web UI entry script (delegates to `llamafactory.webui.interface`)
+- `src/api.py` - API server entry script (delegates to `llamafactory.api.app`)
+- `tests/` - Test suite
+- `examples/` - Example configurations for various training scenarios
+- `data/` - Dataset definitions and examples
+
+### v1 Architecture (USE_V1=1)
+
+- `src/llamafactory/v1/` - Version 1 package directory
+  - `trainers/` - Training implementations
+  - `core/` - Core training utilities
+  - `accelerator/` - Acceleration and distributed training
+  - `plugins/` - Pluggable components (model, data, sampler, trainer)
+  - `config/` - Configuration management
+  - `utils/` - Utility functions
+
+## Development Practices
+
+### Code Style
+
+- Follow the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
+- Use ruff for linting and formatting
+- Line length: 119 characters
+- Indentation: 4 spaces
+- Quote style: double quotes
+- Use Google-style docstrings for documentation
+
+### Import Organization
+
+- Known first-party: `llamafactory`
+- Known third-party: `accelerate`, `datasets`, `gradio`, `numpy`, `peft`, `torch`, `transformers`, `trl`
+- Use 2 blank lines after imports
+
+### Quality Checks
+
+Before committing code, run:
+```bash
+make style      # Auto-fix style issues
+make quality    # Check code quality
+make test       # Run test suite
+```
+
+Or use the combined command:
+```bash
+make commit     # Run pre-commit hooks
+```
+
+### Testing
+
+- Use pytest for testing
+- Tests are located in `tests/` and `tests_v1/` directories
+- Run tests with: `make test` (which runs `WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/ tests_v1/`)
+- Disable wandb during testing to avoid external dependencies
+- **Note**: Training configurations require GPU machines, so training is typically not tested end-to-end. Use `make test` to validate file-level functionality.
+
+### Building
+
+Build the package with:
+```bash
+pip3 install build && python3 -m build
+```
+
+### License
+
+- All source files must include the Apache 2.0 license header
+- Check license headers with: `make license`
+
+## Common Patterns
+
+### Configuration Files
+
+- Training configurations are typically YAML or JSON files in `examples/` directory
+- Hyperparameters are defined using dataclasses in `src/llamafactory/hparams/`
+
+### Model Support
+
+- New model support is added through model patches in `src/llamafactory/model/`
+- Visual models use the visual utilities in `src/llamafactory/model/model_utils/visual.py`
+- Quantization support is in `src/llamafactory/model/model_utils/quantization.py`
+
+### Data Processing
+
+- Dataset definitions are in `data/dataset_info.json`
+- Data templates and processors are in `src/llamafactory/data/`
+
+### Training
+
+- Training pipelines are in `src/llamafactory/train/`
+- Support for different training methods: SFT, DPO, PPO, RM, PT, KTO, ORPO
+
+## Key Dependencies
+
+- Python >= 3.9.0
+- PyTorch and transformers for model handling
+- datasets for data processing
+- peft for parameter-efficient fine-tuning
+- accelerate for distributed training
+- gradio for web UI
+- trl for reinforcement learning
+- Optional: vllm/sglang for inference, flash-attention-2, unsloth, liger-kernel
+
+## Entry Points
+
+- **CLI Training**: `llamafactory-cli train --config examples/train_lora/llama3_lora_sft.yaml`
+- **Web UI**: `llamafactory-cli webui` or `python src/webui.py`
+- **API Server**: `llamafactory-cli api` or `python src/api.py`
+- **Chat Interface**: `llamafactory-cli chat --model_name_or_path MODEL_PATH`
+
+## Environment Setup
+
+For development:
+```bash
+pip install -e ".[dev]"
+```
+
+## Important Notes
+
+- The project supports multiple backends: default PyTorch, vLLM, SGLang
+- Megatron-core training is supported via mcore_adapter
+- SwanLab and W&B are supported for experiment tracking
+- Docker support is available with pre-built images
+- Day-0/Day-1 support for latest cutting-edge models
+- Multi-modal support for vision and audio understanding tasks
+
+## Contribution Guidelines
+
+1. Fork the repository
+2. Create a development branch
+3. Set up development environment with `pip install -e ".[dev]"`
+4. Make changes following the style guide
+5. Run quality checks: `make style && make quality`
+6. Run tests: `make test`
+7. Submit a pull request
+
+## Common Commands
+
+- `make style` - Format code
+- `make quality` - Run linters
+- `make test` - Run tests
+- `make commit` - Install and run pre-commit hooks
+- `make license` - Check license headers
--- a/.github/workflows/docker.yml
+++ b/.github/workflows/docker.yml
@@ -7,7 +7,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "docker/**"
      - ".github/workflows/*.yml"
  pull_request:
@@ -15,7 +15,7 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
      - "docker/**"
      - ".github/workflows/*.yml"
  release:
@@ -27,9 +27,10 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        device:
-          - "cuda"
-          - "npu"
+        include:
+          - device: "cuda"
+          - device: "npu-a2"
+          - device: "npu-a3"

    runs-on: ubuntu-latest

@@ -51,16 +52,11 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4

-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: "3.10"
-
      - name: Get llamafactory version
        id: version
        run: |
          if [ "${{ github.event_name }}" = "release" ]; then
-            echo "tag=$(python setup.py --version)" >> "$GITHUB_OUTPUT"
+            echo "tag=$(grep -oP 'VERSION = "\K[^"]+' src/llamafactory/extras/env.py)" >> "$GITHUB_OUTPUT"
          else
            echo "tag=latest" >> "$GITHUB_OUTPUT"
          fi
@@ -76,7 +72,7 @@ jobs:
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Login to Quay
-        if: ${{ github.event_name != 'pull_request' && matrix.device == 'npu' }}
+        if: ${{ github.event_name != 'pull_request' && startsWith(matrix.device, 'npu') }}
        uses: docker/login-action@v3
        with:
          registry: quay.io
@@ -89,16 +85,12 @@ jobs:
        with:
          context: .
          file: ./docker/docker-cuda/Dockerfile
-          build-args: |
-            EXTRAS=metrics,deepspeed,liger-kernel
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
            docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}
-          cache-from: type=gha
-          cache-to: type=gha,mode=max

-      - name: Build and push Docker image (NPU)
-        if: ${{ matrix.device == 'npu' }}
+      - name: Build and push Docker image (NPU-A2)
+        if: ${{ matrix.device == 'npu-a2' }}
        uses: docker/build-push-action@v6
        with:
          context: .
@@ -108,5 +100,17 @@ jobs:
          tags: |
            docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a2
            quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a2
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
+
+      - name: Build and push Docker image (NPU-A3)
+        if: ${{ matrix.device == 'npu-a3' }}
+        uses: docker/build-push-action@v6
+        with:
+          context: .
+          platforms: linux/amd64,linux/arm64
+          file: ./docker/docker-npu/Dockerfile
+          build-args: |
+            BASE_IMAGE=quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: |
+            docker.io/hiyouga/llamafactory:${{ steps.version.outputs.tag }}-npu-a3
+            quay.io/ascend/llamafactory:${{ steps.version.outputs.tag }}-npu-a3
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -23,10 +23,11 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4

-      - name: Set up Python
-        uses: actions/setup-python@v5
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
        with:
-          python-version: "3.9"
+          python-version: "3.11"
+          github-token: ${{ github.token }}

      - name: Build package
        run: |
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -7,14 +7,16 @@ on:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
+      - "Makefile"
      - ".github/workflows/*.yml"
  pull_request:
    branches:
      - "main"
    paths:
      - "**/*.py"
-      - "requirements.txt"
+      - "pyproject.toml"
+      - "Makefile"
      - ".github/workflows/*.yml"

 jobs:
@@ -23,29 +25,25 @@ jobs:
      fail-fast: false
      matrix:
        python:
-          - "3.9"
-          - "3.10"
          - "3.11"
          - "3.12"
+          - "3.13"
        os:
          - "ubuntu-latest"
          - "windows-latest"
          - "macos-latest"
        transformers:
-          - null
+          - ""
        include:  # test backward compatibility
-          - python: "3.9"
-            os: "ubuntu-latest"
-            transformers: "4.49.0"
-          - python: "3.9"
+          - python: "3.11"
            os: "ubuntu-latest"
            transformers: "4.51.0"
-          - python: "3.9"
+          - python: "3.11"
            os: "ubuntu-latest"
            transformers: "4.53.0"
-        exclude:  # exclude python 3.9 on macos
-          - python: "3.9"
-            os: "macos-latest"
+          - python: "3.11"
+            os: "ubuntu-latest"
+            transformers: "4.55.0"

    runs-on: ${{ matrix.os }}

@@ -61,22 +59,23 @@ jobs:
      - name: Checkout
        uses: actions/checkout@v4

-      - name: Set up Python
-        uses: actions/setup-python@v5
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
        with:
          python-version: ${{ matrix.python }}
-          cache: "pip"
-          cache-dependency-path: "**/requirements*.txt"
+          github-token: ${{ github.token }}
+          enable-cache: false

      - name: Install dependencies
        run: |
-          python -m pip install --upgrade pip
-          python -m pip install ".[torch,dev]"
+          uv venv
+          uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+          uv pip install -e ".[dev]"

      - name: Install transformers
        if: ${{ matrix.transformers }}
        run: |
-          python -m pip install "transformers==${{ matrix.transformers }}"
+          uv pip install "transformers==${{ matrix.transformers }}"

      - name: Cache files
        id: hf-hub-cache
@@ -88,18 +87,25 @@ jobs:
      - name: Check quality
        run: |
          make style && make quality
+        env:
+          UV_NO_SYNC: 1

      - name: Check license
        run: |
          make license
+        env:
+          UV_NO_SYNC: 1

      - name: Check build
        run: |
          make build
+        env:
+          UV_NO_SYNC: 1

      - name: Test with pytest
        run: |
          make test
        env:
+          UV_NO_SYNC: 1
          HF_HOME: ${{ runner.temp }}/huggingface
          HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"
--- a/.github/workflows/tests_cuda.yml
+++ b/.github/workflows/tests_cuda.yml
@@ -0,0 +1,88 @@
+name: tests_cuda
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - "main"
+    paths:
+      - "**/*.py"
+      - "pyproject.toml"
+      - "Makefile"
+      - ".github/workflows/*.yml"
+  pull_request:
+    branches:
+      - "main"
+    paths:
+      - "**/*.py"
+      - "pyproject.toml"
+      - "Makefile"
+      - ".github/workflows/*.yml"
+
+jobs:
+  tests:
+    strategy:
+      fail-fast: false
+      matrix:
+        python:
+          - "3.11"
+        os:
+          - "linux-x86_64-gpu-2"
+
+    runs-on: ${{ matrix.os }}
+
+    concurrency:
+      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}-${{ matrix.python }}
+      cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+        with:
+          python-version: ${{ matrix.python }}
+          github-token: ${{ github.token }}
+          enable-cache: false
+
+      - name: Check GPU Status
+        run: nvidia-smi
+
+      - name: Install dependencies
+        run: |
+          uv venv
+          uv pip install -e ".[dev]"
+
+      - name: Cache HuggingFace models
+        id: hf-hub-cache
+        uses: actions/cache@v4
+        with:
+          path: ${{ runner.temp }}/huggingface
+          key: hf-cache-${{ runner.os }}-${{ hashFiles('tests/version.txt') }}
+
+      - name: Check quality
+        run: |
+          make style && make quality
+        env:
+          UV_NO_SYNC: 1
+
+      - name: Check license
+        run: |
+          make license
+        env:
+          UV_NO_SYNC: 1
+
+      - name: Check build
+        run: |
+          make build
+        env:
+          UV_NO_SYNC: 1
+
+      - name: Test with pytest
+        run: |
+          make test
+        env:
+          UV_NO_SYNC: 1
+          HF_HOME: ${{ runner.temp }}/huggingface
+          HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"
--- a/.github/workflows/tests_npu.yml
+++ b/.github/workflows/tests_npu.yml
@@ -0,0 +1,102 @@
+name: tests_npu
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - "main"
+    paths:
+      - "**/*.py"
+      - "pyproject.toml"
+      - "Makefile"
+      - ".github/workflows/*.yml"
+  pull_request:
+    branches:
+      - "main"
+    paths:
+      - "**/*.py"
+      - "pyproject.toml"
+      - "Makefile"
+      - ".github/workflows/*.yml"
+
+jobs:
+  tests:
+    strategy:
+      fail-fast: false
+      matrix:
+        python:
+          - "3.11"
+        os:
+          - "linux-aarch64-a2-4"
+        pytorch_npu:
+          - "2.7.1"
+
+    runs-on: ${{ matrix.os }}
+
+    concurrency:
+      group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.os }}-${{ matrix.python }}
+      cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
+
+    container:
+      image: ascendai/cann:8.3.rc2-910b-ubuntu22.04-py3.11
+      env:
+        HF_ENDPOINT: https://hf-mirror.com
+        HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        OS_NAME: ${{ matrix.os }}
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7
+        with:
+          python-version: ${{ matrix.python }}
+          github-token: ${{ github.token }}
+          enable-cache: false
+
+      - name: Install dependencies
+        run: |
+          uv venv
+          uv pip install torch-npu==${{matrix.pytorch_npu}}
+          uv pip install -e ".[dev]"
+
+      - name: Install node
+        run: |
+          apt-get update || true
+          apt-get install -y curl
+          curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
+          apt-get install -y nodejs
+
+      - name: Cache files
+        id: hf-hub-cache
+        uses: actions/cache@v4
+        with:
+          path: ${{ runner.temp }}/huggingface
+          key: huggingface-${{ matrix.os }}-${{ matrix.python }}-${{ hashFiles('tests/version.txt') }}
+
+      - name: Check quality
+        run: |
+          make style && make quality
+        env:
+          UV_NO_SYNC: 1
+
+      - name: Check license
+        run: |
+          make license
+        env:
+          UV_NO_SYNC: 1
+
+      - name: Check build
+        run: |
+          make build
+        env:
+          UV_NO_SYNC: 1
+
+      - name: Test with pytest
+        run: |
+          make test
+        env:
+          UV_NO_SYNC: 1
+          HF_HOME: /root/.cache/huggingface
+          HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"
--- a/.gitignore
+++ b/.gitignore
@@ -85,7 +85,7 @@ ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
-# .python-version
+.python-version

 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
@@ -165,6 +165,9 @@ cython_debug/
 # uv
 uv.lock

+# macOS
+.DS_Store
+
 # custom .gitignore
 hf_cache/
 ms_cache/
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1 +1 @@
-include LICENSE requirements.txt
+include LICENSE
--- a/24
+++ b/24
@@ -1,24 +1,28 @@
 .PHONY: build commit license quality style test

-check_dirs := scripts src tests tests_v1 setup.py
+check_dirs := scripts src tests tests_v1
+
+RUN := $(shell command -v uv >/dev/null 2>&1 && echo "uv run" || echo "")
+BUILD := $(shell command -v uv >/dev/null 2>&1 && echo "uv build" || echo "python -m build")
+TOOL := $(shell command -v uv >/dev/null 2>&1 && echo "uvx" || echo "")

 build:
-	pip3 install build && python3 -m build
+	$(BUILD)

 commit:
-	pre-commit install
-	pre-commit run --all-files
+	$(TOOL) pre-commit install
+	$(TOOL) pre-commit run --all-files

 license:
-	python3 tests/check_license.py $(check_dirs)
+	$(RUN) python3 tests/check_license.py $(check_dirs)

 quality:
-	ruff check $(check_dirs)
-	ruff format --check $(check_dirs)
+	$(TOOL) ruff check $(check_dirs)
+	$(TOOL) ruff format --check $(check_dirs)

 style:
-	ruff check $(check_dirs) --fix
-	ruff format $(check_dirs)
+	$(TOOL) ruff check $(check_dirs) --fix
+	$(TOOL) ruff format $(check_dirs)

 test:
-	CUDA_VISIBLE_DEVICES= WANDB_DISABLED=true pytest -vv tests/
+	WANDB_DISABLED=true $(RUN) pytest -vv --import-mode=importlib tests/ tests_v1/
--- a/README.md
+++ b/README.md
@@ -96,7 +96,7 @@ Read technical notes:
 - **Integrated methods**: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
 - **Scalable resources**: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
 - **Advanced algorithms**: [GaLore](https://github.com/jiaweizzhao/GaLore), [BAdam](https://github.com/Ledzy/BAdam), [APOLLO](https://github.com/zhuhanqing/APOLLO), [Adam-mini](https://github.com/zyushun/Adam-mini), [Muon](https://github.com/KellerJordan/Muon), [OFT](https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft), DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ and PiSSA.
- **Practical tricks**: [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), [Unsloth](https://github.com/unslothai/unsloth), [Liger Kernel](https://github.com/linkedin/Liger-Kernel), RoPE scaling, NEFTune and rsLoRA.
+- **Practical tricks**: [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), [Unsloth](https://github.com/unslothai/unsloth), [Liger Kernel](https://github.com/linkedin/Liger-Kernel), [KTransformers](https://github.com/kvcache-ai/ktransformers/), RoPE scaling, NEFTune and rsLoRA.
 - **Wide tasks**: Multi-turn dialogue, tool using, image understanding, visual grounding, video recognition, audio understanding, etc.
 - **Experiment monitors**: LlamaBoard, TensorBoard, Wandb, MLflow, [SwanLab](https://github.com/SwanHubX/SwanLab), etc.
 - **Faster inference**: OpenAI-style API, Gradio UI and CLI with [vLLM worker](https://github.com/vllm-project/vllm) or [SGLang worker](https://github.com/sgl-project/sglang).
@@ -115,6 +115,7 @@ Read technical notes:
 >
 > Website: https://blog.llamafactory.net/en/

+- 💡 [KTransformers Fine-Tuning × LLaMA Factory: Fine-tuning 1000 Billion models with 2 4090-GPU + CPU](https://blog.llamafactory.net/en/posts/ktransformers/) (English)
 - 💡 [Easy Dataset × LLaMA Factory: Enabling LLMs to Efficiently Learn Domain Knowledge](https://buaa-act.feishu.cn/wiki/GVzlwYcRFiR8OLkHbL6cQpYin7g) (English)
 - [Fine-tune a mental health LLM using LLaMA-Factory](https://www.lab4ai.cn/project/detail?id=25cce32ec131497b9e06a93336a0817f&type=project&utm_source=LLaMA-Factory) (Chinese)
 - [Fine-tune GPT-OSS for Role-Playing using LLaMA-Factory](https://docs.llamafactory.com.cn/docs/documents/best-practice/gptroleplay/?utm_source=LLaMA-Factory) (Chinese)
@@ -277,27 +278,21 @@ Read technical notes:

 | Model                                                             | Model size                       | Template             |
 | ----------------------------------------------------------------- | -------------------------------- | -------------------- |
-| [Baichuan 2](https://huggingface.co/baichuan-inc)                 | 7B/13B                           | baichuan2            |
 | [BLOOM/BLOOMZ](https://huggingface.co/bigscience)                 | 560M/1.1B/1.7B/3B/7.1B/176B      | -                    |
-| [ChatGLM3](https://huggingface.co/THUDM)                          | 6B                               | chatglm3             |
 | [Command R](https://huggingface.co/CohereForAI)                   | 35B/104B                         | cohere               |
-| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai)         | 7B/16B/67B/236B                  | deepseek             |
-| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
+| [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai)     | 7B/16B/67B/236B                  | deepseek             |
+| [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
 | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai)       | 1.5B/7B/8B/14B/32B/70B/671B      | deepseekr1           |
 | [ERNIE-4.5](https://huggingface.co/baidu)                         | 0.3B/21B/300B                    | ernie/ernie_nothink  |
-| [Falcon](https://huggingface.co/tiiuae)                           | 7B/11B/40B/180B                  | falcon               |
-| [Falcon-H1](https://huggingface.co/tiiuae)                        | 0.5B/1.5B/3B/7B/34B              | falcon_h1            |
+| [Falcon/Falcon H1](https://huggingface.co/tiiuae)                 | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1     |
 | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google)          | 2B/7B/9B/27B                     | gemma/gemma2         |
 | [Gemma 3/Gemma 3n](https://huggingface.co/google)                 | 270M/1B/4B/6B/8B/12B/27B         | gemma3/gemma3n       |
 | [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org)         | 9B/32B                           | glm4/glmz1           |
-| [GLM-4.1V](https://huggingface.co/zai-org)                        | 9B                               | glm4v                |
-| [GLM-4.5/GLM-4.5V](https://huggingface.co/zai-org)                | 106B/355B                        | glm4_moe/glm4v_moe   |
+| [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org)             | 9B/106B/355B                     | glm4_moe/glm4_5v     |
 | [GPT-2](https://huggingface.co/openai-community)                  | 0.1B/0.4B/0.8B/1.5B              | -                    |
-| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt                  |
-| [Granite 3.0-3.3](https://huggingface.co/ibm-granite)             | 1B/2B/3B/8B                      | granite3             |
-| [Granite 4](https://huggingface.co/ibm-granite)                   | 7B                               | granite4             |
+| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt_oss              |
+| [Granite 3-4](https://huggingface.co/ibm-granite)                 | 1B/2B/3B/7B/8B                   | granite3/granite4    |
 | [Hunyuan (MT)](https://huggingface.co/tencent/)                   | 7B                               | hunyuan              |
-| [Index](https://huggingface.co/IndexTeam)                         | 1.9B                             | index                |
 | [InternLM 2-3](https://huggingface.co/internlm)                   | 7B/8B/20B                        | intern2              |
 | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab)              | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl            |
 | [InternLM/Intern-S1-mini](https://huggingface.co/internlm/)       | 8B                               | intern_s1            |
@@ -311,15 +306,14 @@ Read technical notes:
 | [LLaVA-1.5](https://huggingface.co/llava-hf)                      | 7B/13B                           | llava                |
 | [LLaVA-NeXT](https://huggingface.co/llava-hf)                     | 7B/8B/13B/34B/72B/110B           | llava_next           |
 | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf)               | 7B/34B                           | llava_next_video     |
-| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B                               | mimo                 |
+| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B/309B                          | mimo/mimo_v2         |
 | [MiniCPM 1-4.1](https://huggingface.co/openbmb)                   | 0.5B/1B/2B/4B/8B                 | cpm/cpm3/cpm4        |
 | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb)     | 8B                               | minicpm_o/minicpm_v  |
-| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai)        | 8B/12B                           | ministral            |
+| [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models)  | 229B/456B                        | minimax1/minimax2    |
+| [Ministral 3](https://huggingface.co/mistralai)                   | 3B/8B/14B                        | ministral3           |
 | [Mistral/Mixtral](https://huggingface.co/mistralai)               | 7B/8x7B/8x22B                    | mistral              |
-| [Mistral Small](https://huggingface.co/mistralai)                 | 24B                              | mistral_small        |
 | [OLMo](https://huggingface.co/allenai)                            | 1B/7B                            | -                    |
 | [PaliGemma/PaliGemma2](https://huggingface.co/google)             | 3B/10B/28B                       | paligemma            |
-| [Phi-1.5/Phi-2](https://huggingface.co/microsoft)                 | 1.3B/2.7B                        | -                    |
 | [Phi-3/Phi-3.5](https://huggingface.co/microsoft)                 | 4B/14B                           | phi                  |
 | [Phi-3-small](https://huggingface.co/microsoft)                   | 7B                               | phi_small            |
 | [Phi-4](https://huggingface.co/microsoft)                         | 14B                              | phi4                 |
@@ -332,12 +326,9 @@ Read technical notes:
 | [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen)            | 2B/3B/7B/32B/72B                 | qwen2_vl             |
 | [Qwen3-VL](https://huggingface.co/Qwen)                           | 2B/4B/8B/30B/32B/235B            | qwen3_vl             |
 | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed)         | 8B/36B                           | seed_oss/seed_coder  |
-| [Skywork o1](https://huggingface.co/Skywork)                      | 8B                               | skywork_o1           |
 | [StarCoder 2](https://huggingface.co/bigcode)                     | 3B/7B/15B                        | -                    |
-| [TeleChat2](https://huggingface.co/Tele-AI)                       | 3B/7B/35B/115B                   | telechat2            |
-| [XVERSE](https://huggingface.co/xverse)                           | 7B/13B/65B                       | xverse               |
+| [VibeThinker-1.5B](https://huggingface.co/WeiboAI)                | 1.5B                             | qwen3                |
 | [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai)                  | 1.5B/6B/9B/34B                   | yi                   |
-| [Yi-VL](https://huggingface.co/01-ai)                             | 6B/34B                           | yi_vl                |
 | [Yuan 2](https://huggingface.co/IEITYuan)                         | 2B/51B/102B                      | yuan                 |

 > [!NOTE]
@@ -443,6 +434,7 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t
 - [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
 - [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
 - [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
+- [DLR-Web (en)](https://huggingface.co/datasets/Attention1115/DLR-Web)
 - [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
 - [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
 - [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
@@ -524,10 +516,12 @@ huggingface-cli login
 ```bash
 git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
 cd LLaMA-Factory
-pip install -e ".[torch,metrics]" --no-build-isolation
+pip install -e ".[metrics]"
 ```

-Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, aqlm, vllm, sglang, galore, apollo, badam, adam-mini, qwen, minicpm_v, openmind, swanlab, dev
+Optional dependencies available: `metrics`, `deepspeed`. Install with: `pip install -e ".[metrics,deepspeed]"`
+
+Additional dependencies for specific features are available in `examples/requirements/`.

 #### Install from Docker Image

@@ -546,13 +540,7 @@ Please refer to [build docker](#build-docker) to build the image yourself.
 Create an isolated Python environment with [uv](https://github.com/astral-sh/uv):

 ```bash
-uv sync --extra torch --extra metrics --prerelease=allow
-```
-
-Run LLaMA-Factory in the isolated environment:
-
-```bash
-uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+uv run llamafactory-cli webui
 ```

 </details>
@@ -589,7 +577,7 @@ To enable FlashAttention-2 on the Windows platform, please use the script from [

 <details><summary>For Ascend NPU users</summary>

-To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher and specify extra dependencies: `pip install -e ".[torch-npu,metrics]"`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:
+To install LLaMA Factory on Ascend NPU devices, please upgrade Python to version 3.10 or higher: `pip install -e . torch-npu==2.7.1`. Additionally, you need to install the **[Ascend CANN Toolkit and Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**. Please follow the [installation tutorial](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/600alphaX/softwareinstall/instg/atlasdeploy_03_0031.html) or use the following commands:

 ```bash
 # replace the url according to your CANN version and devices
@@ -608,8 +596,8 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh
 | Requirement  | Minimum | Recommend      |
 | ------------ | ------- | -------------- |
 | CANN         | 8.0.RC1 | 8.0.0.alpha002 |
-| torch        | 2.1.0   | 2.4.0          |
-| torch-npu    | 2.1.0   | 2.4.0.post2    |
+| torch        | 2.1.0   | 2.7.1          |
+| torch-npu    | 2.1.0   | 2.7.1          |
 | deepspeed    | 0.13.2  | 0.13.2         |
 | vllm-ascend  | -       | 0.7.3          |

@@ -651,7 +639,7 @@ cd transformers
 pip install .
 ```

-3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml).
+3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml).

 </details>

@@ -666,12 +654,12 @@ You can also use **[Easy Dataset](https://github.com/ConardLi/easy-dataset)**, *

 ### Quickstart

-Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Llama3-8B-Instruct model, respectively.
+Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Qwen3-4B-Instruct model, respectively.

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```

 See [examples/README.md](examples/README.md) for advanced usage (including distributed training).
@@ -724,7 +712,6 @@ For CUDA users:
 ```bash
 docker build -f ./docker/docker-cuda/Dockerfile \
    --build-arg PIP_INDEX=https://pypi.org/simple \
-    --build-arg EXTRAS=metrics \
    -t llamafactory:latest .

 docker run -dit --ipc=host --gpus=all \
@@ -741,7 +728,6 @@ For Ascend NPU users:
 ```bash
 docker build -f ./docker/docker-npu/Dockerfile \
    --build-arg PIP_INDEX=https://pypi.org/simple \
-    --build-arg EXTRAS=torch-npu,metrics \
    -t llamafactory:latest .

 docker run -dit --ipc=host \
@@ -766,7 +752,6 @@ For AMD ROCm users:
 ```bash
 docker build -f ./docker/docker-rocm/Dockerfile \
    --build-arg PIP_INDEX=https://pypi.org/simple \
-    --build-arg EXTRAS=metrics \
    -t llamafactory:latest .

 docker run -dit --ipc=host \
@@ -797,7 +782,7 @@ When building the Docker image, use `-v ./hf_cache:/root/.cache/huggingface` arg
 ### Deploy with OpenAI-style API and vLLM

 ```bash
-API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true
+API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
 ```

 > [!TIP]
--- a/README_zh.md
+++ b/README_zh.md
@@ -98,7 +98,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 - **集成方法**：（增量）预训练、（多模态）指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
 - **多种精度**：16 比特全参数微调、冻结微调、LoRA 微调和基于 AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ 的 2/3/4/5/6/8 比特 QLoRA 微调。
 - **先进算法**：[GaLore](https://github.com/jiaweizzhao/GaLore)、[BAdam](https://github.com/Ledzy/BAdam)、[APOLLO](https://github.com/zhuhanqing/APOLLO)、[Adam-mini](https://github.com/zyushun/Adam-mini)、[Muon](https://github.com/KellerJordan/Muon)、[OFT](https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft)、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ 和 PiSSA。
- **实用技巧**：[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)、[Unsloth](https://github.com/unslothai/unsloth)、[Liger Kernel](https://github.com/linkedin/Liger-Kernel)、RoPE scaling、NEFTune 和 rsLoRA。
+- **实用技巧**：[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)、[Unsloth](https://github.com/unslothai/unsloth)、[Liger Kernel](https://github.com/linkedin/Liger-Kernel)、[KTransformers](https://github.com/kvcache-ai/ktransformers/)、RoPE scaling、NEFTune 和 rsLoRA。
 - **广泛任务**：多轮对话、工具调用、图像理解、视觉定位、视频识别和语音理解等等。
 - **实验监控**：LlamaBoard、TensorBoard、Wandb、MLflow、[SwanLab](https://github.com/SwanHubX/SwanLab) 等等。
 - **极速推理**：基于 [vLLM](https://github.com/vllm-project/vllm) 或 [SGLang](https://github.com/sgl-project/sglang) 的 OpenAI 风格 API、浏览器界面和命令行接口。
@@ -117,6 +117,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 >
 > 网站地址：https://blog.llamafactory.net/

+- 💡 [KTransformers Fine-Tuning × LLaMA Factory: 用2张4090级的GPU+CPU 微调 1000B规模的超大模型](https://swcil84qspu.feishu.cn/wiki/Z1sSwb2poijybxkyPEkcDG6enVc) (中文)
 - 💡 [Easy Dataset × LLaMA Factory: 让大模型高效学习领域知识](https://buaa-act.feishu.cn/wiki/KY9xwTGs1iqHrRkjXBwcZP9WnL9)（中文）
 - [使用 LLaMA-Factory 微调心理健康大模型](https://www.lab4ai.cn/project/detail?id=25cce32ec131497b9e06a93336a0817f&type=project&utm_source=LLaMA-Factory)（中文）
 - [使用 LLaMA-Factory 构建 GPT-OSS 角色扮演模型](https://docs.llamafactory.com.cn/docs/documents/best-practice/gptroleplay/?utm_source=LLaMA-Factory)（中文）
@@ -279,27 +280,21 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc

 | 模型名                                                             | 参数量                            | Template             |
 | ----------------------------------------------------------------- | -------------------------------- | -------------------- |
-| [Baichuan 2](https://huggingface.co/baichuan-inc)                 | 7B/13B                           | baichuan2            |
 | [BLOOM/BLOOMZ](https://huggingface.co/bigscience)                 | 560M/1.1B/1.7B/3B/7.1B/176B      | -                    |
-| [ChatGLM3](https://huggingface.co/THUDM)                          | 6B                               | chatglm3             |
 | [Command R](https://huggingface.co/CohereForAI)                   | 35B/104B                         | cohere               |
-| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai)         | 7B/16B/67B/236B                  | deepseek             |
-| [DeepSeek 2.5/3](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
+| [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai)     | 7B/16B/67B/236B                  | deepseek             |
+| [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai)              | 236B/671B                        | deepseek3            |
 | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai)       | 1.5B/7B/8B/14B/32B/70B/671B      | deepseekr1           |
 | [ERNIE-4.5](https://huggingface.co/baidu)                         | 0.3B/21B/300B                    | ernie/ernie_nothink  |
-| [Falcon](https://huggingface.co/tiiuae)                           | 7B/11B/40B/180B                  | falcon               |
-| [Falcon-H1](https://huggingface.co/tiiuae)                        | 0.5B/1.5B/3B/7B/34B              | falcon_h1            |
+| [Falcon/Falcon H1](https://huggingface.co/tiiuae)                 | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1     |
 | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google)          | 2B/7B/9B/27B                     | gemma/gemma2         |
 | [Gemma 3/Gemma 3n](https://huggingface.co/google)                 | 270M/1B/4B/6B/8B/12B/27B         | gemma3/gemma3n       |
 | [GLM-4/GLM-4-0414/GLM-Z1](https://huggingface.co/zai-org)         | 9B/32B                           | glm4/glmz1           |
-| [GLM-4.1V](https://huggingface.co/zai-org)                        | 9B                               | glm4v                |
-| [GLM-4.5/GLM-4.5V](https://huggingface.co/zai-org)                | 106B/355B                        | glm4_moe/glm4v_moe   |
+| [GLM-4.5/GLM-4.5(6)V](https://huggingface.co/zai-org)             | 9B/106B/355B                     | glm4_moe/glm4_5v     |
 | [GPT-2](https://huggingface.co/openai-community)                  | 0.1B/0.4B/0.8B/1.5B              | -                    |
-| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt                  |
-| [Granite 3.0-3.3](https://huggingface.co/ibm-granite)             | 1B/2B/3B/8B                      | granite3             |
-| [Granite 4](https://huggingface.co/ibm-granite)                   | 7B                               | granite4             |
+| [GPT-OSS](https://huggingface.co/openai)                          | 20B/120B                         | gpt_oss              |
+| [Granite 3-4](https://huggingface.co/ibm-granite)                 | 1B/2B/3B/7B/8B                   | granite3/granite4    |
 | [Hunyuan (MT)](https://huggingface.co/tencent/)                   | 7B                               | hunyuan              |
-| [Index](https://huggingface.co/IndexTeam)                         | 1.9B                             | index                |
 | [InternLM 2-3](https://huggingface.co/internlm)                   | 7B/8B/20B                        | intern2              |
 | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab)              | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl            |
 | [InternLM/Intern-S1-mini](https://huggingface.co/internlm/)       | 8B                               | intern_s1            |
@@ -313,15 +308,14 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 | [LLaVA-1.5](https://huggingface.co/llava-hf)                      | 7B/13B                           | llava                |
 | [LLaVA-NeXT](https://huggingface.co/llava-hf)                     | 7B/8B/13B/34B/72B/110B           | llava_next           |
 | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf)               | 7B/34B                           | llava_next_video     |
-| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B                               | mimo                 |
+| [MiMo](https://huggingface.co/XiaomiMiMo)                         | 7B/309B                          | mimo/mimo_v2         |
 | [MiniCPM 1-4.1](https://huggingface.co/openbmb)                   | 0.5B/1B/2B/4B/8B                 | cpm/cpm3/cpm4        |
 | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb)     | 8B                               | minicpm_o/minicpm_v  |
-| [Ministral/Mistral-Nemo](https://huggingface.co/mistralai)        | 8B/12B                           | ministral            |
+| [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models)  | 229B/456B                        | minimax1/minimax2    |
+| [Ministral 3](https://huggingface.co/mistralai)                   | 3B/8B/14B                        | ministral3           |
 | [Mistral/Mixtral](https://huggingface.co/mistralai)               | 7B/8x7B/8x22B                    | mistral              |
-| [Mistral Small](https://huggingface.co/mistralai)                 | 24B                              | mistral_small        |
 | [OLMo](https://huggingface.co/allenai)                            | 1B/7B                            | -                    |
 | [PaliGemma/PaliGemma2](https://huggingface.co/google)             | 3B/10B/28B                       | paligemma            |
-| [Phi-1.5/Phi-2](https://huggingface.co/microsoft)                 | 1.3B/2.7B                        | -                    |
 | [Phi-3/Phi-3.5](https://huggingface.co/microsoft)                 | 4B/14B                           | phi                  |
 | [Phi-3-small](https://huggingface.co/microsoft)                   | 7B                               | phi_small            |
 | [Phi-4](https://huggingface.co/microsoft)                         | 14B                              | phi4                 |
@@ -334,12 +328,9 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 | [Qwen2-VL/Qwen2.5-VL/QVQ](https://huggingface.co/Qwen)            | 2B/3B/7B/32B/72B                 | qwen2_vl             |
 | [Qwen3-VL](https://huggingface.co/Qwen)                           | 2B/4B/8B/30B/32B/235B            | qwen3_vl             |
 | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed)         | 8B/36B                           | seed_oss/seed_coder  |
-| [Skywork o1](https://huggingface.co/Skywork)                      | 8B                               | skywork_o1           |
 | [StarCoder 2](https://huggingface.co/bigcode)                     | 3B/7B/15B                        | -                    |
-| [TeleChat2](https://huggingface.co/Tele-AI)                       | 3B/7B/35B/115B                   | telechat2            |
-| [XVERSE](https://huggingface.co/xverse)                           | 7B/13B/65B                       | xverse               |
+| [VibeThinker-1.5B](https://huggingface.co/WeiboAI)                | 1.5B                             | qwen3                |
 | [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai)                  | 1.5B/6B/9B/34B                   | yi                   |
-| [Yi-VL](https://huggingface.co/01-ai)                             | 6B/34B                           | yi_vl                |
 | [Yuan 2](https://huggingface.co/IEITYuan)                         | 2B/51B/102B                      | yuan                 |

 > [!NOTE]
@@ -445,6 +436,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
 - [Chinese-DeepSeek-R1-Distill (zh)](https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT)
 - [LLaVA mixed (en&zh)](https://huggingface.co/datasets/BUAADreamer/llava-en-zh-300k)
 - [Pokemon-gpt4o-captions (en&zh)](https://huggingface.co/datasets/jugg1024/pokemon-gpt4o-captions)
+- [DLR-Web (en)](https://huggingface.co/datasets/Attention1115/DLR-Web)
 - [Open Assistant (de)](https://huggingface.co/datasets/mayflowergmbh/oasst_de)
 - [Dolly 15k (de)](https://huggingface.co/datasets/mayflowergmbh/dolly-15k_de)
 - [Alpaca GPT4 (de)](https://huggingface.co/datasets/mayflowergmbh/alpaca-gpt4_de)
@@ -526,10 +518,12 @@ huggingface-cli login
 ```bash
 git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
 cd LLaMA-Factory
-pip install -e ".[torch,metrics]" --no-build-isolation
+pip install -e ".[metrics]"
 ```

-可选的额外依赖项：torch、torch-npu、metrics、deepspeed、liger-kernel、bitsandbytes、hqq、eetq、gptq、aqlm、vllm、sglang、galore、apollo、badam、adam-mini、qwen、minicpm_v、openmind、swanlab、dev
+可选的额外依赖项：`metrics`、`deepspeed`。使用 `pip install -e ".[metrics,deepspeed]"` 安装。
+
+其他可选依赖项请参考 `examples/requirements/` 目录下的文件。

 #### 从镜像安装

@@ -548,13 +542,7 @@ docker run -it --rm --gpus=all --ipc=host hiyouga/llamafactory:latest
 使用 [uv](https://github.com/astral-sh/uv) 创建隔离的 Python 环境：

 ```bash
-uv sync --extra torch --extra metrics --prerelease=allow
-```
-
-在环境中运行 LLaMA-Factory：
-
-```bash
-uv run --prerelease=allow llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+uv run llamafactory-cli webui
 ```

 </details>
@@ -591,7 +579,7 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl

 <details><summary>昇腾 NPU 用户指南</summary>

-在昇腾 NPU 设备上安装 LLaMA Factory 时，请升级 Python 到 3.10 及以上，并需要指定额外依赖项，使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外，还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**，安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令：
+在昇腾 NPU 设备上安装 LLaMA Factory 时，请升级 Python 到 3.10 及以上，并需要指定额外依赖项，使用 `pip install -e . torch-npu==2.7.1` 命令安装。此外，还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**，安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令：

 ```bash
 # 请替换 URL 为 CANN 版本和设备型号对应的 URL
@@ -610,8 +598,8 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh
 | 依赖项        | 至少     | 推荐           |
 | ------------ | ------- | -------------- |
 | CANN         | 8.0.RC1 | 8.0.0.alpha002 |
-| torch        | 2.1.0   | 2.4.0          |
-| torch-npu    | 2.1.0   | 2.4.0.post2    |
+| torch        | 2.1.0   | 2.7.1          |
+| torch-npu    | 2.1.0   | 2.7.1          |
 | deepspeed    | 0.13.2  | 0.13.2         |
 | vllm-ascend  | -       | 0.7.3          |

@@ -653,7 +641,7 @@ cd transformers
 pip install .
 ```

-3. 在训练参数中设置 `double_quantization: false`，可参考[示例](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml)。
+3. 在训练参数中设置 `double_quantization: false`，可参考[示例](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml)。

 </details>

@@ -668,12 +656,12 @@ pip install .

 ### 快速开始

-下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。
+下面三行命令分别对 Qwen3-4B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```

 高级用法请参考 [examples/README_zh.md](examples/README_zh.md)（包括多 GPU 微调）。
@@ -799,7 +787,7 @@ docker exec -it llamafactory bash
 ### 利用 vLLM 部署 OpenAI API

 ```bash
-API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true
+API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
 ```

 > [!TIP]
--- a/data/dataset_info.json
+++ b/data/dataset_info.json
@@ -471,6 +471,14 @@
  "ultrachat_de": {
    "hf_hub_url": "mayflowergmbh/ultra-chat_de"
  },
+  "dlr_web": {
+    "hf_hub_url": "Attention1115/DLR-Web",
+    "split": "full",
+    "columns": {
+      "prompt": "question",
+      "response": "response"
+    }
+  },
  "dpo_en_demo": {
    "file_name": "dpo_en_demo.json",
    "ranking": true,
--- a/data/v1_dpo_demo.yaml
+++ b/data/v1_dpo_demo.yaml
@@ -1,4 +1,4 @@
 dpo_zh_demo:
-  hf_hub_url: HuggingFaceH4/orca_dpo_pairs
+  path: HuggingFaceH4/orca_dpo_pairs
  split: train_prefs
  converter: pair
--- a/data/v1_sft_demo.yaml
+++ b/data/v1_sft_demo.yaml
@@ -1,8 +1,9 @@
 identity:
-  file_name: identity.json
+  path: data/identity.json
+  source: local
  converter: alpaca
 alpaca_en_demo:
-  file_name: alpaca_en_demo.json
-  dataset_dir: ~/data
+  path: data/alpaca_en_demo.json
+  source: local
  converter: alpaca
-  num_samples: 500
+  size: 500
--- a/docker/docker-cuda/Dockerfile
+++ b/docker/docker-cuda/Dockerfile
@@ -4,7 +4,6 @@ FROM ${BASE_IMAGE}

 # Installation arguments
 ARG PIP_INDEX=https://pypi.org/simple
-ARG EXTRAS=metrics
 ARG INSTALL_FLASHATTN=false
 ARG HTTP_PROXY=""

@@ -27,17 +26,13 @@ WORKDIR /app
 # Change pip source
 RUN pip config set global.index-url "${PIP_INDEX}" && \
    pip config set global.extra-index-url "${PIP_INDEX}" && \
-    pip install --no-cache-dir --upgrade pip packaging wheel setuptools
+    pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"

-# Install the requirements
-COPY requirements.txt /app
-RUN pip install --no-cache-dir -r requirements.txt
-
-# Copy the rest of the application into the image
+# Copy the application into the image
 COPY . /app

 # Install LLaMA Factory
-RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation
+RUN pip install --no-cache-dir --no-build-isolation -e ".[metrics,deepspeed]"

 # Rebuild flash attention
 RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \
--- a/docker/docker-cuda/Dockerfile.megatron
+++ b/docker/docker-cuda/Dockerfile.megatron
@@ -8,7 +8,7 @@ ENV PYPI_MIRROR=https://mirrors.aliyun.com/pypi/simple/
 ENV PYPI_TRUSTED_HOST=mirrors.aliyun.com
 ENV APT_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/ubuntu/

-RUN pip install --upgrade pip setuptools wheel --trusted-host ${PYPI_TRUSTED_HOST} --index-url ${PYPI_MIRROR}
+RUN pip install --upgrade pip setuptools wheel "hatchling>=1.18.0" editables --trusted-host ${PYPI_TRUSTED_HOST} --index-url ${PYPI_MIRROR}

 RUN pip uninstall -y torch torchvision torch-tensorrt \
    flash_attn transformer-engine \
@@ -56,14 +56,14 @@ ENV JAVA_HOME /usr/lib/jvm/java-21-openjdk-amd64
 # pip install LLaMA-Factory
 WORKDIR /app

-COPY requirements.txt /app/
-RUN pip install --no-cache-dir -r requirements.txt
+# Copy the application into the image
+COPY . /app
+
+# Install LLaMA Factory
+RUN pip install --no-cache-dir -e ".[metrics]" --no-build-isolation

 RUN pip install "git+https://github.com/alibaba/roll.git#subdirectory=mcore_adapter"

-COPY . /app/
-RUN pip install -e ".[metrics]" --no-build-isolation
-
 # Expose port 7860 for LLaMA Board
 ENV GRADIO_SERVER_PORT=7860
 EXPOSE 7860
--- a/docker/docker-cuda/docker-compose.yml
+++ b/docker/docker-cuda/docker-compose.yml
@@ -5,7 +5,6 @@ services:
      context: ../..
      args:
        PIP_INDEX: https://pypi.org/simple
-        EXTRAS: metrics
    container_name: llamafactory
    ports:
      - "7860:7860"
--- a/docker/docker-npu/Dockerfile
+++ b/docker/docker-npu/Dockerfile
@@ -1,14 +1,10 @@
 # https://hub.docker.com/r/ascendai/cann/tags

-# default base image build for A2, if build for A3, using this image:
-# ARG BASE_IMAGE=ascendai/cann:8.3.rc1-a3-ubuntu22.04-py3.11
-
-ARG BASE_IMAGE=ascendai/cann:8.3.rc1-910b-ubuntu22.04-py3.11
+ARG BASE_IMAGE=quay.io/ascend/cann:8.3.rc2-910b-ubuntu22.04-py3.11
 FROM ${BASE_IMAGE}

 # Installation arguments
 ARG PIP_INDEX=https://pypi.org/simple
-ARG EXTRAS=torch-npu,metrics
 ARG HTTP_PROXY=""
 ARG PYTORCH_INDEX=https://download.pytorch.org/whl/cpu

@@ -31,21 +27,15 @@ WORKDIR /app
 # Change pip source
 RUN pip config set global.index-url "${PIP_INDEX}" && \
    pip config set global.extra-index-url "${PIP_INDEX}" && \
-    pip install --no-cache-dir --upgrade pip packaging wheel setuptools
+    pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"
+
+# Copy the application into the image
+COPY . /app

 # Install torch-npu
 RUN pip uninstall -y torch torchvision torchaudio && \
-    pip install --no-cache-dir "torch==2.7.1" "torch-npu==2.7.1" "torchvision==0.22.1" --index-url "${PYTORCH_INDEX}"
-
-# Install the requirements
-COPY requirements.txt /app
-RUN pip install --no-cache-dir -r requirements.txt
-
-# Copy the rest of the application into the image
-COPY . /app
-
-# Install LLaMA Factory
-RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation
+    pip install --no-cache-dir "torch==2.7.1" "torch-npu==2.7.1" "torchvision==0.22.1" "torchaudio==2.7.1" --index-url "${PYTORCH_INDEX}" && \
+    pip install --no-cache-dir -e ".[metrics]" --no-build-isolation

 # Set up volumes
 # VOLUME [ "/root/.cache/huggingface", "/app/shared_data", "/app/output" ]
--- a/docker/docker-npu/docker-compose.yml
+++ b/docker/docker-npu/docker-compose.yml
@@ -1,12 +1,12 @@
 services:
-  llamafactory:
+  llamafactory-a2:
    build:
      dockerfile: ./docker/docker-npu/Dockerfile
      context: ../..
      args:
        PIP_INDEX: https://pypi.org/simple
-        EXTRAS: torch-npu,metrics
-    container_name: llamafactory
+    container_name: llamafactory-a2
+    image: llamafactory:npu-a2
    volumes:
      - /usr/local/dcmi:/usr/local/dcmi
      - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
@@ -26,3 +26,33 @@ services:
      - /dev/devmm_svm
      - /dev/hisi_hdc
    restart: unless-stopped
+
+  llamafactory-a3:
+    profiles: ["a3"]
+    build:
+      dockerfile: ./docker/docker-npu/Dockerfile
+      context: ../..
+      args:
+        BASE_IMAGE: quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11
+        PIP_INDEX: https://pypi.org/simple
+    container_name: llamafactory-a3
+    image: llamafactory:npu-a3
+    volumes:
+      - /usr/local/dcmi:/usr/local/dcmi
+      - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
+      - /usr/local/Ascend/driver:/usr/local/Ascend/driver
+      - /etc/ascend_install.info:/etc/ascend_install.info
+    ports:
+      - "7861:7860"
+      - "8001:8000"
+    ipc: host
+    tty: true
+    # shm_size: "16gb"  # ipc: host is set
+    stdin_open: true
+    command: bash
+    devices:
+      - /dev/davinci0
+      - /dev/davinci_manager
+      - /dev/devmm_svm
+      - /dev/hisi_hdc
+    restart: unless-stopped
--- a/docker/docker-rocm/Dockerfile
+++ b/docker/docker-rocm/Dockerfile
@@ -4,7 +4,6 @@ FROM ${BASE_IMAGE}

 # Installation arguments
 ARG PIP_INDEX=https://pypi.org/simple
-ARG EXTRAS=metrics
 ARG INSTALL_FLASHATTN=false
 ARG HTTP_PROXY=""
 ARG PYTORCH_INDEX=https://download.pytorch.org/whl/rocm6.3
@@ -28,21 +27,14 @@ WORKDIR /app
 # Change pip source
 RUN pip config set global.index-url "${PIP_INDEX}" && \
    pip config set global.extra-index-url "${PIP_INDEX}" && \
-    pip install --no-cache-dir --upgrade pip packaging wheel setuptools
+    pip install --no-cache-dir --upgrade pip packaging wheel setuptools editables "hatchling>=1.18.0"

-# Reinstall pytorch rocm
-RUN pip uninstall -y torch torchvision torchaudio && \
-    pip install --no-cache-dir --pre torch torchvision torchaudio --index-url "${PYTORCH_INDEX}"
-
-# Install the requirements
-COPY requirements.txt /app
-RUN pip install --no-cache-dir -r requirements.txt
-
-# Copy the rest of the application into the image
+# Copy the application into the image
 COPY . /app

-# Install LLaMA Factory
-RUN pip install --no-cache-dir -e ".[${EXTRAS}]" --no-build-isolation
+# Reinstall pytorch rocm and install LLaMA Factory
+RUN pip uninstall -y torch torchvision torchaudio && \
+    pip install --no-cache-dir --no-build-isolation -e --pre ".[metrics,deepspeed]" --index-url "${PYTORCH_INDEX}"

 # Rebuild flash attention
 RUN if [ "${INSTALL_FLASHATTN}" == "true" ]; then \
--- a/docker/docker-rocm/docker-compose.yml
+++ b/docker/docker-rocm/docker-compose.yml
@@ -5,7 +5,6 @@ services:
      context: ../..
      args:
        PIP_INDEX: https://pypi.org/simple
-        EXTRAS: metrics
    container_name: llamafactory
    ports:
      - "7860:7860"
--- a/examples/README.md
+++ b/examples/README.md
@@ -18,19 +18,19 @@ By default, LLaMA-Factory uses all visible computing devices.
 Basic usage:

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```

 Advanced usage:

 ```bash
-CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
+CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
    learning_rate=1e-5 \
    logging_steps=1
 ```

 ```bash
-bash examples/train_lora/llama3_lora_sft.sh
+bash examples/train_lora/qwen3_lora_sft.sh
 ```

 ## Examples
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
 #### (Continuous) Pre-Training

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
 ```

 #### Supervised Fine-Tuning

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```

 #### Multimodal Supervised Fine-Tuning

 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
 ```

 #### DPO/ORPO/SimPO Training

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
 ```

 #### Multimodal DPO/ORPO/SimPO Training

 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
 ```

 #### Reward Modeling

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
-```
-
-#### PPO Training
-
-```bash
-llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
 ```

 #### KTO Training

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
 ```

 #### Preprocess Dataset
@@ -90,32 +84,26 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
 It is useful for large dataset, use `tokenized_path` in config to load the preprocessed dataset.

 ```bash
-llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
-```
-
-#### Evaluating on MMLU/CMMLU/C-Eval Benchmarks
-
-```bash
-llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
+llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
 ```

 #### Supervised Fine-Tuning on Multiple Nodes

 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```

 #### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)

 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
 ```

 #### Supervised Fine-Tuning with Ray on 4 GPUs

 ```bash
-USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
+USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
 ```

 ### QLoRA Fine-Tuning
@@ -123,13 +111,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
 #### Supervised Fine-Tuning with 4/8-bit Bitsandbytes/HQQ/EETQ Quantization (Recommended)

 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
 ```

 #### Supervised Fine-Tuning with 4-bit Bitsandbytes Quantization on Ascend NPU

 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
 ```

 #### Supervised Fine-Tuning with 4/8-bit GPTQ Quantization
@@ -155,14 +143,14 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
 #### Supervised Fine-Tuning on Single Node

 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```

 #### Supervised Fine-Tuning on Multiple Nodes

 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```

 ### Elastic and Fault-Tolerant Supervised Fine-Tuning on Multiple Nodes
@@ -170,13 +158,13 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
 To launch an elastic job with `MAX_RESTARTS` failures retries, run the following on at least `MIN_NNODES` nodes and at most `MAX_NNODES` nodes. `RDZV_ID` should be set as a unique job id (shared by all nodes participating in the job). See also [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html).

 ```bash
-FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```

 #### Multimodal Supervised Fine-Tuning

 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
 ```

 ### Merging LoRA Adapters and Quantization
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
 Note: DO NOT use quantized model or `quantization_bit` when merging LoRA adapters.

 ```bash
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```

 #### Quantizing Model using AutoGPTQ

 ```bash
-llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
+llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
 ```

 ### Save Ollama modelfile

 ```bash
-llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
 ```

 ### Inferring LoRA Fine-Tuned Models
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
 #### Evaluation using vLLM's Multi-GPU Inference

 ```
-python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo
+python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
 python scripts/eval_bleu_rouge.py generated_predictions.jsonl
 ```

 #### Use CLI ChatBox

 ```bash
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
 ```

 #### Use Web UI ChatBox

 ```bash
-llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
 ```

 #### Launch OpenAI-style API

 ```bash
-llamafactory-cli api examples/inference/llama3_lora_sft.yaml
+llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
 ```

 ### Extras
--- a/examples/README_zh.md
+++ b/examples/README_zh.md
@@ -18,19 +18,19 @@ LLaMA-Factory 默认使用所有可见的计算设备。
 基础用法：

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```

 高级用法：

 ```bash
-CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
+CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
    learning_rate=1e-5 \
    logging_steps=1
 ```

 ```bash
-bash examples/train_lora/llama3_lora_sft.sh
+bash examples/train_lora/qwen3_lora_sft.sh
 ```

 ## 示例
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
 #### （增量）预训练

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
 ```

 #### 指令监督微调

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```

 #### 多模态指令监督微调

 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
 ```

 #### DPO/ORPO/SimPO 训练

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
 ```

 #### 多模态 DPO/ORPO/SimPO 训练

 ```bash
-llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml
+llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
 ```

 #### 奖励模型训练

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
-```
-
-#### PPO 训练
-
-```bash
-llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
 ```

 #### KTO 训练

 ```bash
-llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
+llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
 ```

 #### 预处理数据集
@@ -90,20 +84,14 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
 对于大数据集有帮助，在配置中使用 `tokenized_path` 以加载预处理后的数据集。

 ```bash
-llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
-```
-
-#### 在 MMLU/CMMLU/C-Eval 上评估
-
-```bash
-llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
+llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
 ```

 #### 多机指令监督微调

 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
 ```

 ### 支持弹性和容错的多机指令监督微调
@@ -111,19 +99,19 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
 要启动一个支持弹性节点和容错的多机指令微调，在每个节点上执行以下命令。弹性节点数量范围为 `MIN_NNODES:MAX_NNODES`，每个节点最多允许因为错误重启 `MAX_RESTARTS` 次。`RDZV_ID` 应设置为一个唯一的作业 ID（由参与该作业的所有节点共享）。更多新可以参考官方文档 [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html)。

 ```bash
-FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```

 #### 使用 DeepSpeed ZeRO-3 平均分配显存

 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
 ```

 #### 使用 Ray 在 4 张 GPU 上微调

 ```bash
-USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
+USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
 ```

 ### QLoRA 微调
@@ -131,13 +119,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
 #### 基于 4/8 比特 Bitsandbytes/HQQ/EETQ 量化进行指令监督微调（推荐）

 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
 ```

 #### 在 NPU 上基于 4 比特 Bitsandbytes 量化进行指令监督微调

 ```bash
-llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
 ```

 #### 基于 4/8 比特 GPTQ 量化进行指令监督微调
@@ -163,20 +151,20 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
 #### 在单机上进行指令监督微调

 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```

 #### 在多机上进行指令监督微调

 ```bash
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
-FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
+FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
 ```

 #### 多模态指令监督微调

 ```bash
-FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml
+FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
 ```

 ### 合并 LoRA 适配器与模型量化
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
 注：请勿使用量化后的模型或 `quantization_bit` 参数来合并 LoRA 适配器。

 ```bash
-llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
 ```

 #### 使用 AutoGPTQ 量化模型

 ```bash
-llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
+llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
 ```

 ### 保存 Ollama 配置文件

 ```bash
-llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
+llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
 ```

 ### 推理 LoRA 模型
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
 #### 使用 vLLM 多卡推理评估

 ```
-python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo
+python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
 python scripts/eval_bleu_rouge.py generated_predictions.jsonl
 ```

 #### 使用命令行对话框

 ```bash
-llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
 ```

 #### 使用浏览器对话框

 ```bash
-llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
+llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
 ```

 #### 启动 OpenAI 风格 API

 ```bash
-llamafactory-cli api examples/inference/llama3_lora_sft.yaml
+llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
 ```

 ### 杂项
--- a/examples/accelerate/fsdp2_config.yaml
+++ b/examples/accelerate/fsdp2_config.yaml
@@ -0,0 +1,22 @@
+compute_environment: LOCAL_MACHINE
+debug: false
+distributed_type: FSDP
+downcast_bf16: 'no'
+fsdp_config:
+  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+  fsdp_cpu_ram_efficient_loading: true
+  fsdp_offload_params: false
+  fsdp_reshard_after_forward: true
+  fsdp_state_dict_type: FULL_STATE_DICT
+  fsdp_version: 2
+machine_rank: 0
+main_training_function: main
+mixed_precision: bf16  # or fp16
+num_machines: 1  # the number of nodes
+num_processes: 2  # the number of GPUs in all nodes
+rdzv_backend: static
+same_network: true
+tpu_env: []
+tpu_use_cluster: false
+tpu_use_sudo: false
+use_cpu: false
--- a/examples/accelerate/fsdp_config_multiple_nodes.yaml
+++ b/examples/accelerate/fsdp_config_multiple_nodes.yaml
@@ -0,0 +1,34 @@
+# If you want to run this example on multiple nodes, you need to set the following parameters:
+# - num_machines: the number of nodes
+# - num_processes: the number of GPUs in all nodes, num_machines * num_processes_per_machine
+# - main_process_ip: the IP address of the main process, please keep it the same across all nodes
+# - main_process_port: the port of all nodes, please keep it the same across all nodes
+# - machine_rank: the rank of the current machine, starting from 0, and it should be 0 for main_process_ip
+
+compute_environment: LOCAL_MACHINE
+debug: false
+distributed_type: FSDP
+downcast_bf16: 'no'
+fsdp_config:
+  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+  fsdp_backward_prefetch: BACKWARD_PRE
+  fsdp_forward_prefetch: false
+  fsdp_cpu_ram_efficient_loading: true
+  fsdp_offload_params: false
+  fsdp_sharding_strategy: FULL_SHARD
+  fsdp_state_dict_type: FULL_STATE_DICT
+  fsdp_sync_module_states: true
+  fsdp_use_orig_params: true
+machine_rank: 0
+main_training_function: main
+mixed_precision: bf16  # or fp16
+main_process_ip: 192.168.0.1
+main_process_port: 29500
+num_machines: 2  # the number of nodes
+num_processes: 16  # the number of GPUs in all nodes, num_machines * num_processes_per_machine
+rdzv_backend: static
+same_network: true
+tpu_env: []
+tpu_use_cluster: false
+tpu_use_sudo: false
+use_cpu: false
--- a/examples/ascend/qwen3_full_sft_fsdp2.yaml
+++ b/examples/ascend/qwen3_full_sft_fsdp2.yaml
@@ -0,0 +1,45 @@
+# Start FSDP2 fine-tuning
+# accelerate launch \
+#     --config_file examples/accelerate/fsdp2_config.yaml \
+#     src/train.py examples/ascend/qwen3_full_sft_fsdp2.yaml
+# Change `num_processes` in fsdp2_config.yaml to 16 in A3
+
+### model
+model_name_or_path: Qwen/Qwen3-8B
+trust_remote_code: true
+use_v1_kernels: true
+flash_attn: fa2
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+
+### dataset
+dataset: alpaca_en_demo
+template: qwen3
+cutoff_len: 2048
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/Qwen3-8B/full/sft
+logging_steps: 1
+save_steps: 500
+max_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: false
+report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 8
+gradient_accumulation_steps: 1
+learning_rate: 1.0e-5
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 1800
+resume_from_checkpoint: null
--- a/examples/ascend/qwen3moe_full_sft_fsdp.yaml
+++ b/examples/ascend/qwen3moe_full_sft_fsdp.yaml
@@ -0,0 +1,46 @@
+# Start FSDP fine-tuning
+# accelerate launch \
+#     --config_file examples/accelerate/fsdp_config.yaml \
+#     src/train.py examples/ascend/qwen3moe_full_sft_fsdp.yaml
+# Change `num_processes` in fsdp_config.yaml to 16 in A3
+
+### model
+model_name_or_path: Qwen/Qwen3-30B-A3B-Instruct-2507
+trust_remote_code: true
+use_v1_kernels: true
+flash_attn: fa2
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+disable_gradient_checkpointing: false
+
+### dataset
+dataset: alpaca_zh
+template: qwen3
+cutoff_len: 1024
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/Qwen3-30B-A3B-Instruct-2507/full/sft
+logging_steps: 1
+save_steps: 500
+max_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: true
+report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 4
+gradient_accumulation_steps: 1
+learning_rate: 1.0e-4
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+seed: 1234
--- a/examples/ascend/qwen3vlmoe_full_sft_fsdp2.yaml
+++ b/examples/ascend/qwen3vlmoe_full_sft_fsdp2.yaml
@@ -0,0 +1,48 @@
+# Start FSDP2 fine-tuning
+# accelerate launch \
+#     --config_file examples/accelerate/fsdp2_config.yaml \
+#     src/train.py examples/ascend/qwen3vlmoe_full_sft_fsdp2.yaml
+# Change `num_processes` in fsdp2_config.yaml to 16 in A3
+
+### model
+model_name_or_path: Qwen/Qwen3-VL-30B-A3B-Instruct
+image_max_pixels: 262144
+video_max_pixels: 16384
+trust_remote_code: true
+use_v1_kernels: true
+flash_attn: fa2
+
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+disable_gradient_checkpointing: false
+
+### dataset
+dataset: llava_1k_en, llava_1k_zh
+template: qwen3_vl
+cutoff_len: 1024
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/Qwen3-VL-30B-A3B-Instruct/full/sft
+logging_steps: 1
+save_steps: 500
+max_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: true
+report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 2
+gradient_accumulation_steps: 1
+learning_rate: 1.0e-4
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+seed: 1234
--- a/examples/ascend/qwen3vlmoe_lora_sft_fsdp.yaml
+++ b/examples/ascend/qwen3vlmoe_lora_sft_fsdp.yaml
@@ -0,0 +1,42 @@
+### model
+model_name_or_path: Qwen/Qwen3-VL-30B-A3B-Instruct
+image_max_pixels: 262144
+video_max_pixels: 16384
+trust_remote_code: true
+use_v1_kernels: true  # replaced kernels: [NpuRMSNormKernel, NpuRoPEKernel, NpuQwen3VLMoEFusedMoEKernel]
+
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 8
+lora_target: all
+disable_gradient_checkpointing: false
+flash_attn: disabled
+
+### dataset
+dataset: alpaca_zh_demo, alpaca_en_demo
+template: qwen3_vl
+cutoff_len: 1024
+overwrite_cache: true
+preprocessing_num_workers: 16
+dataloader_num_workers: 4
+
+### output
+output_dir: saves/qwen3vlmoe/lora/sft
+logging_steps: 1
+plot_loss: true
+overwrite_output_dir: true
+save_only_model: true
+report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
+
+### train
+per_device_train_batch_size: 8
+gradient_accumulation_steps: 1
+learning_rate: 1.0e-4
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+resume_from_checkpoint: null
+seed: 1234
--- a/examples/deepspeed/ds_z2_autotp_config.json
+++ b/examples/deepspeed/ds_z2_autotp_config.json
@@ -0,0 +1,32 @@
+{
+  "_comment": "suooprted model list: https://www.deepspeed.ai/tutorials/automatic-tensor-parallelism/#supported-models",
+  "train_batch_size": "auto",
+  "train_micro_batch_size_per_gpu": "auto",
+  "gradient_accumulation_steps": "auto",
+  "gradient_clipping": "auto",
+  "zero_allow_untested_optimizer": true,
+  "fp16": {
+    "enabled": "auto",
+    "loss_scale": 0,
+    "loss_scale_window": 1000,
+    "initial_scale_power": 16,
+    "hysteresis": 2,
+    "min_loss_scale": 1
+  },
+  "bf16": {
+    "enabled": "auto"
+  },
+  "zero_optimization": {
+    "stage": 2,
+    "allgather_partitions": true,
+    "allgather_bucket_size": 5e8,
+    "overlap_comm": false,
+    "reduce_scatter": true,
+    "reduce_bucket_size": 5e8,
+    "contiguous_gradients": true,
+    "round_robin_gradients": true
+  },
+  "tensor_parallel": {
+    "autotp_size": 2
+  }
+}
--- a/examples/inference/llama3_lora_sft.yaml
+++ b/examples/inference/llama3_lora_sft.yaml
@@ -1,5 +0,0 @@
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
-adapter_name_or_path: saves/llama3-8b/lora/sft
-template: llama3
-infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
-trust_remote_code: true
--- a/examples/inference/qwen2_5vl.yaml
+++ b/examples/inference/qwen2_5vl.yaml
@@ -1,4 +1,4 @@
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
-template: qwen2_vl
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
+template: qwen3_nothink
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/inference/llama3_full_sft.yaml
+++ b/examples/inference/llama3_full_sft.yaml
@@ -1,4 +1,4 @@
-model_name_or_path: saves/llama3-8b/full/sft
-template: llama3
+model_name_or_path: saves/qwen3-4b/full/sft
+template: qwen3_nothink
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/inference/qwen3_lora_sft.yaml
+++ b/examples/inference/qwen3_lora_sft.yaml
@@ -0,0 +1,5 @@
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
+adapter_name_or_path: saves/qwen3-4b/lora/sft
+template: qwen3_nothink
+infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
+trust_remote_code: true
--- a/examples/inference/qwen3vl.yaml
+++ b/examples/inference/qwen3vl.yaml
@@ -1,4 +1,4 @@
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
-template: llama3
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
+template: qwen3_vl_nothink
 infer_backend: huggingface  # choices: [huggingface, vllm, sglang, ktransformers]
 trust_remote_code: true
--- a/examples/ktransformers/infer_lora/deepseek2_lora_sft_kt.yaml
+++ b/examples/ktransformers/infer_lora/deepseek2_lora_sft_kt.yaml
--- a/examples/ktransformers/infer_lora/deepseek3_kt.yaml
+++ b/examples/ktransformers/infer_lora/deepseek3_kt.yaml
--- a/examples/ktransformers/infer_lora/deepseek3_lora_sft_kt.yaml
+++ b/examples/ktransformers/infer_lora/deepseek3_lora_sft_kt.yaml
--- a/examples/ktransformers/infer_lora/qwen3moe_lora_sft_kt.yaml
+++ b/examples/ktransformers/infer_lora/qwen3moe_lora_sft_kt.yaml
@@ -0,0 +1,10 @@
+model_name_or_path: Qwen/Qwen3-235B-A22B-Instruct-2507
+adapter_name_or_path: saves/Kllama_Qwen3MoE_235bA22b
+template: qwen3_nothink
+infer_backend: ktransformers  # choices: [huggingface, vllm, sglang, ktransformers]
+trust_remote_code: true
+
+use_kt: true # use KTransformers as LoRA sft backend to inference
+kt_optimize_rule: examples/kt_optimize_rules/Qwen3Moe-sft-amx.yaml
+cpu_infer: 32
+chunk_size: 8192
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat-sft-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Chat.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx-multi-gpu.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx-multi-gpu.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat-sft.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V2-Lite-Chat.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu-4.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu-4.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx-multi-gpu.yaml
--- a/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/DeepSeek-V3-Chat-sft-amx.yaml
--- a/examples/ktransformers/kt_optimize_rules/Qwen3Moe-sft-amx.yaml
+++ b/examples/ktransformers/kt_optimize_rules/Qwen3Moe-sft-amx.yaml
@@ -0,0 +1,80 @@
+- match:
+    class: ktransformers.models.modeling_qwen2_moe.Qwen2MoeRotaryEmbedding
+  replace:
+    class: ktransformers.operators.RoPE.RotaryEmbedding
+    kwargs:
+      generate_device: "cuda"
+      prefill_device: "cuda"
+
+- match:
+    name: "^lm_head$"  # regular expression
+    class: torch.nn.Linear  # only match modules matching name and class simultaneously
+  replace:
+    class: ktransformers.operators.linear.KTransformersLinear  # optimized Kernel on quantized data types
+    kwargs:
+      generate_device: "cuda"
+      prefill_device: "cuda"
+      generate_op: "KLinearTorch"
+      prefill_op: "KLinearTorch"
+
+# - match:
+#     name: "^model\\.layers\\..*$"  # regular expression
+#     class: torch.nn.Linear  # only match modules matching name and class simultaneously
+#   replace:
+#     class: ktransformers.operators.linear.KTransformersLinear  # optimized Kernel on quantized data types
+#     kwargs:
+#       generate_device: "cuda"
+#       prefill_device: "cuda"
+#       generate_op: "KLinearTorch"
+#       prefill_op: "KLinearTorch"
+- match:
+    name: "^model\\.layers\\.(?!.*mlp\\.shared_expert_gate).*$"  # regular expression
+    class: torch.nn.Linear  # only match modules matching name and class simultaneously
+  replace:
+    class: ktransformers.operators.linear.KTransformersLinear  # optimized Kernel on quantized data types
+    kwargs:
+      generate_device: "cuda"
+      prefill_device: "cuda"
+      generate_op: "KLinearTorch"
+      prefill_op: "KLinearTorch"
+- match:
+    name: "^model\\.layers\\..*\\.mlp$"
+  replace:
+    class: ktransformers.operators.experts.KQwen3MoeSparseMoeBlock     # mlp module with custom forward function
+    kwargs:
+      generate_device: "cuda"
+      prefill_device: "cuda"
+
+- match:
+    name: "^model\\.layers\\..*\\.mlp\\.experts$"
+  replace:
+    class: ktransformers.operators.experts.KTransformersExperts     # custom MoE Kernel with expert paralleism
+    kwargs:
+      prefill_device: "cuda"
+      prefill_op: "KExpertsTorch"
+      generate_device: "cpu"
+      generate_op: "KSFTExpertsCPU"
+      out_device: "cuda"
+      backend: "AMXInt8" # or "AMXBF16" or "AMXInt8"
+  recursive: False # don't recursively inject submodules of this module
+- match:
+    name: "^model\\.layers\\..*\\.self_attn$"
+  replace:
+    class: ktransformers.operators.attention.KQwen3MoeAttention # optimized MLA implementation
+    kwargs:
+      generate_device: "cuda"
+      prefill_device: "cuda"
+- match:
+    name: "^model.embed_tokens"
+  replace:
+    class: "default"
+    kwargs:
+      generate_device: "cpu"
+      prefill_device: "cpu"
+
+- match:
+    name: "^model$"
+  replace:
+    class: "ktransformers.operators.models.KQwen3MoeModel"
+    kwargs:
+      per_layer_prefill_intput_threshold: 0
--- a/examples/ktransformers/train_lora/deepseek2_lora_sft_kt.yaml
+++ b/examples/ktransformers/train_lora/deepseek2_lora_sft_kt.yaml
--- a/examples/ktransformers/train_lora/deepseek3_lora_sft_kt.yaml
+++ b/examples/ktransformers/train_lora/deepseek3_lora_sft_kt.yaml
--- a/examples/ktransformers/train_lora/qwen3moe_lora_sft_kt.yaml
+++ b/examples/ktransformers/train_lora/qwen3moe_lora_sft_kt.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-235B-A22B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -10,18 +10,18 @@ lora_rank: 8
 lora_target: all

 ### dataset
-dataset: identity,alpaca_en_demo
-template: llama3
+dataset: identity, alpaca_en_demo
+template: qwen3_nothink
 cutoff_len: 2048
-max_samples: 1000
+max_samples: 100000
 overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/Kllama_Qwen3MoE_235bA22b
 logging_steps: 10
-save_steps: 500
+save_steps: 200
 plot_loss: true
 overwrite_output_dir: true
 save_only_model: false
@@ -31,13 +31,19 @@ report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
-num_train_epochs: 3.0
+num_train_epochs: 3
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 resume_from_checkpoint: null

+### ktransformers
+use_kt: true # use KTransformers as LoRA sft backend
+kt_optimize_rule: examples/kt_optimize_rules/Qwen3Moe-sft-amx.yaml
+cpu_infer: 32
+chunk_size: 8192
+
 ### eval
 # eval_dataset: alpaca_en_demo
 # val_size: 0.1
--- a/examples/merge_lora/llama3_full_sft.yaml
+++ b/examples/merge_lora/llama3_full_sft.yaml
@@ -1,10 +1,10 @@
 ### model
-model_name_or_path: saves/llama3-8b/full/sft
-template: llama3
+model_name_or_path: saves/qwen3-4b/full/sft
+template: qwen3_nothink
 trust_remote_code: true

 ### export
-export_dir: output/llama3_full_sft
+export_dir: saves/qwen3_sft_merged
 export_size: 5
 export_device: cpu  # choices: [cpu, auto]
 export_legacy_format: false
--- a/examples/merge_lora/llama3_gptq.yaml
+++ b/examples/merge_lora/llama3_gptq.yaml
@@ -1,10 +1,10 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
-template: llama3
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
+template: qwen3_nothink
 trust_remote_code: true

 ### export
-export_dir: output/llama3_gptq
+export_dir: saves/qwen3_gptq
 export_quantization_bit: 4
 export_quantization_dataset: data/c4_demo.jsonl
 export_size: 5
--- a/examples/merge_lora/qwen2_5vl_lora_sft.yaml
+++ b/examples/merge_lora/qwen2_5vl_lora_sft.yaml
@@ -1,13 +1,13 @@
 ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters

 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
-adapter_name_or_path: saves/qwen2_5vl-7b/lora/sft
-template: qwen2_vl
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
+adapter_name_or_path: saves/qwen3-4b/lora/sft
+template: qwen3_nothink
 trust_remote_code: true

 ### export
-export_dir: output/qwen2_5vl_lora_sft
+export_dir: saves/qwen3_sft_merged
 export_size: 5
 export_device: cpu  # choices: [cpu, auto]
 export_legacy_format: false
--- a/examples/merge_lora/qwen3vl_lora_sft.yaml
+++ b/examples/merge_lora/qwen3vl_lora_sft.yaml
@@ -1,13 +1,13 @@
 ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters

 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
-adapter_name_or_path: saves/llama3-8b/lora/sft
-template: llama3
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
+adapter_name_or_path: saves/qwen3-vl-4b/lora/sft
+template: qwen3_vl_nothink
 trust_remote_code: true

 ### export
-export_dir: output/llama3_lora_sft
+export_dir: saves/qwen3_vl_sft_merged
 export_size: 5
 export_device: cpu  # choices: [cpu, auto]
 export_legacy_format: false
--- a/examples/requirements/adam-mini.txt
+++ b/examples/requirements/adam-mini.txt
@@ -0,0 +1 @@
+adam-mini
--- a/examples/requirements/apollo.txt
+++ b/examples/requirements/apollo.txt
@@ -0,0 +1 @@
+apollo-torch
--- a/examples/requirements/aqlm.txt
+++ b/examples/requirements/aqlm.txt
@@ -0,0 +1 @@
+aqlm[gpu]>=1.1.0
--- a/examples/requirements/badam.txt
+++ b/examples/requirements/badam.txt
@@ -0,0 +1 @@
+badam>=1.2.1
--- a/examples/requirements/bitsandbytes.txt
+++ b/examples/requirements/bitsandbytes.txt
@@ -0,0 +1 @@
+bitsandbytes>=0.39.0
--- a/examples/requirements/eetq.txt
+++ b/examples/requirements/eetq.txt
@@ -0,0 +1 @@
+eetq
--- a/examples/requirements/fp8-te.txt
+++ b/examples/requirements/fp8-te.txt
@@ -0,0 +1,2 @@
+transformer_engine[pytorch]>=2.0.0
+accelerate>=1.10.0
--- a/examples/requirements/fp8.txt
+++ b/examples/requirements/fp8.txt
@@ -0,0 +1,2 @@
+torchao>=0.8.0
+accelerate>=1.10.0
--- a/examples/requirements/galore.txt
+++ b/examples/requirements/galore.txt
@@ -0,0 +1 @@
+galore-torch
--- a/examples/requirements/gptq.txt
+++ b/examples/requirements/gptq.txt
@@ -0,0 +1,2 @@
+optimum>=1.24.0
+gptqmodel>=2.0.0
--- a/examples/requirements/hqq.txt
+++ b/examples/requirements/hqq.txt
@@ -0,0 +1 @@
+hqq
--- a/examples/requirements/liger-kernel.txt
+++ b/examples/requirements/liger-kernel.txt
@@ -0,0 +1 @@
+liger-kernel>=0.5.5
--- a/examples/requirements/minicpm-v.txt
+++ b/examples/requirements/minicpm-v.txt
@@ -0,0 +1,8 @@
+soundfile
+torchvision
+torchaudio
+vector_quantize_pytorch
+vocos
+msgpack
+referencing
+jsonschema_specifications
--- a/examples/requirements/openmind.txt
+++ b/examples/requirements/openmind.txt
@@ -0,0 +1 @@
+openmind
--- a/examples/requirements/sglang.txt
+++ b/examples/requirements/sglang.txt
@@ -0,0 +1,2 @@
+sglang[srt]>=0.4.5
+transformers==4.51.1
--- a/examples/requirements/swanlab.txt
+++ b/examples/requirements/swanlab.txt
@@ -0,0 +1 @@
+swanlab
--- a/examples/requirements/vllm.txt
+++ b/examples/requirements/vllm.txt
@@ -0,0 +1 @@
+vllm>=0.4.3,<=0.11.0
--- a/examples/train_full/llama3_full_sft.yaml
+++ b/examples/train_full/llama3_full_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -10,15 +10,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json,

 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/full/sft
+output_dir: saves/qwen3-4b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_full/qwen2_5vl_full_sft.yaml
+++ b/examples/train_full/qwen2_5vl_full_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
 image_max_pixels: 262144
 video_max_pixels: 16384
 trust_remote_code: true
@@ -15,15 +15,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json

 ### dataset
 dataset: mllm_demo,identity,alpaca_en_demo
-template: qwen2_vl
+template: qwen3_vl_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/qwen2_5vl-7b/full/sft
+output_dir: saves/qwen3-vl-4b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_eval.yaml
+++ b/examples/train_lora/llama3_lora_eval.yaml
@@ -1,19 +0,0 @@
-### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
-adapter_name_or_path: saves/llama3-8b/lora/sft
-trust_remote_code: true
-
-### method
-finetuning_type: lora
-
-### dataset
-task: mmlu_test  # choices: [mmlu_test, ceval_validation, cmmlu_test]
-template: fewshot
-lang: en
-n_shot: 5
-
-### output
-save_dir: saves/llama3-8b/lora/eval
-
-### eval
-batch_size: 4
--- a/examples/train_lora/llama3_lora_ppo.yaml
+++ b/examples/train_lora/llama3_lora_ppo.yaml
@@ -1,43 +0,0 @@
-### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
-reward_model: saves/llama3-8b/lora/reward
-trust_remote_code: true
-
-### method
-stage: ppo
-do_train: true
-finetuning_type: lora
-lora_rank: 8
-lora_target: all
-
-### dataset
-dataset: identity,alpaca_en_demo
-template: llama3
-cutoff_len: 2048
-max_samples: 1000
-overwrite_cache: true
-preprocessing_num_workers: 16
-dataloader_num_workers: 4
-
-### output
-output_dir: saves/llama3-8b/lora/ppo
-logging_steps: 10
-save_steps: 500
-plot_loss: true
-overwrite_output_dir: true
-report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
-
-### train
-per_device_train_batch_size: 1
-gradient_accumulation_steps: 8
-learning_rate: 1.0e-5
-num_train_epochs: 3.0
-lr_scheduler_type: cosine
-warmup_ratio: 0.1
-bf16: true
-ddp_timeout: 180000000
-
-### generate
-max_new_tokens: 512
-top_k: 0
-top_p: 0.9
--- a/examples/train_lora/llama4_lora_sft_ds3.yaml
+++ b/examples/train_lora/llama4_lora_sft_ds3.yaml
@@ -1,49 +0,0 @@
-# pip install git+https://github.com/hiyouga/transformers.git@llama4_train
-
-### model
-model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
-trust_remote_code: true
-
-### method
-stage: sft
-do_train: true
-finetuning_type: lora
-lora_rank: 8
-lora_target: all
-deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
-
-### dataset
-dataset: mllm_demo,identity,alpaca_en_demo
-template: llama4
-cutoff_len: 2048
-max_samples: 1000
-overwrite_cache: true
-preprocessing_num_workers: 16
-dataloader_num_workers: 4
-
-### output
-output_dir: saves/llama4-8b/lora/sft
-logging_steps: 10
-save_steps: 500
-plot_loss: true
-overwrite_output_dir: true
-save_only_model: false
-report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]
-
-### train
-per_device_train_batch_size: 1
-gradient_accumulation_steps: 2
-learning_rate: 1.0e-4
-num_train_epochs: 3.0
-lr_scheduler_type: cosine
-warmup_ratio: 0.1
-bf16: true
-ddp_timeout: 180000000
-resume_from_checkpoint: null
-
-### eval
-# eval_dataset: alpaca_en_demo
-# val_size: 0.1
-# per_device_eval_batch_size: 1
-# eval_strategy: steps
-# eval_steps: 500
--- a/examples/train_lora/llama3_lora_dpo.yaml
+++ b/examples/train_lora/llama3_lora_dpo.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -13,15 +13,14 @@ pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

 ### dataset
 dataset: dpo_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/dpo
+output_dir: saves/qwen3-4b/lora/dpo
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_kto.yaml
+++ b/examples/train_lora/llama3_lora_kto.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -12,15 +12,14 @@ pref_beta: 0.1

 ### dataset
 dataset: kto_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/kto
+output_dir: saves/qwen3-4b/lora/kto
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_pretrain.yaml
+++ b/examples/train_lora/llama3_lora_pretrain.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -13,12 +13,11 @@ lora_target: all
 dataset: c4_demo
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/pretrain
+output_dir: saves/qwen3-4b/lora/pretrain
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_reward.yaml
+++ b/examples/train_lora/llama3_lora_reward.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -11,15 +11,14 @@ lora_target: all

 ### dataset
 dataset: dpo_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/reward
+output_dir: saves/qwen3-4b/lora/reward
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_sft.sh
+++ b/examples/train_lora/llama3_lora_sft.sh
@@ -2,7 +2,7 @@

 set -x

-MODEL_PATH=meta-llama/Meta-Llama-3-8B-Instruct
+MODEL_PATH=Qwen/Qwen3-4B-Instruct-2507

 llamafactory-cli train \
    --model_name_or_path ${MODEL_PATH} \
@@ -13,13 +13,12 @@ llamafactory-cli train \
    --lora_rank 8 \
    --lora_target all \
    --dataset identity,alpaca_en_demo \
-    --template llama3 \
+    --template qwen3_nothink \
    --cutoff_len 2048 \
    --max_samples 1000 \
-    --overwrite_cache \
    --preprocessing_num_workers 16 \
    --dataloader_num_workers 4 \
-    --output_dir saves/llama3-8b/lora/sft \
+    --output_dir saves/qwen3-4b/lora/sft \
    --logging_steps 10 \
    --save_steps 500 \
    --plot_loss \
--- a/examples/train_lora/qwen3_lora_sft.yaml
+++ b/examples/train_lora/qwen3_lora_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: openai/gpt-oss-20b
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -11,15 +11,14 @@ lora_target: all

 ### dataset
 dataset: identity,alpaca_en_demo
-template: gpt
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/gpt-20b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_sft_ds3.yaml
+++ b/examples/train_lora/llama3_lora_sft_ds3.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -12,15 +12,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json,

 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/llama3_lora_sft_ray.yaml
+++ b/examples/train_lora/llama3_lora_sft_ray.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct  # or use local absolute path
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507  # or use local absolute path
 trust_remote_code: true

 ### method
@@ -12,10 +12,9 @@ lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 dataset_dir: REMOTE:llamafactory/demo_data  # or use local absolute path
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

@@ -29,7 +28,7 @@ save_only_model: false
 report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

 ### ray
-ray_run_name: llama3_8b_sft_lora
+ray_run_name: qwen3_4b_sft_lora
 ray_storage_path: ./saves
 ray_num_workers: 4  # Number of GPUs to use.
 placement_strategy: PACK
--- a/examples/train_lora/llama3_preprocess.yaml
+++ b/examples/train_lora/llama3_preprocess.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 trust_remote_code: true

 ### method
@@ -11,13 +11,12 @@ lora_target: all

 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
-tokenized_path: saves/llama3-8b/dataset/sft
+tokenized_path: saves/qwen3-4b/dataset/sft

-### output
-output_dir: saves/llama3-8b/lora/sft
+### output (not used)
+output_dir: saves/qwen3-4b/lora/sft
 overwrite_output_dir: true
--- a/examples/train_lora/qwen2_5vl_lora_dpo.yaml
+++ b/examples/train_lora/qwen2_5vl_lora_dpo.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
 image_max_pixels: 262144
 video_max_pixels: 16384
 trust_remote_code: true
@@ -15,15 +15,14 @@ pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

 ### dataset
 dataset: rlhf_v
-template: qwen2_vl
+template: qwen3_vl_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/qwen2_5vl-7b/lora/dpo
+output_dir: saves/qwen3-vl-4b/lora/dpo
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_lora/qwen2_5vl_lora_sft.yaml
+++ b/examples/train_lora/qwen2_5vl_lora_sft.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
+model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
 image_max_pixels: 262144
 video_max_pixels: 16384
 trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all

 ### dataset
 dataset: mllm_demo,identity,alpaca_en_demo  # video: mllm_video_demo
-template: qwen2_vl
+template: qwen3_vl_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/qwen2_5vl-7b/lora/sft
+output_dir: saves/qwen3-vl-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_qlora/llama3_lora_sft_aqlm.yaml
+++ b/examples/train_qlora/llama3_lora_sft_aqlm.yaml
@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

--- a/examples/train_qlora/llama3_lora_sft_awq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_awq.yaml
@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

--- a/examples/train_qlora/llama3_lora_sft_gptq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_gptq.yaml
@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

--- a/examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
+++ b/examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 quantization_bit: 4
 quantization_method: bnb
 double_quantization: false
@@ -14,15 +14,14 @@ lora_target: all

 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/examples/train_qlora/llama3_lora_sft_otfq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_otfq.yaml
@@ -1,5 +1,5 @@
 ### model
-model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
 quantization_bit: 4  # choices: [8 (bnb/hqq/eetq), 4 (bnb/hqq), 3 (hqq), 2 (hqq)]
 quantization_method: bnb  # choices: [bnb, hqq, eetq]
 trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all

 ### dataset
 dataset: identity,alpaca_en_demo
-template: llama3
+template: qwen3_nothink
 cutoff_len: 2048
 max_samples: 1000
-overwrite_cache: true
 preprocessing_num_workers: 16
 dataloader_num_workers: 4

 ### output
-output_dir: saves/llama3-8b/lora/sft
+output_dir: saves/qwen3-4b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,42 +1,122 @@
 [build-system]
-requires = ["setuptools>=61.0"]
-build-backend = "setuptools.build_meta"
+requires = ["hatchling"]
+build-backend = "hatchling.build"

 [project]
 name = "llamafactory"
-requires-python = ">=3.9.0"
-dynamic = [
-    "version",
-    "dependencies",
-    "optional-dependencies",
-    "scripts",
-    "authors",
-    "description",
-    "readme",
-    "license",
-    "keywords",
-    "classifiers"
+dynamic = ["version"]
+description = "Unified Efficient Fine-Tuning of 100+ LLMs"
+readme = "README.md"
+license = "Apache-2.0"
+requires-python = ">=3.11.0"
+authors = [
+    { name = "hiyouga", email = "hiyouga@buaa.edu.cn" }
+]
+keywords = [
+    "AI",
+    "LLM",
+    "GPT",
+    "ChatGPT",
+    "Llama",
+    "Transformer",
+    "DeepSeek",
+    "Pytorch"
+]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Developers",
+    "Intended Audience :: Education",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: Apache Software License",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence"
+]
+dependencies = [
+    # core deps
+    "torch>=2.4.0",
+    "torchvision>=0.19.0",
+    "torchaudio>=2.4.0",
+    "transformers>=4.51.0,<=4.57.1,!=4.52.0,!=4.57.0",
+    "datasets>=2.16.0,<=4.0.0",
+    "accelerate>=1.3.0,<=1.11.0",
+    "peft>=0.14.0,<=0.17.1",
+    "trl>=0.18.0,<=0.24.0",
+    "torchdata>=0.10.0,<=0.11.0",
+    # gui
+    "gradio>=4.38.0,<=5.50.0",
+    "matplotlib>=3.7.0",
+    "tyro<0.9.0",
+    # ops
+    "einops",
+    "numpy",
+    "pandas",
+    "scipy",
+    # model and tokenizer
+    "sentencepiece",
+    "tiktoken",
+    "modelscope",
+    "hf-transfer",
+    "safetensors",
+    # python
+    "av",
+    "fire",
+    "omegaconf",
+    "packaging",
+    "protobuf",
+    "pyyaml",
+    "pydantic",
+    # api
+    "uvicorn",
+    "fastapi",
+    "sse-starlette"
 ]

+[project.optional-dependencies]
+dev = ["pre-commit", "ruff", "pytest", "build"]
+metrics = ["nltk", "jieba", "rouge-chinese"]
+deepspeed = ["deepspeed>=0.10.0,<=0.16.9"]
+
+[project.scripts]
+llamafactory-cli = "llamafactory.cli:main"
+lmf = "llamafactory.cli:main"
+
+[project.urls]
+Homepage = "https://github.com/hiyouga/LLaMA-Factory"
+Repository = "https://github.com/hiyouga/LLaMA-Factory"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/llamafactory"]
+
+[tool.hatch.version]
+path = "src/llamafactory/extras/env.py"
+pattern = "VERSION = \"(?P<version>[^\"]+)\""
+
 [tool.ruff]
-target-version = "py39"
+target-version = "py311"
 line-length = 119
 indent-width = 4

 [tool.ruff.lint]
 ignore = [
-    "C408", # collection
-    "C901", # complex
-    "E501", # line too long
-    "E731", # lambda function
-    "E741", # ambiguous var name
-    "D100", # no doc public module
-    "D101", # no doc public class
-    "D102", # no doc public method
-    "D103", # no doc public function
-    "D104", # no doc public package
-    "D105", # no doc magic method
-    "D107", # no doc __init__
+    "C408",  # collection
+    "C901",  # complex
+    "E501",  # line too long
+    "E731",  # lambda function
+    "E741",  # ambiguous var name
+    "UP007", # no upgrade union
+    "UP045", # no upgrade optional
+    "D100",  # no doc public module
+    "D101",  # no doc public class
+    "D102",  # no doc public method
+    "D103",  # no doc public function
+    "D104",  # no doc public package
+    "D105",  # no doc magic method
+    "D107",  # no doc __init__
 ]
 extend-select = [
    "C",      # complexity
@@ -73,23 +153,3 @@ indent-style = "space"
 docstring-code-format = true
 skip-magic-trailing-comma = false
 line-ending = "auto"
-
-[tool.uv]
-conflicts = [
-    [
-        { extra = "torch-npu" },
-        { extra = "aqlm" },
-    ],
-    [
-        { extra = "torch-npu" },
-        { extra = "vllm" },
-    ],
-    [
-        { extra = "torch-npu" },
-        { extra = "sglang" },
-    ],
-    [
-        { extra = "vllm" },
-        { extra = "sglang" },
-    ],
-]
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,38 +0,0 @@
-# core deps
-transformers>=4.49.0,<=4.56.2,!=4.52.0; python_version < '3.10'
-transformers>=4.49.0,<=4.57.1,!=4.52.0,!=4.57.0; python_version >= '3.10'
-datasets>=2.16.0,<=4.0.0
-accelerate>=1.3.0,<=1.11.0
-peft>=0.14.0,<=0.17.1
-trl>=0.8.6,<=0.9.6
-# gui
-gradio>=4.38.0,<=5.45.0
-matplotlib>=3.7.0
-tyro<0.9.0
-# ops
-einops
-numpy<2.0.0
-pandas>=2.0.0
-scipy
-# model and tokenizer
-sentencepiece
-tiktoken
-modelscope>=1.14.0
-hf-transfer
-safetensors<=0.5.3
-# python
-fire
-omegaconf
-packaging
-protobuf
-pyyaml
-pydantic<=2.10.6
-# api
-uvicorn
-fastapi
-sse-starlette
-# media
-av
-librosa
-# yanked
-propcache!=0.4.0
--- a/scripts/megatron_merge.py
+++ b/scripts/megatron_merge.py
@@ -16,7 +16,6 @@
 # limitations under the License.

 import os
-from typing import Optional

 import fire
 import torch
@@ -34,7 +33,7 @@ def convert_mca_to_hf(
    output_path: str = "./output",
    bf16: bool = False,
    fp16: bool = False,
-    convert_model_max_length: Optional[int] = None,
+    convert_model_max_length: int | None = None,
 ):
    """Convert megatron checkpoint to HuggingFace format.

@@ -67,11 +66,11 @@ def convert(
    output_path: str = "./output",
    bf16: bool = False,
    fp16: bool = False,
-    convert_model_max_length: Optional[int] = None,
+    convert_model_max_length: int | None = None,
    tensor_model_parallel_size: int = 1,
    pipeline_model_parallel_size: int = 1,
    expert_model_parallel_size: int = 1,
-    virtual_pipeline_model_parallel_size: Optional[int] = None,
+    virtual_pipeline_model_parallel_size: int | None = None,
 ):
    """Convert checkpoint between MCA and HuggingFace formats.

--- a/scripts/stat_utils/cal_ppl.py
+++ b/scripts/stat_utils/cal_ppl.py
@@ -14,7 +14,7 @@

 import json
 from dataclasses import dataclass
-from typing import Any, Literal, Optional
+from typing import Any, Literal

 import fire
 import torch
@@ -61,7 +61,7 @@ def calculate_ppl(
    dataset_dir: str = "data",
    template: str = "default",
    cutoff_len: int = 2048,
-    max_samples: Optional[int] = None,
+    max_samples: int | None = None,
    train_on_prompt: bool = False,
 ):
    r"""Calculate the ppl on the dataset of the pre-trained models.
--- a/scripts/vllm_infer.py
+++ b/scripts/vllm_infer.py
@@ -14,10 +14,12 @@

 import gc
 import json
-from typing import Optional
+import time

 import av
 import fire
+from datasets import load_dataset
+from eval_bleu_rouge import compute_metrics
 from tqdm import tqdm
 from transformers import Seq2SeqTrainingArguments

@@ -49,18 +51,19 @@ def vllm_infer(
    dataset_dir: str = "data",
    template: str = "default",
    cutoff_len: int = 2048,
-    max_samples: Optional[int] = None,
+    max_samples: int | None = None,
    vllm_config: str = "{}",
    save_name: str = "generated_predictions.jsonl",
+    matrix_save_name: str = None,
    temperature: float = 0.95,
    top_p: float = 0.7,
    top_k: int = 50,
    max_new_tokens: int = 1024,
    repetition_penalty: float = 1.0,
    skip_special_tokens: bool = True,
-    default_system: Optional[str] = None,
+    default_system: str | None = None,
    enable_thinking: bool = True,
-    seed: Optional[int] = None,
+    seed: int | None = None,
    pipeline_parallel_size: int = 1,
    image_max_pixels: int = 768 * 768,
    image_min_pixels: int = 32 * 32,
@@ -118,6 +121,7 @@ def vllm_infer(
    if isinstance(model_args.vllm_config, dict):
        engine_args.update(model_args.vllm_config)

+    model_preparation_start_time = time.time()
    llm = LLM(**engine_args)

    # load datasets
@@ -143,6 +147,7 @@ def vllm_infer(
    all_prompts, all_preds, all_labels = [], [], []
    need_video_kwargs = _need_video_kwargs(template)

+    model_predict_start_time = time.time()
    # Add batch process to avoid the issue of too many files opened
    for i in tqdm(range(0, len(train_dataset), batch_size), desc="Processing batched inference"):
        vllm_inputs, prompts, labels = [], [], []
@@ -219,6 +224,7 @@ def vllm_infer(
        all_labels.extend(labels)
        gc.collect()

+    model_predict_end_time = time.time()
    # Write all results at once outside the loop
    with open(save_name, "w", encoding="utf-8") as f:
        for text, pred, label in zip(all_prompts, all_preds, all_labels):
@@ -228,6 +234,49 @@ def vllm_infer(
    print(f"{len(all_prompts)} total generated results have been saved at {save_name}.")
    print("*" * 70)

+    # Write all matrix results when matrix_save_name is not None,
+    # The result matrix is referencing src.llamafactory.train.sft.workflow.run_sft # 127~132
+    # trainer.save_metrics("predict", predict_results.metrics)
+    #
+    #   {
+    #        "predict_bleu-4": 4.349975,
+    #        "predict_model_preparation_time": 0.0128,
+    #        "predict_rouge-1": 21.873359375,
+    #        "predict_rouge-2": 4.144340625,
+    #        "predict_rouge-l": 10.83949375,
+    #        "predict_runtime": 131.664,
+    #        "predict_samples_per_second": 0.076,
+    #        "predict_steps_per_second": 0.008
+    #    }
+    #
+    if matrix_save_name is not None:
+        predict_time = model_predict_end_time - model_predict_start_time
+        preparation_time = model_predict_start_time - model_preparation_start_time
+
+        start_time = time.time()
+        dataset = load_dataset("json", data_files=save_name, split="train")
+        dataset = dataset.map(compute_metrics, num_proc=8, remove_columns=dataset.column_names)
+        score_dict = dataset.to_dict()
+
+        average_score = {}
+        for task, scores in sorted(score_dict.items(), key=lambda x: x[0]):
+            score = sum(scores) / len(scores) if scores else 0.0
+            print(f"predict_{task}: {score:.4f}")
+            average_score["predict_" + task] = score
+
+        average_score["predict_model_preparation_time"] = preparation_time
+        average_score["predict_runtime"] = predict_time
+        num_steps = len(range(0, len(train_dataset), batch_size))
+        average_score["predict_samples_per_second"] = len(dataset) / predict_time if predict_time > 0 else 0.0
+        average_score["predict_steps_per_second"] = num_steps / predict_time if predict_time > 0 else 0.0
+
+        with open(matrix_save_name, "w", encoding="utf-8") as f:
+            json.dump(average_score, f, indent=4)
+
+        print("*" * 70)
+        print(f"\nDone in {time.time() - start_time:.3f}s.\nScore file saved to {matrix_save_name}.")
+        print("*" * 70)
+

 if __name__ == "__main__":
    fire.Fire(vllm_infer)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Yaowei Zheng	95ac3f2373	[release] Bye 2025 (#9702 )	2025-12-31 22:22:40 +08:00
Username_Full	000526908a	[core deps] upgrade TRL to be between 0.18 and 0.24 (#9617 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-31 20:54:27 +08:00
fivehaitao	c8d7e85b3e	[fix] Fix prediction metrics in scripts/vllm_infer.py to match Transformers (#9701 ) Co-authored-by: xuht6 <xuht6@asiainfo.com>	2025-12-31 18:30:00 +08:00
浮梦	16735b9e35	[v1] Refactor kernel plugin (#9669 ) Co-authored-by: frozenleaves <frozen@Mac.local>	2025-12-31 18:26:48 +08:00
Weize Liu	4e1d69579a	[data] add DLR-Web dataset for supervised fine-tuning (#9696 )	2025-12-30 20:50:38 +08:00
浮梦	1857fbdd6b	[ci] add cuda workflow (#9682 ) Co-authored-by: frozenleaves <frozen@Mac.local> Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-29 20:03:00 +08:00
Kingsley	bb1ba31005	[misc] lint mca code (#9692 )	2025-12-29 11:44:38 +08:00
Copilot	e97d0474fb	[ci] Fix NPU device condition in docker workflow (#9688 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>	2025-12-28 20:04:59 +08:00
Yaowei Zheng	3f0c3dc84d	[assets] fix installation (#9687 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-28 19:29:28 +08:00
Hertz	c107cc22d0	[model] support MiniMax-M1&M2 series (#9680 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-28 19:02:05 +08:00
Yaowei Zheng	7ef1fba34a	[version] fix gradio (#9685 )	2025-12-28 05:00:51 +08:00
Copilot	eceec8ab69	[deps] goodbye python 3.9 (#9677 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com> Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>	2025-12-27 02:50:44 +08:00
Yaowei Zheng	b44f651e09	[ci] fix docker (#9678 )	2025-12-27 02:43:46 +08:00
Yaowei Zheng	55590f5ece	[misc] fix ci with uv (#9676 )	2025-12-27 01:39:13 +08:00
Copilot	a1b1931b4a	[breaking] migrate from setuptools to uv (#9673 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>	2025-12-26 22:47:23 +08:00
Xunpeng Xiao	3c17f2722c	[model] Update ernie_vl to adapt new version (#9665 )	2025-12-26 19:57:49 +08:00
Copilot	a882e2d5fc	[assets] Add GitHub Copilot instructions for repository (#9675 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>	2025-12-26 17:32:48 +08:00
Yaowei Zheng	a754604c11	[misc] fix accelerator (#9661 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-25 02:11:04 +08:00
Xunpeng Xiao	6a2eafbae3	[feat] Models trained and inferred with Mxfp4 are dequantized by default (#9652 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-24 00:26:40 +08:00
Yaowei Zheng	84485406b7	[ci] disable pip cache for ci (#9654 )	2025-12-23 18:37:40 +08:00
Kingsley	1c8a42d2f8	[v1&WIP] dataloader init (#9645 )	2025-12-23 16:29:47 +08:00
thulyubh22	7901b2f32e	[model] efficient tuning for gpt-oss (#9354 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-23 16:28:38 +08:00
Yaowei Zheng	1f1f5a7d1b	[ci] remove docker cache (#9640 )	2025-12-22 01:03:10 +08:00
Yaowei Zheng	6ef9854713	[misc] fix cache & pin transformers to 4.57.1 (#9638 )	2025-12-22 00:20:55 +08:00
Hertz	4923f52a28	[model] support MiMo-V2-Flash model (#9637 )	2025-12-21 14:38:18 +08:00
Yaowei Zheng	0894b4f37e	[misc] lint (#9636 )	2025-12-20 16:19:39 +08:00
ZIYI ZENG	b0d49e137f	[misc] Support split eval_dataset when explict set "predict_with_generate" (#9604 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-20 01:46:00 +08:00
Xunpeng Xiao	ddd7dcc722	[data] Fix the video frame sampling issue #9620 (#9634 )	2025-12-19 18:36:31 +08:00
浮梦	5204cd2bca	[misc] add version check for moe (#9633 )	2025-12-19 14:57:37 +08:00
Xunpeng Xiao	8c74dca76a	[feat] Models trained and inferred with FP8 are dequantized by default (#9627 )	2025-12-18 22:54:35 +08:00
xvxuopop	e8deda53a1	[example] add Qwen3 series examples (#9624 ) Co-authored-by: UsernameFull <tohowtodoit@gmail.com>	2025-12-18 21:27:00 +08:00
mrhaoxx	a769fb94b9	[feat] support ktransformers for dpo (#9621 ) Co-authored-by: poryfly <porykid@gmail.com>	2025-12-18 21:26:25 +08:00
mrhaoxx	964569751f	[kt] refactor ktransformers integration (#9632 )	2025-12-18 21:26:04 +08:00
Hertz	9fd4b094d4	[model] support VibeThinker models (#9616 )	2025-12-16 21:50:46 +08:00
浮梦	18c21bce5a	[test] add allreduce test on npu (#9619 ) Co-authored-by: frozenleaves <frozen@Mac.local>	2025-12-16 21:33:30 +08:00
sunyi0505	a0179772ab	[example] add deepspeed autotp config and example (#9602 )	2025-12-15 15:15:26 +08:00
Yaowei Zheng	aeda079014	[v1] model loader (#9613 )	2025-12-14 11:50:52 +08:00
Xunpeng Xiao	fdd24276ed	[feat] support new function call value (#9610 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-14 00:20:33 +08:00
Yaowei Zheng	110d21713e	[v1] add dp & mp mesh (#9611 )	2025-12-13 01:44:28 +08:00
Yaowei Zheng	203069e11c	[v1] add accelerator (#9607 )	2025-12-12 19:22:06 +08:00
tangefly	4fd94141a4	[model] Add Ministral3 (#9582 ) Co-authored-by: kingsley <kingsleydodonow@gmail.com>	2025-12-10 15:57:24 +08:00
Kingsley	22d6ac29d5	[model] Rename GLMV template (#9595 )	2025-12-10 13:27:47 +08:00
DoubleWheat	cff4483392	[config] Fix RoPE scaling patch for resuming from a scaled model (#9588 )	2025-12-09 20:37:37 +08:00
Yaowei Zheng	5d56817e2b	[misc] lint (#9593 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-09 18:00:35 +08:00
Yaowei Zheng	1bbb461f76	[assets] update readme (#9587 )	2025-12-09 12:22:54 +08:00
Hertz	c1f5f8fff6	[model] support GLM4.6v (#9586 )	2025-12-09 11:06:42 +08:00
Yaowei Zheng	5744f1ea94	[v1] add models & accelerator (#9579 )	2025-12-08 02:30:25 +08:00
tangefly	739954910a	[deps] Update for Transformers v5 (#9569 )	2025-12-08 01:13:32 +08:00
xvxuopop	109162dc56	[fix] fix the issue when using fsdp2 with gradient checkpointing. (#9541 ) Co-authored-by: jin-yongxu <jinyongxu@h-partners.com>	2025-12-06 16:04:51 +08:00
jiaqiw09	165f3f073a	[examples] add fsdp config for mutiple nodes (#9575 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-12-05 23:22:48 +08:00
jiaqiw09	efb13b7483	[V1] Refactor ascend MoE kernel patch logic & Support Qwen3-MoE (#9557 )	2025-12-02 00:22:03 +08:00
Username_Full	e43a972b25	[test] add npu test yaml and add ascend a3 docker file (#9547 ) Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>	2025-11-30 09:37:08 +08:00
Kingsley	22be45c78c	[misc] fix omni thinker load (#9552 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-30 09:36:36 +08:00
浮梦	d1f585f80a	[test] update test cmd (#9544 ) Co-authored-by: frozenleaves <frozen@Mac.local> Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-11-27 17:59:42 +08:00
xvxuopop	955396e8a5	[example] correct the parameter errors in the examples file. (#9543 )	2025-11-27 17:38:38 +08:00
xvxuopop	231756a5bf	[chat] fix the error when the vLLM version is greater than 0.10.0 (#9539 ) Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>	2025-11-27 02:14:53 +08:00
xvxuopop	2c4fb3c97e	[v1] Support fused moe kernel for qwen3vlmoe model. (#9532 )	2025-11-27 02:13:33 +08:00
浮梦	2b6f16f261	[model] temporarily support npu fused options on v0, powered by v1 kernels (#9520 ) Co-authored-by: frozenleaves <frozen@Mac.local>	2025-11-27 02:08:36 +08:00
浮梦	f17efde693	[v1] support automatic discovery of registered kernels. (#9509 ) Co-authored-by: frozenleaves <frozen@Mac.local>	2025-11-27 01:47:22 +08:00
Hertz	591fc9ed02	[model] support ERNIE-4.5-VL Models (#9521 )	2025-11-24 16:48:06 +08:00
Peilin Li	3140c242f0	[assets] add README with KT+llamafactory (#9514 )	2025-11-19 16:50:45 +08:00
Peilin Li	887c562d60	[example] Add KTransformers Qwen3MoE example (#9511 ) Co-authored-by: unknown <xiongchenhui@hisense.ad> Co-authored-by: Kingsley <kingsleydodonow@gmail.com>	2025-11-19 00:53:28 +08:00
Edge-Seven	9779b1f361	[misc] fix typos in some files (#9505 ) Co-authored-by: khanhkhanhlele <namkhanh20xx@gmail.com>	2025-11-18 20:36:01 +08:00
Yinlei Sun	45f0437a14	[v1] Add support for ShareGPT format. (#9486 )	2025-11-18 13:44:08 +08:00
浮梦	d4e120423d	[data] fix qwen3omni moe model (#9501 ) Co-authored-by: frozenleaves <frozen@Mac.local>	2025-11-18 13:43:22 +08:00