CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

# Code style (auto-fix)
make style

# Code quality check (no modifications)
make quality

# Run all tests
make test

# Run a single test file
WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/path/to/test_file.py

# Run tests matching a pattern
WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/ -k "test_name"

# License header check
make license

# Build package
make build

The project uses uv as the preferred package manager. Commands automatically use uv run / uvx if uv is available.

Architecture

LlamaFactory has two parallel architectures controlled by the USE_V1 environment variable:

v0 (default): api, webui > chat, eval, train > data, model > hparams > extras
v1 (experimental, USE_V1=1): trainers > core > accelerator, plugins, config > utils

Most active development happens in v0. The v1 architecture lives in src/llamafactory/v1/.

Entry Points

CLI entry point is llamafactory-cli / lmf → src/llamafactory/cli.py:main(), which dispatches to launcher.py based on USE_V1.

Available subcommands: train, chat, api, export, webchat, webui, env, version, help.

Training Flow (v0)

run_exp() [tuner.py]
  → read_args() → parse YAML/JSON config
  → get_train_args() → produces typed argument dataclasses
  → routes to: run_sft / run_dpo / run_ppo / run_rm / run_pt / run_kto
  → optional: export_model()

Training is invoked with a YAML config: llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

Configuration System

All training parameters are YAML/JSON config files. Argument parsing in src/llamafactory/hparams/parser.py produces four typed dataclasses:

ModelArguments — model/tokenizer selection, quantization
DataArguments — datasets, templates, preprocessing
FinetuningArguments — LoRA rank/target, training method (sft/dpo/ppo/rm/pt/kto)
TrainingArguments — extends HuggingFace's TrainingArguments

Key Modules

Module	Purpose
`src/llamafactory/model/loader.py`	Loads model + tokenizer; applies quantization, LoRA, patches
`src/llamafactory/model/patcher.py`	Model-specific compatibility patches
`src/llamafactory/data/template.py`	Prompt templates; `TEMPLATES` dict maps model family → format
`src/llamafactory/data/mm_plugin.py`	Multi-modal (image/video/audio) data handling
`src/llamafactory/data/processor/`	Per-stage data processors (supervised, pairwise, pretrain, etc.)
`src/llamafactory/train/sft/`	SFT trainer; other stages follow same structure
`src/llamafactory/chat/`	Inference engines: `hf_engine`, `vllm_engine`, `sglang_engine`, `kt_engine`
`src/llamafactory/extras/constants.py`	Enums and constants used across the project

Adding Support for a New Model

Add a prompt template to src/llamafactory/data/template.py in the TEMPLATES dict
Add any necessary model patches in src/llamafactory/model/patcher.py
Add multi-modal support in src/llamafactory/data/mm_plugin.py if needed

Distributed Training

Multi-GPU automatically uses torchrun. Additional backends:

Ray: Optional Ray cluster support
HyperParallel FSDP2: src/llamafactory/train/hyper_parallel/
Megatron-core: src/llamafactory/train/mca/

Testing

tests/ — v0 tests; tests_v1/ — v1 tests
Most training tests require GPU hardware
pytest markers: @pytest.mark.slow, @pytest.mark.runs_on(['cuda'])
Always set WANDB_DISABLED=true when running tests

Code Style

Ruff for linting and formatting (line length 119, Google-style docstrings)
Python 3.11+ syntax
Double quotes for strings
All new files must include Apache 2.0 license header (checked by make license)

3.8 KiB Raw Blame History