3.8 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Commands
# Code style (auto-fix)
make style
# Code quality check (no modifications)
make quality
# Run all tests
make test
# Run a single test file
WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/path/to/test_file.py
# Run tests matching a pattern
WANDB_DISABLED=true pytest -vv --import-mode=importlib tests/ -k "test_name"
# License header check
make license
# Build package
make build
The project uses uv as the preferred package manager. Commands automatically use uv run / uvx if uv is available.
Architecture
LlamaFactory has two parallel architectures controlled by the USE_V1 environment variable:
- v0 (default):
api, webui > chat, eval, train > data, model > hparams > extras - v1 (experimental,
USE_V1=1):trainers > core > accelerator, plugins, config > utils
Most active development happens in v0. The v1 architecture lives in src/llamafactory/v1/.
Entry Points
CLI entry point is llamafactory-cli / lmf → src/llamafactory/cli.py:main(), which dispatches to launcher.py based on USE_V1.
Available subcommands: train, chat, api, export, webchat, webui, env, version, help.
Training Flow (v0)
run_exp() [tuner.py]
→ read_args() → parse YAML/JSON config
→ get_train_args() → produces typed argument dataclasses
→ routes to: run_sft / run_dpo / run_ppo / run_rm / run_pt / run_kto
→ optional: export_model()
Training is invoked with a YAML config: llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
Configuration System
All training parameters are YAML/JSON config files. Argument parsing in src/llamafactory/hparams/parser.py produces four typed dataclasses:
ModelArguments— model/tokenizer selection, quantizationDataArguments— datasets, templates, preprocessingFinetuningArguments— LoRA rank/target, training method (sft/dpo/ppo/rm/pt/kto)TrainingArguments— extends HuggingFace'sTrainingArguments
Key Modules
| Module | Purpose |
|---|---|
src/llamafactory/model/loader.py |
Loads model + tokenizer; applies quantization, LoRA, patches |
src/llamafactory/model/patcher.py |
Model-specific compatibility patches |
src/llamafactory/data/template.py |
Prompt templates; TEMPLATES dict maps model family → format |
src/llamafactory/data/mm_plugin.py |
Multi-modal (image/video/audio) data handling |
src/llamafactory/data/processor/ |
Per-stage data processors (supervised, pairwise, pretrain, etc.) |
src/llamafactory/train/sft/ |
SFT trainer; other stages follow same structure |
src/llamafactory/chat/ |
Inference engines: hf_engine, vllm_engine, sglang_engine, kt_engine |
src/llamafactory/extras/constants.py |
Enums and constants used across the project |
Adding Support for a New Model
- Add a prompt template to
src/llamafactory/data/template.pyin theTEMPLATESdict - Add any necessary model patches in
src/llamafactory/model/patcher.py - Add multi-modal support in
src/llamafactory/data/mm_plugin.pyif needed
Distributed Training
Multi-GPU automatically uses torchrun. Additional backends:
- Ray: Optional Ray cluster support
- HyperParallel FSDP2:
src/llamafactory/train/hyper_parallel/ - Megatron-core:
src/llamafactory/train/mca/
Testing
tests/— v0 tests;tests_v1/— v1 tests- Most training tests require GPU hardware
- pytest markers:
@pytest.mark.slow,@pytest.mark.runs_on(['cuda']) - Always set
WANDB_DISABLED=truewhen running tests
Code Style
- Ruff for linting and formatting (line length 119, Google-style docstrings)
- Python 3.11+ syntax
- Double quotes for strings
- All new files must include Apache 2.0 license header (checked by
make license)