[release] Bye 2025 (#9702)

This commit is contained in:
Yaowei Zheng
2025-12-31 22:22:40 +08:00
committed by GitHub
parent 000526908a
commit 95ac3f2373
59 changed files with 154 additions and 401 deletions

View File

@@ -27,7 +27,7 @@ jobs:
python:
- "3.11"
- "3.12"
# - "3.13" # enable after trl is upgraded
- "3.13"
os:
- "ubuntu-latest"
- "windows-latest"

View File

@@ -639,7 +639,7 @@ cd transformers
pip install .
```
3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml).
3. Set `double_quantization: false` in the configuration. You can refer to the [example](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml).
</details>
@@ -654,12 +654,12 @@ You can also use **[Easy Dataset](https://github.com/ConardLi/easy-dataset)**, *
### Quickstart
Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Llama3-8B-Instruct model, respectively.
Use the following 3 commands to run LoRA **fine-tuning**, **inference** and **merging** of the Qwen3-4B-Instruct model, respectively.
```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
```
See [examples/README.md](examples/README.md) for advanced usage (including distributed training).
@@ -782,7 +782,7 @@ When building the Docker image, use `-v ./hf_cache:/root/.cache/huggingface` arg
### Deploy with OpenAI-style API and vLLM
```bash
API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true
API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
```
> [!TIP]

View File

@@ -641,7 +641,7 @@ cd transformers
pip install .
```
3. 在训练参数中设置 `double_quantization: false`,可参考[示例](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml)。
3. 在训练参数中设置 `double_quantization: false`,可参考[示例](examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml)。
</details>
@@ -656,12 +656,12 @@ pip install .
### 快速开始
下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。
下面三行命令分别对 Qwen3-4B-Instruct 模型进行 LoRA **微调**、**推理**和**合并**。
```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
```
高级用法请参考 [examples/README_zh.md](examples/README_zh.md)(包括多 GPU 微调)。
@@ -787,7 +787,7 @@ docker exec -it llamafactory bash
### 利用 vLLM 部署 OpenAI API
```bash
API_PORT=8000 llamafactory-cli api examples/inference/llama3.yaml infer_backend=vllm vllm_enforce_eager=true
API_PORT=8000 llamafactory-cli api examples/inference/qwen3.yaml infer_backend=vllm vllm_enforce_eager=true
```
> [!TIP]

View File

@@ -18,19 +18,19 @@ By default, LLaMA-Factory uses all visible computing devices.
Basic usage:
```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
```
Advanced usage:
```bash
CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
learning_rate=1e-5 \
logging_steps=1
```
```bash
bash examples/train_lora/llama3_lora_sft.sh
bash examples/train_lora/qwen3_lora_sft.sh
```
## Examples
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
#### (Continuous) Pre-Training
```bash
llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
```
#### Supervised Fine-Tuning
```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
```
#### Multimodal Supervised Fine-Tuning
```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
```
#### DPO/ORPO/SimPO Training
```bash
llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
```
#### Multimodal DPO/ORPO/SimPO Training
```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml
llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
```
#### Reward Modeling
```bash
llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
```
#### PPO Training
```bash
llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
```
#### KTO Training
```bash
llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
```
#### Preprocess Dataset
@@ -90,32 +84,26 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
It is useful for large dataset, use `tokenized_path` in config to load the preprocessed dataset.
```bash
llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
```
#### Evaluating on MMLU/CMMLU/C-Eval Benchmarks
```bash
llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
```
#### Supervised Fine-Tuning on Multiple Nodes
```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
```
#### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)
```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
```
#### Supervised Fine-Tuning with Ray on 4 GPUs
```bash
USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
```
### QLoRA Fine-Tuning
@@ -123,13 +111,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
#### Supervised Fine-Tuning with 4/8-bit Bitsandbytes/HQQ/EETQ Quantization (Recommended)
```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
```
#### Supervised Fine-Tuning with 4-bit Bitsandbytes Quantization on Ascend NPU
```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
```
#### Supervised Fine-Tuning with 4/8-bit GPTQ Quantization
@@ -155,14 +143,14 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
#### Supervised Fine-Tuning on Single Node
```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
```
#### Supervised Fine-Tuning on Multiple Nodes
```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
```
### Elastic and Fault-Tolerant Supervised Fine-Tuning on Multiple Nodes
@@ -170,13 +158,13 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
To launch an elastic job with `MAX_RESTARTS` failures retries, run the following on at least `MIN_NNODES` nodes and at most `MAX_NNODES` nodes. `RDZV_ID` should be set as a unique job id (shared by all nodes participating in the job). See also [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html).
```bash
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
```
#### Multimodal Supervised Fine-Tuning
```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
```
### Merging LoRA Adapters and Quantization
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
Note: DO NOT use quantized model or `quantization_bit` when merging LoRA adapters.
```bash
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
```
#### Quantizing Model using AutoGPTQ
```bash
llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
```
### Save Ollama modelfile
```bash
llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
```
### Inferring LoRA Fine-Tuned Models
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
#### Evaluation using vLLM's Multi-GPU Inference
```
python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo
python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
python scripts/eval_bleu_rouge.py generated_predictions.jsonl
```
#### Use CLI ChatBox
```bash
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
```
#### Use Web UI ChatBox
```bash
llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
```
#### Launch OpenAI-style API
```bash
llamafactory-cli api examples/inference/llama3_lora_sft.yaml
llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
```
### Extras

View File

@@ -18,19 +18,19 @@ LLaMA-Factory 默认使用所有可见的计算设备。
基础用法:
```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
```
高级用法:
```bash
CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml \
CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml \
learning_rate=1e-5 \
logging_steps=1
```
```bash
bash examples/train_lora/llama3_lora_sft.sh
bash examples/train_lora/qwen3_lora_sft.sh
```
## 示例
@@ -40,49 +40,43 @@ bash examples/train_lora/llama3_lora_sft.sh
#### (增量)预训练
```bash
llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_pretrain.yaml
```
#### 指令监督微调
```bash
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
```
#### 多模态指令监督微调
```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_sft.yaml
llamafactory-cli train examples/train_lora/qwen3vl_lora_sft.yaml
```
#### DPO/ORPO/SimPO 训练
```bash
llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_dpo.yaml
```
#### 多模态 DPO/ORPO/SimPO 训练
```bash
llamafactory-cli train examples/train_lora/qwen2_5vl_lora_dpo.yaml
llamafactory-cli train examples/train_lora/qwen3vl_lora_dpo.yaml
```
#### 奖励模型训练
```bash
llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
```
#### PPO 训练
```bash
llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_reward.yaml
```
#### KTO 训练
```bash
llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
llamafactory-cli train examples/train_lora/qwen3_lora_kto.yaml
```
#### 预处理数据集
@@ -90,20 +84,14 @@ llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
对于大数据集有帮助,在配置中使用 `tokenized_path` 以加载预处理后的数据集。
```bash
llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
```
#### 在 MMLU/CMMLU/C-Eval 上评估
```bash
llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
llamafactory-cli train examples/train_lora/qwen3_preprocess.yaml
```
#### 多机指令监督微调
```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen3_lora_sft.yaml
```
### 支持弹性和容错的多机指令监督微调
@@ -111,19 +99,19 @@ FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500
要启动一个支持弹性节点和容错的多机指令微调,在每个节点上执行以下命令。弹性节点数量范围为 `MIN_NNODES:MAX_NNODES`,每个节点最多允许因为错误重启 `MAX_RESTARTS` 次。`RDZV_ID` 应设置为一个唯一的作业 ID由参与该作业的所有节点共享。更多新可以参考官方文档 [torchrun](https://docs.pytorch.org/docs/stable/elastic/run.html)。
```bash
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 MIN_NNODES=1 MAX_NNODES=3 MAX_RESTARTS=3 RDZV_ID=llamafactory MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
```
#### 使用 DeepSpeed ZeRO-3 平均分配显存
```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ds3.yaml
```
#### 使用 Ray 在 4 张 GPU 上微调
```bash
USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
USE_RAY=1 llamafactory-cli train examples/train_lora/qwen3_lora_sft_ray.yaml
```
### QLoRA 微调
@@ -131,13 +119,13 @@ USE_RAY=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ray.yaml
#### 基于 4/8 比特 Bitsandbytes/HQQ/EETQ 量化进行指令监督微调(推荐)
```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
llamafactory-cli train examples/train_qlora/qwen3_lora_sft_otfq.yaml
```
#### 在 NPU 上基于 4 比特 Bitsandbytes 量化进行指令监督微调
```bash
llamafactory-cli train examples/train_qlora/llama3_lora_sft_bnb_npu.yaml
llamafactory-cli train examples/train_qlora/qwen3_lora_sft_bnb_npu.yaml
```
#### 基于 4/8 比特 GPTQ 量化进行指令监督微调
@@ -163,20 +151,20 @@ llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
#### 在单机上进行指令监督微调
```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
```
#### 在多机上进行指令监督微调
```bash
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
FORCE_TORCHRUN=1 NNODES=2 NODE_RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/qwen3_full_sft.yaml
```
#### 多模态指令监督微调
```bash
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml
FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen3vl_full_sft.yaml
```
### 合并 LoRA 适配器与模型量化
@@ -186,19 +174,19 @@ FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.y
注:请勿使用量化后的模型或 `quantization_bit` 参数来合并 LoRA 适配器。
```bash
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_lora_sft.yaml
```
#### 使用 AutoGPTQ 量化模型
```bash
llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
llamafactory-cli export examples/merge_lora/qwen3_gptq.yaml
```
### 保存 Ollama 配置文件
```bash
llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
llamafactory-cli export examples/merge_lora/qwen3_full_sft.yaml
```
### 推理 LoRA 模型
@@ -206,26 +194,26 @@ llamafactory-cli export examples/merge_lora/llama3_full_sft.yaml
#### 使用 vLLM 多卡推理评估
```
python scripts/vllm_infer.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --template llama3 --dataset alpaca_en_demo
python scripts/vllm_infer.py --model_name_or_path Qwen/Qwen3-4B-Instruct-2507 --template qwen3_nothink --dataset alpaca_en_demo
python scripts/eval_bleu_rouge.py generated_predictions.jsonl
```
#### 使用命令行对话框
```bash
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/qwen3_lora_sft.yaml
```
#### 使用浏览器对话框
```bash
llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
llamafactory-cli webchat examples/inference/qwen3_lora_sft.yaml
```
#### 启动 OpenAI 风格 API
```bash
llamafactory-cli api examples/inference/llama3_lora_sft.yaml
llamafactory-cli api examples/inference/qwen3_lora_sft.yaml
```
### 杂项

View File

@@ -1,5 +0,0 @@
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true

View File

@@ -1,4 +1,4 @@
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
template: qwen2_vl
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
template: qwen3_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true

View File

@@ -1,4 +1,4 @@
model_name_or_path: saves/llama3-8b/full/sft
template: llama3
model_name_or_path: saves/qwen3-4b/full/sft
template: qwen3_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true

View File

@@ -0,0 +1,5 @@
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
adapter_name_or_path: saves/qwen3-4b/lora/sft
template: qwen3_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true

View File

@@ -1,4 +1,4 @@
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3
model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
template: qwen3_vl_nothink
infer_backend: huggingface # choices: [huggingface, vllm, sglang, ktransformers]
trust_remote_code: true

View File

@@ -1,10 +1,10 @@
### model
model_name_or_path: saves/llama3-8b/full/sft
template: llama3
model_name_or_path: saves/qwen3-4b/full/sft
template: qwen3_nothink
trust_remote_code: true
### export
export_dir: output/llama3_full_sft
export_dir: saves/qwen3_sft_merged
export_size: 5
export_device: cpu # choices: [cpu, auto]
export_legacy_format: false

View File

@@ -1,10 +1,10 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
template: qwen3_nothink
trust_remote_code: true
### export
export_dir: output/llama3_gptq
export_dir: saves/qwen3_gptq
export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.jsonl
export_size: 5

View File

@@ -1,13 +1,13 @@
### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
adapter_name_or_path: saves/qwen2_5vl-7b/lora/sft
template: qwen2_vl
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
adapter_name_or_path: saves/qwen3-4b/lora/sft
template: qwen3_nothink
trust_remote_code: true
### export
export_dir: output/qwen2_5vl_lora_sft
export_dir: saves/qwen3_sft_merged
export_size: 5
export_device: cpu # choices: [cpu, auto]
export_legacy_format: false

View File

@@ -1,13 +1,13 @@
### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
adapter_name_or_path: saves/qwen3-vl-4b/lora/sft
template: qwen3_vl_nothink
trust_remote_code: true
### export
export_dir: output/llama3_lora_sft
export_dir: saves/qwen3_vl_sft_merged
export_size: 5
export_device: cpu # choices: [cpu, auto]
export_legacy_format: false

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -10,15 +10,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json,
### dataset
dataset: identity,alpaca_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/full/sft
output_dir: saves/qwen3-4b/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,46 +0,0 @@
### model
model_name_or_path: Qwen/Qwen3-32B
trust_remote_code: true
use_v1_kernels: true
### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z2_autotp_config.json
### dataset
dataset: identity,alpaca_en_demo
template: qwen3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/qwen3-32b/full/sft_autotp
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true
@@ -15,15 +15,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: mllm_demo,identity,alpaca_en_demo
template: qwen2_vl
template: qwen3_vl_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/qwen2_5vl-7b/full/sft
output_dir: saves/qwen3-vl-4b/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,19 +0,0 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
trust_remote_code: true
### method
finetuning_type: lora
### dataset
task: mmlu_test # choices: [mmlu_test, ceval_validation, cmmlu_test]
template: fewshot
lang: en
n_shot: 5
### output
save_dir: saves/llama3-8b/lora/eval
### eval
batch_size: 4

View File

@@ -1,43 +0,0 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
reward_model: saves/llama3-8b/lora/reward
trust_remote_code: true
### method
stage: ppo
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/ppo
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### generate
max_new_tokens: 512
top_k: 0
top_p: 0.9

View File

@@ -1,46 +0,0 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

View File

@@ -1,49 +0,0 @@
# pip install git+https://github.com/hiyouga/transformers.git@llama4_train
### model
model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
trust_remote_code: true
### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]
### dataset
dataset: mllm_demo,identity,alpaca_en_demo
template: llama4
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama4-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -13,15 +13,14 @@ pref_loss: sigmoid # choices: [sigmoid (dpo), orpo, simpo]
### dataset
dataset: dpo_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/dpo
output_dir: saves/qwen3-4b/lora/dpo
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -12,15 +12,14 @@ pref_beta: 0.1
### dataset
dataset: kto_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/kto
output_dir: saves/qwen3-4b/lora/kto
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -13,12 +13,11 @@ lora_target: all
dataset: c4_demo
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/pretrain
output_dir: saves/qwen3-4b/lora/pretrain
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -11,15 +11,14 @@ lora_target: all
### dataset
dataset: dpo_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/reward
output_dir: saves/qwen3-4b/lora/reward
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -2,7 +2,7 @@
set -x
MODEL_PATH=meta-llama/Meta-Llama-3-8B-Instruct
MODEL_PATH=Qwen/Qwen3-4B-Instruct-2507
llamafactory-cli train \
--model_name_or_path ${MODEL_PATH} \
@@ -13,13 +13,12 @@ llamafactory-cli train \
--lora_rank 8 \
--lora_target all \
--dataset identity,alpaca_en_demo \
--template llama3 \
--template qwen3_nothink \
--cutoff_len 2048 \
--max_samples 1000 \
--overwrite_cache \
--preprocessing_num_workers 16 \
--dataloader_num_workers 4 \
--output_dir saves/llama3-8b/lora/sft \
--output_dir saves/qwen3-4b/lora/sft \
--logging_steps 10 \
--save_steps 500 \
--plot_loss \

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: openai/gpt-oss-20b
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -11,15 +11,14 @@ lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: gpt
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/gpt-20b/lora/sft
output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -12,15 +12,14 @@ deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json,
### dataset
dataset: identity,alpaca_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/sft
output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct # or use local absolute path
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507 # or use local absolute path
trust_remote_code: true
### method
@@ -12,10 +12,9 @@ lora_target: all
### dataset
dataset: identity,alpaca_en_demo
dataset_dir: REMOTE:llamafactory/demo_data # or use local absolute path
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
@@ -29,7 +28,7 @@ save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### ray
ray_run_name: llama3_8b_sft_lora
ray_run_name: qwen3_4b_sft_lora
ray_storage_path: ./saves
ray_num_workers: 4 # Number of GPUs to use.
placement_strategy: PACK

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
trust_remote_code: true
### method
@@ -11,13 +11,12 @@ lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
tokenized_path: saves/llama3-8b/dataset/sft
tokenized_path: saves/qwen3-4b/dataset/sft
### output
output_dir: saves/llama3-8b/lora/sft
### output (not used)
output_dir: saves/qwen3-4b/lora/sft
overwrite_output_dir: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true
@@ -15,15 +15,14 @@ pref_loss: sigmoid # choices: [sigmoid (dpo), orpo, simpo]
### dataset
dataset: rlhf_v
template: qwen2_vl
template: qwen3_vl_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/qwen2_5vl-7b/lora/dpo
output_dir: saves/qwen3-vl-4b/lora/dpo
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: Qwen/Qwen2.5-VL-7B-Instruct
model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all
### dataset
dataset: mllm_demo,identity,alpaca_en_demo # video: mllm_video_demo
template: qwen2_vl
template: qwen3_vl_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/qwen2_5vl-7b/lora/sft
output_dir: saves/qwen3-vl-4b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

View File

@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

View File

@@ -14,7 +14,6 @@ dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
quantization_bit: 4
quantization_method: bnb
double_quantization: false
@@ -14,15 +14,14 @@ lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/sft
output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -1,5 +1,5 @@
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
model_name_or_path: Qwen/Qwen3-4B-Instruct-2507
quantization_bit: 4 # choices: [8 (bnb/hqq/eetq), 4 (bnb/hqq), 3 (hqq), 2 (hqq)]
quantization_method: bnb # choices: [bnb, hqq, eetq]
trust_remote_code: true
@@ -13,15 +13,14 @@ lora_target: all
### dataset
dataset: identity,alpaca_en_demo
template: llama3
template: qwen3_nothink
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: saves/llama3-8b/lora/sft
output_dir: saves/qwen3-4b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true

View File

@@ -41,8 +41,7 @@ dependencies = [
"torch>=2.4.0",
"torchvision>=0.19.0",
"torchaudio>=2.4.0",
"transformers>=4.51.0,<=4.56.2,!=4.52.0; python_version < '3.10'",
"transformers>=4.51.0,<=4.57.1,!=4.52.0,!=4.57.0; python_version >= '3.10'",
"transformers>=4.51.0,<=4.57.1,!=4.52.0,!=4.57.0",
"datasets>=2.16.0,<=4.0.0",
"accelerate>=1.3.0,<=1.11.0",
"peft>=0.14.0,<=0.17.1",

View File

@@ -18,9 +18,10 @@ import time
import av
import fire
from datasets import load_dataset
from eval_bleu_rouge import compute_metrics
from tqdm import tqdm
from transformers import Seq2SeqTrainingArguments
from datasets import load_dataset
from llamafactory.data import get_dataset, get_template_and_fix_tokenizer
from llamafactory.extras.constants import IGNORE_INDEX
@@ -29,8 +30,6 @@ from llamafactory.extras.packages import is_vllm_available
from llamafactory.hparams import get_infer_args
from llamafactory.model import load_tokenizer
from eval_bleu_rouge import compute_metrics
if is_vllm_available():
from vllm import LLM, SamplingParams
@@ -235,10 +234,10 @@ def vllm_infer(
print(f"{len(all_prompts)} total generated results have been saved at {save_name}.")
print("*" * 70)
# Write all matrix results when matrix_save_name is not None,
# Write all matrix results when matrix_save_name is not None,
# The result matrix is referencing src.llamafactory.train.sft.workflow.run_sft # 127~132
# trainer.save_metrics("predict", predict_results.metrics)
#
#
# {
# "predict_bleu-4": 4.349975,
# "predict_model_preparation_time": 0.0128,
@@ -265,11 +264,11 @@ def vllm_infer(
print(f"predict_{task}: {score:.4f}")
average_score["predict_" + task] = score
average_score['predict_model_preparation_time'] = preparation_time
average_score['predict_runtime'] = predict_time
average_score["predict_model_preparation_time"] = preparation_time
average_score["predict_runtime"] = predict_time
num_steps = len(range(0, len(train_dataset), batch_size))
average_score['predict_samples_per_second'] = len(dataset) / predict_time if predict_time > 0 else 0.0
average_score['predict_steps_per_second'] = num_steps / predict_time if predict_time > 0 else 0.0
average_score["predict_samples_per_second"] = len(dataset) / predict_time if predict_time > 0 else 0.0
average_score["predict_steps_per_second"] = num_steps / predict_time if predict_time > 0 else 0.0
with open(matrix_save_name, "w", encoding="utf-8") as f:
json.dump(average_score, f, indent=4)
@@ -280,4 +279,4 @@ def vllm_infer(
if __name__ == "__main__":
fire.Fire(vllm_infer)
fire.Fire(vllm_infer)

View File

@@ -19,7 +19,7 @@
from collections import OrderedDict
VERSION = "0.9.4.dev0"
VERSION = "0.9.4"
def print_env() -> None:

View File

@@ -12,8 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import pathlib
import sys
from unittest.mock import patch
from llamafactory.v1.config.arg_parser import get_args