diff --git a/README.md b/README.md index 9d63b967a..f53ac70d5 100644 --- a/README.md +++ b/README.md @@ -92,7 +92,7 @@ Read technical notes: ## Features -- **Various models**: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2-VL, DeepSeek, Yi, Gemma, ChatGLM, Phi, etc. +- **Various models**: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen3, Qwen3-VL, DeepSeek, Gemma, GLM, Phi, etc. - **Integrated methods**: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc. - **Scalable resources**: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ. - **Advanced algorithms**: [GaLore](https://github.com/jiaweizzhao/GaLore), [BAdam](https://github.com/Ledzy/BAdam), [APOLLO](https://github.com/zhuhanqing/APOLLO), [Adam-mini](https://github.com/zyushun/Adam-mini), [Muon](https://github.com/KellerJordan/Muon), [OFT](https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft), DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ and PiSSA. @@ -279,11 +279,10 @@ Read technical notes: | Model | Model size | Template | | ----------------------------------------------------------------- | -------------------------------- | -------------------- | | [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - | -| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere | | [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek | | [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 | | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 | -| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink | +| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie_nothink | | [Falcon/Falcon H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 | | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 | | [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n | @@ -295,7 +294,7 @@ Read technical notes: | [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan | | [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 | | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl | -| [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 | +| [Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 | | [Kimi-VL](https://huggingface.co/moonshotai) | 16B | kimi_vl | | [Ling 2.0 (mini/flash)](https://huggingface.co/inclusionAI) | 16B/100B | bailing_v2 | | [LFM 2.5 (VL)](https://huggingface.co/LiquidAI) | 1.2B/1.6B | lfm2/lfm2_vl | @@ -308,18 +307,17 @@ Read technical notes: | [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next | | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video | | [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 | -| [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 | +| [MiniCPM 4](https://huggingface.co/openbmb) | 0.5B/8B | cpm4 | | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v | | [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models) | 229B/456B | minimax1/minimax2 | | [Ministral 3](https://huggingface.co/mistralai) | 3B/8B/14B | ministral3 | | [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral | -| [OLMo](https://huggingface.co/allenai) | 1B/7B | - | | [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma | | [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi | | [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small | | [Phi-4-mini/Phi-4](https://huggingface.co/microsoft) | 3.8B/14B | phi4_mini/phi4 | | [Pixtral](https://huggingface.co/mistralai) | 12B | pixtral | -| [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen | +| [Qwen2 (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen | | [Qwen3 (MoE/Instruct/Thinking/Next)](https://huggingface.co/Qwen) | 0.6B/1.7B/4B/8B/14B/32B/80B/235B | qwen3/qwen3_nothink | | [Qwen2-Audio](https://huggingface.co/Qwen) | 7B | qwen2_audio | | [Qwen2.5-Omni](https://huggingface.co/Qwen) | 3B/7B | qwen2_omni | @@ -328,9 +326,6 @@ Read technical notes: | [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl | | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder | | [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - | -| [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | qwen3 | -| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi | -| [Youtu-LLM](https://huggingface.co/tencent/) | 2B | youtu | | [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan | > [!NOTE] diff --git a/README_zh.md b/README_zh.md index 751d14a92..52f75de00 100644 --- a/README_zh.md +++ b/README_zh.md @@ -94,7 +94,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc ## 项目特色 -- **多种模型**:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Qwen2-VL、DeepSeek、Yi、Gemma、ChatGLM、Phi 等等。 +- **多种模型**:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen3、Qwen3-VL、DeepSeek、Gemma、GLM、Phi 等等。 - **集成方法**:(增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。 - **多种精度**:16 比特全参数微调、冻结微调、LoRA 微调和基于 AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ 的 2/3/4/5/6/8 比特 QLoRA 微调。 - **先进算法**:[GaLore](https://github.com/jiaweizzhao/GaLore)、[BAdam](https://github.com/Ledzy/BAdam)、[APOLLO](https://github.com/zhuhanqing/APOLLO)、[Adam-mini](https://github.com/zyushun/Adam-mini)、[Muon](https://github.com/KellerJordan/Muon)、[OFT](https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft)、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ 和 PiSSA。 @@ -281,11 +281,10 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc | 模型名 | 参数量 | Template | | ----------------------------------------------------------------- | -------------------------------- | -------------------- | | [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - | -| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere | | [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek | | [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 | | [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 | -| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink | +| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie_nothink | | [Falcon/Falcon H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 | | [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 | | [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n | @@ -297,7 +296,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc | [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan | | [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 | | [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl | -| [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 | +| [Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 | | [Kimi-VL](https://huggingface.co/moonshotai) | 16B | kimi_vl | | [Ling 2.0 (mini/flash)](https://huggingface.co/inclusionAI) | 16B/100B | bailing_v2 | | [LFM 2.5 (VL)](https://huggingface.co/LiquidAI) | 1.2B/1.6B | lfm2/lfm2_vl | @@ -310,18 +309,17 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc | [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next | | [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video | | [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 | -| [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 | +| [MiniCPM 4](https://huggingface.co/openbmb) | 0.5B/8B | cpm4 | | [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v | | [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models) | 229B/456B | minimax1/minimax2 | | [Ministral 3](https://huggingface.co/mistralai) | 3B/8B/14B | ministral3 | | [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral | -| [OLMo](https://huggingface.co/allenai) | 1B/7B | - | | [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma | | [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi | | [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small | | [Phi-4-mini/Phi-4](https://huggingface.co/microsoft) | 3.8B/14B | phi4_mini/phi4 | | [Pixtral](https://huggingface.co/mistralai) | 12B | pixtral | -| [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen | +| [Qwen2 (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen | | [Qwen3 (MoE/Instruct/Thinking/Next)](https://huggingface.co/Qwen) | 0.6B/1.7B/4B/8B/14B/32B/80B/235B | qwen3/qwen3_nothink | | [Qwen2-Audio](https://huggingface.co/Qwen) | 7B | qwen2_audio | | [Qwen2.5-Omni](https://huggingface.co/Qwen) | 3B/7B | qwen2_omni | @@ -330,9 +328,6 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc | [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl | | [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder | | [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - | -| [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | qwen3 | -| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi | -| [Youtu-LLM](https://huggingface.co/tencent/) | 2B | youtu | | [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan | > [!NOTE] diff --git a/pyproject.toml b/pyproject.toml index 0faa5d9e0..146474782 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -30,7 +30,6 @@ classifiers = [ "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", - "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", "Programming Language :: Python :: 3.13", diff --git a/src/llamafactory/data/template.py b/src/llamafactory/data/template.py index 665372ee8..c3a2bc1ba 100644 --- a/src/llamafactory/data/template.py +++ b/src/llamafactory/data/template.py @@ -649,42 +649,6 @@ register_template( ) -register_template( - name="aquila", - format_user=StringFormatter(slots=["Human: {{content}}###Assistant:"]), - format_assistant=StringFormatter(slots=["{{content}}###"]), - format_system=StringFormatter(slots=["System: {{content}}###"]), - default_system=( - "A chat between a curious human and an artificial intelligence assistant. " - "The assistant gives helpful, detailed, and polite answers to the human's questions." - ), - stop_words=[""], -) - - -register_template( - name="atom", - format_user=StringFormatter( - slots=[{"bos_token"}, "Human: {{content}}\n", {"eos_token"}, {"bos_token"}, "Assistant:"] - ), - format_assistant=StringFormatter(slots=["{{content}}\n", {"eos_token"}]), -) - - -register_template( - name="baichuan", - format_user=StringFormatter(slots=[{"token": ""}, "{{content}}", {"token": ""}]), - efficient_eos=True, -) - - -register_template( - name="baichuan2", - format_user=StringFormatter(slots=["{{content}}"]), - efficient_eos=True, -) - - register_template( name="bailing", format_user=StringFormatter(slots=["HUMAN{{content}}ASSISTANT"]), @@ -712,20 +676,6 @@ register_template( ) -register_template( - name="belle", - format_user=StringFormatter(slots=["Human: {{content}}\n\nBelle: "]), - format_assistant=StringFormatter(slots=["{{content}}", {"eos_token"}, "\n\n"]), - format_prefix=EmptyFormatter(slots=[{"bos_token"}]), -) - - -register_template( - name="bluelm", - format_user=StringFormatter(slots=[{"token": "[|Human|]:"}, "{{content}}", {"token": "[|AI|]:"}]), -) - - register_template( name="breeze", format_user=StringFormatter(slots=["[INST] {{content}} [/INST] "]), @@ -734,14 +684,6 @@ register_template( ) -register_template( - name="chatglm2", - format_user=StringFormatter(slots=["[Round {{idx}}]\n\n问:{{content}}\n\n答:"]), - format_prefix=EmptyFormatter(slots=[{"token": "[gMASK]"}, {"token": "sop"}]), - efficient_eos=True, -) - - register_template( name="chatglm3", format_user=StringFormatter(slots=[{"token": "<|user|>"}, "\n", "{{content}}", {"token": "<|assistant|>"}]), @@ -784,29 +726,6 @@ register_template( ) -register_template( - name="codegeex2", - format_prefix=EmptyFormatter(slots=[{"token": "[gMASK]"}, {"token": "sop"}]), -) - - -register_template( - name="codegeex4", - format_user=StringFormatter(slots=["<|user|>\n{{content}}<|assistant|>\n"]), - format_system=StringFormatter(slots=["<|system|>\n{{content}}"]), - format_function=FunctionFormatter(slots=["{{content}}"], tool_format="glm4"), - format_observation=StringFormatter(slots=["<|observation|>\n{{content}}<|assistant|>\n"]), - format_tools=ToolFormatter(tool_format="glm4"), - format_prefix=EmptyFormatter(slots=["[gMASK]"]), - default_system=( - "你是一位智能编程助手,你叫CodeGeeX。你会为用户回答关于编程、代码、计算机方面的任何问题," - "并提供格式规范、可以执行、准确安全的代码,并在必要时提供详细的解释。" - ), - stop_words=["<|user|>", "<|observation|>"], - efficient_eos=True, -) - - register_template( name="cohere", format_user=StringFormatter( @@ -822,25 +741,6 @@ register_template( ) -register_template( - name="cpm", - format_user=StringFormatter(slots=["<用户>{{content}}"]), - format_prefix=EmptyFormatter(slots=[{"bos_token"}]), -) - - -# copied from chatml template -register_template( - name="cpm3", - format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), - format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]), - format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]), - format_observation=StringFormatter(slots=["<|im_start|>tool\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), - format_prefix=EmptyFormatter(slots=[{"bos_token"}]), - stop_words=["<|im_end|>"], -) - - # copied from chatml template register_template( name="cpm4", @@ -1238,23 +1138,6 @@ register_template( ) -register_template( - name="intern", - format_user=StringFormatter(slots=["<|User|>:{{content}}\n<|Bot|>:"]), - format_assistant=StringFormatter(slots=["{{content}}\n"]), - format_system=StringFormatter(slots=["<|System|>:{{content}}\n"]), - format_prefix=EmptyFormatter(slots=[{"bos_token"}]), - default_system=( - "You are an AI assistant whose name is InternLM (书生·浦语).\n" - "- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory " - "(上海人工智能实验室). It is designed to be helpful, honest, and harmless.\n" - "- InternLM (书生·浦语) can understand and communicate fluently in the language " - "chosen by the user such as English and 中文." - ), - stop_words=[""], -) - - register_template( name="intern2", format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), @@ -1617,23 +1500,6 @@ register_template( ) -# copied from chatml template -register_template( - name="marco", - format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), - format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]), - format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]), - format_observation=StringFormatter(slots=["<|im_start|>tool\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), - default_system=( - "你是一个经过良好训练的AI助手,你的名字是Marco-o1." - "由阿里国际数字商业集团的AI Business创造.\n## 重要!!!!!\n" - "当你回答问题时,你的思考应该在内完成,内输出你的结果。\n" - "应该尽可能是英文,但是有2个特例,一个是对原文中的引用,另一个是是数学应该使用markdown格式,内的输出需要遵循用户输入的语言。\n" - ), - stop_words=["<|im_end|>"], -) - - # copied from qwen template register_template( name="mimo", @@ -1845,13 +1711,6 @@ register_template( ) -register_template( - name="orion", - format_user=StringFormatter(slots=["Human: {{content}}\n\nAssistant: ", {"eos_token"}]), - format_prefix=EmptyFormatter(slots=[{"bos_token"}]), -) - - register_template( name="paligemma", format_user=StringFormatter(slots=["{{content}}\n"]), @@ -2156,41 +2015,6 @@ register_template( ) -# copied from llama3 template -register_template( - name="skywork_o1", - format_user=StringFormatter( - slots=[ - ( - "<|start_header_id|>user<|end_header_id|>\n\n{{content}}<|eot_id|>" - "<|start_header_id|>assistant<|end_header_id|>\n\n" - ) - ] - ), - format_assistant=StringFormatter(slots=["{{content}}<|eot_id|>"]), - format_system=StringFormatter(slots=["<|start_header_id|>system<|end_header_id|>\n\n{{content}}<|eot_id|>"]), - format_function=FunctionFormatter(slots=["{{content}}<|eot_id|>"], tool_format="llama3"), - format_observation=StringFormatter( - slots=[ - ( - "<|start_header_id|>ipython<|end_header_id|>\n\n{{content}}<|eot_id|>" - "<|start_header_id|>assistant<|end_header_id|>\n\n" - ) - ] - ), - format_tools=ToolFormatter(tool_format="llama3"), - format_prefix=EmptyFormatter(slots=[{"bos_token"}]), - default_system=( - "You are Skywork-o1, a thinking model developed by Skywork AI, specializing in solving complex problems " - "involving mathematics, coding, and logical reasoning through deep thought. When faced with a user's request, " - "you first engage in a lengthy and in-depth thinking process to explore possible solutions to the problem. " - "After completing your thoughts, you then provide a detailed explanation of the solution process " - "in your response." - ), - stop_words=["<|eot_id|>", "<|eom_id|>"], -) - - register_template( name="smollm", format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]), @@ -2227,13 +2051,6 @@ register_template( ) -register_template( - name="telechat", - format_user=StringFormatter(slots=["<_user>{{content}}<_bot>"]), - format_system=StringFormatter(slots=["<_system>{{content}}<_end>"]), -) - - register_template( name="telechat2", format_user=StringFormatter(slots=["<_user>{{content}}<_bot>"]), @@ -2277,32 +2094,6 @@ register_template( ) -register_template( - name="xverse", - format_user=StringFormatter(slots=["Human: {{content}}\n\nAssistant: "]), -) - - -register_template( - name="yayi", - format_user=StringFormatter(slots=[{"token": "<|Human|>"}, ":\n{{content}}\n\n", {"token": "<|YaYi|>"}, ":"]), - format_assistant=StringFormatter(slots=["{{content}}\n\n"]), - format_system=StringFormatter(slots=[{"token": "<|System|>"}, ":\n{{content}}\n\n"]), - default_system=( - "You are a helpful, respectful and honest assistant named YaYi " - "developed by Beijing Wenge Technology Co.,Ltd. " - "Always answer as helpfully as possible, while being safe. " - "Your answers should not include any harmful, unethical, " - "racist, sexist, toxic, dangerous, or illegal content. " - "Please ensure that your responses are socially unbiased and positive in nature.\n\n" - "If a question does not make any sense, or is not factually coherent, " - "explain why instead of answering something not correct. " - "If you don't know the answer to a question, please don't share false information." - ), - stop_words=["<|End|>"], -) - - # copied from chatml template register_template( name="yi", @@ -2359,10 +2150,3 @@ register_template( format_system=StringFormatter(slots=["<|system|>\n{{content}}", {"eos_token"}]), default_system="You are Zephyr, a helpful assistant.", ) - - -register_template( - name="ziya", - format_user=StringFormatter(slots=[":{{content}}\n:"]), - format_assistant=StringFormatter(slots=["{{content}}\n"]), -) diff --git a/src/llamafactory/extras/constants.py b/src/llamafactory/extras/constants.py index 0208de822..a7a9f6337 100644 --- a/src/llamafactory/extras/constants.py +++ b/src/llamafactory/extras/constants.py @@ -181,51 +181,6 @@ register_model_group( ) -register_model_group( - models={ - "Baichuan-7B-Base": { - DownloadSource.DEFAULT: "baichuan-inc/Baichuan-7B", - DownloadSource.MODELSCOPE: "baichuan-inc/baichuan-7B", - }, - "Baichuan-13B-Base": { - DownloadSource.DEFAULT: "baichuan-inc/Baichuan-13B-Base", - DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan-13B-Base", - }, - "Baichuan-13B-Chat": { - DownloadSource.DEFAULT: "baichuan-inc/Baichuan-13B-Chat", - DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan-13B-Chat", - }, - }, - template="baichuan", -) - - -register_model_group( - models={ - "Baichuan2-7B-Base": { - DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-7B-Base", - DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-7B-Base", - }, - "Baichuan2-13B-Base": { - DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-13B-Base", - DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-13B-Base", - DownloadSource.OPENMIND: "Baichuan/Baichuan2_13b_base_pt", - }, - "Baichuan2-7B-Chat": { - DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-7B-Chat", - DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-7B-Chat", - DownloadSource.OPENMIND: "Baichuan/Baichuan2_7b_chat_pt", - }, - "Baichuan2-13B-Chat": { - DownloadSource.DEFAULT: "baichuan-inc/Baichuan2-13B-Chat", - DownloadSource.MODELSCOPE: "baichuan-inc/Baichuan2-13B-Chat", - DownloadSource.OPENMIND: "Baichuan/Baichuan2_13b_chat_pt", - }, - }, - template="baichuan2", -) - - register_model_group( models={ "BLOOM-560M": { @@ -262,21 +217,6 @@ register_model_group( ) -register_model_group( - models={ - "BlueLM-7B-Base": { - DownloadSource.DEFAULT: "vivo-ai/BlueLM-7B-Base", - DownloadSource.MODELSCOPE: "vivo-ai/BlueLM-7B-Base", - }, - "BlueLM-7B-Chat": { - DownloadSource.DEFAULT: "vivo-ai/BlueLM-7B-Chat", - DownloadSource.MODELSCOPE: "vivo-ai/BlueLM-7B-Chat", - }, - }, - template="bluelm", -) - - register_model_group( models={ "Breeze-7B": { @@ -290,17 +230,6 @@ register_model_group( ) -register_model_group( - models={ - "ChatGLM2-6B-Chat": { - DownloadSource.DEFAULT: "zai-org/chatglm2-6b", - DownloadSource.MODELSCOPE: "ZhipuAI/chatglm2-6b", - } - }, - template="chatglm2", -) - - register_model_group( models={ "ChatGLM3-6B-Base": { @@ -347,17 +276,6 @@ register_model_group( ) -register_model_group( - models={ - "CodeGeeX4-9B-Chat": { - DownloadSource.DEFAULT: "zai-org/codegeex4-all-9b", - DownloadSource.MODELSCOPE: "ZhipuAI/codegeex4-all-9b", - }, - }, - template="codegeex4", -) - - register_model_group( models={ "CodeGemma-7B": { @@ -642,15 +560,15 @@ register_model_group( register_model_group( models={ - "ERNIE-4.5-0.3B-PT": { + "ERNIE-4.5-0.3B-Instruct": { DownloadSource.DEFAULT: "baidu/ERNIE-4.5-0.3B-PT", DownloadSource.MODELSCOPE: "PaddlePaddle/ERNIE-4.5-0.3B-PT", }, - "ERNIE-4.5-21B-A3B-PT": { + "ERNIE-4.5-21B-A3B-Instruct": { DownloadSource.DEFAULT: "baidu/ERNIE-4.5-21B-A3B-PT", DownloadSource.MODELSCOPE: "PaddlePaddle/ERNIE-4.5-21B-A3B-PT", }, - "ERNIE-4.5-300B-A47B-PT": { + "ERNIE-4.5-300B-A47B-Instruct": { DownloadSource.DEFAULT: "baidu/ERNIE-4.5-300B-A47B-PT", DownloadSource.MODELSCOPE: "PaddlePaddle/ERNIE-4.5-300B-A47B-PT", }, @@ -661,7 +579,7 @@ register_model_group( register_model_group( models={ - "ERNIE-4.5-VL-28B-A3B-PT": { + "ERNIE-4.5-VL-28B-A3B-Instruct": { DownloadSource.DEFAULT: "baidu/ERNIE-4.5-VL-28B-A3B-PT", DownloadSource.MODELSCOPE: "PaddlePaddle/ERNIE-4.5-VL-28B-A3B-PT", }, @@ -669,7 +587,7 @@ register_model_group( DownloadSource.DEFAULT: "baidu/ERNIE-4.5-VL-28B-A3B-Thinking", DownloadSource.MODELSCOPE: "PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking", }, - "ERNIE-4.5-VL-424B-A47B-Base-PT": { + "ERNIE-4.5-VL-424B-A47B-Instruct": { DownloadSource.DEFAULT: "baidu/ERNIE-4.5-VL-424B-A47B-PT", DownloadSource.MODELSCOPE: "PaddlePaddle/ERNIE-4.5-VL-424B-A47B-PT", }, @@ -1266,29 +1184,6 @@ register_model_group( ) -register_model_group( - models={ - "InternLM-7B": { - DownloadSource.DEFAULT: "internlm/internlm-7b", - DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-7b", - }, - "InternLM-20B": { - DownloadSource.DEFAULT: "internlm/internlm-20b", - DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-20b", - }, - "InternLM-7B-Chat": { - DownloadSource.DEFAULT: "internlm/internlm-chat-7b", - DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-chat-7b", - }, - "InternLM-20B-Chat": { - DownloadSource.DEFAULT: "internlm/internlm-chat-20b", - DownloadSource.MODELSCOPE: "Shanghai_AI_Laboratory/internlm-chat-20b", - }, - }, - template="intern", -) - - register_model_group( models={ "InternLM2-7B": { @@ -1483,16 +1378,6 @@ register_model_group( ) -register_model_group( - models={ - "LingoWhale-8B": { - DownloadSource.DEFAULT: "deeplang-ai/LingoWhale-8B", - DownloadSource.MODELSCOPE: "DeepLang/LingoWhale-8B", - } - }, -) - - register_model_group( models={ "LFM2.5-1.2B": { @@ -1828,17 +1713,6 @@ register_model_group( ) -register_model_group( - models={ - "Marco-o1-Chat": { - DownloadSource.DEFAULT: "AIDC-AI/Marco-o1", - DownloadSource.MODELSCOPE: "AIDC-AI/Marco-o1", - }, - }, - template="marco", -) - - register_model_group( models={ "MiMo-7B-Base": { @@ -1909,33 +1783,6 @@ register_model_group( ) -register_model_group( - models={ - "MiniCPM-2B-SFT-Chat": { - DownloadSource.DEFAULT: "openbmb/MiniCPM-2B-sft-bf16", - DownloadSource.MODELSCOPE: "OpenBMB/miniCPM-bf16", - }, - "MiniCPM-2B-DPO-Chat": { - DownloadSource.DEFAULT: "openbmb/MiniCPM-2B-dpo-bf16", - DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM-2B-dpo-bf16", - }, - }, - template="cpm", -) - - -register_model_group( - models={ - "MiniCPM3-4B-Chat": { - DownloadSource.DEFAULT: "openbmb/MiniCPM3-4B", - DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM3-4B", - DownloadSource.OPENMIND: "LlamaFactory/MiniCPM3-4B", - }, - }, - template="cpm3", -) - - register_model_group( models={ "MiniCPM4-0.5B-Chat": { @@ -1973,26 +1820,10 @@ register_model_group( DownloadSource.DEFAULT: "openbmb/MiniCPM-V-2_6", DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM-V-2_6", }, - }, - template="minicpm_v", - multimodal=True, -) - - -register_model_group( - models={ "MiniCPM-V-4": { DownloadSource.DEFAULT: "openbmb/MiniCPM-V-4", DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM-V-4", }, - }, - template="minicpm_v", - multimodal=True, -) - - -register_model_group( - models={ "MiniCPM-V-4.5": { DownloadSource.DEFAULT: "openbmb/MiniCPM-V-4_5", DownloadSource.MODELSCOPE: "OpenBMB/MiniCPM-V-4_5", @@ -2250,33 +2081,6 @@ register_model_group( ) -register_model_group( - models={ - "Orion-14B-Base": { - DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Base", - DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Base", - }, - "Orion-14B-Chat": { - DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Chat", - DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Chat", - }, - "Orion-14B-Long-Chat": { - DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-LongChat", - DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-LongChat", - }, - "Orion-14B-RAG-Chat": { - DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Chat-RAG", - DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Chat-RAG", - }, - "Orion-14B-Plugin-Chat": { - DownloadSource.DEFAULT: "OrionStarAI/Orion-14B-Chat-Plugin", - DownloadSource.MODELSCOPE: "OrionStarAI/Orion-14B-Chat-Plugin", - }, - }, - template="orion", -) - - register_model_group( models={ "PaliGemma-3B-pt-224": { @@ -2373,20 +2177,6 @@ register_model_group( ) -register_model_group( - models={ - "Phi-1.5-1.3B": { - DownloadSource.DEFAULT: "microsoft/phi-1_5", - DownloadSource.MODELSCOPE: "allspace/PHI_1-5", - }, - "Phi-2-2.7B": { - DownloadSource.DEFAULT: "microsoft/phi-2", - DownloadSource.MODELSCOPE: "AI-ModelScope/phi-2", - }, - } -) - - register_model_group( models={ "Phi-3-4B-4k-Instruct": { @@ -2465,228 +2255,6 @@ register_model_group( ) -register_model_group( - models={ - "Qwen-1.8B": { - DownloadSource.DEFAULT: "Qwen/Qwen-1_8B", - DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B", - }, - "Qwen-7B": { - DownloadSource.DEFAULT: "Qwen/Qwen-7B", - DownloadSource.MODELSCOPE: "Qwen/Qwen-7B", - }, - "Qwen-14B": { - DownloadSource.DEFAULT: "Qwen/Qwen-14B", - DownloadSource.MODELSCOPE: "Qwen/Qwen-14B", - }, - "Qwen-72B": { - DownloadSource.DEFAULT: "Qwen/Qwen-72B", - DownloadSource.MODELSCOPE: "Qwen/Qwen-72B", - }, - "Qwen-1.8B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen-1_8B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B-Chat", - }, - "Qwen-7B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen-7B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen-7B-Chat", - }, - "Qwen-14B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen-14B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen-14B-Chat", - }, - "Qwen-72B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen-72B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen-72B-Chat", - }, - "Qwen-1.8B-Chat-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen-1_8B-Chat-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B-Chat-Int8", - }, - "Qwen-1.8B-Chat-Int4": { - DownloadSource.DEFAULT: "Qwen/Qwen-1_8B-Chat-Int4", - DownloadSource.MODELSCOPE: "Qwen/Qwen-1_8B-Chat-Int4", - }, - "Qwen-7B-Chat-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen-7B-Chat-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen-7B-Chat-Int8", - }, - "Qwen-7B-Chat-Int4": { - DownloadSource.DEFAULT: "Qwen/Qwen-7B-Chat-Int4", - DownloadSource.MODELSCOPE: "Qwen/Qwen-7B-Chat-Int4", - }, - "Qwen-14B-Chat-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen-14B-Chat-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen-14B-Chat-Int8", - }, - "Qwen-14B-Chat-Int4": { - DownloadSource.DEFAULT: "Qwen/Qwen-14B-Chat-Int4", - DownloadSource.MODELSCOPE: "Qwen/Qwen-14B-Chat-Int4", - }, - "Qwen-72B-Chat-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen-72B-Chat-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen-72B-Chat-Int8", - }, - "Qwen-72B-Chat-Int4": { - DownloadSource.DEFAULT: "Qwen/Qwen-72B-Chat-Int4", - DownloadSource.MODELSCOPE: "Qwen/Qwen-72B-Chat-Int4", - }, - }, - template="qwen", -) - - -register_model_group( - models={ - "Qwen1.5-0.5B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B", - }, - "Qwen1.5-1.8B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B", - }, - "Qwen1.5-4B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B", - }, - "Qwen1.5-7B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B", - }, - "Qwen1.5-14B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B", - }, - "Qwen1.5-32B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-32B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-32B", - }, - "Qwen1.5-72B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B", - }, - "Qwen1.5-110B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-110B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-110B", - }, - "Qwen1.5-MoE-A2.7B": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-MoE-A2.7B", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-MoE-A2.7B", - }, - "Qwen1.5-0.5B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B-Chat", - }, - "Qwen1.5-1.8B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B-Chat", - }, - "Qwen1.5-4B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B-Chat", - }, - "Qwen1.5-7B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B-Chat", - }, - "Qwen1.5-14B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B-Chat", - }, - "Qwen1.5-32B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-32B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-32B-Chat", - }, - "Qwen1.5-72B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B-Chat", - }, - "Qwen1.5-110B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-110B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-110B-Chat", - }, - "Qwen1.5-MoE-A2.7B-Chat": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-MoE-A2.7B-Chat", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-MoE-A2.7B-Chat", - }, - "Qwen1.5-0.5B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8", - }, - "Qwen1.5-0.5B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-0.5B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-0.5B-Chat-AWQ", - }, - "Qwen1.5-1.8B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8", - }, - "Qwen1.5-1.8B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-1.8B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-1.8B-Chat-AWQ", - }, - "Qwen1.5-4B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B-Chat-GPTQ-Int8", - }, - "Qwen1.5-4B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-4B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-4B-Chat-AWQ", - }, - "Qwen1.5-7B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B-Chat-GPTQ-Int8", - }, - "Qwen1.5-7B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-7B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-7B-Chat-AWQ", - }, - "Qwen1.5-14B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B-Chat-GPTQ-Int8", - }, - "Qwen1.5-14B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-14B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-14B-Chat-AWQ", - }, - "Qwen1.5-32B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-32B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-32B-Chat-AWQ", - }, - "Qwen1.5-72B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B-Chat-GPTQ-Int8", - }, - "Qwen1.5-72B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-72B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-72B-Chat-AWQ", - }, - "Qwen1.5-110B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-110B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-110B-Chat-AWQ", - }, - "Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4": { - DownloadSource.DEFAULT: "Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4", - DownloadSource.MODELSCOPE: "Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4", - }, - "CodeQwen1.5-7B": { - DownloadSource.DEFAULT: "Qwen/CodeQwen1.5-7B", - DownloadSource.MODELSCOPE: "Qwen/CodeQwen1.5-7B", - }, - "CodeQwen1.5-7B-Chat": { - DownloadSource.DEFAULT: "Qwen/CodeQwen1.5-7B-Chat", - DownloadSource.MODELSCOPE: "Qwen/CodeQwen1.5-7B-Chat", - }, - "CodeQwen1.5-7B-Chat-AWQ": { - DownloadSource.DEFAULT: "Qwen/CodeQwen1.5-7B-Chat-AWQ", - DownloadSource.MODELSCOPE: "Qwen/CodeQwen1.5-7B-Chat-AWQ", - }, - }, - template="qwen", -) - - register_model_group( models={ "Qwen2-0.5B": { @@ -3454,27 +3022,6 @@ register_model_group( ) -register_model_group( - models={ - "Skywork-13B-Base": { - DownloadSource.DEFAULT: "Skywork/Skywork-13B-base", - DownloadSource.MODELSCOPE: "skywork/Skywork-13B-base", - } - } -) - - -register_model_group( - models={ - "Skywork-o1-Open-Llama-3.1-8B": { - DownloadSource.DEFAULT: "Skywork/Skywork-o1-Open-Llama-3.1-8B", - DownloadSource.MODELSCOPE: "AI-ModelScope/Skywork-o1-Open-Llama-3.1-8B", - } - }, - template="skywork_o1", -) - - register_model_group( models={ "SmolLM-135M": { @@ -3569,30 +3116,6 @@ register_model_group( ) -register_model_group( - models={ - "TeleChat-1B-Chat": { - DownloadSource.DEFAULT: "Tele-AI/TeleChat-1B", - DownloadSource.MODELSCOPE: "TeleAI/TeleChat-1B", - }, - "TeleChat-7B-Chat": { - DownloadSource.DEFAULT: "Tele-AI/telechat-7B", - DownloadSource.MODELSCOPE: "TeleAI/telechat-7B", - DownloadSource.OPENMIND: "TeleAI/TeleChat-7B-pt", - }, - "TeleChat-12B-Chat": { - DownloadSource.DEFAULT: "Tele-AI/TeleChat-12B-v2", - DownloadSource.MODELSCOPE: "TeleAI/TeleChat-12B-v2", - DownloadSource.OPENMIND: "TeleAI/TeleChat-12B-pt", - }, - "TeleChat-52B-Chat": { - DownloadSource.DEFAULT: "Tele-AI/TeleChat-52B", - }, - }, - template="telechat", -) - - register_model_group( models={ "TeleChat2-3B-Chat": { @@ -3707,80 +3230,6 @@ register_model_group( ) -register_model_group( - models={ - "XVERSE-7B": { - DownloadSource.DEFAULT: "xverse/XVERSE-7B", - DownloadSource.MODELSCOPE: "xverse/XVERSE-7B", - }, - "XVERSE-13B": { - DownloadSource.DEFAULT: "xverse/XVERSE-13B", - DownloadSource.MODELSCOPE: "xverse/XVERSE-13B", - }, - "XVERSE-65B": { - DownloadSource.DEFAULT: "xverse/XVERSE-65B", - DownloadSource.MODELSCOPE: "xverse/XVERSE-65B", - }, - "XVERSE-65B-2": { - DownloadSource.DEFAULT: "xverse/XVERSE-65B-2", - DownloadSource.MODELSCOPE: "xverse/XVERSE-65B-2", - }, - "XVERSE-7B-Chat": { - DownloadSource.DEFAULT: "xverse/XVERSE-7B-Chat", - DownloadSource.MODELSCOPE: "xverse/XVERSE-7B-Chat", - }, - "XVERSE-13B-Chat": { - DownloadSource.DEFAULT: "xverse/XVERSE-13B-Chat", - DownloadSource.MODELSCOPE: "xverse/XVERSE-13B-Chat", - }, - "XVERSE-65B-Chat": { - DownloadSource.DEFAULT: "xverse/XVERSE-65B-Chat", - DownloadSource.MODELSCOPE: "xverse/XVERSE-65B-Chat", - }, - "XVERSE-MoE-A4.2B": { - DownloadSource.DEFAULT: "xverse/XVERSE-MoE-A4.2B", - DownloadSource.MODELSCOPE: "xverse/XVERSE-MoE-A4.2B", - }, - "XVERSE-7B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "xverse/XVERSE-7B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "xverse/XVERSE-7B-Chat-GPTQ-Int8", - }, - "XVERSE-7B-Chat-GPTQ-Int4": { - DownloadSource.DEFAULT: "xverse/XVERSE-7B-Chat-GPTQ-Int4", - DownloadSource.MODELSCOPE: "xverse/XVERSE-7B-Chat-GPTQ-Int4", - }, - "XVERSE-13B-Chat-GPTQ-Int8": { - DownloadSource.DEFAULT: "xverse/XVERSE-13B-Chat-GPTQ-Int8", - DownloadSource.MODELSCOPE: "xverse/XVERSE-13B-Chat-GPTQ-Int8", - }, - "XVERSE-13B-Chat-GPTQ-Int4": { - DownloadSource.DEFAULT: "xverse/XVERSE-13B-Chat-GPTQ-Int4", - DownloadSource.MODELSCOPE: "xverse/XVERSE-13B-Chat-GPTQ-Int4", - }, - "XVERSE-65B-Chat-GPTQ-Int4": { - DownloadSource.DEFAULT: "xverse/XVERSE-65B-Chat-GPTQ-Int4", - DownloadSource.MODELSCOPE: "xverse/XVERSE-65B-Chat-GPTQ-Int4", - }, - }, - template="xverse", -) - - -register_model_group( - models={ - "Yayi-7B": { - DownloadSource.DEFAULT: "wenge-research/yayi-7b-llama2", - DownloadSource.MODELSCOPE: "AI-ModelScope/yayi-7b-llama2", - }, - "Yayi-13B": { - DownloadSource.DEFAULT: "wenge-research/yayi-13b-llama2", - DownloadSource.MODELSCOPE: "AI-ModelScope/yayi-13b-llama2", - }, - }, - template="yayi", -) - - register_model_group( models={ "Yi-6B": { diff --git a/src/llamafactory/v1/accelerator/interface.py b/src/llamafactory/v1/accelerator/interface.py index 2837cbc74..b464198b2 100644 --- a/src/llamafactory/v1/accelerator/interface.py +++ b/src/llamafactory/v1/accelerator/interface.py @@ -35,7 +35,7 @@ from torch.distributed import barrier, destroy_process_group, init_process_group from torch.distributed.device_mesh import DeviceMesh, init_device_mesh from ..utils import logging -from ..utils.types import DistributedConfig, ProcessGroup, Tensor, TensorLike +from ..utils.types import DistributedConfig, ProcessGroup, TensorLike from . import helper @@ -214,7 +214,7 @@ class DistributedInterface: """Get parallel local world size.""" return self._local_world_size - def all_gather(self, data: Tensor, dim: Dim | None = Dim.DP) -> Tensor: + def all_gather(self, data: TensorLike, dim: Dim | None = Dim.DP) -> TensorLike: """Gather tensor across specified parallel group.""" if self.model_device_mesh is not None: return helper.operate_tensorlike(helper.all_gather, data, group=self.get_group(dim))