Commit Graph

3028 Commits

Author SHA1 Message Date
sunyi0505
7e20db5735 [v1] support liger_kernel (#10493) 2026-05-21 11:44:56 +08:00
浮梦
2322bf1cc2 [v1] add cuda fused moe kernel, implementing with triton (#10481) 2026-05-20 20:49:42 +08:00
浮梦
368c48968f [callback] add torch profiler callback (#10463) 2026-05-20 20:47:52 +08:00
浮梦
8b5ea65770 [v1] support reward training stage (#10431)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-20 20:46:52 +08:00
Dennis Huang
40e786d016 [data] add missing return statement in MiniCPM V Plugin (#10500)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-20 01:50:00 +08:00
xvxuopop
6b9df75ab9 [docker] update npu docker (#10479) 2026-05-13 20:56:43 +08:00
马境远
ca50f22c38 [fix] Fix MiniCPM-V-4.6 image preprocessing behavior (#10478) 2026-05-12 11:35:23 +08:00
马境远
53e77a9bfa [model] support MiniCPM-V-4.6 (#10472) 2026-05-08 18:14:34 +08:00
浮梦
55bd4944b6 [fix] fix qwen3_6 template doc (#10470) 2026-05-08 11:47:02 +08:00
Tai An
7e09152275 fix(data/converter): handle None tool_calls in OpenAI-style messages (#10455) 2026-05-07 17:44:41 +08:00
simulikeit
1e503a982d [assets] correct typo in examples/README_zh.md (#10462) 2026-05-07 00:42:01 +08:00
luca-888
8752280dd7 [data] Optimize QwenVL video dataset preprocessing (#10404)
Co-authored-by: Kingsley <kingsleydodonow@gmail.com>
2026-05-03 18:36:56 +08:00
Kingsley
468723c5d9 [packing] fix GDN crash when meeting dummy image (#10453) 2026-05-01 12:10:13 +08:00
Peilin Li
887ee2b121 [refactor] Add KTransformers AMX MoE SFT support via Accelerate (#10430)
Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 01:47:58 +08:00
Kingsley
6b08b948c9 [misc] bump transformers version upperbound (#10446) 2026-05-01 01:30:11 +08:00
Hertz
f7f3bfcbd7 [model] support Hy3-Preview (#10432) 2026-04-29 23:21:13 +08:00
Kingsley
3475198d1e [fa2] fix IMA when train qwen3_5 (#10448) 2026-04-29 20:20:55 +08:00
sunyi0505
50945ef850 [v1] fix device_mesh and sp for fsdp2 (#10429) 2026-04-28 11:20:11 +08:00
Octopus
2f0bef207a [export] handle NotImplementedError in export_model for transformers>=5.0 (fixes #10410) (#10438)
Co-authored-by: octo-patch <octo-patch@github.com>
2026-04-27 23:36:23 +08:00
curnane-lab
2092abc217 [npu] add Qwen3.5 support with Partial RoPE and Hybrid Attention (#10421)
Co-authored-by: Curnane <mingliangfu@users.noreply.github.com>
2026-04-27 23:36:07 +08:00
Kingsley
99464b3d03 [misc] code lint (#10439)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-27 14:07:31 +08:00
jiaqiw09
9a0cfdccfa [v1] fix init on meta in transformers v5 (#10414) 2026-04-27 00:37:09 +08:00
Kingsley
c8890c32db [data] support discard history cot for multiturn (#10435) 2026-04-27 00:32:44 +08:00
Kingsley
79c8332e4c [train] add qwen35 patch for neat_packing (#10436) 2026-04-27 00:31:49 +08:00
jiaqiw09
e0bc3c1971 [v1] fix epoch and steps (#10422) 2026-04-23 17:29:06 +08:00
浮梦
ecca167eb4 [model] support qwen3.6 models (#10415)
Co-authored-by: frozenleaves <frozen@Mac.local>
2026-04-22 19:44:01 +08:00
jiaqiw09
28a6ea1cdc [v1] add deepspeed zero3 trigger for low memory usage weight loading (#10300) 2026-04-21 14:09:52 +08:00
sunyi0505
f5d739b132 [v1] fix device mesh and clip_grad_norm for ulysses cp (#10366) 2026-04-21 10:54:54 +08:00
浮梦
c4bbac49b2 [v1] support resume training from checkpoint (#10280)
Co-authored-by: frozenleaves <frozen@Mac.local>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-20 20:28:08 +08:00
Cocoon-Break
c5aecaf31d [data] fix SeedToolUtils.tool_extractor returns content when no tool calls found (#10408)
Signed-off-by: Cocoon-Break <54054995+kuishou68@users.noreply.github.com>
2026-04-20 12:22:55 +08:00
Kingsley
436d26bc28 fix: projector lookup for gemma4 modules (#10382)
Co-authored-by: yiluoAK_47 <yiluoAK_47@163.com>
2026-04-12 08:32:14 +08:00
Kingsley
c109c061e5 [model] set mm_projectors for omni models (#10378) 2026-04-10 18:12:57 +08:00
Kingsley
fa09c01c36 fix: gemma4 mm_token_type_ids padding (#10359) 2026-04-06 13:14:45 +08:00
Kingsley
eae6f0b541 [model] gemma4 (#10346) 2026-04-05 12:10:28 +08:00
Kingsley
acac63ef35 [data] fix qwen3vl timestamp (#10338) 2026-04-01 22:40:12 +08:00
浮梦
e5e8546493 [misc] fix moe (#10334)
Co-authored-by: frozenleaves <frozen@Mac.local>
2026-03-31 23:04:45 +08:00
Cui-yshoho
97433c53b6 [feat] support LlamaFactory SFT training by HyperParallel FSDP2 backend (#10289) 2026-03-30 10:47:20 +08:00
sunyi0505
b5afabe3d2 [v1] support ulysses cp for fsdp2 (#10262) 2026-03-27 16:22:48 +08:00
jiaqiw09
df2e6edb7e [v1] add init on rank0 for fsdp2 (#10264) 2026-03-27 14:54:03 +08:00
Goalina
d02fcd3588 [ci] add nginx cache config for Ascend NPU CI environment (#10323) 2026-03-27 10:04:16 +08:00
jiaqiw09
c340aa2a33 [v1] add callbacks (#10255) 2026-03-26 19:59:57 +08:00
Hertz
1e536733c6 [data] fix mimo-v2 tool call (#10315) 2026-03-26 17:37:22 +08:00
Yutong Wu
97d479fa92 [model] support Qwen3.5 liger kernel (#10313) 2026-03-24 18:25:33 +08:00
Kingsley
ffbff33af3 chore: mca workflow compatible with qwen-vl series (#10303) 2026-03-22 02:28:52 +08:00
Kingsley
833f6027b1 [fix] fit neat_packing & mrope model packing (#10283)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2026-03-20 16:50:11 +08:00
robertglools
d91d8af89e [data] add SGSC zero-hallucination B2B dataset (NOO-Protocol) (#10284)
Co-authored-by: GloolsGuan <GloolsGuan@gmail.com>
2026-03-20 15:49:03 +08:00
xxddccaa
e67ab9e2f2 fix:MiniCPMVPlugin IndexError in process_messages when training with video (#10276)
Co-authored-by: xxddccaa <xxddccaa@users.noreply.github.com>
2026-03-18 19:18:06 +08:00
LincolnBurrows2017
2c4f121817 [fix] handle empty content list in system message (#10291)
Co-authored-by: AI Assistant <assistant@example.com>
2026-03-18 12:05:49 +08:00
xvxuopop
487f8b8191 [v1] add qwen3 templates and fix rendering plugin. (#10212)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
2026-03-18 11:30:50 +08:00
SnowCharm
78cad1e332 [fix] unused keys in ray example (#10290) 2026-03-18 00:23:53 +08:00