3046 Commits

Author SHA1 Message Date
jiaqiw09
8669a22e9c [fix] fix liger kernel patch for npu (#10583) 2026-06-16 18:21:52 +08:00
Hao Liang
897a44386c [docs] add DataFlow and DataFlex blog tutorials (#10582)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-16 14:20:36 +08:00
jiaqiw09
7a1e9630f2 [fix] update ascend doc link (#10572) 2026-06-15 13:55:53 +08:00
souljoy
cabe59a343 [model] add MiniCPM5-1B-Chat (#10558) 2026-06-10 16:18:27 +08:00
Co-Cl2
9ca4026efe [model] handle unsloth model loading fallback during checkpoint resume (#7156) (#10551) 2026-06-09 01:01:01 +08:00
Ximing Xing
0b7aaf8f6a [fix] correctly place new token embeddings when embedding is padded (#10547) 2026-06-05 10:47:51 +08:00
codingma
8a4f6a3da5 [model] add gemma-4-12B-it (#10549) 2026-06-04 23:43:20 +08:00
A1waysBeenHere
409e8a477f [model] Patch GDN for NPU (#10504)
Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>
2026-06-04 16:39:02 +08:00
Cui-yshoho
053d43c0ac [feat] support HyperParallel PT training and activation optimization (#10370) 2026-06-02 22:39:32 +08:00
Zhao73
a98a1ef101 [docs] fix README citation typo (#10540) 2026-06-01 21:04:53 +08:00
Yaowei Zheng
8ef7335b6a [misc] set dev version (#10533) 2026-05-31 00:16:07 +08:00
Yaowei Zheng
7af909522a [version] release v0.9.5 (#10532) v0.9.5 2026-05-30 23:57:09 +08:00
xvxuopop
e016d2480e [fix] Fix NPU FusedMoE and RMSNorm (#10512) 2026-05-30 21:42:54 +08:00
jiaqiw09
7d719182c9 [model] fix non-packing batch (bsz>1) for Qwen3.5 with flash attention (#10529) 2026-05-30 21:41:41 +08:00
jiaqiw09
01398eb18d [v1] fix padding free with sp (#10513) 2026-05-26 23:49:21 +08:00
cxy
8e68764b65 [v1] Implement dynamic padding-free stretrgy for batching (#10507)
Co-authored-by: cxy-thinkbook <xuanyuchen@seu.edu.cn>
2026-05-25 20:40:21 +08:00
Copilot
16ff5a23cb [fix] use getattr for profiler attrs to support MCA TrainingArguments (#10506)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
2026-05-21 17:26:29 +08:00
jiaqiw09
bdcb92d035 [v1] Add FlashAttention selection and implement normal / padding-free / dynamic batching (#10469) 2026-05-21 17:14:19 +08:00
sunyi0505
7e20db5735 [v1] support liger_kernel (#10493) 2026-05-21 11:44:56 +08:00
浮梦
2322bf1cc2 [v1] add cuda fused moe kernel, implementing with triton (#10481) 2026-05-20 20:49:42 +08:00
浮梦
368c48968f [callback] add torch profiler callback (#10463) 2026-05-20 20:47:52 +08:00
浮梦
8b5ea65770 [v1] support reward training stage (#10431)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-05-20 20:46:52 +08:00
Dennis Huang
40e786d016 [data] add missing return statement in MiniCPM V Plugin (#10500)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-20 01:50:00 +08:00
xvxuopop
6b9df75ab9 [docker] update npu docker (#10479) 2026-05-13 20:56:43 +08:00
马境远
ca50f22c38 [fix] Fix MiniCPM-V-4.6 image preprocessing behavior (#10478) 2026-05-12 11:35:23 +08:00
马境远
53e77a9bfa [model] support MiniCPM-V-4.6 (#10472) 2026-05-08 18:14:34 +08:00
浮梦
55bd4944b6 [fix] fix qwen3_6 template doc (#10470) 2026-05-08 11:47:02 +08:00
Tai An
7e09152275 fix(data/converter): handle None tool_calls in OpenAI-style messages (#10455) 2026-05-07 17:44:41 +08:00
simulikeit
1e503a982d [assets] correct typo in examples/README_zh.md (#10462) 2026-05-07 00:42:01 +08:00
luca-888
8752280dd7 [data] Optimize QwenVL video dataset preprocessing (#10404)
Co-authored-by: Kingsley <kingsleydodonow@gmail.com>
2026-05-03 18:36:56 +08:00
Kingsley
468723c5d9 [packing] fix GDN crash when meeting dummy image (#10453) 2026-05-01 12:10:13 +08:00
Peilin Li
887ee2b121 [refactor] Add KTransformers AMX MoE SFT support via Accelerate (#10430)
Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 01:47:58 +08:00
Kingsley
6b08b948c9 [misc] bump transformers version upperbound (#10446) 2026-05-01 01:30:11 +08:00
Hertz
f7f3bfcbd7 [model] support Hy3-Preview (#10432) 2026-04-29 23:21:13 +08:00
Kingsley
3475198d1e [fa2] fix IMA when train qwen3_5 (#10448) 2026-04-29 20:20:55 +08:00
sunyi0505
50945ef850 [v1] fix device_mesh and sp for fsdp2 (#10429) 2026-04-28 11:20:11 +08:00
Octopus
2f0bef207a [export] handle NotImplementedError in export_model for transformers>=5.0 (fixes #10410) (#10438)
Co-authored-by: octo-patch <octo-patch@github.com>
2026-04-27 23:36:23 +08:00
curnane-lab
2092abc217 [npu] add Qwen3.5 support with Partial RoPE and Hybrid Attention (#10421)
Co-authored-by: Curnane <mingliangfu@users.noreply.github.com>
2026-04-27 23:36:07 +08:00
Kingsley
99464b3d03 [misc] code lint (#10439)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-27 14:07:31 +08:00
jiaqiw09
9a0cfdccfa [v1] fix init on meta in transformers v5 (#10414) 2026-04-27 00:37:09 +08:00
Kingsley
c8890c32db [data] support discard history cot for multiturn (#10435) 2026-04-27 00:32:44 +08:00
Kingsley
79c8332e4c [train] add qwen35 patch for neat_packing (#10436) 2026-04-27 00:31:49 +08:00
jiaqiw09
e0bc3c1971 [v1] fix epoch and steps (#10422) 2026-04-23 17:29:06 +08:00
浮梦
ecca167eb4 [model] support qwen3.6 models (#10415)
Co-authored-by: frozenleaves <frozen@Mac.local>
2026-04-22 19:44:01 +08:00
jiaqiw09
28a6ea1cdc [v1] add deepspeed zero3 trigger for low memory usage weight loading (#10300) 2026-04-21 14:09:52 +08:00
sunyi0505
f5d739b132 [v1] fix device mesh and clip_grad_norm for ulysses cp (#10366) 2026-04-21 10:54:54 +08:00
浮梦
c4bbac49b2 [v1] support resume training from checkpoint (#10280)
Co-authored-by: frozenleaves <frozen@Mac.local>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-20 20:28:08 +08:00
Cocoon-Break
c5aecaf31d [data] fix SeedToolUtils.tool_extractor returns content when no tool calls found (#10408)
Signed-off-by: Cocoon-Break <54054995+kuishou68@users.noreply.github.com>
2026-04-20 12:22:55 +08:00
Kingsley
436d26bc28 fix: projector lookup for gemma4 modules (#10382)
Co-authored-by: yiluoAK_47 <yiluoAK_47@163.com>
2026-04-12 08:32:14 +08:00
Kingsley
c109c061e5 [model] set mm_projectors for omni models (#10378) 2026-04-10 18:12:57 +08:00