jiaqiw09
|
8669a22e9c
|
[fix] fix liger kernel patch for npu (#10583)
|
2026-06-16 18:21:52 +08:00 |
|
Hao Liang
|
897a44386c
|
[docs] add DataFlow and DataFlex blog tutorials (#10582)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-06-16 14:20:36 +08:00 |
|
jiaqiw09
|
7a1e9630f2
|
[fix] update ascend doc link (#10572)
|
2026-06-15 13:55:53 +08:00 |
|
souljoy
|
cabe59a343
|
[model] add MiniCPM5-1B-Chat (#10558)
|
2026-06-10 16:18:27 +08:00 |
|
Co-Cl2
|
9ca4026efe
|
[model] handle unsloth model loading fallback during checkpoint resume (#7156) (#10551)
|
2026-06-09 01:01:01 +08:00 |
|
Ximing Xing
|
0b7aaf8f6a
|
[fix] correctly place new token embeddings when embedding is padded (#10547)
|
2026-06-05 10:47:51 +08:00 |
|
codingma
|
8a4f6a3da5
|
[model] add gemma-4-12B-it (#10549)
|
2026-06-04 23:43:20 +08:00 |
|
A1waysBeenHere
|
409e8a477f
|
[model] Patch GDN for NPU (#10504)
Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>
|
2026-06-04 16:39:02 +08:00 |
|
Cui-yshoho
|
053d43c0ac
|
[feat] support HyperParallel PT training and activation optimization (#10370)
|
2026-06-02 22:39:32 +08:00 |
|
Zhao73
|
a98a1ef101
|
[docs] fix README citation typo (#10540)
|
2026-06-01 21:04:53 +08:00 |
|
Yaowei Zheng
|
8ef7335b6a
|
[misc] set dev version (#10533)
|
2026-05-31 00:16:07 +08:00 |
|
Yaowei Zheng
|
7af909522a
|
[version] release v0.9.5 (#10532)
v0.9.5
|
2026-05-30 23:57:09 +08:00 |
|
xvxuopop
|
e016d2480e
|
[fix] Fix NPU FusedMoE and RMSNorm (#10512)
|
2026-05-30 21:42:54 +08:00 |
|
jiaqiw09
|
7d719182c9
|
[model] fix non-packing batch (bsz>1) for Qwen3.5 with flash attention (#10529)
|
2026-05-30 21:41:41 +08:00 |
|
jiaqiw09
|
01398eb18d
|
[v1] fix padding free with sp (#10513)
|
2026-05-26 23:49:21 +08:00 |
|
cxy
|
8e68764b65
|
[v1] Implement dynamic padding-free stretrgy for batching (#10507)
Co-authored-by: cxy-thinkbook <xuanyuchen@seu.edu.cn>
|
2026-05-25 20:40:21 +08:00 |
|
Copilot
|
16ff5a23cb
|
[fix] use getattr for profiler attrs to support MCA TrainingArguments (#10506)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
|
2026-05-21 17:26:29 +08:00 |
|
jiaqiw09
|
bdcb92d035
|
[v1] Add FlashAttention selection and implement normal / padding-free / dynamic batching (#10469)
|
2026-05-21 17:14:19 +08:00 |
|
sunyi0505
|
7e20db5735
|
[v1] support liger_kernel (#10493)
|
2026-05-21 11:44:56 +08:00 |
|
浮梦
|
2322bf1cc2
|
[v1] add cuda fused moe kernel, implementing with triton (#10481)
|
2026-05-20 20:49:42 +08:00 |
|
浮梦
|
368c48968f
|
[callback] add torch profiler callback (#10463)
|
2026-05-20 20:47:52 +08:00 |
|
浮梦
|
8b5ea65770
|
[v1] support reward training stage (#10431)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-20 20:46:52 +08:00 |
|
Dennis Huang
|
40e786d016
|
[data] add missing return statement in MiniCPM V Plugin (#10500)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-20 01:50:00 +08:00 |
|
xvxuopop
|
6b9df75ab9
|
[docker] update npu docker (#10479)
|
2026-05-13 20:56:43 +08:00 |
|
马境远
|
ca50f22c38
|
[fix] Fix MiniCPM-V-4.6 image preprocessing behavior (#10478)
|
2026-05-12 11:35:23 +08:00 |
|
马境远
|
53e77a9bfa
|
[model] support MiniCPM-V-4.6 (#10472)
|
2026-05-08 18:14:34 +08:00 |
|
浮梦
|
55bd4944b6
|
[fix] fix qwen3_6 template doc (#10470)
|
2026-05-08 11:47:02 +08:00 |
|
Tai An
|
7e09152275
|
fix(data/converter): handle None tool_calls in OpenAI-style messages (#10455)
|
2026-05-07 17:44:41 +08:00 |
|
simulikeit
|
1e503a982d
|
[assets] correct typo in examples/README_zh.md (#10462)
|
2026-05-07 00:42:01 +08:00 |
|
luca-888
|
8752280dd7
|
[data] Optimize QwenVL video dataset preprocessing (#10404)
Co-authored-by: Kingsley <kingsleydodonow@gmail.com>
|
2026-05-03 18:36:56 +08:00 |
|
Kingsley
|
468723c5d9
|
[packing] fix GDN crash when meeting dummy image (#10453)
|
2026-05-01 12:10:13 +08:00 |
|
Peilin Li
|
887ee2b121
|
[refactor] Add KTransformers AMX MoE SFT support via Accelerate (#10430)
Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-05-01 01:47:58 +08:00 |
|
Kingsley
|
6b08b948c9
|
[misc] bump transformers version upperbound (#10446)
|
2026-05-01 01:30:11 +08:00 |
|
Hertz
|
f7f3bfcbd7
|
[model] support Hy3-Preview (#10432)
|
2026-04-29 23:21:13 +08:00 |
|
Kingsley
|
3475198d1e
|
[fa2] fix IMA when train qwen3_5 (#10448)
|
2026-04-29 20:20:55 +08:00 |
|
sunyi0505
|
50945ef850
|
[v1] fix device_mesh and sp for fsdp2 (#10429)
|
2026-04-28 11:20:11 +08:00 |
|
Octopus
|
2f0bef207a
|
[export] handle NotImplementedError in export_model for transformers>=5.0 (fixes #10410) (#10438)
Co-authored-by: octo-patch <octo-patch@github.com>
|
2026-04-27 23:36:23 +08:00 |
|
curnane-lab
|
2092abc217
|
[npu] add Qwen3.5 support with Partial RoPE and Hybrid Attention (#10421)
Co-authored-by: Curnane <mingliangfu@users.noreply.github.com>
|
2026-04-27 23:36:07 +08:00 |
|
Kingsley
|
99464b3d03
|
[misc] code lint (#10439)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-27 14:07:31 +08:00 |
|
jiaqiw09
|
9a0cfdccfa
|
[v1] fix init on meta in transformers v5 (#10414)
|
2026-04-27 00:37:09 +08:00 |
|
Kingsley
|
c8890c32db
|
[data] support discard history cot for multiturn (#10435)
|
2026-04-27 00:32:44 +08:00 |
|
Kingsley
|
79c8332e4c
|
[train] add qwen35 patch for neat_packing (#10436)
|
2026-04-27 00:31:49 +08:00 |
|
jiaqiw09
|
e0bc3c1971
|
[v1] fix epoch and steps (#10422)
|
2026-04-23 17:29:06 +08:00 |
|
浮梦
|
ecca167eb4
|
[model] support qwen3.6 models (#10415)
Co-authored-by: frozenleaves <frozen@Mac.local>
|
2026-04-22 19:44:01 +08:00 |
|
jiaqiw09
|
28a6ea1cdc
|
[v1] add deepspeed zero3 trigger for low memory usage weight loading (#10300)
|
2026-04-21 14:09:52 +08:00 |
|
sunyi0505
|
f5d739b132
|
[v1] fix device mesh and clip_grad_norm for ulysses cp (#10366)
|
2026-04-21 10:54:54 +08:00 |
|
浮梦
|
c4bbac49b2
|
[v1] support resume training from checkpoint (#10280)
Co-authored-by: frozenleaves <frozen@Mac.local>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-20 20:28:08 +08:00 |
|
Cocoon-Break
|
c5aecaf31d
|
[data] fix SeedToolUtils.tool_extractor returns content when no tool calls found (#10408)
Signed-off-by: Cocoon-Break <54054995+kuishou68@users.noreply.github.com>
|
2026-04-20 12:22:55 +08:00 |
|
Kingsley
|
436d26bc28
|
fix: projector lookup for gemma4 modules (#10382)
Co-authored-by: yiluoAK_47 <yiluoAK_47@163.com>
|
2026-04-12 08:32:14 +08:00 |
|
Kingsley
|
c109c061e5
|
[model] set mm_projectors for omni models (#10378)
|
2026-04-10 18:12:57 +08:00 |
|