hiyouga
|
6cdbfa980e
|
fix
|
2025-12-27 08:46:17 +08:00 |
|
hiyouga
|
5eea54f888
|
fix
|
2025-12-27 08:18:20 +08:00 |
|
hiyouga
|
f69c8efd27
|
fix
|
2025-12-27 08:15:02 +08:00 |
|
hiyouga
|
e374b615f2
|
fix
|
2025-12-27 08:08:57 +08:00 |
|
hiyouga
|
d7201fd6ae
|
fix
|
2025-12-27 08:06:42 +08:00 |
|
hiyouga
|
42b436a4fc
|
fix
|
2025-12-27 08:05:25 +08:00 |
|
hiyouga
|
5f3e5eb5f8
|
fix
|
2025-12-27 07:53:52 +08:00 |
|
hiyouga
|
6de0c3da9b
|
fix
|
2025-12-27 07:50:18 +08:00 |
|
hiyouga
|
1622bad7d4
|
fix
|
2025-12-27 07:47:43 +08:00 |
|
hiyouga
|
c439924e74
|
fix
|
2025-12-27 07:38:10 +08:00 |
|
hiyouga
|
a24d8cc78c
|
fix
|
2025-12-27 07:36:30 +08:00 |
|
hiyouga
|
66e6aa8f37
|
fixup
|
2025-12-27 07:35:18 +08:00 |
|
Copilot
|
a1b1931b4a
|
[breaking] migrate from setuptools to uv (#9673)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
|
2025-12-26 22:47:23 +08:00 |
|
Xunpeng Xiao
|
3c17f2722c
|
[model] Update ernie_vl to adapt new version (#9665)
|
2025-12-26 19:57:49 +08:00 |
|
Copilot
|
a882e2d5fc
|
[assets] Add GitHub Copilot instructions for repository (#9675)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
|
2025-12-26 17:32:48 +08:00 |
|
Yaowei Zheng
|
a754604c11
|
[misc] fix accelerator (#9661)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-25 02:11:04 +08:00 |
|
Xunpeng Xiao
|
6a2eafbae3
|
[feat] Models trained and inferred with Mxfp4 are dequantized by default (#9652)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
|
2025-12-24 00:26:40 +08:00 |
|
Yaowei Zheng
|
84485406b7
|
[ci] disable pip cache for ci (#9654)
|
2025-12-23 18:37:40 +08:00 |
|
Kingsley
|
1c8a42d2f8
|
[v1&WIP] dataloader init (#9645)
|
2025-12-23 16:29:47 +08:00 |
|
thulyubh22
|
7901b2f32e
|
[model] efficient tuning for gpt-oss (#9354)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-23 16:28:38 +08:00 |
|
Yaowei Zheng
|
1f1f5a7d1b
|
[ci] remove docker cache (#9640)
|
2025-12-22 01:03:10 +08:00 |
|
Yaowei Zheng
|
6ef9854713
|
[misc] fix cache & pin transformers to 4.57.1 (#9638)
|
2025-12-22 00:20:55 +08:00 |
|
Hertz
|
4923f52a28
|
[model] support MiMo-V2-Flash model (#9637)
|
2025-12-21 14:38:18 +08:00 |
|
Yaowei Zheng
|
0894b4f37e
|
[misc] lint (#9636)
|
2025-12-20 16:19:39 +08:00 |
|
ZIYI ZENG
|
b0d49e137f
|
[misc] Support split eval_dataset when explict set "predict_with_generate" (#9604)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-20 01:46:00 +08:00 |
|
Xunpeng Xiao
|
ddd7dcc722
|
[data] Fix the video frame sampling issue #9620 (#9634)
|
2025-12-19 18:36:31 +08:00 |
|
浮梦
|
5204cd2bca
|
[misc] add version check for moe (#9633)
|
2025-12-19 14:57:37 +08:00 |
|
Xunpeng Xiao
|
8c74dca76a
|
[feat] Models trained and inferred with FP8 are dequantized by default (#9627)
|
2025-12-18 22:54:35 +08:00 |
|
xvxuopop
|
e8deda53a1
|
[example] add Qwen3 series examples (#9624)
Co-authored-by: UsernameFull <tohowtodoit@gmail.com>
|
2025-12-18 21:27:00 +08:00 |
|
mrhaoxx
|
a769fb94b9
|
[feat] support ktransformers for dpo (#9621)
Co-authored-by: poryfly <porykid@gmail.com>
|
2025-12-18 21:26:25 +08:00 |
|
mrhaoxx
|
964569751f
|
[kt] refactor ktransformers integration (#9632)
|
2025-12-18 21:26:04 +08:00 |
|
Hertz
|
9fd4b094d4
|
[model] support VibeThinker models (#9616)
|
2025-12-16 21:50:46 +08:00 |
|
浮梦
|
18c21bce5a
|
[test] add allreduce test on npu (#9619)
Co-authored-by: frozenleaves <frozen@Mac.local>
|
2025-12-16 21:33:30 +08:00 |
|
sunyi0505
|
a0179772ab
|
[example] add deepspeed autotp config and example (#9602)
|
2025-12-15 15:15:26 +08:00 |
|
Yaowei Zheng
|
aeda079014
|
[v1] model loader (#9613)
|
2025-12-14 11:50:52 +08:00 |
|
Xunpeng Xiao
|
fdd24276ed
|
[feat] support new function call value (#9610)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
|
2025-12-14 00:20:33 +08:00 |
|
Yaowei Zheng
|
110d21713e
|
[v1] add dp & mp mesh (#9611)
|
2025-12-13 01:44:28 +08:00 |
|
Yaowei Zheng
|
203069e11c
|
[v1] add accelerator (#9607)
|
2025-12-12 19:22:06 +08:00 |
|
tangefly
|
4fd94141a4
|
[model] Add Ministral3 (#9582)
Co-authored-by: kingsley <kingsleydodonow@gmail.com>
|
2025-12-10 15:57:24 +08:00 |
|
Kingsley
|
22d6ac29d5
|
[model] Rename GLMV template (#9595)
|
2025-12-10 13:27:47 +08:00 |
|
DoubleWheat
|
cff4483392
|
[config] Fix RoPE scaling patch for resuming from a scaled model (#9588)
|
2025-12-09 20:37:37 +08:00 |
|
Yaowei Zheng
|
5d56817e2b
|
[misc] lint (#9593)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-09 18:00:35 +08:00 |
|
Yaowei Zheng
|
1bbb461f76
|
[assets] update readme (#9587)
|
2025-12-09 12:22:54 +08:00 |
|
Hertz
|
c1f5f8fff6
|
[model] support GLM4.6v (#9586)
|
2025-12-09 11:06:42 +08:00 |
|
Yaowei Zheng
|
5744f1ea94
|
[v1] add models & accelerator (#9579)
|
2025-12-08 02:30:25 +08:00 |
|
tangefly
|
739954910a
|
[deps] Update for Transformers v5 (#9569)
|
2025-12-08 01:13:32 +08:00 |
|
xvxuopop
|
109162dc56
|
[fix] fix the issue when using fsdp2 with gradient checkpointing. (#9541)
Co-authored-by: jin-yongxu <jinyongxu@h-partners.com>
|
2025-12-06 16:04:51 +08:00 |
|
jiaqiw09
|
165f3f073a
|
[examples] add fsdp config for mutiple nodes (#9575)
Co-authored-by: Yaowei Zheng <hiyouga@buaa.edu.cn>
|
2025-12-05 23:22:48 +08:00 |
|
jiaqiw09
|
efb13b7483
|
[V1] Refactor ascend MoE kernel patch logic & Support Qwen3-MoE (#9557)
|
2025-12-02 00:22:03 +08:00 |
|
Username_Full
|
e43a972b25
|
[test] add npu test yaml and add ascend a3 docker file (#9547)
Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>
|
2025-11-30 09:37:08 +08:00 |
|