Commit Graph

  • 1e503a982d [assets] correct typo in examples/README_zh.md (#10462) main simulikeit 2026-05-07 00:42:01 +08:00
  • 8752280dd7 [data] Optimize QwenVL video dataset preprocessing (#10404) luca-888 2026-05-03 18:36:56 +08:00
  • 468723c5d9 [packing] fix GDN crash when meeting dummy image (#10453) Kingsley 2026-05-01 12:10:13 +08:00
  • 887ee2b121 [refactor] Add KTransformers AMX MoE SFT support via Accelerate (#10430) Peilin Li 2026-05-01 01:47:58 +08:00
  • 6b08b948c9 [misc] bump transformers version upperbound (#10446) Kingsley 2026-05-01 01:30:11 +08:00
  • f7f3bfcbd7 [model] support Hy3-Preview (#10432) Hertz 2026-04-29 23:21:13 +08:00
  • 3475198d1e [fa2] fix IMA when train qwen3_5 (#10448) Kingsley 2026-04-29 20:20:55 +08:00
  • 50945ef850 [v1] fix device_mesh and sp for fsdp2 (#10429) sunyi0505 2026-04-28 11:20:11 +08:00
  • 2f0bef207a [export] handle NotImplementedError in export_model for transformers>=5.0 (fixes #10410) (#10438) Octopus 2026-04-27 23:36:23 +08:00
  • 2092abc217 [npu] add Qwen3.5 support with Partial RoPE and Hybrid Attention (#10421) curnane-lab 2026-04-27 23:36:07 +08:00
  • 99464b3d03 [misc] code lint (#10439) Kingsley 2026-04-27 14:07:31 +08:00
  • 9a0cfdccfa [v1] fix init on meta in transformers v5 (#10414) jiaqiw09 2026-04-27 00:37:09 +08:00
  • c8890c32db [data] support discard history cot for multiturn (#10435) Kingsley 2026-04-27 00:32:44 +08:00
  • 79c8332e4c [train] add qwen35 patch for neat_packing (#10436) Kingsley 2026-04-27 00:31:49 +08:00
  • e0bc3c1971 [v1] fix epoch and steps (#10422) jiaqiw09 2026-04-23 17:29:06 +08:00
  • ecca167eb4 [model] support qwen3.6 models (#10415) 浮梦 2026-04-22 19:44:01 +08:00
  • 28a6ea1cdc [v1] add deepspeed zero3 trigger for low memory usage weight loading (#10300) jiaqiw09 2026-04-21 14:09:52 +08:00
  • f5d739b132 [v1] fix device mesh and clip_grad_norm for ulysses cp (#10366) sunyi0505 2026-04-21 10:54:54 +08:00
  • c4bbac49b2 [v1] support resume training from checkpoint (#10280) 浮梦 2026-04-20 20:28:08 +08:00
  • c5aecaf31d [data] fix SeedToolUtils.tool_extractor returns content when no tool calls found (#10408) Cocoon-Break 2026-04-20 12:22:55 +08:00
  • 436d26bc28 fix: projector lookup for gemma4 modules (#10382) Kingsley 2026-04-12 08:32:14 +08:00
  • c109c061e5 [model] set mm_projectors for omni models (#10378) Kingsley 2026-04-10 18:12:57 +08:00
  • fa09c01c36 fix: gemma4 mm_token_type_ids padding (#10359) Kingsley 2026-04-06 13:14:45 +08:00
  • eae6f0b541 [model] gemma4 (#10346) Kingsley 2026-04-05 12:10:28 +08:00
  • acac63ef35 [data] fix qwen3vl timestamp (#10338) Kingsley 2026-04-01 22:40:12 +08:00
  • e5e8546493 [misc] fix moe (#10334) 浮梦 2026-03-31 23:04:45 +08:00
  • 97433c53b6 [feat] support LlamaFactory SFT training by HyperParallel FSDP2 backend (#10289) Cui-yshoho 2026-03-30 10:47:20 +08:00
  • b5afabe3d2 [v1] support ulysses cp for fsdp2 (#10262) sunyi0505 2026-03-27 16:22:48 +08:00
  • df2e6edb7e [v1] add init on rank0 for fsdp2 (#10264) jiaqiw09 2026-03-27 14:54:03 +08:00
  • d02fcd3588 [ci] add nginx cache config for Ascend NPU CI environment (#10323) Goalina 2026-03-27 10:04:16 +08:00
  • c340aa2a33 [v1] add callbacks (#10255) jiaqiw09 2026-03-26 19:59:57 +08:00
  • 1e536733c6 [data] fix mimo-v2 tool call (#10315) Hertz 2026-03-26 17:37:22 +08:00
  • 97d479fa92 [model] support Qwen3.5 liger kernel (#10313) Yutong Wu 2026-03-24 18:25:33 +08:00
  • ffbff33af3 chore: mca workflow compatible with qwen-vl series (#10303) Kingsley 2026-03-22 02:28:52 +08:00
  • 833f6027b1 [fix] fit neat_packing & mrope model packing (#10283) Kingsley 2026-03-20 16:50:11 +08:00
  • d91d8af89e [data] add SGSC zero-hallucination B2B dataset (NOO-Protocol) (#10284) robertglools 2026-03-20 15:49:03 +08:00
  • e67ab9e2f2 fix:MiniCPMVPlugin IndexError in process_messages when training with video (#10276) xxddccaa 2026-03-18 19:18:06 +08:00
  • 2c4f121817 [fix] handle empty content list in system message (#10291) LincolnBurrows2017 2026-03-18 12:05:49 +08:00
  • 487f8b8191 [v1] add qwen3 templates and fix rendering plugin. (#10212) xvxuopop 2026-03-18 11:30:50 +08:00
  • 78cad1e332 [fix] unused keys in ray example (#10290) SnowCharm 2026-03-18 00:23:53 +08:00
  • 70653026f5 [fix] make position_id_per_seconds configurable for Qwen2OmniPlugin (#10281) LincolnBurrows2017 2026-03-16 19:42:38 +08:00
  • 246192abd2 [data] correct gpt_oss template format_assistant (#10269) Ruijie Hou 2026-03-10 21:36:38 +08:00
  • 0258dc14d0 [docker] update npu docker (#10268) 浮梦 2026-03-10 19:37:27 +08:00
  • 3045adf0ba [fix] fallback to audio_processor when feature_extractor is missing (#10267) xxddccaa 2026-03-10 19:36:41 +08:00
  • a3d44e3152 [mca] support qwen3.5 (#10265) Kingsley 2026-03-10 10:55:16 +08:00
  • edeb953bc7 [data] convert filter() to list in read_cloud_json to fix broken empty-check (#10260) JiangNan 2026-03-09 17:12:53 +08:00
  • d045794387 [docs] fix Python version requirement from 3.10 to >=3.11.0 (#10259) yizhouChen 2026-03-09 16:44:07 +08:00
  • 9501c3308a [train] fix compatibility issue with HuggingFace Dataset Column when sav… (#10254) pyx 2026-03-06 18:44:57 +08:00
  • 0ee1c42c2b [v1] Support meta loading for full and free (#10236) jiaqiw09 2026-03-05 23:15:27 +08:00
  • 3061f48d55 [ray] fix get ray head ip (#10252) SnowCharm 2026-03-05 23:14:38 +08:00
  • 2d9bd2aa14 [fix] qwen3.5 projector path (#10242) LittleYanlin 2026-03-04 01:31:09 +08:00
  • c0245c43fc [model] support Qwen3.5 all series models (#10237) Hertz 2026-03-03 17:34:59 +08:00
  • eb976d75a2 [tracker] Add Trackio Integration for LlamaFactory (#10165) Parag Ekbote 2026-03-03 14:49:37 +05:30
  • b5cb7cb0e6 [misc] fix constants (#10232) Yaowei Zheng 2026-03-02 11:10:48 +08:00
  • 0779846513 [infer] support mixed multimodal payloads (#10225) Philip Ottesen 2026-02-28 13:26:53 +01:00
  • 45d335c709 [v1] add seed for training and fix gradient checkpointing (#10211) jiaqiw09 2026-02-28 18:16:06 +08:00
  • 816480012f [fix] register visual part for Qwen3.5 (#10227) Kingsley 2026-02-28 16:39:24 +08:00
  • d3bf882e87 [docker] upgrade to ROCm 7.2 base image, drop PyTorch reinstall (#10223) Mikko Tukiainen 2026-02-27 14:16:33 +02:00
  • 589da21d32 [model] support Aeva (#10214) 娄宗志 2026-02-26 23:03:13 +08:00
  • 122cd46084 [model] update constants (#10220) Yaowei Zheng 2026-02-26 21:13:56 +08:00
  • 2b8b871475 [model] Adapt Qwen3.5 (#10213) 浮梦 2026-02-26 20:45:02 +08:00
  • aab9b400bb [model] Add DeepSpeed Z3 leaf module for Qwen3-Next (#10194) Shanay Mehta 2026-02-24 17:24:37 +05:30
  • 50599c719b [misc] remove safe_serialization arg for transformers v5 compatibility (#10208) P. Clawmogorov 2026-02-24 04:14:19 +01:00
  • a0f3ad0cee [mca] update supported models (#10196) Kingsley 2026-02-20 22:02:49 +08:00
  • f80e15dbb4 [ci] fix ut huggingface hub 429 error when transformers>=5.0.0 (#10155) jiaqiw09 2026-02-12 22:14:10 +08:00
  • 991267fd3b [v1] support quantization (#10161) sunyi0505 2026-02-12 20:37:41 +08:00
  • 5c52afa30d [v1] support deepspeed (#10181) 浮梦 2026-02-12 17:24:30 +08:00
  • 675ce8cc7f [algo] add ASFT (#10174) Junyou Su 2026-02-12 13:12:14 +08:00
  • ab073f4c13 [v1] add LoRA/Freeze support and merge workflow (#10157) jiaqiw09 2026-02-12 13:02:09 +08:00
  • 184304b5b4 [model] add liger kernel support for Qwen3-Next (#10176) Shanay Mehta 2026-02-10 19:17:48 +05:30
  • d3ebd5678d [model] support GLM-OCR SFT (#10183) Xue Yadong 2026-02-10 21:41:01 +08:00
  • 1d5e8ebcd0 [v1] init commit for v1 docs (#10145) 浮梦 2026-02-09 19:43:55 +08:00
  • ea644d04ec [model] support GLM-4.7-Flash SFT (#10173) Shanay Mehta 2026-02-09 08:10:44 +05:30
  • 92fa3df4c4 [trainer] add dpo/kto fsdp fsdp2 support (#10127) Username_Full 2026-02-04 23:27:12 +08:00
  • 8bedfafa4e [model] support MiniCPM-o-4.5 (#10163) Hertz 2026-02-04 23:21:27 +08:00
  • 1a02717fa8 [assets] update readme (#10159) Yaowei Zheng 2026-02-03 19:11:15 +08:00
  • e7cb145f5d [logging] Fix race condition in LoggerHandler during multi-GPU training (#10156) ゆり 2026-02-03 11:14:07 +08:00
  • b53d7037c2 [model] support youtu-vl model (#10152) Hertz 2026-02-02 21:42:43 +08:00
  • bf04ca6af8 [deps] adapt to transformers v5 (#10147) 浮梦 2026-02-02 12:07:19 +08:00
  • 762b480131 [feature] support using ray.remote to start distributed training. (#10109) xvxuopop 2026-01-28 16:05:29 +08:00
  • 9640f79ae5 [fix] add visual.pos_embed to Qwen3-VL visual model keys (#10139) Jewon Lee 2026-01-27 17:33:01 +09:00
  • 7ef19eea00 [v0] Fix reward model training safetensors saving (#10137) jiaqiw09 2026-01-27 16:27:14 +08:00
  • f9f11dcb97 [v1] support training with fsdp2 (#9773) 浮梦 2026-01-25 19:41:58 +08:00
  • 641bfdd482 chore: Update outdated GitHub Actions versions (#10123) Pádraic Slattery 2026-01-25 12:12:39 +01:00
  • e70651ac58 [feat] support all_exhausted_without_replacement in datasets.interleave_datasets (#10112) Meng WANG 2026-01-20 15:54:07 +08:00
  • db2f794f7b [misc] update mcore related docker and mca supported models (#10114) Kingsley 2026-01-19 14:55:16 +08:00
  • 44eadbda1c [v1] fix kernel moe patch (#9867) jiaqiw09 2026-01-17 09:24:54 +08:00
  • 9829ae0a77 [ci] using mp to run kernel test (#9754) 浮梦 2026-01-13 19:43:59 +08:00
  • 958b9c3468 [v1] add sft (#9752) Yaowei Zheng 2026-01-12 03:15:01 +08:00
  • 4d3621e3d3 [model] fixed&added Hunyuan models (#9750) Hertz 2026-01-12 01:15:00 +08:00
  • a296723697 [v1] upgrade batching (#9751) Yaowei Zheng 2026-01-12 00:21:36 +08:00
  • 15b87f3125 [model] support HY-MT model (#9746) Hertz 2026-01-11 16:25:56 +08:00
  • 9f73a6eb23 [deps] fix package (#9745) Yaowei Zheng 2026-01-10 04:27:53 +08:00
  • b2effbd77c [v1] add batch generator (#9744) Yaowei Zheng 2026-01-10 04:24:09 +08:00
  • d7d734d54c [misc] fix fp8 (#9742) Yaowei Zheng 2026-01-09 16:17:26 +08:00
  • 8abb8fb533 [v1] use async streamer (#9741) Yaowei Zheng 2026-01-09 16:07:40 +08:00
  • 766d5ae6ad [ci] fix workflow (#9738) Yaowei Zheng 2026-01-09 14:48:16 +08:00
  • 5cccaeec82 [model] clean obsolete models (#9736) Yaowei Zheng 2026-01-09 14:08:18 +08:00
  • 5fb5d7ebd3 [model] support for microsoft's Phi-4-mini (#9734) Jackey 2026-01-09 12:24:45 +08:00
  • 03a70ba8dd [fix] correct ktransformers example config paths and templates (#9732) Peilin Li 2026-01-08 10:52:50 +08:00