Commit Graph

562 Commits

Author SHA1 Message Date
hoshi-hiyouga
ae392e054c [model] add qwen3 (#7885) 2025-04-29 09:34:05 +08:00
hoshi-hiyouga
2b7d564e3b [assets] update model readme (#7804) 2025-04-22 16:43:56 +08:00
hoshi-hiyouga
a62cba3d05 [example] add bash usage (#7794) 2025-04-22 00:25:51 +08:00
Juanxi Tian
d128382d3c [trainer] Add Muon Optimizer (#7749)
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-04-21 23:38:37 +08:00
hoshi-hiyouga
278df4308d [parser] support omegaconf (#7793) 2025-04-21 23:30:30 +08:00
hoshi-hiyouga
a4455e3021 [assets] update wechat (#7792) 2025-04-21 21:29:42 +08:00
hoshi-hiyouga
610f164c69 [trainer] fix pt loss (#7748)
* fix pt loss

* robust

* fix

* test
2025-04-17 03:15:35 +08:00
hoshi-hiyouga
0a0cfeb782 [breaking] bump transformers to 4.45.0 & improve ci (#7746)
* update ci

* fix

* fix

* fix

* fix

* fix
2025-04-17 02:36:48 +08:00
Kingsley
125513fa5c [model] support intern-VL 2.5-3 series (#7258)
* add internvl and rebase

* fix for internvl2&3

* remove lines

* fix video_inputs & lint

* nit

* add constants

* remove lines

* fix

* fix error

* pass ci

* pass ci

* skip internvl & nit
2025-04-17 00:31:30 +08:00
hoshi-hiyouga
ac8c6fdd3a [assets] update model readme (#7724) 2025-04-15 00:41:09 +08:00
hoshi-hiyouga
1fd4d14fbb [deps] upgrade transformers (#7704) 2025-04-13 18:11:34 +08:00
Yuxuan Zhang
481ecbf9c5 [model] add GLM-4-0414 (#7695)
* Update README_zh.md

* update
2025-04-13 17:10:45 +08:00
Eric Tang
6c53471de2 [data] support for specifying a dataset in cloud storage (#7567)
* add support for loading datasets from s3/gcs

* add comments to readme

* run linter and address comments

* add option to pass in kwargs to ray init (i.e. runtime env)

* address comment

* revert mixed up changes
2025-04-10 11:31:35 +08:00
hoshi-hiyouga
34fdabe005 [data] add coig-p dataset (#7657) 2025-04-09 21:18:25 +08:00
hoshi-hiyouga
39876b85fc [assets] update readme (#7644) 2025-04-09 01:06:06 +08:00
hoshi-hiyouga
5817cda37e [misc] fix packing and eval plot (#7623) 2025-04-07 18:20:57 +08:00
hoshi-hiyouga
7e0cdb1a76 [assets] update readme (#7612) 2025-04-06 13:58:49 +08:00
hoshi-hiyouga
aaf2e6ba2a [model] fix kv cache (#7564) 2025-04-01 23:07:46 +08:00
Billy Cao
5d1cc863a4 [data] shard the dataset to allow multiprocessing when streaming is enabled (#7530)
* Shard the dataset when streaming to allow multiprocessing

* Allow user to not set dataset_shards to ensure backward compatibility
2025-04-01 15:36:23 +08:00
Kingsley
185c76f6ad [model] add Qwen2.5-Omni model (#7537)
* preserve image_sizes

* preserve image_sizes

* init plugin

* support audio-text2text lora

* nit

* support image/video-text2text, audio-text2text

* remove args

* remove lines

* add docs && nit

* remove some comments

* fix && add merge part script

* add license
2025-03-31 20:39:35 +08:00
hoshi-hiyouga
59e12bffe8 [model] add qwen2vl 32b & upgrade peft (#7469)
* add qwen2vl 32b

* fix ci

* upgrade peft to 0.15

* fix ci

* fix ci
2025-03-25 12:15:58 +08:00
hoshi-hiyouga
833edc7c73 [assets] fix gemma3 readme (#7449) 2025-03-24 10:31:25 +08:00
hoshi-hiyouga
48a6584fb1 [assets] update videos (#7340)
* Update README.md

* Update README_zh.md
2025-03-17 15:48:02 +08:00
Hertz
a71e685021 [model] support hunyuan 7b (#7317)
* [Model]supported tencent-hunyuan model

* [Model]supported tencent-hunyuan model(fix)

* [Model]supported tencent-hunyuan model(fix)
2025-03-15 20:55:24 +08:00
Qiaolin Yu
30038d9ce7 [inference] support sglang backend (#7278)
* Mimic SGLang offline Engine

* Add more tests and args

* Pass all current tests

* Clean Code

* fix sample_params

* clean code

* Fix Stream Chat

* change sglang from engine mode to server mode

* fix

* Fix Review Issues

* Use SGLang Built-In Utilities

* Fix test SGLang

* Some Doc Issue

* fix sglang engine

* add readme

---------

Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
2025-03-15 04:37:58 +08:00
hoshi-hiyouga
e9b427d535 [assets] update video (#7287) 2025-03-13 18:45:47 +08:00
hoshi-hiyouga
165d3ed084 [model] support gemma3 (#7273) 2025-03-13 01:35:23 +08:00
hoshi-hiyouga
317d0855d2 [infer] fix vllm args (#7235)
Former-commit-id: ef7af457fc
2025-03-11 01:15:35 +08:00
hoshi-hiyouga
5a29f49fb1 [config] update args (#7231)
Former-commit-id: ed8b12e3cb
2025-03-10 23:04:43 +08:00
hoshi-hiyouga
5a0fd22c05 [assets] update readme (#7209)
Former-commit-id: cdf8fc6478
2025-03-07 17:27:49 +08:00
hoshi-hiyouga
e7556b591e [deps] upgrade vllm (#7183)
Former-commit-id: d739fddb10
2025-03-06 15:25:08 +08:00
hoshi-hiyouga
54a090079c [assets] update wechat (#7106)
Former-commit-id: d1863bbbaa
2025-02-28 12:01:04 +08:00
leo-pony
e86cb8a4fa [npu] update cann base image and torch 2.4 (#7061)
* Update base npu container image version:The Python version required for Hugging Face Transformers is >= python3.10

* Fix the bug: arg type of INSTALL_DEEPSPEED shoud been string now.

* Update Ascend CANN, CANN-Kernel and corresponding torch and torch-npu version

* Upgrade torch-npu needs packages' version: torch==2.1.0 and torch-npu==2.4.0.post2

Former-commit-id: acc52e0fe7
2025-02-25 23:32:01 +08:00
hoshi-hiyouga
f4aa0a146c [misc] fix project toml (#7067)
Former-commit-id: 96fd510e6a
2025-02-25 23:22:48 +08:00
hoshi-hiyouga
9359ee18ad [assets] update readme (#7051)
Former-commit-id: fe6dd92c84
2025-02-24 20:45:06 +08:00
hoshi-hiyouga
15f3087b96 [assets] update wechat (#7019)
Former-commit-id: 1481af5dc9
2025-02-20 20:32:33 +08:00
hoshi-hiyouga
beb1a9f9d9 [data] add r1 distill dataset (#6983)
Former-commit-id: 2591a3fa8b
2025-02-18 17:25:09 +08:00
hoshi-hiyouga
3fbd4848e8 [version] support transformers 449 (#6982)
* support transformers 449

* fix mm plugin

Former-commit-id: b00b290c07
2025-02-18 17:05:40 +08:00
hoshi-hiyouga
9b852ebe25 [misc] update readme (#6918)
Former-commit-id: 8956c93d9b
2025-02-13 01:01:41 +08:00
hoshi-hiyouga
07aa7b71a3 [misc] update readme (#6917)
Former-commit-id: 499ea45d1f
2025-02-13 00:58:10 +08:00
hoshi-hiyouga
1b02183da9 [misc] update readme (#6903)
Former-commit-id: 18179a3823
2025-02-11 22:51:26 +08:00
hoshi-hiyouga
c6be9e242c [misc] support export ollama modelfile (#6899)
* support export ollama modelfile

* update config

* add system and num ctx

Former-commit-id: 9184a6e0ed
2025-02-11 19:52:25 +08:00
Zhangchi Feng
5433b318bb [da'ta] fix minicpmv plugin (#6890)
* fix template name

* tiny fix

* support minicpm-o-2.6

* support inference of minicpmv

* update readme

* support dpo of minicpmv

* update init audio

* update init audio

* [model]fix image process in minicpmo

* fix no mm inputs

Former-commit-id: 764627645a
2025-02-11 13:30:44 +08:00
hoshi-hiyouga
fcd0f0480d [dataset] add openthought (#6866)
Former-commit-id: 1356f9d840
2025-02-09 00:53:01 +08:00
Zhangchi Feng
01915eaf40 [model] support audio (#6701)
* support qwen2_audio

* improve code

* lint

* fix

* fix

* fix

---------

Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
Former-commit-id: 24c7842948
2025-02-05 04:59:09 +08:00
neavo
32163e7ce0 [readme] update flash attention installation instruction on win platform (#6788)
* Update README_zh.md

* Update README.md

Former-commit-id: a417bcf8d9
2025-02-01 12:43:29 +08:00
hoshi-hiyouga
445d643ef3 [model] add mistral small models (#6786)
Former-commit-id: 94803d8133
2025-02-01 04:31:38 +08:00
hoshi-hiyouga
e8c1979b79 [model] add qwen2.5 vl models (#6779)
Former-commit-id: 999c7c8fe0
2025-01-31 03:00:29 +08:00
hoshi-hiyouga
f6779b0e0c [breaking] support transformers 4.48 (#6628)
Former-commit-id: 15357cdad9
2025-01-31 01:36:33 +08:00
hoshi-hiyouga
245de012ca [webui] improve webui & reasoning mode (#6778)
Former-commit-id: 45e68b9f09
2025-01-31 00:09:21 +08:00