LLaMA-Factory

mirror of https://github.com/hiyouga/LLaMA-Factory.git synced 2025-11-06 02:42:15 +08:00

Author	SHA1	Message	Date
Billy Cao	51e741ec85	[data] shard the dataset to allow multiprocessing when streaming is enabled (#7530 ) * Shard the dataset when streaming to allow multiprocessing * Allow user to not set dataset_shards to ensure backward compatibility	2025-04-01 15:36:23 +08:00
hoshi-hiyouga	1b1964714e	[misc] update format (#7277 )	2025-03-13 02:53:08 +08:00
hoshi-hiyouga	efa86e730c	[misc] upgrade format to py39 (#7256 )	2025-03-12 00:08:41 +08:00
hiyouga	ec251b4614	remove exit in preprocess Former-commit-id: f369b6ef41ffd9586ba568b88c5ff32a1af4bace	2025-03-11 15:08:25 +08:00
hoshi-hiyouga	8c7917d1a2	[data] fix loader (#7207 ) * fix dataloader * add test case * fix type * fix ci * fix ci * fix ci * disable overwrite cache in ci Former-commit-id: e84af0e140b1aafd1a6d6fe185a8e41c8fc5f831	2025-03-07 17:20:46 +08:00
hoshi-hiyouga	4e661b63e3	[data] fix predict dataset (#6972 ) Former-commit-id: f9a82e527877b1ed47cabb3d34f4d155705f4048	2025-02-17 20:29:40 +08:00
SrWYG	d9ea4baf00	[data] evaluate on each dataset (#5522 ) * [Update] loader.py , evaluate will run separate evaluations on each dataset. `If you pass a dictionary with names of datasets as keys and datasets as values, evaluate will run separate evaluations on each dataset. This can be useful to monitor how training affects other datasets or simply to get a more fine-grained evaluation` seq2seqtrainner support eval_dataset as Dict. * fix format * fix * fix --------- Co-authored-by: hiyouga <hiyouga@buaa.edu.cn> Former-commit-id: cf00f78650a442c85678ce805e030d2b96cbecd7	2025-02-13 02:19:03 +08:00
hoshi-hiyouga	efef4eaefc	[breaking change] refactor data pipeline (#6901 ) * refactor data * rename file Former-commit-id: 7a1a4ce6451cb782573d0bd9dd27a5e443e3a18b	2025-02-13 00:39:20 +08:00
hoshi-hiyouga	40b6e9045d	[misc] update license year & fix llama pro (#6814 ) * fix llamapro script * change year Former-commit-id: d9ae594178796994d400a5f207d6499712816f89	2025-02-05 01:53:33 +08:00
hiyouga	760dea0787	imporve log Former-commit-id: a6abf375975ffea3d51e1b944c9855b5f62ffac8	2025-01-08 09:56:10 +00:00
Yaser Afshar	45596f5ae0	Add trust_remote_code parameter and remove True - Introduced a new model parameter `trust_remote_code` - Set the default value of `trust_remote_code` to `False` to enhance security Former-commit-id: 4bf23f406cf5235c16f9f8139850c53354901814	2024-12-17 12:25:12 +00:00
hoshi-hiyouga	50782abce6	lint Former-commit-id: 191ccc585399ad4c6c2c4f280b144b2c0a4869f3	2024-12-04 22:08:27 +08:00
wangdepeng	5c496f2db8	fix:tokenized_path not None and load_from_disk return Dataset Trigger stuck Former-commit-id: cbf9da35728daaf98d92e699e891e334c74af1e5	2024-11-27 16:44:42 +08:00
hiyouga	41c1d014a9	fix #6149 Former-commit-id: b581b272793314a9602f4dc2fb646a988a6249df	2024-11-26 16:03:02 +00:00
hiyouga	a117731ecb	support rank0 logger Former-commit-id: 84528eabe560091bfd866b6a0ca864085af7529b	2024-11-02 18:31:04 +08:00
hiyouga	4329e5d4d8	tiny fix Former-commit-id: b8f4b145506851cf5488cd8551a04d1c7603019b	2024-10-30 08:56:29 +00:00
hiyouga	dbbfb5f5dc	use pre-commit Former-commit-id: 7cfede95df22a9ff236788f04159b6b16b8d04bb	2024-10-29 09:07:46 +00:00
hiyouga	916804d11a	tiny fix Former-commit-id: 1fe424323b212094856f423351dc2a15774d39c3	2024-10-11 23:51:54 +08:00
huniu20	28c602357e	add om_hub_token argument Former-commit-id: b3214e69d32067a1c22dbd60c2cde1545ba75b19	2024-10-10 17:16:46 +08:00
huniu20	4affb39ca2	1. add model and dataset info to support webui Former-commit-id: 92f6226f3fecbd9af744a7232dda2c68b2bb0d86	2024-10-10 16:46:34 +08:00
huniu20	c3a040b4a5	1. add modelers hub support Former-commit-id: 14678eb444d8181176745d18d4a6865fd6860f58	2024-10-09 17:21:37 +08:00
hiyouga	fb90faf19a	add docstrings, refactor logger Former-commit-id: c34e489d71f8f539028543ccf8ee92cecedd6276	2024-09-08 00:56:56 +08:00
hiyouga	3756553c25	update get template Former-commit-id: 21ea0d0786f91c0bce79630963e66b815a6792a0	2024-09-04 22:36:20 +08:00
hoshi-hiyouga	1a76d3e09c	Merge pull request #5323 from naem1023/feat/add-dataset-map-batch-size-argument Add batch size of map function in the preprocessed dataset Former-commit-id: c3428c5807500d087cdee4386798e10e39c9cf30	2024-09-04 22:09:36 +08:00
hoshi-hiyouga	7f94451034	fix #5228 Former-commit-id: 0d332ca8d0987c0331361934ab110fafa6402a7e	2024-09-04 19:10:30 +08:00
hiyouga	54d4c3fca7	lazy image load Former-commit-id: cdd733b575411e003bc5ffd6560dd8eff8aa09cf	2024-09-04 02:27:08 +08:00
naem1023	961a1d5ae1	feat: add batch size of map function in the preprocessed dataset Former-commit-id: 94b6cf06c2f84d0619b1a2dccaf8abb51de9951c	2024-09-02 13:52:47 +09:00
hiyouga	727193cdc9	follow #5115 Former-commit-id: 7d917e03e2df570139bae18227d9c7303a12de2a	2024-08-09 18:03:00 +08:00
“Wzw”	a53c99ecda	mask_history args verify valid Former-commit-id: 2f8388b4f4195d934400ad9267d72e10ca4105a3	2024-08-08 10:12:01 +08:00
hoshi-hiyouga	f78cd9f9da	Update loader.py Former-commit-id: 860e3eb374947b72dcae88cab0a93ef561e3bfb3	2024-07-15 00:50:06 +08:00
codingma	82e941ff61	1. add custom eval dataset support 2. merge load dataset and split dataset function Former-commit-id: 963d97ba07e7efa3a4544c4d077283d9e112b3ad	2024-07-05 15:52:10 +08:00
hoshi-hiyouga	70410aedc1	Update loader.py Former-commit-id: afa59d61844595e6b615227e6bfdc0b16c8015dd	2024-06-24 23:06:18 +08:00
hiyouga	acfae2e677	add license Former-commit-id: 69cfc98d7c81756a5ab6bf962240e393e449fef0	2024-06-15 17:54:33 +08:00
hiyouga	e8885443a9	fix #4221 Former-commit-id: 05a3be4853b941909e7d193c31e8d62c8c5f879b	2024-06-13 02:48:21 +08:00
hiyouga	0b1f4a34f8	rename files Former-commit-id: e1a8431770fc36c0c9ee7fed4abbc3d7fdcc5efd	2024-06-07 00:09:06 +08:00
hiyouga	56a6db6d84	fix ppo dataset bug #4012 Former-commit-id: 7fc51b2e93698ae5e012566af8481f4d861c873d	2024-06-06 19:03:20 +08:00
hiyouga	1cc9508fb3	tiny fix Former-commit-id: f9d50501aac1f60a3b445ca3fee9aa60995461ee	2024-06-04 00:31:10 +08:00
hiyouga	920b091581	fix #3992 Former-commit-id: a48321fbf5196b88a11106cf74a74fbcea2ea50b	2024-06-04 00:17:36 +08:00
hiyouga	2e843a4cf6	fix data loader hint Former-commit-id: 25b56126a11591b0155e2f72b673dd8f45a6c8c9	2024-06-03 18:28:27 +08:00
hoshi-hiyouga	ae773f9355	Update loader.py Former-commit-id: 0aa59322906d91c5e385c9c02ebb5dd64ba060f3	2024-05-30 00:20:20 +08:00
hoshi-hiyouga	88f4c583d3	Update loader.py Former-commit-id: aa7f335e3ad5a78e4ed5f99c120be28e9733ea2e	2024-05-30 00:17:21 +08:00
hoshi-hiyouga	d5ee485440	Update loader.py Former-commit-id: 19d8fd62c18ee3ba0e431fc241f7d315cb716fef	2024-05-30 00:12:12 +08:00
seanzhang-zhichen	fc6c31127a	Merge branch 'main' into add_dataset_sample_num Former-commit-id: 26300127c45f24e63b91f1b0cc73e46c3a936a91	2024-05-24 15:57:47 +08:00
hiyouga	664cba05e3	refactor data preprocessing, fix mllm rlhf Former-commit-id: 53ff2dd24f9121ea30c95063bb72e49a9b31e980	2024-05-24 04:08:25 +08:00
zhangzc	e84b72f806	fix conflict Former-commit-id: 6922b23a748c2459147bf44b96d86daa89f2c96c	2024-05-20 17:10:01 +08:00
hiyouga	d24969bb7e	improve KTO impl., replace datasets Former-commit-id: e56a57ddcf061de6e4acc8679f7dbf0b68364986	2024-05-18 03:44:56 +08:00
enji.zhou	d16a1d9ed0	add kto Former-commit-id: ec51986cf70b0bdd79b8141e45916670fb97a08e	2024-05-17 13:09:17 +08:00
hiyouga	ee759aa0d8	rename package Former-commit-id: a07ff0c083558cfe6f474d13027642d3052fee08	2024-05-16 18:39:08 +08:00

48 Commits