Billy Cao 00409ff28a [data] shard the dataset to allow multiprocessing when streaming is enabled (#7530)
* Shard the dataset when streaming to allow multiprocessing

* Allow user to not set dataset_shards to ensure backward compatibility
2025-04-01 15:36:23 +08:00
..
2025-03-23 14:32:22 +08:00
2025-03-13 02:53:08 +08:00