pytorch3d

mirror of https://github.com/facebookresearch/pytorch3d.git synced 2025-08-03 04:12:48 +08:00

Author	SHA1	Message	Date
Nikhila Ravi	aa8b03f31d	Updates to support Accelerate and multigpu training (#37 ) Summary: ## Changes: - Added Accelerate Library and refactored experiment.py to use it - Needed to move `init_optimizer` and `ExperimentConfig` to a separate file to be compatible with submitit/hydra - Needed to make some modifications to data loaders etc to work well with the accelerate ddp wrappers - Loading/saving checkpoints incorporates an unwrapping step so remove the ddp wrapped model ## Tests Tested with both `torchrun` and `submitit/hydra` on two gpus locally. Here are the commands: Torchrun Modules loaded: ```sh 1) anaconda3/2021.05 2) cuda/11.3 3) NCCL/2.9.8-3-cuda.11.3 4) gcc/5.2.0. (but unload gcc when using submit) ``` ```sh torchrun --nnodes=1 --nproc_per_node=2 experiment.py --config-path ./configs --config-name repro_singleseq_nerf_test ``` Submitit/Hydra Local test ```sh ~/pytorch3d/projects/implicitron_trainer$ HYDRA_FULL_ERROR=1 python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_local hydra.launcher.gpus_per_node=2 hydra.launcher.tasks_per_node=2 hydra.launcher.nodes=1 ``` Submitit/Hydra distributed test ```sh ~/implicitron/pytorch3d$ python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_slurm hydra.launcher.gpus_per_node=8 hydra.launcher.tasks_per_node=8 hydra.launcher.nodes=1 hydra.launcher.partition=learnlab hydra.launcher.timeout_min=4320 ``` ## TODOS: - Fix distributed evaluation: currently this doesn't work as the input format to the evaluation function is not suitable for gathering across gpus (needs to be nested list/tuple/dicts of objects that satisfy `is_torch_tensor`) and currently `frame_data` contains `Cameras` type. - Refactor the `accelerator` object to be accessible by all functions instead of needing to pass it around everywhere? Maybe have a `Trainer` class and add it as a method? - Update readme with installation instructions for accelerate and also commands for running jobs with torchrun and submitit/hydra X-link: https://github.com/fairinternal/pytorch3d/pull/37 Reviewed By: davnov134, kjchalup Differential Revision: D37543870 Pulled By: bottler fbshipit-source-id: be9eb4e91244d4fe3740d87dafec622ae1e0cf76	2022-07-11 19:29:58 -07:00
Jeremy Reizenstein	40fb189c29	typing for trainer Summary: Enable pyre checking of the trainer code. Reviewed By: shapovalov Differential Revision: D36545438 fbshipit-source-id: db1ea8d1ade2da79a2956964eb0c7ba302fa40d1	2022-07-06 07:13:41 -07:00
Jeremy Reizenstein	fbd3c679ac	rename ImplicitronDataset to JsonIndexDataset Summary: The ImplicitronDataset class corresponds to JsonIndexDatasetMapProvider Reviewed By: shapovalov Differential Revision: D36661396 fbshipit-source-id: 80ca2ff81ef9ecc2e3d1f4e1cd14b6f66a7ec34d	2022-05-25 10:16:59 -07:00
Jeremy Reizenstein	79c61a2d86	dataset_map_provider Summary: replace dataset_zoo with a pluggable DatasetMapProvider. The logic is now in annotated_file_dataset_map_provider. Reviewed By: shapovalov Differential Revision: D36443965 fbshipit-source-id: 9087649802810055e150b2fbfcc3c197a761f28a	2022-05-20 07:50:30 -07:00
Jeremy Reizenstein	69c6d06ed8	New file for ImplicitronDatasetBase Summary: Separate ImplicitronDatasetBase and FrameData (to be used by all data sources) from ImplicitronDataset (which is specific). Reviewed By: shapovalov Differential Revision: D36413111 fbshipit-source-id: 3725744cde2e08baa11aff4048237ba10c7efbc6	2022-05-20 07:50:30 -07:00
Jeremy Reizenstein	73dc109dba	data_source Summary: Move dataset_args and dataloader_args from ExperimentConfig into a new member called datasource so that it can contain replaceables. Also add enum Task for task type. Reviewed By: shapovalov Differential Revision: D36201719 fbshipit-source-id: 47d6967bfea3b7b146b6bbd1572e0457c9365871	2022-05-20 07:50:30 -07:00
Jeremy Reizenstein	2c1901522a	return types for dataset_zoo, dataloader_zoo Summary: Stronger typing for these functions Reviewed By: shapovalov Differential Revision: D36170489 fbshipit-source-id: a2104b29dbbbcfcf91ae1d076cd6b0e3d2030c0b	2022-05-13 05:38:14 -07:00
Roman Shapovalov	a6dada399d	Extracted ImplicitronModelBase and unified API for GenericModel and ModelDBIR Summary: To avoid model_zoo, we need to make GenericModel pluggable. I also align creation APIs for convenience. Reviewed By: bottler, davnov134 Differential Revision: D35933093 fbshipit-source-id: 8228926528eb41a795fbfbe32304b8019197e2b1	2022-05-09 15:23:07 -07:00
Roman Shapovalov	e2622d79c0	Using the new dataset idx API everywhere. Summary: Using the API from D35012121 everywhere. Reviewed By: bottler Differential Revision: D35045870 fbshipit-source-id: dab112b5e04160334859bbe8fa2366344b6e0f70	2022-03-24 05:33:25 -07:00
Jeremy Reizenstein	cdd2142dd5	implicitron v0 (#1133 ) Co-authored-by: Jeremy Francis Reizenstein <bottler@users.noreply.github.com>	2022-03-21 13:20:10 -07:00

10 Commits