Summary:
## Changes:
- Added Accelerate Library and refactored experiment.py to use it
- Needed to move `init_optimizer` and `ExperimentConfig` to a separate file to be compatible with submitit/hydra
- Needed to make some modifications to data loaders etc to work well with the accelerate ddp wrappers
- Loading/saving checkpoints incorporates an unwrapping step so remove the ddp wrapped model
## Tests
Tested with both `torchrun` and `submitit/hydra` on two gpus locally. Here are the commands:
**Torchrun**
Modules loaded:
```sh
1) anaconda3/2021.05 2) cuda/11.3 3) NCCL/2.9.8-3-cuda.11.3 4) gcc/5.2.0. (but unload gcc when using submit)
```
```sh
torchrun --nnodes=1 --nproc_per_node=2 experiment.py --config-path ./configs --config-name repro_singleseq_nerf_test
```
**Submitit/Hydra Local test**
```sh
~/pytorch3d/projects/implicitron_trainer$ HYDRA_FULL_ERROR=1 python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_local hydra.launcher.gpus_per_node=2 hydra.launcher.tasks_per_node=2 hydra.launcher.nodes=1
```
**Submitit/Hydra distributed test**
```sh
~/implicitron/pytorch3d$ python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_slurm hydra.launcher.gpus_per_node=8 hydra.launcher.tasks_per_node=8 hydra.launcher.nodes=1 hydra.launcher.partition=learnlab hydra.launcher.timeout_min=4320
```
## TODOS:
- Fix distributed evaluation: currently this doesn't work as the input format to the evaluation function is not suitable for gathering across gpus (needs to be nested list/tuple/dicts of objects that satisfy `is_torch_tensor`) and currently `frame_data` contains `Cameras` type.
- Refactor the `accelerator` object to be accessible by all functions instead of needing to pass it around everywhere? Maybe have a `Trainer` class and add it as a method?
- Update readme with installation instructions for accelerate and also commands for running jobs with torchrun and submitit/hydra
X-link: https://github.com/fairinternal/pytorch3d/pull/37
Reviewed By: davnov134, kjchalup
Differential Revision: D37543870
Pulled By: bottler
fbshipit-source-id: be9eb4e91244d4fe3740d87dafec622ae1e0cf76
Summary: The ImplicitronDataset class corresponds to JsonIndexDatasetMapProvider
Reviewed By: shapovalov
Differential Revision: D36661396
fbshipit-source-id: 80ca2ff81ef9ecc2e3d1f4e1cd14b6f66a7ec34d
Summary: replace dataset_zoo with a pluggable DatasetMapProvider. The logic is now in annotated_file_dataset_map_provider.
Reviewed By: shapovalov
Differential Revision: D36443965
fbshipit-source-id: 9087649802810055e150b2fbfcc3c197a761f28a
Summary: Separate ImplicitronDatasetBase and FrameData (to be used by all data sources) from ImplicitronDataset (which is specific).
Reviewed By: shapovalov
Differential Revision: D36413111
fbshipit-source-id: 3725744cde2e08baa11aff4048237ba10c7efbc6
Summary:
Move dataset_args and dataloader_args from ExperimentConfig into a new member called datasource so that it can contain replaceables.
Also add enum Task for task type.
Reviewed By: shapovalov
Differential Revision: D36201719
fbshipit-source-id: 47d6967bfea3b7b146b6bbd1572e0457c9365871
Summary:
To avoid model_zoo, we need to make GenericModel pluggable.
I also align creation APIs for convenience.
Reviewed By: bottler, davnov134
Differential Revision: D35933093
fbshipit-source-id: 8228926528eb41a795fbfbe32304b8019197e2b1
Summary: Using the API from D35012121 everywhere.
Reviewed By: bottler
Differential Revision: D35045870
fbshipit-source-id: dab112b5e04160334859bbe8fa2366344b6e0f70