21 Commits

Author SHA1 Message Date
Jeremy Reizenstein
9b2e570536 option to avoid accelerate
Summary: For debugging, introduce PYTORCH3D_NO_ACCELERATE env var.

Reviewed By: shapovalov

Differential Revision: D37885393

fbshipit-source-id: de080080c0aa4b6d874028937083a0113bb97c23
2022-07-17 13:15:59 -07:00
Roman Shapovalov
4261e59f51 Fix: making visualisation work again
Summary:
1. Respecting `visdom_show_preds` parameter when it is False.
2. Clipping the images pre-visualisation, which is important for methods like SRN that are not arare of pixel value range.

Reviewed By: bottler

Differential Revision: D37786439

fbshipit-source-id: 8dbb5104290bcc5c2829716b663cae17edc911bd
2022-07-13 05:29:09 -07:00
Nikhila Ravi
aa8b03f31d Updates to support Accelerate and multigpu training (#37)
Summary:
## Changes:
- Added Accelerate Library and refactored experiment.py to use it
- Needed to move `init_optimizer` and `ExperimentConfig` to a separate file to be compatible with submitit/hydra
- Needed to make some modifications to data loaders etc to work well with the accelerate ddp wrappers
- Loading/saving checkpoints incorporates an unwrapping step so remove the ddp wrapped model

## Tests

Tested with both `torchrun` and `submitit/hydra` on two gpus locally. Here are the commands:

**Torchrun**

Modules loaded:
```sh
1) anaconda3/2021.05   2) cuda/11.3   3) NCCL/2.9.8-3-cuda.11.3   4) gcc/5.2.0. (but unload gcc when using submit)
```

```sh
torchrun --nnodes=1 --nproc_per_node=2 experiment.py --config-path ./configs --config-name repro_singleseq_nerf_test
```

**Submitit/Hydra Local test**

```sh
~/pytorch3d/projects/implicitron_trainer$ HYDRA_FULL_ERROR=1 python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs  hydra/launcher=submitit_local hydra.launcher.gpus_per_node=2 hydra.launcher.tasks_per_node=2 hydra.launcher.nodes=1
```

**Submitit/Hydra distributed test**

```sh
~/implicitron/pytorch3d$ python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs  hydra/launcher=submitit_slurm hydra.launcher.gpus_per_node=8 hydra.launcher.tasks_per_node=8 hydra.launcher.nodes=1 hydra.launcher.partition=learnlab hydra.launcher.timeout_min=4320
```

## TODOS:
- Fix distributed evaluation: currently this doesn't work as the input format to the evaluation function is not suitable for gathering across gpus (needs to be nested list/tuple/dicts of objects that satisfy `is_torch_tensor`) and currently `frame_data`  contains `Cameras` type.
- Refactor the `accelerator` object to be accessible by all functions instead of needing to pass it around everywhere? Maybe have a `Trainer` class and add it as a method?
- Update readme with installation instructions for accelerate and also commands for running jobs with torchrun and submitit/hydra

X-link: https://github.com/fairinternal/pytorch3d/pull/37

Reviewed By: davnov134, kjchalup

Differential Revision: D37543870

Pulled By: bottler

fbshipit-source-id: be9eb4e91244d4fe3740d87dafec622ae1e0cf76
2022-07-11 19:29:58 -07:00
Jeremy Reizenstein
efb721320a extract camera_difficulty_bin_breaks
Summary: As part of removing Task, move camera difficulty bin breaks from hard code to the top level.

Reviewed By: davnov134

Differential Revision: D37491040

fbshipit-source-id: f2d6775ebc490f6f75020d13f37f6b588cc07a0b
2022-07-06 07:13:41 -07:00
Jeremy Reizenstein
40fb189c29 typing for trainer
Summary: Enable pyre checking of the trainer code.

Reviewed By: shapovalov

Differential Revision: D36545438

fbshipit-source-id: db1ea8d1ade2da79a2956964eb0c7ba302fa40d1
2022-07-06 07:13:41 -07:00
Jeremy Reizenstein
4e87c2b7f1 get_all_train_cameras
Summary: As part of removing Task, make the dataset code generate the source cameras for itself. There's a small optimization available here, in that the JsonIndexDataset could avoid loading images.

Reviewed By: shapovalov

Differential Revision: D37313423

fbshipit-source-id: 3e5e0b2aabbf9cc51f10547a3523e98c72ad8755
2022-07-06 07:13:41 -07:00
Jeremy Reizenstein
65f667fd2e loading llff and blender datasets
Summary: Copy code from NeRF for loading LLFF data and blender synthetic data, and create dataset objects for them

Reviewed By: shapovalov

Differential Revision: D35581039

fbshipit-source-id: af7a6f3e9a42499700693381b5b147c991f57e5d
2022-06-16 03:09:15 -07:00
Jeremy Reizenstein
023a2369ae test configs are loadable
Summary: Add test that the yaml files deserialize.

Reviewed By: davnov134

Differential Revision: D36830673

fbshipit-source-id: b785d8db97b676686036760bfa2dd3fa638bda57
2022-06-10 12:22:46 -07:00
Jeremy Reizenstein
c0f88e04a0 make ExperimentConfig Configurable
Summary: Preparing for pluggables in experiment.py

Reviewed By: davnov134

Differential Revision: D36830674

fbshipit-source-id: eab499d1bc19c690798fbf7da547544df7e88fa5
2022-06-10 12:22:46 -07:00
Jeremy Reizenstein
c31bf85a23 test runner for experiment.py
Summary: Add simple interactive testrunner for experiment.py

Reviewed By: shapovalov

Differential Revision: D35316221

fbshipit-source-id: d424bcba632eef89eefb56e18e536edb58ec6f85
2022-05-26 05:33:03 -07:00
Jeremy Reizenstein
fbd3c679ac rename ImplicitronDataset to JsonIndexDataset
Summary: The ImplicitronDataset class corresponds to JsonIndexDatasetMapProvider

Reviewed By: shapovalov

Differential Revision: D36661396

fbshipit-source-id: 80ca2ff81ef9ecc2e3d1f4e1cd14b6f66a7ec34d
2022-05-25 10:16:59 -07:00
Jeremy Reizenstein
0f12c51646 data_loader_map_provider
Summary: replace dataloader_zoo with a pluggable DataLoaderMapProvider.

Reviewed By: shapovalov

Differential Revision: D36475441

fbshipit-source-id: d16abb190d876940434329928f2e3f2794a25416
2022-05-20 07:50:30 -07:00
Jeremy Reizenstein
79c61a2d86 dataset_map_provider
Summary: replace dataset_zoo with a pluggable DatasetMapProvider. The logic is now in annotated_file_dataset_map_provider.

Reviewed By: shapovalov

Differential Revision: D36443965

fbshipit-source-id: 9087649802810055e150b2fbfcc3c197a761f28a
2022-05-20 07:50:30 -07:00
Jeremy Reizenstein
69c6d06ed8 New file for ImplicitronDatasetBase
Summary: Separate ImplicitronDatasetBase and FrameData (to be used by all data sources) from ImplicitronDataset (which is specific).

Reviewed By: shapovalov

Differential Revision: D36413111

fbshipit-source-id: 3725744cde2e08baa11aff4048237ba10c7efbc6
2022-05-20 07:50:30 -07:00
Jeremy Reizenstein
73dc109dba data_source
Summary:
Move dataset_args and dataloader_args from ExperimentConfig into a new member called datasource so that it can contain replaceables.

Also add enum Task for task type.

Reviewed By: shapovalov

Differential Revision: D36201719

fbshipit-source-id: 47d6967bfea3b7b146b6bbd1572e0457c9365871
2022-05-20 07:50:30 -07:00
Jeremy Reizenstein
2c1901522a return types for dataset_zoo, dataloader_zoo
Summary: Stronger typing for these functions

Reviewed By: shapovalov

Differential Revision: D36170489

fbshipit-source-id: a2104b29dbbbcfcf91ae1d076cd6b0e3d2030c0b
2022-05-13 05:38:14 -07:00
Roman Shapovalov
a6dada399d Extracted ImplicitronModelBase and unified API for GenericModel and ModelDBIR
Summary:
To avoid model_zoo, we need to make GenericModel pluggable.
I also align creation APIs for convenience.

Reviewed By: bottler, davnov134

Differential Revision: D35933093

fbshipit-source-id: 8228926528eb41a795fbfbe32304b8019197e2b1
2022-05-09 15:23:07 -07:00
Jeremy Reizenstein
e10a90140d enable_get_default_args to allow pickling get_default_args(f)
Summary:
Try again to solve https://github.com/facebookresearch/pytorch3d/issues/1144 pickling problem.
D35258561 (24260130ce) didn't work.

When writing a function or vanilla class C which you want people to be able to call get_default_args on, you must add the line enable_get_default_args(C) to it. This causes autogeneration of a hidden dataclass in the module.

Reviewed By: davnov134

Differential Revision: D35364410

fbshipit-source-id: 53f6e6fff43e7142ae18ca3b06de7d0c849ef965
2022-04-06 03:32:31 -07:00
Jeremy Reizenstein
24260130ce _allow_untyped for get_default_args
Summary:
ListConfig and DictConfig members of get_default_args(X) when X is a callable will contain references to a temporary dataclass and therefore be unpicklable. Avoid this in a few cases.

Fixes https://github.com/facebookresearch/pytorch3d/issues/1144

Reviewed By: shapovalov

Differential Revision: D35258561

fbshipit-source-id: e52186825f52accee9a899e466967a4ff71b3d25
2022-03-31 06:31:45 -07:00
Roman Shapovalov
645a47d054 Return a typed structured config from default_args for callables
Summary:
Before the fix, running get_default_args(C: Callable) returns an unstructured DictConfig which causes Enums to be handled incorrectly. This is a fix.

WIP update: Currently tests still fail whenever a function signature contains an untyped argument: This needs to be somehow fixed.

Reviewed By: bottler

Differential Revision: D34932124

fbshipit-source-id: ecdc45c738633cfea5caa7480ba4f790ece931e8
2022-03-25 07:08:01 -07:00
Jeremy Reizenstein
cdd2142dd5
implicitron v0 (#1133)
Co-authored-by: Jeremy Francis Reizenstein <bottler@users.noreply.github.com>
2022-03-21 13:20:10 -07:00