Summary:
Allow a module's param_group member to specify overrides to the param groups of its members or their members.
Also logging for param group assignments.
This allows defining `params.basis_matrix` in the param_groups of a voxel_grid.
Reviewed By: shapovalov
Differential Revision: D41080667
fbshipit-source-id: 49f3b0e5b36e496f78701db0699cbb8a7e20c51e
Summary:
Allows loading of multiple categories.
Multiple categories are provided in a comma-separated list of category names.
Reviewed By: bottler, shapovalov
Differential Revision: D40803297
fbshipit-source-id: 863938be3aa6ffefe9e563aede4a2e9e66aeeaa8
Summary: Add option to flat pad the last delta. Might to help when training on rgb only.
Reviewed By: shapovalov
Differential Revision: D40587475
fbshipit-source-id: c763fa38948600ea532c730538dc4ff29d2c3e0a
Summary: Make Implicitron run without visdom installed.
Reviewed By: shapovalov
Differential Revision: D40587974
fbshipit-source-id: dc319596c7a4d10a4c54c556dabc89ad9d25c2fb
Summary:
Adds the ability to have different learning rates for different parts of the model. The trainable parts of the implicitron have a new member
param_groups: dictionary where keys are names of individual parameters,
or module’s members and values are the parameter group where the
parameter/member will be sorted to. "self" key is used to denote the
parameter group at the module level. Possible keys, including the "self" key
do not have to be defined. By default all parameters are put into "default"
parameter group and have the learning rate defined in the optimizer,
it can be overriden at the:
- module level with “self” key, all the parameters and child
module s parameters will be put to that parameter group
- member level, which is the same as if the `param_groups` in that
member has key=“self” and value equal to that parameter group.
This is useful if members do not have `param_groups`, for
example torch.nn.Linear.
- parameter level, parameter with the same name as the key
will be put to that parameter group.
And in the optimizer factory, parameters and their learning rates are recursively gathered.
Reviewed By: shapovalov
Differential Revision: D40145802
fbshipit-source-id: 631c02b8d79ee1c0eb4c31e6e42dbd3d2882078a
Summary: Loads the whole dataset and moves it to the device and sends it to for sampling to enable full dataset heterogeneous raysampling.
Reviewed By: bottler
Differential Revision: D39263009
fbshipit-source-id: c527537dfc5f50116849656c9e171e868f6845b1
Summary:
Changed ray_sampler and metrics to be able to use mixed frame raysampling.
Ray_sampler now has a new member which it passes to the pytorch3d raysampler.
If the raybundle is heterogeneous metrics now samples images by padding xys first. This reduces memory consumption.
Reviewed By: bottler, kjchalup
Differential Revision: D39542221
fbshipit-source-id: a6fec23838d3049ae5c2fd2e1f641c46c7c927e3
Summary: Allow using the new `foreach` option on optimizers.
Reviewed By: shapovalov
Differential Revision: D39694843
fbshipit-source-id: 97109c245b669bc6edff0f246893f95b7ae71f90
Summary: Various fixes to get visualize_reconstruction running, and an interactive test for it.
Reviewed By: kjchalup
Differential Revision: D39286691
fbshipit-source-id: 88735034cc01736b24735bcb024577e6ab7ed336
Summary: Workaround for oddity with new hydra.
Reviewed By: davnov134
Differential Revision: D39280639
fbshipit-source-id: 76e91947f633589945446db93cf2dbc259642f8a
Summary:
Move the flyaround rendering function into core implicitron.
The unblocks an example in the facebookresearch/co3d repo.
Reviewed By: bottler
Differential Revision: D39257801
fbshipit-source-id: 6841a88a43d4aa364dd86ba83ca2d4c3cf0435a4
Summary:
Adds yaml configs to train selected methods on CO3Dv2.
Few more updates:
1) moved some fields to base classes so that we can check is_multisequence in experiment.py
2) skip loading all train cameras for multisequence datasets, without this, co3d-fewview is untrainable
3) fix bug in json index dataset provider v2
Reviewed By: kjchalup
Differential Revision: D38952755
fbshipit-source-id: 3edac6fc8e20775aa70400bd73a0e6d52b091e0c
Summary:
generic_model_args no longer exists. Update some references to it, mostly in doc.
This fixes the testing of all the yaml files in test_forward pass.
Reviewed By: shapovalov
Differential Revision: D38789202
fbshipit-source-id: f11417efe772d7f86368b3598aa66c52b1309dbf
Summary:
**"filename"**: "projects/nerf/nerf/implicit_function.py"
**"warning_type"**: "Incompatible variable type [9]",
**"warning_message"**: " input_skips is declared to have type `Tuple[int]` but is used as type `Tuple[]`.",
**"warning_line"**: 256,
**"fix"**: input_skips: Tuple[int,...] = ()
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/1288
Reviewed By: kjchalup
Differential Revision: D38615188
Pulled By: bottler
fbshipit-source-id: a014344dd6cf2125f564f948a3c905ceb84cf994
Summary: Linear followed by exponential LR progression. Needed for making Blender scenes converge.
Reviewed By: kjchalup
Differential Revision: D38557007
fbshipit-source-id: ad630dbc5b8fabcb33eeb5bdeed5e4f31360bac2
Summary:
LLFF (and most/all non-synth datasets) will have no background/foreground distinction. Add support for data with no fg mask.
Also, we had a bug in stats loading, like this:
* Load stats
* One of the stats has a history of length 0
* That's fine, e.g. maybe it's fg_error but the dataset has no notion of fg/bg. So leave it as len 0
* Check whether all the stats have the same history length as an arbitrarily chosen "reference-stat"
* Ooops the reference-stat happened to be the stat with length 0
* assert (legit_stat_len == reference_stat_len (=0)) ---> failed assert
Also some minor fixes (from Jeremy's other diff) to support LLFF
Reviewed By: davnov134
Differential Revision: D38475272
fbshipit-source-id: 5b35ac86d1d5239759f537621f41a3aa4eb3bd68
Summary: Don't copy from one part of config to another, rather do the copy within GenericModel.
Reviewed By: davnov134
Differential Revision: D38248828
fbshipit-source-id: ff8af985c37ea1f7df9e0aa0a45a58df34c3f893
Summary:
Stats are logically connected to the training loop, not to the model. Hence, moving to the training loop.
Also removing resume_epoch from OptimizerFactory in favor of a single place - ModelFactory. This removes the need for config consistency checks etc.
Reviewed By: kjchalup
Differential Revision: D38313475
fbshipit-source-id: a1d188a63e28459df381ff98ad8acdcdb14887b7
Summary: Before this diff, train_stats.py would not be created by default, EXCEPT when resuming training. This makes them appear from start.
Reviewed By: shapovalov
Differential Revision: D38320341
fbshipit-source-id: 8ea5b99ec81c377ae129f58e78dc2eaff94821ad
Summary: Remove the dataset's need to provide the task type.
Reviewed By: davnov134, kjchalup
Differential Revision: D38314000
fbshipit-source-id: 3805d885b5d4528abdc78c0da03247edb9abf3f7
Summary:
Added _NEED_CONTROL
to JsonIndexDatasetMapProviderV2 and made dataset_tweak_args use it.
Reviewed By: bottler
Differential Revision: D38313914
fbshipit-source-id: 529847571065dfba995b609a66737bd91e002cfe
Summary: Only import it if you ask for it.
Reviewed By: kjchalup
Differential Revision: D38327167
fbshipit-source-id: 3f05231f26eda582a63afc71b669996342b0c6f9
Summary: Currently, seeds are set only inside the train loop. But this does not ensure that the model weights are initialized the same way everywhere which makes all experiments irreproducible. This diff fixes it.
Reviewed By: bottler
Differential Revision: D38315840
fbshipit-source-id: 3d2ecebbc36072c2b68dd3cd8c5e30708e7dd808
Summary: Make a dummy single-scene dataset using the code from generate_cow_renders (used in existing NeRF tutorials)
Reviewed By: kjchalup
Differential Revision: D38116910
fbshipit-source-id: 8db6df7098aa221c81d392e5cd21b0e67f65bd70
Summary:
This large diff rewrites a significant portion of Implicitron's config hierarchy. The new hierarchy, and some of the default implementation classes, are as follows:
```
Experiment
data_source: ImplicitronDataSource
dataset_map_provider
data_loader_map_provider
model_factory: ImplicitronModelFactory
model: GenericModel
optimizer_factory: ImplicitronOptimizerFactory
training_loop: ImplicitronTrainingLoop
evaluator: ImplicitronEvaluator
```
1) Experiment (used to be ExperimentConfig) is now a top-level Configurable and contains as members mainly (mostly new) high-level factory Configurables.
2) Experiment's job is to run factories, do some accelerate setup and then pass the results to the main training loop.
3) ImplicitronOptimizerFactory and ImplicitronModelFactory are new high-level factories that create the optimizer, scheduler, model, and stats objects.
4) TrainingLoop is a new configurable that runs the main training loop and the inner train-validate step.
5) Evaluator is a new configurable that TrainingLoop uses to run validation/test steps.
6) GenericModel is not the only model choice anymore. Instead, ImplicitronModelBase (by default instantiated with GenericModel) is a member of Experiment and can be easily replaced by a custom implementation by the user.
All the new Configurables are children of ReplaceableBase, and can be easily replaced with custom implementations.
In addition, I added support for the exponential LR schedule, updated the config files and the test, as well as added a config file that reproduces NERF results and a test to run the repro experiment.
Reviewed By: bottler
Differential Revision: D37723227
fbshipit-source-id: b36bee880d6aa53efdd2abfaae4489d8ab1e8a27
Summary: Avoid calculating all_train_cameras before it is needed, because it is slow in some datasets.
Reviewed By: shapovalov
Differential Revision: D38037157
fbshipit-source-id: 95461226655cde2626b680661951ab17ebb0ec75
Summary: Add the conditioning types to the repro yaml files. In particular, this fixes test_conditioning_type.
Reviewed By: shapovalov
Differential Revision: D37914537
fbshipit-source-id: 621390f329d9da662d915eb3b7bc709206a20552
Summary:
I tried to run `experiment.py` and `pytorch3d_implicitron_runner` and faced the failure with this traceback: https://www.internalfb.com/phabricator/paste/view/P515734086
It seems to be due to the new release of OmegaConf (version=2.2.2) which requires different typing. This fix helped to overcome it.
Reviewed By: bottler
Differential Revision: D37881644
fbshipit-source-id: be0cd4ced0526f8382cea5bdca9b340e93a2fba2
Summary:
1. Random sampling of num batches without replacement not supported.
2.Providers should implement the interface for the training loop to work.
Reviewed By: bottler, davnov134
Differential Revision: D37815388
fbshipit-source-id: 8a2795b524e733f07346ffdb20a9c0eb1a2b8190
Summary: Accelerate is an additional implicitron dependency, so document it.
Reviewed By: shapovalov
Differential Revision: D37786933
fbshipit-source-id: 11024fe604107881f8ca29e17cb5cbfe492fc7f9
Summary:
1. Respecting `visdom_show_preds` parameter when it is False.
2. Clipping the images pre-visualisation, which is important for methods like SRN that are not arare of pixel value range.
Reviewed By: bottler
Differential Revision: D37786439
fbshipit-source-id: 8dbb5104290bcc5c2829716b663cae17edc911bd
Summary:
## Changes:
- Added Accelerate Library and refactored experiment.py to use it
- Needed to move `init_optimizer` and `ExperimentConfig` to a separate file to be compatible with submitit/hydra
- Needed to make some modifications to data loaders etc to work well with the accelerate ddp wrappers
- Loading/saving checkpoints incorporates an unwrapping step so remove the ddp wrapped model
## Tests
Tested with both `torchrun` and `submitit/hydra` on two gpus locally. Here are the commands:
**Torchrun**
Modules loaded:
```sh
1) anaconda3/2021.05 2) cuda/11.3 3) NCCL/2.9.8-3-cuda.11.3 4) gcc/5.2.0. (but unload gcc when using submit)
```
```sh
torchrun --nnodes=1 --nproc_per_node=2 experiment.py --config-path ./configs --config-name repro_singleseq_nerf_test
```
**Submitit/Hydra Local test**
```sh
~/pytorch3d/projects/implicitron_trainer$ HYDRA_FULL_ERROR=1 python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_local hydra.launcher.gpus_per_node=2 hydra.launcher.tasks_per_node=2 hydra.launcher.nodes=1
```
**Submitit/Hydra distributed test**
```sh
~/implicitron/pytorch3d$ python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs hydra/launcher=submitit_slurm hydra.launcher.gpus_per_node=8 hydra.launcher.tasks_per_node=8 hydra.launcher.nodes=1 hydra.launcher.partition=learnlab hydra.launcher.timeout_min=4320
```
## TODOS:
- Fix distributed evaluation: currently this doesn't work as the input format to the evaluation function is not suitable for gathering across gpus (needs to be nested list/tuple/dicts of objects that satisfy `is_torch_tensor`) and currently `frame_data` contains `Cameras` type.
- Refactor the `accelerator` object to be accessible by all functions instead of needing to pass it around everywhere? Maybe have a `Trainer` class and add it as a method?
- Update readme with installation instructions for accelerate and also commands for running jobs with torchrun and submitit/hydra
X-link: https://github.com/fairinternal/pytorch3d/pull/37
Reviewed By: davnov134, kjchalup
Differential Revision: D37543870
Pulled By: bottler
fbshipit-source-id: be9eb4e91244d4fe3740d87dafec622ae1e0cf76
Summary: As part of removing Task, move camera difficulty bin breaks from hard code to the top level.
Reviewed By: davnov134
Differential Revision: D37491040
fbshipit-source-id: f2d6775ebc490f6f75020d13f37f6b588cc07a0b