23 Commits

Author SHA1 Message Date
Jeremy Reizenstein
e20cbe9b0e test fixes and lints
Summary:
- followup recent pyre change D63415925
- make tests remove temporary files
- weights_only=True in torch.load
- lint fixes

3 test fixes from VRehnberg in https://github.com/facebookresearch/pytorch3d/issues/1914
- imageio channels fix
- frozen decorator in test_config
- load_blobs positional

Reviewed By: MichaelRamamonjisoa

Differential Revision: D66162167

fbshipit-source-id: 7737e174691b62f1708443a4fae07343cec5bfeb
2024-11-20 09:15:51 -08:00
generatedunixname89002005307016
38afdcfc68 upgrade pyre version in fbcode/vision - batch 2
Reviewed By: bottler

Differential Revision: D60992234

fbshipit-source-id: 899db6ed590ef966ff651c11027819e59b8401a3
2024-08-09 02:07:45 -07:00
Conner Nilsen
a27755db41 Pyre Configurationless migration for] [batch:85/112] [shard:6/N]
Reviewed By: inseokhwang

Differential Revision: D54438157

fbshipit-source-id: a6acfe146ed29fff82123b5e458906d4b4cee6a2
2024-03-04 18:30:37 -08:00
generatedunixname89002005307016
f74fc450e8 suppress errors in vision/fair/pytorch3d
Differential Revision: D51645956

fbshipit-source-id: 1ae7279efa0a27bb9bc5255527bafebb84fdafd0
2023-11-28 19:10:06 -08:00
Jeremy Reizenstein
f68371d398 lints
Summary: simple

Reviewed By: shapovalov

Differential Revision: D46438865

fbshipit-source-id: 0f41cb3ddd7e7aca4513267d33299531f7e8d373
2023-06-16 06:48:04 -07:00
Virendra Kumar Pathak
d08fe6d45a Softly deprecate the get_str=False flag.
Summary: We don't want to use print directly in stats.print() method. Instead this method will return the output string to the caller.

Reviewed By: shapovalov

Differential Revision: D45356240

fbshipit-source-id: 2cabe3cdfb9206bf09aa7b3cdd2263148a5ba145
2023-05-14 01:24:31 -07:00
Roman Shapovalov
d561f1913e Cleaning up camera difficulty
Summary: We don’t see much value in reporting metrics by camera difficulty while supporting that in new datasets is quite painful, hence deprecating training cameras in the data API and ignoring in evaluation.

Reviewed By: bottler

Differential Revision: D42678879

fbshipit-source-id: aad511f6cb2ca82745f31c19594e1d80594b61d7
2023-01-23 10:38:56 -08:00
David Novotny
35f8cb9430 Downgrade "Assigning param_group " msg to DEBUG
Summary: <See title>

Reviewed By: bottler

Differential Revision: D41534524

fbshipit-source-id: 9c39198b9b8d5fc95f857b03ad39bfe0bd720cbb
2022-11-28 02:58:15 -08:00
Jeremy Reizenstein
7be49bf46f allow dots in param_groups
Summary:
Allow a module's param_group member to specify overrides to the param groups of its members or their members.
Also logging for param group assignments.

This allows defining `params.basis_matrix` in the param_groups of a voxel_grid.

Reviewed By: shapovalov

Differential Revision: D41080667

fbshipit-source-id: 49f3b0e5b36e496f78701db0699cbb8a7e20c51e
2022-11-07 06:41:40 -08:00
Jeremy Reizenstein
fe5bdb2fb5 different learning rate for different parts
Summary:
Adds the ability to have different learning rates for different parts of the model. The trainable parts of the implicitron have a new member

       param_groups: dictionary where keys are names of individual parameters,
            or module’s members and values are the parameter group where the
            parameter/member will be sorted to. "self" key is used to denote the
            parameter group at the module level. Possible keys, including the "self" key
            do not have to be defined. By default all parameters are put into "default"
            parameter group and have the learning rate defined in the optimizer,
            it can be overriden at the:
                - module level with “self” key, all the parameters and child
                    module s parameters will be put to that parameter group
                - member level, which is the same as if the `param_groups` in that
                    member has key=“self” and value equal to that parameter group.
                    This is useful if members do not have `param_groups`, for
                    example torch.nn.Linear.
                - parameter level, parameter with the same name as the key
                    will be put to that parameter group.

And in the optimizer factory, parameters and their learning rates are recursively gathered.

Reviewed By: shapovalov

Differential Revision: D40145802

fbshipit-source-id: 631c02b8d79ee1c0eb4c31e6e42dbd3d2882078a
2022-10-18 15:58:18 -07:00
Darijan Gudelj
37bd280d19 load whole dataset in train loop
Summary: Loads the whole dataset and moves it to the device and sends it to for sampling to enable full dataset heterogeneous raysampling.

Reviewed By: bottler

Differential Revision: D39263009

fbshipit-source-id: c527537dfc5f50116849656c9e171e868f6845b1
2022-10-03 08:36:47 -07:00
Jeremy Reizenstein
209c160a20 foreach optimizers
Summary: Allow using the new `foreach` option on optimizers.

Reviewed By: shapovalov

Differential Revision: D39694843

fbshipit-source-id: 97109c245b669bc6edff0f246893f95b7ae71f90
2022-09-22 05:11:56 -07:00
Pyre Bot Jr
c80e5fd07a suppress errors in vision/fair/pytorch3d
Reviewed By: kjchalup

Differential Revision: D39198333

fbshipit-source-id: 3f4ebcf625215f21d165073837578ff69b05f72d
2022-09-01 11:46:55 -07:00
David Novotny
1163eaab43 CO3Dv2 trainer configs
Summary:
Adds yaml configs to train selected methods on CO3Dv2.

Few more updates:
1) moved some fields to base classes so that we can check is_multisequence in experiment.py
2) skip loading all train cameras for multisequence datasets, without this, co3d-fewview is untrainable
3) fix bug in json index dataset provider v2

Reviewed By: kjchalup

Differential Revision: D38952755

fbshipit-source-id: 3edac6fc8e20775aa70400bd73a0e6d52b091e0c
2022-08-30 13:42:19 -07:00
Jeremy Reizenstein
a39cad40f4 LinearExponential LR
Summary: Linear followed by exponential LR progression. Needed for making Blender scenes converge.

Reviewed By: kjchalup

Differential Revision: D38557007

fbshipit-source-id: ad630dbc5b8fabcb33eeb5bdeed5e4f31360bac2
2022-08-09 18:18:46 -07:00
Krzysztof Chalupka
c83ec3555d Mods and bugfixes for LLFF and Blender repros
Summary:
LLFF (and most/all non-synth datasets) will have no background/foreground distinction. Add support for data with no fg mask.

Also, we had a bug in stats loading, like this:
  * Load stats
  * One of the stats has a history of length 0
  * That's fine, e.g. maybe it's fg_error but the dataset has no notion of fg/bg. So leave it as len 0
  * Check whether all the stats have the same history length as an arbitrarily chosen "reference-stat"
  * Ooops the reference-stat happened to be the stat with length 0
  * assert (legit_stat_len == reference_stat_len (=0)) ---> failed assert

Also some minor fixes (from Jeremy's other diff) to support LLFF

Reviewed By: davnov134

Differential Revision: D38475272

fbshipit-source-id: 5b35ac86d1d5239759f537621f41a3aa4eb3bd68
2022-08-09 15:04:44 -07:00
David Novotny
c3f8dad55c Move load_stats to TrainingLoop
Summary:
Stats are logically connected to the training loop, not to the model. Hence, moving to the training loop.

Also removing resume_epoch from OptimizerFactory in favor of a single place - ModelFactory. This removes the need for config consistency checks etc.

Reviewed By: kjchalup

Differential Revision: D38313475

fbshipit-source-id: a1d188a63e28459df381ff98ad8acdcdb14887b7
2022-08-02 15:40:53 -07:00
Krzysztof Chalupka
b7b188bf54 Fix train_stats.pdf: they now work by default
Summary: Before this diff, train_stats.py would not be created by default, EXCEPT when resuming training. This makes them appear from start.

Reviewed By: shapovalov

Differential Revision: D38320341

fbshipit-source-id: 8ea5b99ec81c377ae129f58e78dc2eaff94821ad
2022-08-02 08:50:50 -07:00
Jeremy Reizenstein
f8bf528043 remove get_task
Summary: Remove the dataset's need to provide the task type.

Reviewed By: davnov134, kjchalup

Differential Revision: D38314000

fbshipit-source-id: 3805d885b5d4528abdc78c0da03247edb9abf3f7
2022-08-02 07:55:42 -07:00
David Novotny
80fc0ee0b6 Better seeding of random engines
Summary: Currently, seeds are set only inside the train loop. But this does not ensure that the model weights are initialized the same way everywhere which makes all experiments irreproducible. This diff fixes it.

Reviewed By: bottler

Differential Revision: D38315840

fbshipit-source-id: 3d2ecebbc36072c2b68dd3cd8c5e30708e7dd808
2022-08-01 10:03:09 -07:00
Krzysztof Chalupka
1b0584f7bd Replace pluggable components to create a proper Configurable hierarchy.
Summary:
This large diff rewrites a significant portion of Implicitron's config hierarchy. The new hierarchy, and some of the default implementation classes, are as follows:
```
Experiment
    data_source: ImplicitronDataSource
        dataset_map_provider
        data_loader_map_provider
    model_factory: ImplicitronModelFactory
        model: GenericModel
    optimizer_factory: ImplicitronOptimizerFactory
    training_loop: ImplicitronTrainingLoop
        evaluator: ImplicitronEvaluator
```

1) Experiment (used to be ExperimentConfig) is now a top-level Configurable and contains as members mainly (mostly new) high-level factory Configurables.
2) Experiment's job is to run factories, do some accelerate setup and then pass the results to the main training loop.
3) ImplicitronOptimizerFactory and ImplicitronModelFactory are new high-level factories that create the optimizer, scheduler, model, and stats objects.
4) TrainingLoop is a new configurable that runs the main training loop and the inner train-validate step.
5) Evaluator is a new configurable that TrainingLoop uses to run validation/test steps.
6) GenericModel is not the only model choice anymore. Instead, ImplicitronModelBase (by default instantiated with GenericModel) is a member of Experiment and can be easily replaced by a custom implementation by the user.

All the new Configurables are children of ReplaceableBase, and can be easily replaced with custom implementations.

In addition, I added support for the exponential LR schedule, updated the config files and the test, as well as added a config file that reproduces NERF results and a test to run the repro experiment.

Reviewed By: bottler

Differential Revision: D37723227

fbshipit-source-id: b36bee880d6aa53efdd2abfaae4489d8ab1e8a27
2022-07-29 17:32:51 -07:00
Iurii Makarov
0f966217e5 Fixed typing to have compatibility with OmegaConf 2.2.2 in Pytorch3D
Summary:
I tried to run `experiment.py` and `pytorch3d_implicitron_runner` and faced the failure with this traceback: https://www.internalfb.com/phabricator/paste/view/P515734086

It seems to be due to the new release of OmegaConf (version=2.2.2) which requires different typing. This fix helped to overcome it.

Reviewed By: bottler

Differential Revision: D37881644

fbshipit-source-id: be0cd4ced0526f8382cea5bdca9b340e93a2fba2
2022-07-15 05:55:03 -07:00
Nikhila Ravi
aa8b03f31d Updates to support Accelerate and multigpu training (#37)
Summary:
## Changes:
- Added Accelerate Library and refactored experiment.py to use it
- Needed to move `init_optimizer` and `ExperimentConfig` to a separate file to be compatible with submitit/hydra
- Needed to make some modifications to data loaders etc to work well with the accelerate ddp wrappers
- Loading/saving checkpoints incorporates an unwrapping step so remove the ddp wrapped model

## Tests

Tested with both `torchrun` and `submitit/hydra` on two gpus locally. Here are the commands:

**Torchrun**

Modules loaded:
```sh
1) anaconda3/2021.05   2) cuda/11.3   3) NCCL/2.9.8-3-cuda.11.3   4) gcc/5.2.0. (but unload gcc when using submit)
```

```sh
torchrun --nnodes=1 --nproc_per_node=2 experiment.py --config-path ./configs --config-name repro_singleseq_nerf_test
```

**Submitit/Hydra Local test**

```sh
~/pytorch3d/projects/implicitron_trainer$ HYDRA_FULL_ERROR=1 python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs  hydra/launcher=submitit_local hydra.launcher.gpus_per_node=2 hydra.launcher.tasks_per_node=2 hydra.launcher.nodes=1
```

**Submitit/Hydra distributed test**

```sh
~/implicitron/pytorch3d$ python3.9 experiment.py --config-name repro_singleseq_nerf_test --multirun --config-path ./configs  hydra/launcher=submitit_slurm hydra.launcher.gpus_per_node=8 hydra.launcher.tasks_per_node=8 hydra.launcher.nodes=1 hydra.launcher.partition=learnlab hydra.launcher.timeout_min=4320
```

## TODOS:
- Fix distributed evaluation: currently this doesn't work as the input format to the evaluation function is not suitable for gathering across gpus (needs to be nested list/tuple/dicts of objects that satisfy `is_torch_tensor`) and currently `frame_data`  contains `Cameras` type.
- Refactor the `accelerator` object to be accessible by all functions instead of needing to pass it around everywhere? Maybe have a `Trainer` class and add it as a method?
- Update readme with installation instructions for accelerate and also commands for running jobs with torchrun and submitit/hydra

X-link: https://github.com/fairinternal/pytorch3d/pull/37

Reviewed By: davnov134, kjchalup

Differential Revision: D37543870

Pulled By: bottler

fbshipit-source-id: be9eb4e91244d4fe3740d87dafec622ae1e0cf76
2022-07-11 19:29:58 -07:00