Fix up docstrings

Summary:
One of the docstrings is a disaster see https://pytorch3d.readthedocs.io/en/latest/modules/ops.html

Also some minor fixes I encountered when browsing the code

Reviewed By: bottler

Differential Revision: D38581595

fbshipit-source-id: 3b6ca97788af380a44df9144a6a4cac782c6eab8
This commit is contained in:
Krzysztof Chalupka 2022-08-23 14:58:49 -07:00 committed by Facebook GitHub Bot
parent c4545a7cbc
commit 6653f4400b
3 changed files with 29 additions and 28 deletions

View File

@ -39,20 +39,20 @@ def corresponding_cameras_alignment(
such that the following holds: such that the following holds:
Under the change of coordinates using a similarity transform Under the change of coordinates using a similarity transform
(R_A, T_A, s_A) a 3D point X' is mapped to X with: (R_A, T_A, s_A) a 3D point X' is mapped to X with: ::
```
X = (X' R_A + T_A) / s_A X = (X' R_A + T_A) / s_A
```
Then, for all cameras `i`, we assume that the following holds: Then, for all cameras `i`, we assume that the following holds: ::
```
X R_i + T_i = s' (X' R_i' + T_i'), X R_i + T_i = s' (X' R_i' + T_i'),
```
i.e. an adjusted point X' is mapped by a camera (R_i', T_i') i.e. an adjusted point X' is mapped by a camera (R_i', T_i')
to the same point as imaged from camera (R_i, T_i) after resolving to the same point as imaged from camera (R_i, T_i) after resolving
the scale ambiguity with a global scalar factor s'. the scale ambiguity with a global scalar factor s'.
Substituting for X above gives rise to the following: Substituting for X above gives rise to the following: ::
```
(X' R_A + T_A) / s_A R_i + T_i = s' (X' R_i' + T_i') // · s_A (X' R_A + T_A) / s_A R_i + T_i = s' (X' R_i' + T_i') // · s_A
(X' R_A + T_A) R_i + T_i s_A = (s' s_A) (X' R_i' + T_i') (X' R_A + T_A) R_i + T_i s_A = (s' s_A) (X' R_i' + T_i')
s' := 1 / s_A # without loss of generality s' := 1 / s_A # without loss of generality
@ -60,10 +60,11 @@ def corresponding_cameras_alignment(
X' R_A R_i + T_A R_i + T_i s_A = X' R_i' + T_i' X' R_A R_i + T_A R_i + T_i s_A = X' R_i' + T_i'
^^^^^^^ ^^^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^^^^^^^
~= R_i' ~= T_i' ~= R_i' ~= T_i'
```
i.e. after estimating R_A, T_A, s_A, the aligned source cameras have i.e. after estimating R_A, T_A, s_A, the aligned source cameras have
extrinsics: extrinsics: ::
`cameras_src_align = (R_A R_i, T_A R_i + T_i s_A) ~= (R_i', T_i')`
cameras_src_align = (R_A R_i, T_A R_i + T_i s_A) ~= (R_i', T_i')
We support two ways `R_A, T_A, s_A` can be estimated: We support two ways `R_A, T_A, s_A` can be estimated:
1) `mode=='centers'` 1) `mode=='centers'`
@ -73,12 +74,12 @@ def corresponding_cameras_alignment(
2) `mode=='extrinsics'` 2) `mode=='extrinsics'`
Defines the alignment problem as a system Defines the alignment problem as a system
of the following equations: of the following equations: ::
```
for all i: for all i:
[ R_A 0 ] x [ R_i 0 ] = [ R_i' 0 ] [ R_A 0 ] x [ R_i 0 ] = [ R_i' 0 ]
[ T_A^T 1 ] [ (s_A T_i^T) 1 ] [ T_i' 1 ] [ T_A^T 1 ] [ (s_A T_i^T) 1 ] [ T_i' 1 ]
```
`R_A, T_A` and `s_A` are then obtained by solving the `R_A, T_A` and `s_A` are then obtained by solving the
system in the least squares sense. system in the least squares sense.

View File

@ -36,15 +36,15 @@ class CamerasBase(TensorProperties):
For cameras, there are four different coordinate systems (or spaces) For cameras, there are four different coordinate systems (or spaces)
- World coordinate system: This is the system the object lives - the world. - World coordinate system: This is the system the object lives - the world.
- Camera view coordinate system: This is the system that has its origin on the camera - Camera view coordinate system: This is the system that has its origin on
and the and the Z-axis perpendicular to the image plane. the camera and the Z-axis perpendicular to the image plane.
In PyTorch3D, we assume that +X points left, and +Y points up and In PyTorch3D, we assume that +X points left, and +Y points up and
+Z points out from the image plane. +Z points out from the image plane.
The transformation from world --> view happens after applying a rotation (R) The transformation from world --> view happens after applying a rotation (R)
and translation (T) and translation (T)
- NDC coordinate system: This is the normalized coordinate system that confines - NDC coordinate system: This is the normalized coordinate system that confines
in a volume the rendered part of the object or scene. Also known as view volume. points in a volume the rendered part of the object or scene, also known as
For square images, given the PyTorch3D convention, (+1, +1, znear) view volume. For square images, given the PyTorch3D convention, (+1, +1, znear)
is the top left near corner, and (-1, -1, zfar) is the bottom right far is the top left near corner, and (-1, -1, zfar) is the bottom right far
corner of the volume. corner of the volume.
The transformation from view --> NDC happens after applying the camera The transformation from view --> NDC happens after applying the camera
@ -54,10 +54,9 @@ class CamerasBase(TensorProperties):
- Screen coordinate system: This is another representation of the view volume with - Screen coordinate system: This is another representation of the view volume with
the XY coordinates defined in image space instead of a normalized space. the XY coordinates defined in image space instead of a normalized space.
A better illustration of the coordinate systems can be found in An illustration of the coordinate systems can be found in pytorch3d/docs/notes/cameras.md.
pytorch3d/docs/notes/cameras.md.
It defines methods that are common to all camera models: CameraBase defines methods that are common to all camera models:
- `get_camera_center` that returns the optical center of the camera in - `get_camera_center` that returns the optical center of the camera in
world coordinates world coordinates
- `get_world_to_view_transform` which returns a 3D transform from - `get_world_to_view_transform` which returns a 3D transform from
@ -167,8 +166,8 @@ class CamerasBase(TensorProperties):
as keyword arguments to override the default values as keyword arguments to override the default values
set in __init__. set in __init__.
Setting T here will update the values set in init as this Setting R or T here will update the values set in init as these
value may be needed later on in the rendering pipeline e.g. for values may be needed later on in the rendering pipeline e.g. for
lighting calculations. lighting calculations.
Returns: Returns:
@ -237,8 +236,9 @@ class CamerasBase(TensorProperties):
self, points, eps: Optional[float] = None, **kwargs self, points, eps: Optional[float] = None, **kwargs
) -> torch.Tensor: ) -> torch.Tensor:
""" """
Transform input points from world to camera space with the Transform input points from world to camera space.
projection matrix defined by the camera. If camera is defined in NDC space, the projected points are in NDC space.
If camera is defined in screen space, the projected points are in screen space.
For `CamerasBase.transform_points`, setting `eps > 0` For `CamerasBase.transform_points`, setting `eps > 0`
stabilizes gradients since it leads to avoiding division stabilizes gradients since it leads to avoiding division
@ -492,7 +492,7 @@ class FoVPerspectiveCameras(CamerasBase):
""" """
A class which stores a batch of parameters to generate a batch of A class which stores a batch of parameters to generate a batch of
projection matrices by specifying the field of view. projection matrices by specifying the field of view.
The definition of the parameters follow the OpenGL perspective camera. The definitions of the parameters follow the OpenGL perspective camera.
The extrinsics of the camera (R and T matrices) can also be set in the The extrinsics of the camera (R and T matrices) can also be set in the
initializer or passed in to `get_full_projection_transform` to get initializer or passed in to `get_full_projection_transform` to get
@ -780,7 +780,7 @@ class FoVOrthographicCameras(CamerasBase):
""" """
A class which stores a batch of parameters to generate a batch of A class which stores a batch of parameters to generate a batch of
projection matrices by specifying the field of view. projection matrices by specifying the field of view.
The definition of the parameters follow the OpenGL orthographic camera. The definitions of the parameters follow the OpenGL orthographic camera.
""" """
# For __getitem__ # For __getitem__

View File

@ -165,7 +165,7 @@ class Transform3d:
raise ValueError('"matrix" has to be a 2- or a 3-dimensional tensor.') raise ValueError('"matrix" has to be a 2- or a 3-dimensional tensor.')
if matrix.shape[-2] != 4 or matrix.shape[-1] != 4: if matrix.shape[-2] != 4 or matrix.shape[-1] != 4:
raise ValueError( raise ValueError(
'"matrix" has to be a tensor of shape (minibatch, 4, 4)' '"matrix" has to be a tensor of shape (minibatch, 4, 4) or (4, 4).'
) )
# set dtype and device from matrix # set dtype and device from matrix
dtype = matrix.dtype dtype = matrix.dtype