camera refactoring

Summary: Refactor cameras * CamerasBase was enhanced with `transform_points_screen` that transforms projected points from NDC to screen space * OpenGLPerspective, OpenGLOrthographic -> FoVPerspective, FoVOrthographic * SfMPerspective, SfMOrthographic -> Perspective, Orthographic * PerspectiveCamera can optionally be constructred with screen space parameters * Note on Cameras and coordinate systems was added Reviewed By: nikhilaravi Differential Revision: D23168525 fbshipit-source-id: dd138e2b2cc7e0e0d9f34c45b8251c01266a2063
2026-02-06 05:52:17 +08:00 · 2020-08-20 22:20:41 -07:00
parent 9242e7e65d
commit 57a22e7306
65 changed files with 896 additions and 279 deletions
--- a/docs/notes/cameras.md
+++ b/docs/notes/cameras.md
@@ -0,0 +1,63 @@
+# Cameras
+
+## Camera Coordinate Systems
+
+When working with 3D data, there are 4 coordinate systems users need to know
+* **World coordinate system**
+This is the system the object/scene lives - the world.
+* **Camera view coordinate system**
+This is the system that has its origin on the image plane and the `Z`-axis perpendicular to the image plane. In PyTorch3D, we assume that `+X` points left, and `+Y` points up and `+Z` points out from the image plane. The transformation from world to view happens after applying a rotation (`R`) and translation (`T`). 
+* **NDC coordinate system**
+This is the normalized coordinate system that confines in a volume the renderered part of the object/scene. Also known as view volume. Under the PyTorch3D convention, `(+1, +1, znear)` is the top left near corner, and `(-1, -1, zfar)` is the bottom right far corner of the volume. The transformation from view to NDC happens after applying the camera projection matrix (`P`).
+* **Screen coordinate system**
+This is another representation of the view volume with the `XY` coordinates defined in pixel space instead of a normalized space.
+
+An illustration of the 4 coordinate systems is shown below
+![cameras](https://user-images.githubusercontent.com/4369065/90317960-d9b8db80-dee1-11ea-8088-39c414b1e2fa.png)
+
+## Defining Cameras in PyTorch3D
+
+Cameras in PyTorch3D transform an object/scene from world to NDC by first transforming the object/scene to view (via transforms `R` and `T`) and then projecting the 3D object/scene to NDC (via the projection matrix `P`, else known as camera matrix). Thus, the camera parameters in `P` are assumed to be in NDC space. If the user has camera parameters in screen space, which is a common use case, the parameters should transformed to NDC (see below for an example)
+
+We describe the camera types in PyTorch3D and the convention for the camera parameters provided at construction time. 
+
+### Camera Types
+
+All cameras inherit from `CamerasBase` which is a base class for all cameras. PyTorch3D provides four different camera types. The `CamerasBase` defines methods that are common to all camera models:
+* `get_camera_center` that returns the optical center of the camera in world coordinates
+* `get_world_to_view_transform` which returns a 3D transform from world coordinates to the camera view coordinates (R, T)
+* `get_full_projection_transform` which composes the projection transform (P) with the world-to-view transform (R, T)
+* `transform_points` which takes a set of input points in world coordinates and projects to NDC coordinates ranging from [-1, -1, znear] to  [+1, +1, zfar].
+* `transform_points_screen` which takes a set of input points in world coordinates and projects them to the screen coordinates ranging from [0, 0, znear] to [W-1, H-1, zfar] 
+
+Users can easily customize their own cameras. For each new camera, users should implement the `get_projection_transform` routine that returns the mapping `P` from camera view coordinates to NDC coordinates.
+
+#### FoVPerspectiveCameras, FoVOrthographicCameras
+These two cameras follow the OpenGL convention for perspective and orthographic cameras respectively. The user provides the near `znear` and far `zfar` field which confines the view volume in the `Z` axis. The view volume in the `XY` plane is defined by field of view angle (`fov`) in the case of `FoVPerspectiveCameras` and by `min_x, min_y, max_x, max_y` in the case of `FoVOrthographicCameras`. 
+
+#### PerspectiveCameras, OrthographicCameras
+These two cameras follow the Multi-View Geometry convention for cameras. The user provides the focal length (`fx`, `fy`) and the principal point (`px`, `py`). For example, `camera = PerspectiveCameras(focal_length=((fx, fy),), principal_point=((px, py),))`
+
+As mentioned above, the focal length and principal point are used to convert a point `(X, Y, Z)` from view coordinates to NDC coordinates, as follows
+
+```
+# for perspective
+x_ndc = fx * X / Z + px
+y_ndc = fy * Y / Z + py
+z_ndc = 1 / Z
+
+# for orthographic
+x_ndc = fx * X + px
+y_ndc = fy * Y + py
+z_ndc = Z
+```
+
+Commonly, users have access to the focal length (`fx_screen`, `fy_screen`) and the principal point (`px_screen`, `py_screen`) in screen space. In that case, to construct the camera the user needs to additionally provide the `image_size = ((image_width, image_height),)`. More precisely, `camera = PerspectiveCameras(focal_length=((fx_screen, fy_screen),), principal_point=((px_screen, py_screen),), image_size = ((image_width, image_height),))`. Internally, the camera parameters are converted from screen to NDC as follows:
+
+```
+fx = fx_screen * 2.0 / image_width
+fy = fy_screen * 2.0 / image_height
+
+px = - (px_screen - image_width / 2.0) * 2.0 / image_width
+py = - (py_screen - image_height / 2.0) * 2.0/ image_height
+```
--- a/docs/notes/renderer_getting_started.md
+++ b/docs/notes/renderer_getting_started.md
@@ -39,7 +39,7 @@ Rendering requires transformations between several different coordinate frames:
 <img src="assets/transformations_overview.png" width="1000">


-For example, given a teapot mesh, the world coordinate frame, camera coordiante frame and image are show in the figure below. Note that the world and camera coordinate frames have the +z direction pointing in to the page. 
+For example, given a teapot mesh, the world coordinate frame, camera coordiante frame and image are show in the figure below. Note that the world and camera coordinate frames have the +z direction pointing in to the page.

 <img src="assets/world_camera_image.png" width="1000">

@@ -47,8 +47,8 @@ For example, given a teapot mesh, the world coordinate frame, camera coordiante

 **NOTE: PyTorch3D vs OpenGL**

-While we tried to emulate several aspects of OpenGL, there are differences in the coordinate frame conventions. 
- The default world coordinate frame in PyTorch3D has +Z pointing in to the screen whereas in OpenGL, +Z is pointing out of the screen.  Both are right handed. 
+While we tried to emulate several aspects of OpenGL, there are differences in the coordinate frame conventions.
+- The default world coordinate frame in PyTorch3D has +Z pointing in to the screen whereas in OpenGL, +Z is pointing out of the screen.  Both are right handed.
 - The NDC coordinate system in PyTorch3D is **right-handed** compared with a **left-handed** NDC coordinate system in OpenGL (the projection matrix switches the handedness).

 <img align="center" src="assets/opengl_coordframes.png" width="300">
@@ -61,14 +61,14 @@ A renderer in PyTorch3D is composed of a **rasterizer** and a **shader**. Create
 ```
 # Imports
 from pytorch3d.renderer import (
-    OpenGLPerspectiveCameras, look_at_view_transform,
+    FoVPerspectiveCameras, look_at_view_transform,
    RasterizationSettings, BlendParams,
    MeshRenderer, MeshRasterizer, HardPhongShader
 )

 # Initialize an OpenGL perspective camera.
 R, T = look_at_view_transform(2.7, 10, 20)
-cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)
+cameras = FoVPerspectiveCameras(device=device, R=R, T=T)

 # Define the settings for rasterization and shading. Here we set the output image to be of size
 # 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1
--- a/docs/tutorials/camera_position_optimization_with_differentiable_rendering.ipynb
+++ b/docs/tutorials/camera_position_optimization_with_differentiable_rendering.ipynb
@@ -102,7 +102,7 @@
    "\n",
    "# rendering components\n",
    "from pytorch3d.renderer import (\n",
-    "    OpenGLPerspectiveCameras, look_at_view_transform, look_at_rotation, \n",
+    "    FoVPerspectiveCameras, look_at_view_transform, look_at_rotation, \n",
    "    RasterizationSettings, MeshRenderer, MeshRasterizer, BlendParams,\n",
    "    SoftSilhouetteShader, HardPhongShader, PointLights\n",
    ")"
@@ -217,8 +217,8 @@
   },
   "outputs": [],
   "source": [
-    "# Initialize an OpenGL perspective camera.\n",
-    "cameras = OpenGLPerspectiveCameras(device=device)\n",
+    "# Initialize a perspective camera.\n",
+    "cameras = FoVPerspectiveCameras(device=device)\n",
    "\n",
    "# To blend the 100 faces we set a few parameters which control the opacity and the sharpness of \n",
    "# edges. Refer to blending.py for more details. \n",
--- a/docs/tutorials/fit_textured_mesh.ipynb
+++ b/docs/tutorials/fit_textured_mesh.ipynb
@@ -129,7 +129,7 @@
        "from pytorch3d.structures import Meshes, Textures\n",
        "from pytorch3d.renderer import (\n",
        "    look_at_view_transform,\n",
-        "    OpenGLPerspectiveCameras, \n",
+        "    FoVPerspectiveCameras, \n",
        "    PointLights, \n",
        "    DirectionalLights, \n",
        "    Materials, \n",
@@ -309,16 +309,16 @@
        "# the cow is facing the -z direction. \n",
        "lights = PointLights(device=device, location=[[0.0, 0.0, -3.0]])\n",
        "\n",
-        "# Initialize an OpenGL perspective camera that represents a batch of different \n",
+        "# Initialize a camera that represents a batch of different \n",
        "# viewing angles. All the cameras helper methods support mixed type inputs and \n",
        "# broadcasting. So we can view the camera from the a distance of dist=2.7, and \n",
        "# then specify elevation and azimuth angles for each viewpoint as tensors. \n",
        "R, T = look_at_view_transform(dist=2.7, elev=elev, azim=azim)\n",
-        "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+        "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
        "\n",
        "# We arbitrarily choose one particular view that will be used to visualize \n",
        "# results\n",
-        "camera = OpenGLPerspectiveCameras(device=device, R=R[None, 1, ...], \n",
+        "camera = FoVPerspectiveCameras(device=device, R=R[None, 1, ...], \n",
        "                                  T=T[None, 1, ...]) \n",
        "\n",
        "# Define the settings for rasterization and shading. Here we set the output \n",
@@ -361,7 +361,7 @@
        "# Our multi-view cow dataset will be represented by these 2 lists of tensors,\n",
        "# each of length num_views.\n",
        "target_rgb = [target_images[i, ..., :3] for i in range(num_views)]\n",
-        "target_cameras = [OpenGLPerspectiveCameras(device=device, R=R[None, i, ...], \n",
+        "target_cameras = [FoVPerspectiveCameras(device=device, R=R[None, i, ...], \n",
        "                                           T=T[None, i, ...]) for i in range(num_views)]"
      ],
      "execution_count": null,
@@ -925,4 +925,4 @@
      ]
    }
  ]
-}
+}
--- a/docs/tutorials/render_colored_points.ipynb
+++ b/docs/tutorials/render_colored_points.ipynb
@@ -64,7 +64,7 @@
    "from pytorch3d.structures import Pointclouds\n",
    "from pytorch3d.renderer import (\n",
    "    look_at_view_transform,\n",
-    "    OpenGLOrthographicCameras, \n",
+    "    FoVOrthographicCameras, \n",
    "    PointsRasterizationSettings,\n",
    "    PointsRenderer,\n",
    "    PointsRasterizer,\n",
@@ -147,9 +147,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# Initialize an OpenGL perspective camera.\n",
+    "# Initialize a camera.\n",
    "R, T = look_at_view_transform(20, 10, 0)\n",
-    "cameras = OpenGLOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
+    "cameras = FoVOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
    "\n",
    "# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
    "# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
@@ -195,9 +195,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# Initialize an OpenGL perspective camera.\n",
+    "# Initialize a camera.\n",
    "R, T = look_at_view_transform(20, 10, 0)\n",
-    "cameras = OpenGLOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
+    "cameras = FoVOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
    "\n",
    "# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
    "# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
--- a/docs/tutorials/render_textured_meshes.ipynb
+++ b/docs/tutorials/render_textured_meshes.ipynb
@@ -90,7 +90,7 @@
    "from pytorch3d.structures import Meshes, Textures\n",
    "from pytorch3d.renderer import (\n",
    "    look_at_view_transform,\n",
-    "    OpenGLPerspectiveCameras, \n",
+    "    FoVPerspectiveCameras, \n",
    "    PointLights, \n",
    "    DirectionalLights, \n",
    "    Materials, \n",
@@ -286,11 +286,11 @@
   },
   "outputs": [],
   "source": [
-    "# Initialize an OpenGL perspective camera.\n",
+    "# Initialize a camera.\n",
    "# With world coordinates +Y up, +X left and +Z in, the front of the cow is facing the -Z direction. \n",
    "# So we move the camera by 180 in the azimuth direction so it is facing the front of the cow. \n",
    "R, T = look_at_view_transform(2.7, 0, 180) \n",
-    "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+    "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
    "\n",
    "# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
    "# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
@@ -444,7 +444,7 @@
   "source": [
    "# Rotate the object by increasing the elevation and azimuth angles\n",
    "R, T = look_at_view_transform(dist=2.7, elev=10, azim=-150)\n",
-    "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+    "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
    "\n",
    "# Move the light location so the light is shining on the cow's face.  \n",
    "lights.location = torch.tensor([[2.0, 2.0, -2.0]], device=device)\n",
@@ -519,7 +519,7 @@
    "# view the camera from the same distance and specify dist=2.7 as a float,\n",
    "# and then specify elevation and azimuth angles for each viewpoint as tensors. \n",
    "R, T = look_at_view_transform(dist=2.7, elev=elev, azim=azim)\n",
-    "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+    "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
    "\n",
    "# Move the light back in front of the cow which is facing the -z direction.\n",
    "lights.location = torch.tensor([[0.0, 0.0, -3.0]], device=device)"