Summary: Collection of spelling things, mostly in docs / tutorials.

Reviewed By: gkioxari

Differential Revision: D26101323

fbshipit-source-id: 652f62bc9d71a4ff872efa21141225e43191353a
This commit is contained in:
Jeremy Reizenstein
2021-04-09 09:57:55 -07:00
committed by Facebook GitHub Bot
parent c2e62a5087
commit 124bb5e391
75 changed files with 220 additions and 217 deletions

View File

@@ -5,7 +5,7 @@ sidebar_label: Batching
# Batching
In deep learning, every optimization step operates on multiple input examples for robust training. Thus, efficient batching is crucial. For image inputs, batching is straighforward; N images are resized to the same height and width and stacked as a 4 dimensional tensor of shape `N x 3 x H x W`. For meshes, batching is less straighforward.
In deep learning, every optimization step operates on multiple input examples for robust training. Thus, efficient batching is crucial. For image inputs, batching is straightforward; N images are resized to the same height and width and stacked as a 4 dimensional tensor of shape `N x 3 x H x W`. For meshes, batching is less straightforward.
<img src="assets/batch_intro.png" alt="batch_intro" align="middle"/>
@@ -21,7 +21,7 @@ Assume you want to construct a batch containing two meshes, with `mesh1 = (v1: V
## Use cases for batch modes
The need for different mesh batch modes is inherent to the way pytorch operators are implemented. To fully utilize the optimized pytorch ops, the [Meshes][meshes] data structure allows for efficient conversion between the different batch modes. This is crucial when aiming for a fast and efficient training cycle. An example of this is [Mesh R-CNN][meshrcnn]. Here, in the same forward pass different parts of the network assume different inputs, which are computed by converting between the different batch modes. In particular, [vert_align][vert_align] assumes a *padded* input tensor while immediately after [graph_conv][graphconv] assumes a *packed* input tensor.
The need for different mesh batch modes is inherent to the way PyTorch operators are implemented. To fully utilize the optimized PyTorch ops, the [Meshes][meshes] data structure allows for efficient conversion between the different batch modes. This is crucial when aiming for a fast and efficient training cycle. An example of this is [Mesh R-CNN][meshrcnn]. Here, in the same forward pass different parts of the network assume different inputs, which are computed by converting between the different batch modes. In particular, [vert_align][vert_align] assumes a *padded* input tensor while immediately after [graph_conv][graphconv] assumes a *packed* input tensor.
<img src="assets/meshrcnn.png" alt="meshrcnn" width="700" align="middle" />

View File

@@ -13,7 +13,7 @@ This is the system the object/scene lives - the world.
* **Camera view coordinate system**
This is the system that has its origin on the image plane and the `Z`-axis perpendicular to the image plane. In PyTorch3D, we assume that `+X` points left, and `+Y` points up and `+Z` points out from the image plane. The transformation from world to view happens after applying a rotation (`R`) and translation (`T`).
* **NDC coordinate system**
This is the normalized coordinate system that confines in a volume the renderered part of the object/scene. Also known as view volume. Under the PyTorch3D convention, `(+1, +1, znear)` is the top left near corner, and `(-1, -1, zfar)` is the bottom right far corner of the volume. The transformation from view to NDC happens after applying the camera projection matrix (`P`).
This is the normalized coordinate system that confines in a volume the rendered part of the object/scene. Also known as view volume. Under the PyTorch3D convention, `(+1, +1, znear)` is the top left near corner, and `(-1, -1, zfar)` is the bottom right far corner of the volume. The transformation from view to NDC happens after applying the camera projection matrix (`P`).
* **Screen coordinate system**
This is another representation of the view volume with the `XY` coordinates defined in pixel space instead of a normalized space.

View File

@@ -20,5 +20,5 @@ mesh = IO().load_mesh("mymesh.ply", device=device)
and to save a pointcloud you might do
```
pcl = Pointclouds(...)
IO().save_point_cloud(pcl, "output_poincloud.obj")
IO().save_point_cloud(pcl, "output_pointcloud.obj")
```

View File

@@ -6,13 +6,13 @@ hide_title: true
# Meshes and IO
The Meshes object represents a batch of triangulated meshes, and is central to
much of the functionality of pytorch3d. There is no insistence that each mesh in
much of the functionality of PyTorch3D. There is no insistence that each mesh in
the batch has the same number of vertices or faces. When available, it can store
other data which pertains to the mesh, for example face normals, face areas
and textures.
Two common file formats for storing single meshes are ".obj" and ".ply" files,
and pytorch3d has functions for reading these.
and PyTorch3D has functions for reading these.
## OBJ
@@ -60,7 +60,7 @@ The `load_objs_as_meshes` function provides this procedure.
## PLY
Ply files are flexible in the way they store additional information, pytorch3d
Ply files are flexible in the way they store additional information. PyTorch3D
provides a function just to read the vertices and faces from a ply file.
The call
```

View File

@@ -84,7 +84,7 @@ For mesh texturing we offer several options (in `pytorch3d/renderer/mesh/texturi
1. **Vertex Textures**: D dimensional textures for each vertex (for example an RGB color) which can be interpolated across the face. This can be represented as an `(N, V, D)` tensor. This is a fairly simple representation though and cannot model complex textures if the mesh faces are large.
2. **UV Textures**: vertex UV coordinates and **one** texture map for the whole mesh. For a point on a face with given barycentric coordinates, the face color can be computed by interpolating the vertex uv coordinates and then sampling from the texture map. This representation requires two tensors (UVs: `(N, V, 2), Texture map: `(N, H, W, 3)`), and is limited to only support one texture map per mesh.
3. **Face Textures**: In more complex cases such as ShapeNet meshes, there are multiple texture maps per mesh and some faces have texture while other do not. For these cases, a more flexible representation is a texture atlas, where each face is represented as an `(RxR)` texture map where R is the texture resolution. For a given point on the face, the texture value can be sampled from the per face texture map using the barycentric coordinates of the point. This representation requires one tensor of shape `(N, F, R, R, 3)`. This texturing method is inspired by the SoftRasterizer implementation. For more details refer to the [`make_material_atlas`](https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/io/mtl_io.py#L123) and [`sample_textures`](https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/renderer/mesh/textures.py#L452) functions. **NOTE:**: The `TextureAtlas` texture sampling is only differentiable with respect to the texture atlas but not differentiable with respect to the barycentric coordinates.
3. **Face Textures**: In more complex cases such as ShapeNet meshes, there are multiple texture maps per mesh and some faces have texture while other do not. For these cases, a more flexible representation is a texture atlas, where each face is represented as an `(RxR)` texture map where R is the texture resolution. For a given point on the face, the texture value can be sampled from the per face texture map using the barycentric coordinates of the point. This representation requires one tensor of shape `(N, F, R, R, 3)`. This texturing method is inspired by the SoftRasterizer implementation. For more details refer to the [`make_material_atlas`](https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/io/mtl_io.py#L123) and [`sample_textures`](https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/renderer/mesh/textures.py#L452) functions. **NOTE:**: The `TexturesAtlas` texture sampling is only differentiable with respect to the texture atlas but not differentiable with respect to the barycentric coordinates.
<img src="assets/texturing.jpg" width="1000">
@@ -116,7 +116,7 @@ raster_settings = RasterizationSettings(
faces_per_pixel=1,
)
# Create a phong renderer by composing a rasterizer and a shader. Here we can use a predefined
# Create a Phong renderer by composing a rasterizer and a shader. Here we can use a predefined
# PhongShader, passing in the device on which to initialize the default parameters
renderer = MeshRenderer(
rasterizer=MeshRasterizer(cameras=cameras, raster_settings=raster_settings),

View File

@@ -387,7 +387,7 @@
" device = device,\n",
" )\n",
"\n",
" # compute the relative cameras as a compositon of the absolute cameras\n",
" # compute the relative cameras as a composition of the absolute cameras\n",
" cameras_relative_composed = \\\n",
" get_relative_camera(cameras_absolute, relative_edges)\n",
"\n",

View File

@@ -223,9 +223,9 @@
"source": [
"### Create a renderer\n",
"\n",
"A **renderer** in PyTorch3D is composed of a **rasterizer** and a **shader** which each have a number of subcomponents such as a **camera** (orthgraphic/perspective). Here we initialize some of these components and use default values for the rest. \n",
"A **renderer** in PyTorch3D is composed of a **rasterizer** and a **shader** which each have a number of subcomponents such as a **camera** (orthographic/perspective). Here we initialize some of these components and use default values for the rest. \n",
"\n",
"For optimizing the camera position we will use a renderer which produces a **silhouette** of the object only and does not apply any **lighting** or **shading**. We will also initialize another renderer which applies full **phong shading** and use this for visualizing the outputs. "
"For optimizing the camera position we will use a renderer which produces a **silhouette** of the object only and does not apply any **lighting** or **shading**. We will also initialize another renderer which applies full **Phong shading** and use this for visualizing the outputs. "
]
},
{
@@ -266,7 +266,7 @@
")\n",
"\n",
"\n",
"# We will also create a phong renderer. This is simpler and only needs to render one face per pixel.\n",
"# We will also create a Phong renderer. This is simpler and only needs to render one face per pixel.\n",
"raster_settings = RasterizationSettings(\n",
" image_size=256, \n",
" blur_radius=0.0, \n",
@@ -322,15 +322,15 @@
"R, T = look_at_view_transform(distance, elevation, azimuth, device=device)\n",
"\n",
"# Render the teapot providing the values of R and T. \n",
"silhouete = silhouette_renderer(meshes_world=teapot_mesh, R=R, T=T)\n",
"silhouette = silhouette_renderer(meshes_world=teapot_mesh, R=R, T=T)\n",
"image_ref = phong_renderer(meshes_world=teapot_mesh, R=R, T=T)\n",
"\n",
"silhouete = silhouete.cpu().numpy()\n",
"silhouette = silhouette.cpu().numpy()\n",
"image_ref = image_ref.cpu().numpy()\n",
"\n",
"plt.figure(figsize=(10, 10))\n",
"plt.subplot(1, 2, 1)\n",
"plt.imshow(silhouete.squeeze()[..., 3]) # only plot the alpha channel of the RGBA image\n",
"plt.imshow(silhouette.squeeze()[..., 3]) # only plot the alpha channel of the RGBA image\n",
"plt.grid(False)\n",
"plt.subplot(1, 2, 2)\n",
"plt.imshow(image_ref.squeeze())\n",
@@ -377,7 +377,7 @@
" def forward(self):\n",
" \n",
" # Render the image using the updated camera position. Based on the new position of the \n",
" # camer we calculate the rotation and translation matrices\n",
" # camera we calculate the rotation and translation matrices\n",
" R = look_at_rotation(self.camera_position[None, :], device=self.device) # (1, 3, 3)\n",
" T = -torch.bmm(R.transpose(1, 2), self.camera_position[None, :, None])[:, :, 0] # (1, 3)\n",
" \n",

View File

@@ -190,7 +190,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can retrieve a model by indexing into the loaded dataset. For both ShapeNetCore and R2N2, we can examine the category this model belongs to (in the form of a synset id, equivalend to wnid described in ImageNet's API: http://image-net.org/download-API), its model id, and its vertices and faces."
"We can retrieve a model by indexing into the loaded dataset. For both ShapeNetCore and R2N2, we can examine the category this model belongs to (in the form of a synset id, equivalent to wnid described in ImageNet's API: http://image-net.org/download-API), its model id, and its vertices and faces."
]
},
{
@@ -254,11 +254,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Training deep learning models, usually requires passing in batches of inputs. The `torch.utils.data.DataLoader` from Pytorch helps us do this. PyTorch3D provides a function `collate_batched_meshes` to group the input meshes into a single `Meshes` object which represents the batch. The `Meshes` datastructure can then be used directly by other PyTorch3D ops which might be part of the deep learning model (e.g. `graph_conv`).\n",
"Training deep learning models, usually requires passing in batches of inputs. The `torch.utils.data.DataLoader` from PyTorch helps us do this. PyTorch3D provides a function `collate_batched_meshes` to group the input meshes into a single `Meshes` object which represents the batch. The `Meshes` datastructure can then be used directly by other PyTorch3D ops which might be part of the deep learning model (e.g. `graph_conv`).\n",
"\n",
"For R2N2, if all the models in the batch have the same number of views, the views, rotation matrices, translation matrices, intrinsic matrices and voxels will also be stacked into batched tensors.\n",
"\n",
"**NOTE**: All models in the `val` split of R2N2 have 24 views, but there are 8 models that split their 24 views between `train` and `test` splits, in which case `collate_batched_meshes` will only be able to join the matrices, views and voxels as lists. However, this can be avoided by laoding only one view of each model by setting `return_all_views = False`."
"**NOTE**: All models in the `val` split of R2N2 have 24 views, but there are 8 models that split their 24 views between `train` and `test` splits, in which case `collate_batched_meshes` will only be able to join the matrices, views and voxels as lists. However, this can be avoided by loading only one view of each model by setting `return_all_views = False`."
]
},
{
@@ -295,7 +295,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Render ShapeNetCore models with PyTorch3D's differntiable renderer"
"## 3. Render ShapeNetCore models with PyTorch3D's differentiable renderer"
]
},
{
@@ -450,7 +450,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we will visualize PyTorch3d's renderings:"
"Next, we will visualize PyTorch3D's renderings:"
]
},
{

View File

@@ -17,7 +17,7 @@
"\n",
"This tutorial shows how to fit Neural Radiance Field given a set of views of a scene using differentiable implicit function rendering.\n",
"\n",
"More specificially, this tutorial will explain how to:\n",
"More specifically, this tutorial will explain how to:\n",
"1. Create a differentiable implicit function renderer with either image-grid or Monte Carlo ray sampling.\n",
"2. Create an Implicit model of a scene.\n",
"3. Fit the implicit function (Neural Radiance Field) based on input images using the differentiable implicit renderer. \n",
@@ -158,9 +158,9 @@
"The following initializes an implicit renderer that emits a ray from each pixel of a target image and samples a set of uniformly-spaced points along the ray. At each ray-point, the corresponding density and color value is obtained by querying the corresponding location in the neural model of the scene (the model is described & instantiated in a later cell).\n",
"\n",
"The renderer is composed of a *raymarcher* and a *raysampler*.\n",
"- The *raysampler* is responsible for emiting rays from image pixels and sampling the points along them. Here, we use two different raysamplers:\n",
"- The *raysampler* is responsible for emitting rays from image pixels and sampling the points along them. Here, we use two different raysamplers:\n",
" - `MonteCarloRaysampler` is used to generate rays from a random subset of pixels of the image plane. The random subsampling of pixels is carried out during **training** to decrease the memory consumption of the implicit model.\n",
" - `NDCGridRaysampler` which follows the standard PyTorch3d coordinate grid convention (+X from right to left; +Y from bottom to top; +Z away from the user). In combination with the implicit model of the scene, `NDCGridRaysampler` consumes a large amount of memory and, hence, is only used for visualizing the results of the training at **test** time.\n",
" - `NDCGridRaysampler` which follows the standard PyTorch3D coordinate grid convention (+X from right to left; +Y from bottom to top; +Z away from the user). In combination with the implicit model of the scene, `NDCGridRaysampler` consumes a large amount of memory and, hence, is only used for visualizing the results of the training at **test** time.\n",
"- The *raymarcher* takes the densities and colors sampled along each ray and renders each ray into a color and an opacity value of the ray's source pixel. Here we use the `EmissionAbsorptionRaymarcher` which implements the standard Emission-Absorption raymarching algorithm."
]
},
@@ -186,7 +186,7 @@
"# 1) Instantiate the raysamplers.\n",
"\n",
"# Here, NDCGridRaysampler generates a rectangular image\n",
"# grid of rays whose coordinates follow the PyTorch3d\n",
"# grid of rays whose coordinates follow the PyTorch3D\n",
"# coordinate conventions.\n",
"raysampler_grid = NDCGridRaysampler(\n",
" image_height=render_size,\n",
@@ -236,7 +236,7 @@
"\n",
"The `forward` function of `NeuralRadianceField` (NeRF) receives as input a set of tensors that parametrize a bundle of rendering rays. The ray bundle is later converted to 3D ray points in the world coordinates of the scene. Each 3D point is then mapped to a harmonic representation using the `HarmonicEmbedding` layer (defined in the next cell). The harmonic embeddings then enter the _color_ and _opacity_ branches of the NeRF model in order to label each ray point with a 3D vector and a 1D scalar ranging in [0-1] which define the point's RGB color and opacity respectively.\n",
"\n",
"Since NeRF has a large memory footprint, we also implement the `NeuralRadianceField.forward_batched` method. The method splits the input rays into batches and executes the `forward` function for each batch separately in a for loop. This allows to render a large set of rays without running out of GPU memory. Standardly, `forward_batched` would be used to render rays emitted from all pixels of an image in order to produce a full-sized render of a scene.\n"
"Since NeRF has a large memory footprint, we also implement the `NeuralRadianceField.forward_batched` method. The method splits the input rays into batches and executes the `forward` function for each batch separately in a for loop. This lets us render a large set of rays without running out of GPU memory. Standardly, `forward_batched` would be used to render rays emitted from all pixels of an image in order to produce a full-sized render of a scene.\n"
]
},
{
@@ -266,7 +266,7 @@
" ]\n",
" \n",
" Note that `x` is also premultiplied by `omega0` before\n",
" evaluting the harmonic functions.\n",
" evaluating the harmonic functions.\n",
" \"\"\"\n",
" super().__init__()\n",
" self.register_buffer(\n",
@@ -417,7 +417,7 @@
"\n",
" Returns:\n",
" rays_densities: A tensor of shape `(minibatch, ..., num_points_per_ray, 1)`\n",
" denoting the opacitiy of each ray point.\n",
" denoting the opacity of each ray point.\n",
" rays_colors: A tensor of shape `(minibatch, ..., num_points_per_ray, 3)`\n",
" denoting the color of each ray point.\n",
" \"\"\"\n",
@@ -457,11 +457,11 @@
" This function is used to allow for memory efficient processing\n",
" of input rays. The input rays are first split to `n_batches`\n",
" chunks and passed through the `self.forward` function one at a time\n",
" in a for loop. Combined with disabling Pytorch gradient caching\n",
" in a for loop. Combined with disabling PyTorch gradient caching\n",
" (`torch.no_grad()`), this allows for rendering large batches\n",
" of rays that do not all fit into GPU memory in a single forward pass.\n",
" In our case, batched_forward is used to export a fully-sized render\n",
" of the radiance field for visualisation purposes.\n",
" of the radiance field for visualization purposes.\n",
" \n",
" Args:\n",
" ray_bundle: A RayBundle object containing the following variables:\n",
@@ -477,7 +477,7 @@
"\n",
" Returns:\n",
" rays_densities: A tensor of shape `(minibatch, ..., num_points_per_ray, 1)`\n",
" denoting the opacitiy of each ray point.\n",
" denoting the opacity of each ray point.\n",
" rays_colors: A tensor of shape `(minibatch, ..., num_points_per_ray, 3)`\n",
" denoting the color of each ray point.\n",
"\n",
@@ -576,12 +576,12 @@
" intermediate results of the learning. \n",
" \n",
" Since the `NeuralRadianceField` suffers from\n",
" a large memory footprint, which does not allow to\n",
" a large memory footprint, which does not let us\n",
" render the full image grid in a single forward pass,\n",
" we utilize the `NeuralRadianceField.batched_forward`\n",
" function in combination with disabling the gradient caching.\n",
" This chunks the set of emitted rays to batches and \n",
" evaluates the implicit function on one-batch at a time\n",
" evaluates the implicit function on one batch at a time\n",
" to prevent GPU memory overflow.\n",
" \"\"\"\n",
" \n",
@@ -720,7 +720,7 @@
" rendered_images_silhouettes.split([3, 1], dim=-1)\n",
" )\n",
" \n",
" # Compute the silhoutte error as the mean huber\n",
" # Compute the silhouette error as the mean huber\n",
" # loss between the predicted masks and the\n",
" # sampled target silhouettes.\n",
" silhouettes_at_rays = sample_images_at_mc_locs(\n",
@@ -818,7 +818,7 @@
" fov=target_cameras.fov[0],\n",
" device=device,\n",
" )\n",
" # Note that we again render with `NDCGridSampler`\n",
" # Note that we again render with `NDCGridRaySampler`\n",
" # and the batched_forward function of neural_radiance_field.\n",
" frames.append(\n",
" renderer_grid(\n",
@@ -841,7 +841,7 @@
"source": [
"## 6. Conclusion\n",
"\n",
"In this tutorial, we have shown how to optimize an implicit representation of a scene such that the renders of the scene from known viewpoints match the observed images for each viewpoint. The rendering was carried out using the Pytorch3D's implicit function renderer composed of either a `MonteCarloRaysampler` or `NDCGridRaysampler`, and an `EmissionAbsorptionRaymarcher`."
"In this tutorial, we have shown how to optimize an implicit representation of a scene such that the renders of the scene from known viewpoints match the observed images for each viewpoint. The rendering was carried out using the PyTorch3D's implicit function renderer composed of either a `MonteCarloRaysampler` or `NDCGridRaysampler`, and an `EmissionAbsorptionRaymarcher`."
]
}
],

View File

@@ -193,11 +193,11 @@
"source": [
"### 1. Load a mesh and texture file\n",
"\n",
"Load an `.obj` file and it's associated `.mtl` file and create a **Textures** and **Meshes** object. \n",
"Load an `.obj` file and its associated `.mtl` file and create a **Textures** and **Meshes** object. \n",
"\n",
"**Meshes** is a unique datastructure provided in PyTorch3D for working with batches of meshes of different sizes. \n",
"\n",
"**TexturesVertex** is an auxillary datastructure for storing vertex rgb texture information about meshes. \n",
"**TexturesVertex** is an auxiliary datastructure for storing vertex rgb texture information about meshes. \n",
"\n",
"**Meshes** has several class methods which are used throughout the rendering pipeline."
]
@@ -315,7 +315,7 @@
"# purposes only we will set faces_per_pixel=1 and blur_radius=0.0. Refer to \n",
"# rasterize_meshes.py for explanations of these parameters. We also leave \n",
"# bin_size and max_faces_per_bin to their default values of None, which sets \n",
"# their values using huristics and ensures that the faster coarse-to-fine \n",
"# their values using heuristics and ensures that the faster coarse-to-fine \n",
"# rasterization method is used. Refer to docs/notes/renderer.md for an \n",
"# explanation of the difference between naive and coarse-to-fine rasterization. \n",
"raster_settings = RasterizationSettings(\n",
@@ -324,8 +324,8 @@
" faces_per_pixel=1, \n",
")\n",
"\n",
"# Create a phong renderer by composing a rasterizer and a shader. The textured \n",
"# phong shader will interpolate the texture uv coordinates for each vertex, \n",
"# Create a Phong renderer by composing a rasterizer and a shader. The textured \n",
"# Phong shader will interpolate the texture uv coordinates for each vertex, \n",
"# sample from a texture image and apply the Phong lighting model\n",
"renderer = MeshRenderer(\n",
" rasterizer=MeshRasterizer(\n",
@@ -386,7 +386,7 @@
"id": "gOb4rYx65E8z"
},
"source": [
"Later in this tutorial, we will fit a mesh to the rendered RGB images, as well as to just images of just the cow silhouette. For the latter case, we will render a dataset of silhouette images. Most shaders in PyTorch3D will output an alpha channel along with the RGB image as a 4th channel in an RGBA image. The alpha channel encodes the probability that each pixel belongs to the foreground of the object. We contruct a soft silhouette shader to render this alpha channel."
"Later in this tutorial, we will fit a mesh to the rendered RGB images, as well as to just images of just the cow silhouette. For the latter case, we will render a dataset of silhouette images. Most shaders in PyTorch3D will output an alpha channel along with the RGB image as a 4th channel in an RGBA image. The alpha channel encodes the probability that each pixel belongs to the foreground of the object. We construct a soft silhouette shader to render this alpha channel."
]
},
{
@@ -607,7 +607,7 @@
"id": "QLc9zK8lEqFS"
},
"source": [
"We write an optimization loop to iteratively refine our predicted mesh from the sphere mesh into a mesh that matches the sillhouettes of the target images:"
"We write an optimization loop to iteratively refine our predicted mesh from the sphere mesh into a mesh that matches the silhouettes of the target images:"
]
},
{

View File

@@ -17,7 +17,7 @@
"\n",
"This tutorial shows how to fit a volume given a set of views of a scene using differentiable volumetric rendering.\n",
"\n",
"More specificially, this tutorial will explain how to:\n",
"More specifically, this tutorial will explain how to:\n",
"1. Create a differentiable volumetric renderer.\n",
"2. Create a Volumetric model (including how to use the `Volumes` class).\n",
"3. Fit the volume based on the images using the differentiable volumetric renderer. \n",
@@ -138,7 +138,7 @@
"The following initializes a volumetric renderer that emits a ray from each pixel of a target image and samples a set of uniformly-spaced points along the ray. At each ray-point, the corresponding density and color value is obtained by querying the corresponding location in the volumetric model of the scene (the model is described & instantiated in a later cell).\n",
"\n",
"The renderer is composed of a *raymarcher* and a *raysampler*.\n",
"- The *raysampler* is responsible for emiting rays from image pixels and sampling the points along them. Here, we use the `NDCGridRaysampler` which follows the standard PyTorch3D coordinate grid convention (+X from right to left; +Y from bottom to top; +Z away from the user).\n",
"- The *raysampler* is responsible for emitting rays from image pixels and sampling the points along them. Here, we use the `NDCGridRaysampler` which follows the standard PyTorch3D coordinate grid convention (+X from right to left; +Y from bottom to top; +Z away from the user).\n",
"- The *raymarcher* takes the densities and colors sampled along each ray and renders each ray into a color and an opacity value of the ray's source pixel. Here we use the `EmissionAbsorptionRaymarcher` which implements the standard Emission-Absorption raymarching algorithm."
]
},
@@ -161,11 +161,11 @@
"\n",
"# 1) Instantiate the raysampler.\n",
"# Here, NDCGridRaysampler generates a rectangular image\n",
"# grid of rays whose coordinates follow the pytorch3d\n",
"# grid of rays whose coordinates follow the PyTorch3D\n",
"# coordinate conventions.\n",
"# Since we use a volume of size 128^3, we sample n_pts_per_ray=150,\n",
"# which roughly corresponds to a one ray-point per voxel.\n",
"# We futher set the min_depth=0.1 since there is no surface within\n",
"# We further set the min_depth=0.1 since there is no surface within\n",
"# 0.1 units of any camera plane.\n",
"raysampler = NDCGridRaysampler(\n",
" image_width=render_size,\n",
@@ -334,7 +334,7 @@
" batch_cameras\n",
" ).split([3, 1], dim=-1)\n",
" \n",
" # Compute the silhoutte error as the mean huber\n",
" # Compute the silhouette error as the mean huber\n",
" # loss between the predicted masks and the\n",
" # target silhouettes.\n",
" sil_err = huber(\n",

View File

@@ -157,7 +157,7 @@
"source": [
"## Create a renderer\n",
"\n",
"A renderer in PyTorch3D is composed of a **rasterizer** and a **shader** which each have a number of subcomponents such as a **camera** (orthgraphic/perspective). Here we initialize some of these components and use default values for the rest.\n",
"A renderer in PyTorch3D is composed of a **rasterizer** and a **shader** which each have a number of subcomponents such as a **camera** (orthographic/perspective). Here we initialize some of these components and use default values for the rest.\n",
"\n",
"In this example we will first create a **renderer** which uses an **orthographic camera**, and applies **alpha compositing**. Then we learn how to vary different components using the modular API. \n",
"\n",

View File

@@ -277,7 +277,7 @@
"\n",
"**Meshes** is a unique datastructure provided in PyTorch3D for working with batches of meshes of different sizes.\n",
"\n",
"**TexturesUV** is an auxillary datastructure for storing vertex uv and texture maps for meshes."
"**TexturesUV** is an auxiliary datastructure for storing vertex uv and texture maps for meshes."
]
},
{
@@ -320,7 +320,7 @@
"# Place a point light in front of the person. \n",
"lights = PointLights(device=device, location=[[0.0, 0.0, 2.0]])\n",
"\n",
"# Create a phong renderer by composing a rasterizer and a shader. The textured phong shader will \n",
"# Create a Phong renderer by composing a rasterizer and a shader. The textured Phong shader will \n",
"# interpolate the texture uv coordinates for each vertex, sample from a texture image and \n",
"# apply the Phong lighting model\n",
"renderer = MeshRenderer(\n",

View File

@@ -191,11 +191,11 @@
"source": [
"### 1. Load a mesh and texture file\n",
"\n",
"Load an `.obj` file and it's associated `.mtl` file and create a **Textures** and **Meshes** object. \n",
"Load an `.obj` file and its associated `.mtl` file and create a **Textures** and **Meshes** object. \n",
"\n",
"**Meshes** is a unique datastructure provided in PyTorch3D for working with batches of meshes of different sizes. \n",
"\n",
"**TexturesUV** is an auxillary datastructure for storing vertex uv and texture maps for meshes. \n",
"**TexturesUV** is an auxiliary datastructure for storing vertex uv and texture maps for meshes. \n",
"\n",
"**Meshes** has several class methods which are used throughout the rendering pipeline."
]
@@ -317,7 +317,7 @@
"\n",
"A renderer in PyTorch3D is composed of a **rasterizer** and a **shader** which each have a number of subcomponents such as a **camera** (orthographic/perspective). Here we initialize some of these components and use default values for the rest.\n",
"\n",
"In this example we will first create a **renderer** which uses a **perspective camera**, a **point light** and applies **phong shading**. Then we learn how to vary different components using the modular API. "
"In this example we will first create a **renderer** which uses a **perspective camera**, a **point light** and applies **Phong shading**. Then we learn how to vary different components using the modular API. "
]
},
{
@@ -352,7 +352,7 @@
"# -z direction. \n",
"lights = PointLights(device=device, location=[[0.0, 0.0, -3.0]])\n",
"\n",
"# Create a phong renderer by composing a rasterizer and a shader. The textured phong shader will \n",
"# Create a Phong renderer by composing a rasterizer and a shader. The textured Phong shader will \n",
"# interpolate the texture uv coordinates for each vertex, sample from a texture image and \n",
"# apply the Phong lighting model\n",
"renderer = MeshRenderer(\n",
@@ -418,7 +418,7 @@
"source": [
"## 4. Move the light behind the object and re-render\n",
"\n",
"We can pass arbirary keyword arguments to the `rasterizer`/`shader` via the call to the `renderer` so the renderer does not need to be reinitialized if any of the settings change/\n",
"We can pass arbitrary keyword arguments to the `rasterizer`/`shader` via the call to the `renderer` so the renderer does not need to be reinitialized if any of the settings change/\n",
"\n",
"In this case, we can simply update the location of the lights and pass them into the call to the renderer. \n",
"\n",
@@ -579,7 +579,7 @@
},
"outputs": [],
"source": [
"# We can pass arbirary keyword arguments to the rasterizer/shader via the renderer\n",
"# We can pass arbitrary keyword arguments to the rasterizer/shader via the renderer\n",
"# so the renderer does not need to be reinitialized if any of the settings change.\n",
"images = renderer(meshes, cameras=cameras, lights=lights)"
]

View File

@@ -113,15 +113,15 @@ def generate_cow_renders(
# purposes only we will set faces_per_pixel=1 and blur_radius=0.0. Refer to
# rasterize_meshes.py for explanations of these parameters. We also leave
# bin_size and max_faces_per_bin to their default values of None, which sets
# their values using huristics and ensures that the faster coarse-to-fine
# their values using heuristics and ensures that the faster coarse-to-fine
# rasterization method is used. Refer to docs/notes/renderer.md for an
# explanation of the difference between naive and coarse-to-fine rasterization.
raster_settings = RasterizationSettings(
image_size=128, blur_radius=0.0, faces_per_pixel=1
)
# Create a phong renderer by composing a rasterizer and a shader. The textured
# phong shader will interpolate the texture uv coordinates for each vertex,
# Create a Phong renderer by composing a rasterizer and a shader. The textured
# Phong shader will interpolate the texture uv coordinates for each vertex,
# sample from a texture image and apply the Phong lighting model
blend_params = BlendParams(sigma=1e-4, gamma=1e-4, background_color=(0.0, 0.0, 0.0))
renderer = MeshRenderer(