Summary:
https://github.com/facebookresearch/pytorch3d/issues/561https://github.com/facebookresearch/pytorch3d/issues/790
Divide by zero fix (NaN fix). When perspective_correct=True, BarycentricPerspectiveCorrectionForward and BarycentricPerspectiveCorrectionBackward in ../csrc/utils/geometry_utils.cuh are called. The denominator (denom) values should not be allowed to go to zero. I'm able to resolve this issue locally with this PR and submit it for the team's review.
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/891
Reviewed By: patricklabatut
Differential Revision: D31829695
Pulled By: bottler
fbshipit-source-id: a3517b8362f6e60d48c35731258d8ce261b1d912
Summary: Keep using at:: instead of torch:: so we don't need torch/extension.h and can keep other compilers happy.
Reviewed By: patricklabatut
Differential Revision: D31688436
fbshipit-source-id: 1825503da0104acaf1558d17300c02ef663bf538
Summary: Few tweaks to make CUDA build on windows happier, as remarked in #876.
Reviewed By: patricklabatut
Differential Revision: D31688188
fbshipit-source-id: 20816d6215f2e3ec898f81ae4221b1c2ff24b64f
Summary: The epsilon value is important for determining whether vertices are inside/outside a plane.
Reviewed By: gkioxari
Differential Revision: D31485247
fbshipit-source-id: 5517575de7c02f1afa277d00e0190a81f44f5761
Summary: Added CUDA implementation to match the new, still unused, C++ function for the core of points2vols.
Reviewed By: nikhilaravi
Differential Revision: D29548608
fbshipit-source-id: 16ebb61787fcb4c70461f9215a86ad5f97aecb4e
Summary: Single C++ function for the core of points2vols, not used anywhere yet. Added ability to control align_corners and the weight of each point, which may be useful later.
Reviewed By: nikhilaravi
Differential Revision: D29548607
fbshipit-source-id: a5cda7ec2c14836624e7dfe744c4bbb3f3d3dfe2
Summary: C++ Implementation of algorithm to compute 3D bounding boxes for batches of bboxes of shape (N, 8, 3) and (M, 8, 3).
Reviewed By: gkioxari
Differential Revision: D30905190
fbshipit-source-id: 02e2cf025cd4fa3ff706ce5cf9b82c0fb5443f96
Summary: Attempt to fix#659, an observation that the rasterizer is nondeterministic, by resolving tied faces by picking those with lower index.
Reviewed By: nikhilaravi, patricklabatut
Differential Revision: D30699039
fbshipit-source-id: 39ed797eb7e9ce7370ae71259ad6b757f9449923
Summary: Unlike other cu files, sigmoid_alpha_blend uses torch/extension.h. Avoid for possible build speed win and because of a reported problem #843 on windows with CUDA 11.4.
Reviewed By: nikhilaravi
Differential Revision: D31054121
fbshipit-source-id: 53a1f985a1695a044dfd2ee1a5b0adabdf280595
Summary: Rename sample_farthest_point.cpp to not match its CUDA equivalent.
Reviewed By: nikhilaravi
Differential Revision: D31006645
fbshipit-source-id: 135b511cbde320d2b3e07fc5b027971ef9210aa9
Summary: Remove use of nonstandard C++. Noticed on windows in issue https://github.com/facebookresearch/pytorch3d/issues/843. (We use `__restrict__` in CUDA, where it is fine, even on windows)
Reviewed By: nikhilaravi
Differential Revision: D31006516
fbshipit-source-id: 929ba9b3216cb70fad3ffa3274c910618d83973f
Summary:
CUDA implementation of farthest point sampling algorithm.
## Visual comparison
Compared to random sampling, farthest point sampling gives better coverage of the shape.
{F658631262}
## Reduction
Parallelized block reduction to find the max value at each iteration happens as follows:
1. First split the points into two equal sized parts (e.g. for a list with 8 values):
`[20, 27, 6, 8 | 11, 10, 2, 33]`
2. Use half of the thread (4 threads) to compare pairs of elements from each half (e.g elements [0, 4], [1, 5] etc) and store the result in the first half of the list:
`[20, 27, 6, 33 | 11, 10, 2, 33]`
Now we no longer care about the second part but again divide the first part into two
`[20, 27 | 6, 33| -, -, -, -]`
Now we can use 2 threads to compare the 4 elements
4. Finally we have gotten down to a single pair
`[20 | 33 | -, - | -, -, -, -]`
Use 1 thread to compare the remaining two elements
5. The max will now be at thread id = 0
`[33 | - | -, - | -, -, -, -]`
The reduction will give the farthest point for the selected batch index at this iteration.
Reviewed By: bottler, jcjohnson
Differential Revision: D30401803
fbshipit-source-id: 525bd5ae27c4b13b501812cfe62306bb003827d2
Summary:
There has historically been a lot of duplication between the coarse rasterization logic for point clouds and meshes. This diff factors out the shared logic, so coarse rasterization of point clouds and meshes share the same core logic.
Previously the only difference between the coarse rasterization kernels for points and meshes was the logic for checking whether a {point / triangle} intersects a tile in the image. We implement a generic coarse rasterization kernel that takes a set of 2D bounding boxes rather than geometric primitives; we then implement separate kernels that compute 2D bounding boxes for points and triangles.
This change does not affect the Python API at all. It also should not change any rasterization behavior, since this diff is just a refactoring of the existing logic.
I see this diff as the first in a few pieces of rasterizer refactoring. Followup diffs should do the following:
- Add a check for bin overflow in the generic coarse rasterizer kernel: allocate a global scalar to flag bin overflow which kernel worker threads can write to in case they detect bin overflow. The C++ launcher function can then check this flag after the kernel returns and issue a warning to the user in case of overflow.
- As a slightly more involved mechanism, if bin overflow is detected then the coarse kernel can continue running in order to count how many elements fall into each bin, without actually writing out their indices to the coarse output tensor. Then the actual number of entries per bin can be used to re-allocate the output tensor and re-run the coarse rasterization kernel so that bin overflow can be automatically avoided.
- The unification of the coarse and fine rasterization kernels also allows us to insert an extra CUDA kernel prior to coarse rasterization that filters out primitives outside the view frustum. This would be helpful for rendering full scenes (e.g. Matterport data) where only a small piece of the mesh is actually visible at any one time.
Reviewed By: bottler
Differential Revision: D25710361
fbshipit-source-id: 9c9dea512cb339c42adb3c92e7733fedd586ce1b
Summary: Renaming parts of the mesh coarse rasterization and separating the bounding box calculation. All in preparation for sharing code with point rasterization.
Reviewed By: bottler
Differential Revision: D30369112
fbshipit-source-id: 3508c0b1239b355030cfa4038d5f3d6a945ebbf4
Summary: In preparation for sharing coarse rasterization between point clouds and meshes, move the functions to a new file. No code changes.
Reviewed By: bottler
Differential Revision: D30367812
fbshipit-source-id: 9e73835a26c4ac91f5c9f61ff682bc8218e36c6a
Summary: Implement the sample_pdf function from the NeRF project as compiled operators.. The binary search (in searchsorted) is replaced with a low tech linear search, but this is not a problem for the envisaged numbers of bins.
Reviewed By: gkioxari
Differential Revision: D26312535
fbshipit-source-id: df1c3119cd63d944380ed1b2657b6ad81d743e49
Summary:
Implementation of ball query from PointNet++. This function is similar to KNN (find the neighbors in p2 for all points in p1). These are the key differences:
- It will return the **first** K neighbors within a specified radius as opposed to the **closest** K neighbors.
- As all the points in p2 do not need to be considered to find the closest K, the algorithm is much faster than KNN when p2 has a large number of points.
- The neighbors are not sorted
- Due to the radius threshold it is not guaranteed that there will be K neighbors even if there are more than K points in p2.
- The padding value for `idx` is -1 instead of 0.
# Note:
- Some of the code is very similar to KNN so it could be possible to modify the KNN forward kernels to support ball query.
- Some users might want to use kNN with ball query - for this we could provide a wrapper function around the current `knn_points` which enables applying the radius threshold afterwards as an alternative. This could be called `ball_query_knn`.
Reviewed By: jcjohnson
Differential Revision: D30261362
fbshipit-source-id: 66b6a7e0114beff7164daf7eba21546ff41ec450
Summary: An early-return test for gradient calculation did not include the opacity gradient calculation - hence would also return early without calculating gradients even if opacity gradients are required.
Reviewed By: bottler
Differential Revision: D29505684
fbshipit-source-id: 575e820b8f58b19476b2fe3288702806733e840b
Summary:
Fix small face issue for point_mesh distance computation.
The issue lies in the computation of `IsInsideTriangle` which is unstable and non-symmetrical when faces with small areas are given as input. This diff fixes the issue by returning `False` for `IsInsideTriangle` when small faces are given as input.
Reviewed By: bottler
Differential Revision: D29163052
fbshipit-source-id: be297002f26b5e6eded9394fde00553a37406bee
Summary:
- Fix the calculation of the non square NDC range when the H and W are not integer multiples.
- Add test for this case
Reviewed By: gkioxari
Differential Revision: D26613213
fbshipit-source-id: df6763cac602e9f1d516b41b432c4d2cfbaa356d
Summary: One step in finding all the pairs of vertices which share faces is a simple calculation but annoying to parallelize. It was implemented in pure Python. We move it to C++. We still pull the data to the CPU and put the answer back on the device.
Reviewed By: nikhilaravi, gkioxari
Differential Revision: D26073475
fbshipit-source-id: ffbf4e2c347a511ab5084bceff600465812b6a52
Summary:
Fixes mostly related to the "main" build on circleci.
-Avoid error to do with tuple copy from initializer_list which is `explicit` on old compiler.
-Add better reporting to copyright test.
-Move to PackedTensorAccessor64 from the deprecated PackedTensorAccessor
-Avoid some warnings about mismatched comparisons.
The "main" build is the only one that runs the test_build stuff. In that area
-Fix my bad copyright fix D26275931 (3463f418b8) / 965c9c
-Add test that all tutorials are valid json.
Reviewed By: nikhilaravi
Differential Revision: D26366466
fbshipit-source-id: c4ab8b7e6647987069f7cb7144aa6ab7c24bcdac
Summary:
- Updated the C++/CUDA mesh rasterization kernels to handle the clipped faces. In particular this required careful handling of the distance calculation for faces which are cut into a quadrilateral by the image plane and then split into two sub triangles i.e. both sub triangles can't be part of the top K faces.
- Updated `rasterize_meshes.py` to use the utils functions to clip the meshes and convert the fragments back to in terms of the unclipped mesh
- Added end to end tests
Reviewed By: jcjohnson
Differential Revision: D26169685
fbshipit-source-id: d64cd0d656109b965f44a35c301b7c81f451cfa0
Summary:
Fixes the assertion that `p1` and `p2` have the same last dimension. The issue was that `D` is set to equal `p2.size(2)`, and then `D` is compared to `p2.size(2)`. The fix instead compares `D` to `p1.size(2).
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/524
Reviewed By: bottler
Differential Revision: D26008688
Pulled By: nikhilaravi
fbshipit-source-id: e32afe9da127d81b1a411d3c223b539a7400597b
Summary:
Similar to non square image rasterization for meshes, apply the same updates to the pointcloud rasterizer.
Main API Change:
- PointRasterizationSettings now accepts a tuple/list of (H, W) for the image size.
Reviewed By: jcjohnson
Differential Revision: D25465206
fbshipit-source-id: 7370d83c431af1b972158cecae19d82364623380
Summary:
There are a couple of options for supporting non square images:
1) NDC stays at [-1, 1] in both directions with the distance calculations all modified by (W/H). There are a lot of distance based calculations (e.g. triangle areas for barycentric coordinates etc) so this requires changes in many places.
2) NDC is scaled by (W/H) so the smallest side has [-1, 1]. In this case none of the distance calculations need to be updated and only the pixel to NDC calculation needs to be modified.
I decided to go with option 2 after trying option 1!
API Changes:
- Image size can now be specified optionally as a tuple
TODO:
- add a benchmark test for the non square case.
Reviewed By: jcjohnson
Differential Revision: D24404975
fbshipit-source-id: 545efb67c822d748ec35999b35762bce58db2cf4
Summary: This fixed#442 by declaring two math functions to be device-only.
Reviewed By: bottler
Differential Revision: D24896992
fbshipit-source-id: a15918d06d2a3e6ee5cf250fec7af5f2f50a6164
Summary:
Changes to CI and some minor fixes now that pulsar is part of pytorch3d. Most significantly, add CUB to CI builds.
Make CUB_HOME override the CUB already in cudatoolkit (important for cuda11.0 which uses cub 1.9.9 which pulsar doesn't work well with.
Make imageio available for testing.
Lint fixes.
Fix some test verbosity.
Avoid use of atomicAdd_block on older GPUs.
Reviewed By: nikhilaravi, classner
Differential Revision: D24773716
fbshipit-source-id: 2428356bb2e62735f2bc0c15cbe4cff35b1b24b8
Summary:
Removes the now-unnecessary kernels from point mesh edge file
Migrates all point mesh functionality into one file.
Reviewed By: gkioxari
Differential Revision: D24550086
fbshipit-source-id: f924996cd38a7c2c1cf189d8a01611de4506cfa3
Summary: This diff creates the generic MeshBackwardKernel which can handle distance calculations between point, edge and faces in either direction. Replaces only point_mesh_face code for now.
Reviewed By: gkioxari
Differential Revision: D24549374
fbshipit-source-id: 2853c1da1c2a6b6de8d0e40007ba0735b8959044
Summary: This diff creates the generic MeshForwardKernel which can handle distance calculations between point, edge and faces in either direction. Replaces only point_mesh_face code for now.
Reviewed By: gkioxari
Differential Revision: D24543316
fbshipit-source-id: 302707d7cec2d77a899738adf40481035c240da8
Summary: Added missing include for cstdint for Windows and removed problematic inline assembly.
Reviewed By: bottler
Differential Revision: D24838053
fbshipit-source-id: 95496be841c2c22a82068073d4740e98ee8a02ac
Summary: This diff updates the documentation and tutorials with information about the new pulsar backend. For more information about the pulsar backend, see the release notes and the paper (https://arxiv.org/abs/2004.07484). For information on how to use the backend, see the point cloud rendering notebook and the examples in the folder docs/examples.
Reviewed By: nikhilaravi
Differential Revision: D24498129
fbshipit-source-id: e312b0169a72b13590df6e4db36bfe6190d742f9
Summary:
This diff integrates the pulsar renderer source code into PyTorch3D as an alternative backend for the PyTorch3D point renderer. This diff is the first of a series of three diffs to complete that migration and focuses on the packaging and integration of the source code.
For more information about the pulsar backend, see the release notes and the paper (https://arxiv.org/abs/2004.07484). For information on how to use the backend, see the point cloud rendering notebook and the examples in the folder `docs/examples`.
Tasks addressed in the following diffs:
* Add the PyTorch3D interface,
* Add notebook examples and documentation (or adapt the existing ones to feature both interfaces).
Reviewed By: nikhilaravi
Differential Revision: D23947736
fbshipit-source-id: a5e77b53e6750334db22aefa89b4c079cda1b443
Summary:
faces_uvs_packed and verts_uvs_packed were only used in one place and the definition of the former was ambiguous. This meant that the wrong coordinates could be used for meshes other than the first in the batch. I have therefore removed both functions and build their common result inline. Added a test that a simple batch of two meshes is rendered consistently with the rendering of each alone. This test would have failed before.
I hope this fixes https://github.com/facebookresearch/pytorch3d/issues/283.
Some other small improvements to the textures code.
Reviewed By: nikhilaravi
Differential Revision: D23161936
fbshipit-source-id: f99b560a46f6b30262e07028b049812bc04350a7
Summary: A triangle is culled if any vertex in a triangle is behind the camera. This fixes incorrect rendering of triangles that are partially behind the camera, where screen coordinate calculations are strange. It doesn't work for triangles that are partially behind the camera but still intersect with the view frustum.
Reviewed By: nikhilaravi
Differential Revision: D22856181
fbshipit-source-id: a9cbaa1327d89601b83d0dfd3e4a04f934a4a213
Summary:
Added support for barycentric clipping in the C++/CUDA rasterization kernels which can be switched on/off via a rasterization setting.
Added tests and a benchmark to compare with the current implementation in PyTorch - for some cases of large image size/faces per pixel the cuda version is 10x faster.
Reviewed By: gkioxari
Differential Revision: D21705503
fbshipit-source-id: e835c0f927f1e5088ca89020aef5ff27ac3a8769
Summary:
C++/CUDA implementation of forward and backward passes for the sigmoid alpha blending function.
This is slightly faster than the vectorized implementation in Python, but more importantly uses less memory due to fewer tensors being created.
Reviewed By: gkioxari
Differential Revision: D19980671
fbshipit-source-id: 0779055d2c68b1f20fb0870e60046077ef4613ff
Summary: When rendering meshes with Phong shading, interpolate_face_attributes was taking up a nontrivial fraction of the overall shading time. This diff replaces our Python implementation of this function with a CUDA implementation.
Reviewed By: nikhilaravi
Differential Revision: D21610763
fbshipit-source-id: 2bb362a28f698541812aeab539047264b125ebb8
Summary: Fix the new CPU implementation of point_mesh functionality to be compatible with older C++.
Reviewed By: nikhilaravi
Differential Revision: D22066785
fbshipit-source-id: a245849342019a93ff68e186a10985458b007436