Summary: Couldn't build p3d on devfair because C++17 is unsupported. Two structured bindings sneaked in.
Reviewed By: bottler
Differential Revision: D40280967
fbshipit-source-id: 9627f3f9f76247a6cefbeac067fdead67c6f4e14
Summary:
Torch C++ extension for Marching Cubes
- Add torch C++ extension for marching cubes. Observe a speed up of ~255x-324x speed up (over varying batch sizes and spatial resolutions)
- Add C++ impl in existing unit-tests.
(Note: this ignores all push blocking failures!)
Reviewed By: kjchalup
Differential Revision: D39590638
fbshipit-source-id: e44d2852a24c2c398e5ea9db20f0dfaa1817e457
Summary:
Threaded the for loop:
```
for (int yi = 0; yi < H; ++yi) {...}
```
in function `RasterizeMeshesNaiveCpu()`.
Chunk size is approx equal.
Reviewed By: bottler
Differential Revision: D40063604
fbshipit-source-id: 09150269405538119b0f1b029892179501421e68
Summary: D38919607 (c4545a7cbc) and D38858887 (06cbba2628) were premature, turns out CUDA 10.2 doesn't support C++17.
Reviewed By: bottler
Differential Revision: D39156205
fbshipit-source-id: 5e2e84cc4a57d1113a915166631651d438540d56
Summary: Fix EPS issue that causes numerical instabilities when boxes are very close
Reviewed By: kjchalup
Differential Revision: D38661465
fbshipit-source-id: d2b6753cba9dc2f0072ace5289c9aa815a1a29f6
Summary: Removing hardcoded block reduction operation from `sample_farthest_points.cu` code, and replace it with `cub::BlockReduce` reducing complexity of the code, and letting established libraries do the thinking for us.
Reviewed By: bottler
Differential Revision: D38617147
fbshipit-source-id: b230029c55f05cda0aab1648d3105a8d3e92d27b
Summary:
Added L1 norm for KNN and chamfer op
* The norm is now specified with a variable `norm` which can only be 1 or 2
Reviewed By: bottler
Differential Revision: D35419637
fbshipit-source-id: 77813fec650b30c28342af90d5ed02c89133e136
Summary: bin_size should be 0 not -1 for naive rasterization. See https://github.com/facebookresearch/pytorch3d/issues/1129
Reviewed By: patricklabatut
Differential Revision: D35077115
fbshipit-source-id: b81ff74f47c78429977802f7dcadfd1b96676f8c
Summary: Attempt to reduce nvcc trouble on windows by (1) avoiding flag for c++14 and (2) avoiding `torch/extension.h`, which introduces pybind11, in `.cu` files.
Reviewed By: patricklabatut
Differential Revision: D34969868
fbshipit-source-id: f3878d6a2ba9d644e87ae7b6377cb5008b4b6ce3
Summary:
1. changed IsInsideTriangle in geometry_utils to take in min_triangle_area parameter instead of hardcoded value
2. updated point_mesh_cpu.cpp and point_mesh_cuda.[h/cu] to adapt to changes in geometry_utils function signatures
3. updated point_mesh_distance.py and test_point_mesh_distance.py to modify _C. calls
Reviewed By: bottler
Differential Revision: D34459764
fbshipit-source-id: 0549e78713c6d68f03d85fb597a13dd88e09b686
Summary: Small fix by adjusting the area `eps` to account for really small faces when computing point to face distances
Reviewed By: bottler
Differential Revision: D34331336
fbshipit-source-id: 51c4888ea46fefa4e31d5b0bb494a9f9d77813cd
Summary: Lower the epsilon value in the IoU3D calculation to fix small numerical issue from GH#1082
Reviewed By: bottler
Differential Revision: D34371597
fbshipit-source-id: 12443fa359b7755ef4ae60e9adf83734a1a295ae
Summary: Since coarse rasterization on cuda can overflow bins, we detect when this happens for memory safety. See https://github.com/facebookresearch/pytorch3d/issues/348 . Also try to print a warning.
Reviewed By: patricklabatut
Differential Revision: D33065604
fbshipit-source-id: 99b3c576d01b78e6d77776cf1a3e95984506c93a
Summary: Update all FB license strings to the new format.
Reviewed By: patricklabatut
Differential Revision: D33403538
fbshipit-source-id: 97a4596c5c888f3c54f44456dc07e718a387a02c
Summary:
A small numerical fix for IoU for 3D boxes, fixes GH #992
* Adds a check for boxes with zero side areas (invalid boxes)
* Fixes numerical issue when two boxes have coplanar sides
Reviewed By: nikhilaravi
Differential Revision: D33195691
fbshipit-source-id: 8a34b4d1f1e5ec2edb6d54143930da44bdde0906
Summary: Restore compatibility with old C++ after recent torch change. https://github.com/facebookresearch/pytorch3d/issues/995
Reviewed By: patricklabatut
Differential Revision: D33093174
fbshipit-source-id: 841202fb875d601db265e93dcf9cfa4249d02b25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69041
`TH_CONCAT_{N}` is still being used by THP so I've moved that into
it's own header but all the compiled code is gone.
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D32872477
Pulled By: ngimel
fbshipit-source-id: 06c82d8f96dbcee0715be407c61dfc7d7e8be47a
Summary:
https://github.com/facebookresearch/pytorch3d/issues/561https://github.com/facebookresearch/pytorch3d/issues/790
Divide by zero fix (NaN fix). When perspective_correct=True, BarycentricPerspectiveCorrectionForward and BarycentricPerspectiveCorrectionBackward in ../csrc/utils/geometry_utils.cuh are called. The denominator (denom) values should not be allowed to go to zero. I'm able to resolve this issue locally with this PR and submit it for the team's review.
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/891
Reviewed By: patricklabatut
Differential Revision: D31829695
Pulled By: bottler
fbshipit-source-id: a3517b8362f6e60d48c35731258d8ce261b1d912
Summary: Keep using at:: instead of torch:: so we don't need torch/extension.h and can keep other compilers happy.
Reviewed By: patricklabatut
Differential Revision: D31688436
fbshipit-source-id: 1825503da0104acaf1558d17300c02ef663bf538
Summary: Few tweaks to make CUDA build on windows happier, as remarked in #876.
Reviewed By: patricklabatut
Differential Revision: D31688188
fbshipit-source-id: 20816d6215f2e3ec898f81ae4221b1c2ff24b64f
Summary: The epsilon value is important for determining whether vertices are inside/outside a plane.
Reviewed By: gkioxari
Differential Revision: D31485247
fbshipit-source-id: 5517575de7c02f1afa277d00e0190a81f44f5761
Summary: Added CUDA implementation to match the new, still unused, C++ function for the core of points2vols.
Reviewed By: nikhilaravi
Differential Revision: D29548608
fbshipit-source-id: 16ebb61787fcb4c70461f9215a86ad5f97aecb4e
Summary: Single C++ function for the core of points2vols, not used anywhere yet. Added ability to control align_corners and the weight of each point, which may be useful later.
Reviewed By: nikhilaravi
Differential Revision: D29548607
fbshipit-source-id: a5cda7ec2c14836624e7dfe744c4bbb3f3d3dfe2
Summary: C++ Implementation of algorithm to compute 3D bounding boxes for batches of bboxes of shape (N, 8, 3) and (M, 8, 3).
Reviewed By: gkioxari
Differential Revision: D30905190
fbshipit-source-id: 02e2cf025cd4fa3ff706ce5cf9b82c0fb5443f96
Summary: Attempt to fix#659, an observation that the rasterizer is nondeterministic, by resolving tied faces by picking those with lower index.
Reviewed By: nikhilaravi, patricklabatut
Differential Revision: D30699039
fbshipit-source-id: 39ed797eb7e9ce7370ae71259ad6b757f9449923
Summary: Unlike other cu files, sigmoid_alpha_blend uses torch/extension.h. Avoid for possible build speed win and because of a reported problem #843 on windows with CUDA 11.4.
Reviewed By: nikhilaravi
Differential Revision: D31054121
fbshipit-source-id: 53a1f985a1695a044dfd2ee1a5b0adabdf280595
Summary: Rename sample_farthest_point.cpp to not match its CUDA equivalent.
Reviewed By: nikhilaravi
Differential Revision: D31006645
fbshipit-source-id: 135b511cbde320d2b3e07fc5b027971ef9210aa9
Summary: Remove use of nonstandard C++. Noticed on windows in issue https://github.com/facebookresearch/pytorch3d/issues/843. (We use `__restrict__` in CUDA, where it is fine, even on windows)
Reviewed By: nikhilaravi
Differential Revision: D31006516
fbshipit-source-id: 929ba9b3216cb70fad3ffa3274c910618d83973f
Summary:
CUDA implementation of farthest point sampling algorithm.
## Visual comparison
Compared to random sampling, farthest point sampling gives better coverage of the shape.
{F658631262}
## Reduction
Parallelized block reduction to find the max value at each iteration happens as follows:
1. First split the points into two equal sized parts (e.g. for a list with 8 values):
`[20, 27, 6, 8 | 11, 10, 2, 33]`
2. Use half of the thread (4 threads) to compare pairs of elements from each half (e.g elements [0, 4], [1, 5] etc) and store the result in the first half of the list:
`[20, 27, 6, 33 | 11, 10, 2, 33]`
Now we no longer care about the second part but again divide the first part into two
`[20, 27 | 6, 33| -, -, -, -]`
Now we can use 2 threads to compare the 4 elements
4. Finally we have gotten down to a single pair
`[20 | 33 | -, - | -, -, -, -]`
Use 1 thread to compare the remaining two elements
5. The max will now be at thread id = 0
`[33 | - | -, - | -, -, -, -]`
The reduction will give the farthest point for the selected batch index at this iteration.
Reviewed By: bottler, jcjohnson
Differential Revision: D30401803
fbshipit-source-id: 525bd5ae27c4b13b501812cfe62306bb003827d2
Summary:
There has historically been a lot of duplication between the coarse rasterization logic for point clouds and meshes. This diff factors out the shared logic, so coarse rasterization of point clouds and meshes share the same core logic.
Previously the only difference between the coarse rasterization kernels for points and meshes was the logic for checking whether a {point / triangle} intersects a tile in the image. We implement a generic coarse rasterization kernel that takes a set of 2D bounding boxes rather than geometric primitives; we then implement separate kernels that compute 2D bounding boxes for points and triangles.
This change does not affect the Python API at all. It also should not change any rasterization behavior, since this diff is just a refactoring of the existing logic.
I see this diff as the first in a few pieces of rasterizer refactoring. Followup diffs should do the following:
- Add a check for bin overflow in the generic coarse rasterizer kernel: allocate a global scalar to flag bin overflow which kernel worker threads can write to in case they detect bin overflow. The C++ launcher function can then check this flag after the kernel returns and issue a warning to the user in case of overflow.
- As a slightly more involved mechanism, if bin overflow is detected then the coarse kernel can continue running in order to count how many elements fall into each bin, without actually writing out their indices to the coarse output tensor. Then the actual number of entries per bin can be used to re-allocate the output tensor and re-run the coarse rasterization kernel so that bin overflow can be automatically avoided.
- The unification of the coarse and fine rasterization kernels also allows us to insert an extra CUDA kernel prior to coarse rasterization that filters out primitives outside the view frustum. This would be helpful for rendering full scenes (e.g. Matterport data) where only a small piece of the mesh is actually visible at any one time.
Reviewed By: bottler
Differential Revision: D25710361
fbshipit-source-id: 9c9dea512cb339c42adb3c92e7733fedd586ce1b
Summary: Renaming parts of the mesh coarse rasterization and separating the bounding box calculation. All in preparation for sharing code with point rasterization.
Reviewed By: bottler
Differential Revision: D30369112
fbshipit-source-id: 3508c0b1239b355030cfa4038d5f3d6a945ebbf4
Summary: In preparation for sharing coarse rasterization between point clouds and meshes, move the functions to a new file. No code changes.
Reviewed By: bottler
Differential Revision: D30367812
fbshipit-source-id: 9e73835a26c4ac91f5c9f61ff682bc8218e36c6a
Summary: Implement the sample_pdf function from the NeRF project as compiled operators.. The binary search (in searchsorted) is replaced with a low tech linear search, but this is not a problem for the envisaged numbers of bins.
Reviewed By: gkioxari
Differential Revision: D26312535
fbshipit-source-id: df1c3119cd63d944380ed1b2657b6ad81d743e49
Summary:
Implementation of ball query from PointNet++. This function is similar to KNN (find the neighbors in p2 for all points in p1). These are the key differences:
- It will return the **first** K neighbors within a specified radius as opposed to the **closest** K neighbors.
- As all the points in p2 do not need to be considered to find the closest K, the algorithm is much faster than KNN when p2 has a large number of points.
- The neighbors are not sorted
- Due to the radius threshold it is not guaranteed that there will be K neighbors even if there are more than K points in p2.
- The padding value for `idx` is -1 instead of 0.
# Note:
- Some of the code is very similar to KNN so it could be possible to modify the KNN forward kernels to support ball query.
- Some users might want to use kNN with ball query - for this we could provide a wrapper function around the current `knn_points` which enables applying the radius threshold afterwards as an alternative. This could be called `ball_query_knn`.
Reviewed By: jcjohnson
Differential Revision: D30261362
fbshipit-source-id: 66b6a7e0114beff7164daf7eba21546ff41ec450
Summary: An early-return test for gradient calculation did not include the opacity gradient calculation - hence would also return early without calculating gradients even if opacity gradients are required.
Reviewed By: bottler
Differential Revision: D29505684
fbshipit-source-id: 575e820b8f58b19476b2fe3288702806733e840b
Summary:
Fix small face issue for point_mesh distance computation.
The issue lies in the computation of `IsInsideTriangle` which is unstable and non-symmetrical when faces with small areas are given as input. This diff fixes the issue by returning `False` for `IsInsideTriangle` when small faces are given as input.
Reviewed By: bottler
Differential Revision: D29163052
fbshipit-source-id: be297002f26b5e6eded9394fde00553a37406bee
Summary:
- Fix the calculation of the non square NDC range when the H and W are not integer multiples.
- Add test for this case
Reviewed By: gkioxari
Differential Revision: D26613213
fbshipit-source-id: df6763cac602e9f1d516b41b432c4d2cfbaa356d
Summary: One step in finding all the pairs of vertices which share faces is a simple calculation but annoying to parallelize. It was implemented in pure Python. We move it to C++. We still pull the data to the CPU and put the answer back on the device.
Reviewed By: nikhilaravi, gkioxari
Differential Revision: D26073475
fbshipit-source-id: ffbf4e2c347a511ab5084bceff600465812b6a52
Summary:
Fixes mostly related to the "main" build on circleci.
-Avoid error to do with tuple copy from initializer_list which is `explicit` on old compiler.
-Add better reporting to copyright test.
-Move to PackedTensorAccessor64 from the deprecated PackedTensorAccessor
-Avoid some warnings about mismatched comparisons.
The "main" build is the only one that runs the test_build stuff. In that area
-Fix my bad copyright fix D26275931 (3463f418b8) / 965c9c
-Add test that all tutorials are valid json.
Reviewed By: nikhilaravi
Differential Revision: D26366466
fbshipit-source-id: c4ab8b7e6647987069f7cb7144aa6ab7c24bcdac