Summary:
Updates to:
- enable cuda kernel launches on any GPU (not just the default)
- cuda and contiguous checks for all kernels
- checks to ensure all tensors are on the same device
- error reporting in the cuda kernels
- cuda tests now run on a random device not just the default
Reviewed By: jcjohnson, gkioxari
Differential Revision: D21215280
fbshipit-source-id: 1bedc9fe6c35e9e920bdc4d78ed12865b1005519
Summary:
We have multiple KNN CUDA implementations. From python, users can currently request a particular implementation via the `version` flag, but they have no way of knowing which implementations can be used for a given problem.
This diff exposes a function `pytorch3d._C.knn_check_version(version, D, K)` that returns whether a particular version can be used.
Reviewed By: nikhilaravi
Differential Revision: D21162573
fbshipit-source-id: 6061960bdcecba454fd920b00036f4e9ff3fdbc0
Summary: Interface and working implementation of ragged KNN. Benchmarks (which aren't ragged) haven't slowed. New benchmark shows that ragged is faster than non-ragged of the same shape.
Reviewed By: jcjohnson
Differential Revision: D20696507
fbshipit-source-id: 21b80f71343a3475c8d3ee0ce2680f92f0fae4de
Summary: Run linter after recent changes. Fix long comment in knn.h which clang-format has reflowed badly. Add crude test that code doesn't call deprecated `.type()` or `.data()`.
Reviewed By: nikhilaravi
Differential Revision: D20692935
fbshipit-source-id: 28ce0308adae79a870cb41a810b7cf8744f41ab8
Summary:
Implements K-Nearest Neighbors with C++ and CUDA versions.
KNN in CUDA is highly nontrivial. I've implemented a few different versions of the kernel, and we heuristically dispatch to different kernels based on the problem size. Some of the kernels rely on template specialization on either D or K, so we use template metaprogramming to compile specialized versions for ranges of D and K.
These kernels are up to 3x faster than our existing 1-nearest-neighbor kernels, so we should also consider swapping out `nn_points_idx` to use these kernels in the backend.
I've been working mostly on the CUDA kernels, and haven't converged on the correct Python API.
I still want to benchmark against FAISS to see how far away we are from their performance.
Reviewed By: bottler
Differential Revision: D19729286
fbshipit-source-id: 608ffbb7030c21fe4008f330522f4890f0c3c21a