Summary:
Update the cuda kernels to:
- remove contiguous checks for the grad tensors and for cpu functions which use accessors
- for cuda implementations call `.contiguous()` on all tensors in the host function before invoking the kernel
Reviewed By: gkioxari
Differential Revision: D21598008
fbshipit-source-id: 9b97bda4582fd4269c8a00999874d4552a1aea2d
Summary:
We have multiple KNN CUDA implementations. From python, users can currently request a particular implementation via the `version` flag, but they have no way of knowing which implementations can be used for a given problem.
This diff exposes a function `pytorch3d._C.knn_check_version(version, D, K)` that returns whether a particular version can be used.
Reviewed By: nikhilaravi
Differential Revision: D21162573
fbshipit-source-id: 6061960bdcecba454fd920b00036f4e9ff3fdbc0
Summary: Interface and working implementation of ragged KNN. Benchmarks (which aren't ragged) haven't slowed. New benchmark shows that ragged is faster than non-ragged of the same shape.
Reviewed By: jcjohnson
Differential Revision: D20696507
fbshipit-source-id: 21b80f71343a3475c8d3ee0ce2680f92f0fae4de
Summary: Run linter after recent changes. Fix long comment in knn.h which clang-format has reflowed badly. Add crude test that code doesn't call deprecated `.type()` or `.data()`.
Reviewed By: nikhilaravi
Differential Revision: D20692935
fbshipit-source-id: 28ce0308adae79a870cb41a810b7cf8744f41ab8
Summary:
Implements K-Nearest Neighbors with C++ and CUDA versions.
KNN in CUDA is highly nontrivial. I've implemented a few different versions of the kernel, and we heuristically dispatch to different kernels based on the problem size. Some of the kernels rely on template specialization on either D or K, so we use template metaprogramming to compile specialized versions for ranges of D and K.
These kernels are up to 3x faster than our existing 1-nearest-neighbor kernels, so we should also consider swapping out `nn_points_idx` to use these kernels in the backend.
I've been working mostly on the CUDA kernels, and haven't converged on the correct Python API.
I still want to benchmark against FAISS to see how far away we are from their performance.
Reviewed By: bottler
Differential Revision: D19729286
fbshipit-source-id: 608ffbb7030c21fe4008f330522f4890f0c3c21a