15 Commits

Author SHA1 Message Date
Eugene Park
2d4d345b6f Improve ball_query() runtime for large-scale cases (#2006)
Summary:
### Overview
The current C++ code for `pytorch3d.ops.ball_query()` performs floating point multiplication for every coordinate of every pair of points (up until the maximum number of neighbor points is reached). This PR modifies the code (for both CPU and CUDA versions) to implement idea presented [here](https://stackoverflow.com/a/3939525): a `D`-cube around the `D`-ball is first constructed, and any point pairs falling outside the cube are skipped, without explicitly computing the squared distances. This change is especially useful for when the dimension `D` and the number of points `P2` are large and the radius is much smaller than the overall volume of space occupied by the point clouds; as much as **~2.5x speedup** (CPU case; ~1.8x speedup in CUDA case) is observed when `D = 10` and `radius = 0.01`. In all benchmark cases, points were uniform randomly distributed inside a unit `D`-cube.

The benchmark code used was different from `tests/benchmarks/bm_ball_query.py` (only the forward part is benchmarked, larger input sizes were used) and is stored in `tests/benchmarks/bm_ball_query_large.py`.

### Average time comparisons

<img width="360" height="270" alt="cpu-03-0 01-avg" src="https://github.com/user-attachments/assets/6cc79893-7921-44af-9366-1766c3caf142" />
<img width="360" height="270" alt="cuda-03-0 01-avg" src="https://github.com/user-attachments/assets/5151647d-0273-40a3-aac6-8b9399ede18a" />
<img width="360" height="270" alt="cpu-03-0 10-avg" src="https://github.com/user-attachments/assets/a87bc150-a5eb-47cd-a4ba-83c2ec81edaf" />
<img width="360" height="270" alt="cuda-03-0 10-avg" src="https://github.com/user-attachments/assets/e3699a9f-dfd3-4dd3-b3c9-619296186d43" />
<img width="360" height="270" alt="cpu-10-0 01-avg" src="https://github.com/user-attachments/assets/5ec8c32d-8e4d-4ced-a94e-1b816b1cb0f8" />
<img width="360" height="270" alt="cuda-10-0 01-avg" src="https://github.com/user-attachments/assets/168a3dfc-777a-4fb3-8023-1ac8c13985b8" />
<img width="360" height="270" alt="cpu-10-0 10-avg" src="https://github.com/user-attachments/assets/43a57fd6-1e01-4c5e-87a9-8ef604ef5fa0" />
<img width="360" height="270" alt="cuda-10-0 10-avg" src="https://github.com/user-attachments/assets/a7c7cc69-f273-493e-95b8-3ba2bb2e32da" />

### Peak time comparisons

<img width="360" height="270" alt="cpu-03-0 01-peak" src="https://github.com/user-attachments/assets/5bbbea3f-ef9b-490d-ab0d-ce551711d74f" />
<img width="360" height="270" alt="cuda-03-0 01-peak" src="https://github.com/user-attachments/assets/30b5ab9b-45cb-4057-b69f-bda6e76bd1dc" />
<img width="360" height="270" alt="cpu-03-0 10-peak" src="https://github.com/user-attachments/assets/db69c333-e5ac-4305-8a86-a26a8a9fe80d" />
<img width="360" height="270" alt="cuda-03-0 10-peak" src="https://github.com/user-attachments/assets/82549656-1f12-409e-8160-dd4c4c9d14f7" />
<img width="360" height="270" alt="cpu-10-0 01-peak" src="https://github.com/user-attachments/assets/d0be8ef1-535e-47bc-b773-b87fad625bf0" />
<img width="360" height="270" alt="cuda-10-0 01-peak" src="https://github.com/user-attachments/assets/e308e66e-ae30-400f-8ad2-015517f6e1af" />
<img width="360" height="270" alt="cpu-10-0 10-peak" src="https://github.com/user-attachments/assets/c9b5bf59-9cc2-465c-ad5d-d4e23bdd138a" />
<img width="360" height="270" alt="cuda-10-0 10-peak" src="https://github.com/user-attachments/assets/311354d4-b488-400c-a1dc-c85a21917aa9" />

### Full benchmark logs

[benchmark-before-change.txt](https://github.com/user-attachments/files/22978300/benchmark-before-change.txt)
[benchmark-after-change.txt](https://github.com/user-attachments/files/22978299/benchmark-after-change.txt)

Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/2006

Reviewed By: shapovalov

Differential Revision: D85356394

Pulled By: bottler

fbshipit-source-id: 9b3ce5fc87bb73d4323cc5b4190fc38ae42f41b2
2025-10-30 05:01:32 -07:00
Thomas Polasek
055ab3a2e3 Convert directory fbcode/vision to use the Ruff Formatter
Summary:
Converts the directory specified to use the Ruff formatter in pyfmt

ruff_dog

If this diff causes merge conflicts when rebasing, please run
`hg status -n -0 --change . -I '**/*.{py,pyi}' | xargs -0 arc pyfmt`
on your diff, and amend any changes before rebasing onto latest.
That should help reduce or eliminate any merge conflicts.

allow-large-files

Reviewed By: bottler

Differential Revision: D66472063

fbshipit-source-id: 35841cb397e4f8e066e2159550d2f56b403b1bef
2024-11-26 02:38:20 -08:00
generatedunixname89002005287564
1f92c4e9d2 vision/fair
Reviewed By: zsol

Differential Revision: D53258682

fbshipit-source-id: 3f006b5f31a2b1ffdc6323d3a3b08ac46c3162ce
2024-01-31 07:43:49 -08:00
Jiali Duan
8b8291830e Marching Cubes cuda extension
Summary:
Torch CUDA extension for Marching Cubes
- MC involving 3 steps:
  - 1st forward pass to collect vertices and occupied state for each voxel
  - Compute compactVoxelArray to skip non-empty voxels
  - 2nd pass to genereate interpolated vertex positions and faces by marching through the grid
- In contrast to existing MC:
   - Bind each interpolated vertex with a global edge_id to address floating-point precision
   - Added deduplication process to remove redundant vertices and faces

Benchmarks (ms):

| N / V(^3)      | python          | C++             |   CUDA   | Speedup |
| 2 / 20          |    12176873  |       24338     |     4363   | 2790x/5x|
| 1 / 100          |     -             |    3070511     |   27126   |    113x    |
| 2 / 100          |     -             |    5968934     |   53129   |    112x    |
| 1 / 256          |     -             |  61278092     | 430900   |    142x    |
| 2 / 256          |     -             |125687930     | 856941   |    146x   |

Reviewed By: kjchalup

Differential Revision: D39644248

fbshipit-source-id: d679c0c79d67b98b235d12296f383d760a00042a
2022-11-15 19:42:04 -08:00
Jiali Duan
0d8608b9f9 Marching Cubes C++ torch extension
Summary:
Torch C++ extension for Marching Cubes

- Add torch C++ extension for marching cubes. Observe a speed up of ~255x-324x speed up (over varying batch sizes and spatial resolutions)

- Add C++ impl in existing unit-tests.

(Note: this ignores all push blocking failures!)

Reviewed By: kjchalup

Differential Revision: D39590638

fbshipit-source-id: e44d2852a24c2c398e5ea9db20f0dfaa1817e457
2022-10-06 11:13:53 -07:00
Gavin Peng
6471893f59 Multithread CPU naive mesh rasterization
Summary:
Threaded the for loop:
```
for (int yi = 0; yi < H; ++yi) {...}
```
in function `RasterizeMeshesNaiveCpu()`.
Chunk size is approx equal.

Reviewed By: bottler

Differential Revision: D40063604

fbshipit-source-id: 09150269405538119b0f1b029892179501421e68
2022-10-06 06:42:58 -07:00
Jiali Duan
03562d87f5 Benchmark Cameras
Summary: Address comments to add benchmarkings for cameras and the new fisheye cameras. The dependency functions in test_cameras have been updated in Diff 1. The following two snapshots show benchmarking results.

Reviewed By: kjchalup

Differential Revision: D38991914

fbshipit-source-id: 51fe9bb7237543e4ee112c9f5068a4cf12a9d482
2022-08-28 11:43:46 -07:00
Jeremy Reizenstein
34f648ede0 move targets
Summary: Move testing targets from pytorch3d/tests/TARGETS to pytorch3d/TARGETS.

Reviewed By: shapovalov

Differential Revision: D36186940

fbshipit-source-id: a4c52c4d99351f885e2b0bf870532d530324039b
2022-05-25 06:16:03 -07:00
Krzysztof Chalupka
7c25d34d22 SplatterPhongShader Benchmarks
Summary:
Benchmarking. We only use num_faces=2 for splatter, because as far as I can see one would never need to use more. Pose optimization and mesh optimization experiments (see next two diffs) showed that Splatter with 2 faces beats Softmax with 50 and 100 faces in terms of accuracy.

Results: We're slower at 64px^2. At 128px and 256px, we're slower than Softmax+50faces, but faster than Softmax+100faces. We're also slower at 10 faces/pix, but expectation as well as results show that more then 2 faces shouldn't be necessary. See also more results in .https://fburl.com/gdoc/ttv7u7hp

Reviewed By: jcjohnson

Differential Revision: D36210575

fbshipit-source-id: c8de28c8a59ce5fe21a47263bd43d2757b15d123
2022-05-24 22:31:12 -07:00
John Reese
bef959c755 formatting changes from black 22.3.0
Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
2022-05-11 19:55:56 -07:00
Jeremy Reizenstein
c2862ff427 use workaround for points_normals
Summary:
Use existing workaround for batched 3x3 symeig because it is faster than torch.symeig.

Added benchmark showing speedup. True = workaround.
```
Benchmark                Avg Time(μs)      Peak Time(μs) Iterations
--------------------------------------------------------------------------------
normals_True_3000            16237           17233             31
normals_True_6000            33028           33391             16
normals_False_3000        18623069        18623069              1
normals_False_6000        36535475        36535475              1
```

Should help https://github.com/facebookresearch/pytorch3d/issues/988

Reviewed By: nikhilaravi

Differential Revision: D33660585

fbshipit-source-id: d1162b277f5d61ed67e367057a61f25e03888dce
2022-01-24 11:41:55 -08:00
Jeremy Reizenstein
3eb4233844 New raysamplers
Summary: New MultinomialRaysampler succeeds GridRaysampler bringing masking and subsampling. Correspondingly, NDCMultinomialRaysampler succeeds NDCGridRaysampler.

Reviewed By: nikhilaravi, shapovalov

Differential Revision: D33256897

fbshipit-source-id: cd80ec6f35b110d1d20a75c62f4e889ba8fa5d45
2022-01-24 10:52:23 -08:00
Jeremy Reizenstein
741777b5b5 More company name & License
Summary: Manual adjustments for license changes.

Reviewed By: patricklabatut

Differential Revision: D33405657

fbshipit-source-id: 8a21735726f3aece9f9164da9e3b272b27db8032
2022-01-04 11:43:38 -08:00
Jeremy Reizenstein
9eeb456e82 Update license for company name
Summary: Update all FB license strings to the new format.

Reviewed By: patricklabatut

Differential Revision: D33403538

fbshipit-source-id: 97a4596c5c888f3c54f44456dc07e718a387a02c
2022-01-04 11:43:38 -08:00
Jeremy Reizenstein
a0e2d2e3c3 move benchmarks to separate directory
Summary: Move benchmarks to a separate directory as tests/ is getting big.

Reviewed By: nikhilaravi

Differential Revision: D32885462

fbshipit-source-id: a832662a494ee341ab77d95493c95b0af0a83f43
2021-12-07 10:26:50 -08:00