pytorch3d

mirror of https://github.com/facebookresearch/pytorch3d.git synced 2026-06-17 20:48:55 +08:00

Author	SHA1	Message	Date
Jeff Daily	b73d735ecf	Port pytorch3d (#2039 ) Summary: Enables building pytorch3d's `_C` extension against a ROCm-built PyTorch and running the test suite on AMD GPUs, including the pulsar subrenderer. Verified on AMD Instinct MI250X (gfx90a, warpSize=64), HIP 7.2, PyTorch 2.13. ## Mechanics `torch.utils.cpp_extension.BuildExtension` auto-hipifies `.cu` sources of a `CUDAExtension` against a HIP-built torch (`cuda_runtime.h → hip/hip_runtime.h`, `cub:: → hipcub::`, `cudaStream_t → hipStream_t`, etc.), so most of the lift is build-system glue and a small number of CUDA intrinsics that don't have HIP equivalents. - `setup.py`: detect ROCm via `torch.version.hip is not None`; treat `ROCM_HOME` as the GPU-toolkit-root analogue of `CUDA_HOME` (without this, `CUDA_HOME is None` silently demoted the build to a CPU-only `CppExtension`); skip `CUB_HOME`, CUDA-13 visibility flags, and `-ccbin=` on ROCm. - `pytorch3d/csrc/pulsar/gpu/commands.h`: CUDA's `_rn`-suffixed FP rounding intrinsics (`__fadd_rn`, `__fdiv_rn`, `__fsqrt_rn`, `__fmaf_rn`, `__frcp_rn`) and `__saturatef` have no HIP equivalents — AMD's GPU ISA has no instruction-level rounding-mode override, so they expand to plain operators / `sqrtf` / `fmaf` / `1.0f/x` / `fmaxf(0,fminf(1,x))` on the `USE_ROCM` arm, which are rounding-mode-equivalent (both round-to-nearest-even). The HIP compiler may fuse `a+b*c` into a single-rounding FMA where CUDA's `_rn` would have prevented it; if FMA-fusion drift ever becomes a numerical issue, add `-ffp-contract=off` to pulsar's HIPCC flags. `__powf` is replaced with `powf`. `atomicAdd_block` has no HIP function-name equivalent — the semantic equivalent is `__hip_atomic_fetch_add(ptr, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_WORKGROUP)` (plain HIP `atomicAdd` is device-scope, strictly stronger than block-scope and forces L2-coherent atomics). - `tests/test_point_mesh_distance.py`: loosen `grad_faces` tolerance in `test_point_face_distance` from `5e-7` to `5e-6` to match the sibling `test_face_point_distance`. The backward kernel uses `atomicAdd` and calls `alertNotDeterministic`; FP add order varies by wavefront width. - The X_t / camera-R/T equality checks in `test_points_alignment.py` and `test_cameras_alignment.py` are now skipped when `n_points <= dim` (resp. `batch_size <= 3` for camera-center alignment in 3D). Mean-centering renders the SVD rank-deficient in those cases, so the rotation around the degenerate axis is non-unique and different BLAS implementations (rocBLAS RDNA vs CDNA, cuBLAS) pick different valid null-space directions. The center-alignment check still runs and verifies the well-defined part of the transformation. Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/2039 Test Plan: All GPU tests pass on both AMD Instinct MI250X (gfx90a, wave64, HIP 7.2) and AMD Radeon Pro W7800 (gfx1100, wave32, HIP 7.2.53211, torch 2.13.0a0). \| Module \| Result \| \|---\|---\| \| knn, ball_query, sample_farthest_points, face_areas_normals \| all pass \| \| rasterize_points, rasterize_meshes, chamfer, packed_to_padded \| all pass \| \| interpolate_face_attributes, blending, compositing, sample_pdf, mesh_normal_consistency \| all pass \| \| point_mesh_distance \| 9/9 pass (with tolerance fix in this PR) \| \| pulsar/test_forward, test_channels, test_depth, test_hands, test_ortho, test_small_spheres \| 10 passed (FB_TEST=1) \| \| test_render_points pulsar tests, test_camera_conversions::test_pulsar_conversion \| 3 passed \| \| points_to_volumes, iou_box3d, marching_cubes \| 20 failures, all env-only \| The 20 env-only failures are `torch.inverse()` on CPU tensors in test reference paths; this verification host's PyTorch was built with `USE_LAPACK: 0` (only `mkl-static` `.a` archives in the conda env; PyTorch's `FindBLAS` looks for `libmkl_intel_lp64.so`). Unrelated to the port — re-verifying with a LAPACK-linked PyTorch is left to upstream. Reviewed By: MichaelRamamonjisoa Differential Revision: D106825690 Pulled By: bottler fbshipit-source-id: f7a9b6028e6fb555f3b8c0f9792e88b818327166	2026-06-01 06:08:12 -07:00
Jeremy Reizenstein	dd068703d1	test fixes Summary: Some random seed changes. Skip multigpu tests when there's only one gpu. This is a better fix for what AI is doing in D80600882. Reviewed By: MichaelRamamonjisoa Differential Revision: D80625966 fbshipit-source-id: ac3952e7144125fd3a05ad6e4e6e5976ae10a8ef	2025-08-27 06:55:50 -07:00
Thomas Polasek	055ab3a2e3	Convert directory fbcode/vision to use the Ruff Formatter Summary: Converts the directory specified to use the Ruff formatter in pyfmt ruff_dog If this diff causes merge conflicts when rebasing, please run `hg status -n -0 --change . -I '*/.{py,pyi}' \| xargs -0 arc pyfmt` on your diff, and amend any changes before rebasing onto latest. That should help reduce or eliminate any merge conflicts. allow-large-files Reviewed By: bottler Differential Revision: D66472063 fbshipit-source-id: 35841cb397e4f8e066e2159550d2f56b403b1bef	2024-11-26 02:38:20 -08:00
Jeremy Reizenstein	34f648ede0	move targets Summary: Move testing targets from pytorch3d/tests/TARGETS to pytorch3d/TARGETS. Reviewed By: shapovalov Differential Revision: D36186940 fbshipit-source-id: a4c52c4d99351f885e2b0bf870532d530324039b	2022-05-25 06:16:03 -07:00
Tim Hatch	34bbb3ad32	apply import merging for fbcode/vision/fair (2 of 2) Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: bottler Differential Revision: D35553814 fbshipit-source-id: be49bdb6a4c25264ff8d4db3a601f18736d17be1	2022-04-13 06:51:33 -07:00
Jeremy Reizenstein	9eeb456e82	Update license for company name Summary: Update all FB license strings to the new format. Reviewed By: patricklabatut Differential Revision: D33403538 fbshipit-source-id: 97a4596c5c888f3c54f44456dc07e718a387a02c	2022-01-04 11:43:38 -08:00
Jeremy Reizenstein	b26f4bc33a	test tolerance loosenings Summary: Increase some test tolerances so that they pass in more situations, and re-enable two tests. Reviewed By: nikhilaravi Differential Revision: D31379717 fbshipit-source-id: 06a25470cc7b6d71cd639d9fd7df500d4b84c079	2021-10-07 10:48:12 -07:00
Patrick Labatut	af93f34834	License lint codebase Summary: License lint codebase Reviewed By: theschnitz Differential Revision: D29001799 fbshipit-source-id: 5c59869911785b0181b1663bbf430bc8b7fb2909	2021-06-22 03:45:27 -07:00
Jeremy Reizenstein	124bb5e391	spelling Summary: Collection of spelling things, mostly in docs / tutorials. Reviewed By: gkioxari Differential Revision: D26101323 fbshipit-source-id: 652f62bc9d71a4ff872efa21141225e43191353a	2021-04-09 09:58:54 -07:00
Rong Rong (AI Infra)	1216b5765a	Extract finding directories for test data Summary: Make common functions for finding directories where test data is found, instead of lots of tests using their own `__file__` while trying to get ./tests/data and the tutorials data. Reviewed By: nikhilaravi Differential Revision: D27633701 fbshipit-source-id: 1467bb6018cea16eba3cab097d713116d51071e9	2021-04-08 20:03:04 -07:00
Jeremy Reizenstein	cd9786e787	Disable random-dependent tests Summary: These two tests fail (with non-small differences) when the seed is changed or if certain environmental changes are made. We disable them pending investigation. A small change to the tolerance at the failing assertion doesn't help. The change in common_testing helps diagnose this. Reviewed By: shapovalov Differential Revision: D26233419 fbshipit-source-id: 357afc1786825256c9bade101fb15707e4dea5ed	2021-02-03 18:24:27 -08:00
Georgia Gkioxari	5fb63b4520	move icp_data.pth to tests/data Summary: Move icp_data.pth to tests/data Reviewed By: bottler Differential Revision: D25012575 fbshipit-source-id: 9252d2eeca9141c82ad3bf9d3e3331a2eab5203b	2020-11-18 14:07:12 -08:00
Nikhila Ravi	0eca74fa5f	lint fixes Summary: Ran the linter. TODO: need to update the linter as per D21353065. Reviewed By: bottler Differential Revision: D21362270 fbshipit-source-id: ad0e781de0a29f565ad25c43bc94a19b1828c020	2020-05-04 09:56:44 -07:00
Jeremy Reizenstein	6207c359b1	spelling and flake Summary: mostly recent lintish things Reviewed By: nikhilaravi Differential Revision: D21089003 fbshipit-source-id: 028733c1d875268f1879e4481da475b7100ba0b6	2020-04-17 10:50:22 -07:00
David Novotny	8abbe22ffb	ICP - point-to-point version Summary: The iterative closest point algorithm - point-to-point version. Output of `bm_iterative_closest_point`: Argument key: `batch_size dim n_points_X n_points_Y use_pointclouds` ``` Benchmark Avg Time(μs) Peak Time(μs) Iterations -------------------------------------------------------------------------------- IterativeClosestPoint_1_3_100_100_False 107569 111323 5 IterativeClosestPoint_1_3_100_1000_False 118972 122306 5 IterativeClosestPoint_1_3_1000_100_False 108576 110978 5 IterativeClosestPoint_1_3_1000_1000_False 331836 333515 2 IterativeClosestPoint_1_20_100_100_False 134387 137842 4 IterativeClosestPoint_1_20_100_1000_False 149218 153405 4 IterativeClosestPoint_1_20_1000_100_False 414248 416595 2 IterativeClosestPoint_1_20_1000_1000_False 374318 374662 2 IterativeClosestPoint_10_3_100_100_False 539852 539852 1 IterativeClosestPoint_10_3_100_1000_False 752784 752784 1 IterativeClosestPoint_10_3_1000_100_False 1070700 1070700 1 IterativeClosestPoint_10_3_1000_1000_False 1164020 1164020 1 IterativeClosestPoint_10_20_100_100_False 374548 377337 2 IterativeClosestPoint_10_20_100_1000_False 472764 476685 2 IterativeClosestPoint_10_20_1000_100_False 1457175 1457175 1 IterativeClosestPoint_10_20_1000_1000_False 2195820 2195820 1 IterativeClosestPoint_1_3_100_100_True 110084 115824 5 IterativeClosestPoint_1_3_100_1000_True 142728 147696 4 IterativeClosestPoint_1_3_1000_100_True 212966 213966 3 IterativeClosestPoint_1_3_1000_1000_True 369130 375114 2 IterativeClosestPoint_10_3_100_100_True 354615 355179 2 IterativeClosestPoint_10_3_100_1000_True 451815 452704 2 IterativeClosestPoint_10_3_1000_100_True 511833 511833 1 IterativeClosestPoint_10_3_1000_1000_True 798453 798453 1 -------------------------------------------------------------------------------- ``` Reviewed By: shapovalov, gkioxari Differential Revision: D19909952 fbshipit-source-id: f77fadc88fb7c53999909d594114b182ee2a3def	2020-04-16 14:02:16 -07:00
Jeremy Reizenstein	b87058c62a	fix recent lint Summary: lint clean again Reviewed By: patricklabatut Differential Revision: D20868775 fbshipit-source-id: ade4301c1012c5c6943186432465215701d635a9	2020-04-06 06:41:00 -07:00
Roman Shapovalov	e37085d999	Weighted Umeyama. Summary: 1. Introduced weights to Umeyama implementation. This will be needed for weighted ePnP but is useful on its own. 2. Refactored to use the same code for the Pointclouds mask and passed weights. 3. Added test cases with random weights. 4. Fixed a bug in tests that calls the function with 0 points (fails randomly in Pytorch 1.3, will be fixed in the next release: https://github.com/pytorch/pytorch/issues/31421 ). Reviewed By: gkioxari Differential Revision: D20070293 fbshipit-source-id: e9f549507ef6dcaa0688a0f17342e6d7a9a4336c	2020-04-03 02:59:11 -07:00
David Novotny	e5b1d6d3a3	Umeyama Summary: Umeyama estimates a rigid motion between two sets of corresponding points. Benchmark output for `bm_points_alignment` ``` Arguments key: [<allow_reflection>_<batch_size>_<dim>_<estimate_scale>_<n_points>_<use_pointclouds>] Benchmark Avg Time(μs) Peak Time(μs) Iterations -------------------------------------------------------------------------------- CorrespodingPointsAlignment_True_1_3_True_100_False 7382 9833 68 CorrespodingPointsAlignment_True_1_3_True_10000_False 8183 10500 62 CorrespodingPointsAlignment_True_1_3_False_100_False 7301 9263 69 CorrespodingPointsAlignment_True_1_3_False_10000_False 7945 9746 64 CorrespodingPointsAlignment_True_1_20_True_100_False 13706 41623 37 CorrespodingPointsAlignment_True_1_20_True_10000_False 11044 33766 46 CorrespodingPointsAlignment_True_1_20_False_100_False 9908 28791 51 CorrespodingPointsAlignment_True_1_20_False_10000_False 9523 18680 53 CorrespodingPointsAlignment_True_10_3_True_100_False 29585 32026 17 CorrespodingPointsAlignment_True_10_3_True_10000_False 29626 36324 18 CorrespodingPointsAlignment_True_10_3_False_100_False 26013 29253 20 CorrespodingPointsAlignment_True_10_3_False_10000_False 25000 33820 20 CorrespodingPointsAlignment_True_10_20_True_100_False 40955 41592 13 CorrespodingPointsAlignment_True_10_20_True_10000_False 42087 42393 12 CorrespodingPointsAlignment_True_10_20_False_100_False 39863 40381 13 CorrespodingPointsAlignment_True_10_20_False_10000_False 40813 41699 13 CorrespodingPointsAlignment_True_100_3_True_100_False 183146 194745 3 CorrespodingPointsAlignment_True_100_3_True_10000_False 213789 231466 3 CorrespodingPointsAlignment_True_100_3_False_100_False 177805 180796 3 CorrespodingPointsAlignment_True_100_3_False_10000_False 184963 185695 3 CorrespodingPointsAlignment_True_100_20_True_100_False 347181 347325 2 CorrespodingPointsAlignment_True_100_20_True_10000_False 363259 363613 2 CorrespodingPointsAlignment_True_100_20_False_100_False 351769 352496 2 CorrespodingPointsAlignment_True_100_20_False_10000_False 375629 379818 2 CorrespodingPointsAlignment_False_1_3_True_100_False 11155 13770 45 CorrespodingPointsAlignment_False_1_3_True_10000_False 10743 13938 47 CorrespodingPointsAlignment_False_1_3_False_100_False 9578 11511 53 CorrespodingPointsAlignment_False_1_3_False_10000_False 9549 11984 53 CorrespodingPointsAlignment_False_1_20_True_100_False 13809 14183 37 CorrespodingPointsAlignment_False_1_20_True_10000_False 14084 15082 36 CorrespodingPointsAlignment_False_1_20_False_100_False 12765 14177 40 CorrespodingPointsAlignment_False_1_20_False_10000_False 12811 13096 40 CorrespodingPointsAlignment_False_10_3_True_100_False 28823 39384 18 CorrespodingPointsAlignment_False_10_3_True_10000_False 27135 27525 19 CorrespodingPointsAlignment_False_10_3_False_100_False 26236 28980 20 CorrespodingPointsAlignment_False_10_3_False_10000_False 42324 45123 12 CorrespodingPointsAlignment_False_10_20_True_100_False 723902 723902 1 CorrespodingPointsAlignment_False_10_20_True_10000_False 220007 252886 3 CorrespodingPointsAlignment_False_10_20_False_100_False 55593 71636 9 CorrespodingPointsAlignment_False_10_20_False_10000_False 44419 71861 12 CorrespodingPointsAlignment_False_100_3_True_100_False 184768 185199 3 CorrespodingPointsAlignment_False_100_3_True_10000_False 198657 213868 3 CorrespodingPointsAlignment_False_100_3_False_100_False 224598 309645 3 CorrespodingPointsAlignment_False_100_3_False_10000_False 197863 202002 3 CorrespodingPointsAlignment_False_100_20_True_100_False 293484 309459 2 CorrespodingPointsAlignment_False_100_20_True_10000_False 327253 366644 2 CorrespodingPointsAlignment_False_100_20_False_100_False 420793 422194 2 CorrespodingPointsAlignment_False_100_20_False_10000_False 462634 485542 2 CorrespodingPointsAlignment_True_1_3_True_100_True 7664 9909 66 CorrespodingPointsAlignment_True_1_3_True_10000_True 7190 8366 70 CorrespodingPointsAlignment_True_1_3_False_100_True 6549 8316 77 CorrespodingPointsAlignment_True_1_3_False_10000_True 6534 7710 77 CorrespodingPointsAlignment_True_10_3_True_100_True 29052 32940 18 CorrespodingPointsAlignment_True_10_3_True_10000_True 30526 33453 17 CorrespodingPointsAlignment_True_10_3_False_100_True 28708 32993 18 CorrespodingPointsAlignment_True_10_3_False_10000_True 30630 35973 17 CorrespodingPointsAlignment_True_100_3_True_100_True 264909 320820 3 CorrespodingPointsAlignment_True_100_3_True_10000_True 310902 322604 2 CorrespodingPointsAlignment_True_100_3_False_100_True 246832 250634 3 CorrespodingPointsAlignment_True_100_3_False_10000_True 276006 289061 2 CorrespodingPointsAlignment_False_1_3_True_100_True 11421 13757 44 CorrespodingPointsAlignment_False_1_3_True_10000_True 11199 12532 45 CorrespodingPointsAlignment_False_1_3_False_100_True 11474 15841 44 CorrespodingPointsAlignment_False_1_3_False_10000_True 10384 13188 49 CorrespodingPointsAlignment_False_10_3_True_100_True 36599 47340 14 CorrespodingPointsAlignment_False_10_3_True_10000_True 40702 50754 13 CorrespodingPointsAlignment_False_10_3_False_100_True 41277 52149 13 CorrespodingPointsAlignment_False_10_3_False_10000_True 34286 37091 15 CorrespodingPointsAlignment_False_100_3_True_100_True 254991 258578 2 CorrespodingPointsAlignment_False_100_3_True_10000_True 257999 261285 2 CorrespodingPointsAlignment_False_100_3_False_100_True 247511 248693 3 CorrespodingPointsAlignment_False_100_3_False_10000_True 251807 263865 3 ``` Reviewed By: gkioxari Differential Revision: D19808389 fbshipit-source-id: 83305a58627d2fc5dcaf3c3015132d8148f28c29	2020-04-02 14:46:51 -07:00

18 Commits