Summary:
Enables building pytorch3d's `_C` extension against a ROCm-built PyTorch and running the test suite on AMD GPUs, including the pulsar subrenderer. Verified on AMD Instinct MI250X (gfx90a, warpSize=64), HIP 7.2, PyTorch 2.13.
## Mechanics
`torch.utils.cpp_extension.BuildExtension` auto-hipifies `.cu` sources of a `CUDAExtension` against a HIP-built torch (`cuda_runtime.h → hip/hip_runtime.h`, `cub:: → hipcub::`, `cudaStream_t → hipStream_t`, etc.), so most of the lift is build-system glue and a small number of CUDA intrinsics that don't have HIP equivalents.
- `setup.py`: detect ROCm via `torch.version.hip is not None`; treat `ROCM_HOME` as the GPU-toolkit-root analogue of `CUDA_HOME` (without this, `CUDA_HOME is None` silently demoted the build to a CPU-only `CppExtension`); skip `CUB_HOME`, CUDA-13 visibility flags, and `-ccbin=` on ROCm.
- `pytorch3d/csrc/pulsar/gpu/commands.h`: CUDA's `_rn`-suffixed FP rounding intrinsics (`__fadd_rn`, `__fdiv_rn`, `__fsqrt_rn`, `__fmaf_rn`, `__frcp_rn`) and `__saturatef` have no HIP equivalents — AMD's GPU ISA has no instruction-level rounding-mode override, so they expand to plain operators / `sqrtf` / `fmaf` / `1.0f/x` / `fmaxf(0,fminf(1,x))` on the `USE_ROCM` arm, which are rounding-mode-equivalent (both round-to-nearest-even). The HIP compiler may fuse `a+b*c` into a single-rounding FMA where CUDA's `_rn` would have prevented it; if FMA-fusion drift ever becomes a numerical issue, add `-ffp-contract=off` to pulsar's HIPCC flags. `__powf` is replaced with `powf`. `atomicAdd_block` has no HIP function-name equivalent — the semantic equivalent is `__hip_atomic_fetch_add(ptr, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_WORKGROUP)` (plain HIP `atomicAdd` is device-scope, strictly stronger than block-scope and forces L2-coherent atomics).
- `tests/test_point_mesh_distance.py`: loosen `grad_faces` tolerance in `test_point_face_distance` from `5e-7` to `5e-6` to match the sibling `test_face_point_distance`. The backward kernel uses `atomicAdd` and calls `alertNotDeterministic`; FP add order varies by wavefront width.
- The X_t / camera-R/T equality checks in `test_points_alignment.py` and `test_cameras_alignment.py` are now skipped when `n_points <= dim` (resp. `batch_size <= 3` for camera-center alignment in 3D). Mean-centering renders the SVD rank-deficient in those cases, so the rotation around the degenerate axis is non-unique and different BLAS implementations (rocBLAS RDNA vs CDNA, cuBLAS) pick different valid null-space directions. The center-alignment check still runs and verifies the well-defined part of the transformation.
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/2039
Test Plan:
All GPU tests pass on both AMD Instinct MI250X (gfx90a, wave64, HIP 7.2) and AMD Radeon Pro W7800 (gfx1100, wave32, HIP 7.2.53211, torch 2.13.0a0).
| Module | Result |
|---|---|
| knn, ball_query, sample_farthest_points, face_areas_normals | all pass |
| rasterize_points, rasterize_meshes, chamfer, packed_to_padded | all pass |
| interpolate_face_attributes, blending, compositing, sample_pdf, mesh_normal_consistency | all pass |
| point_mesh_distance | 9/9 pass (with tolerance fix in this PR) |
| pulsar/test_forward, test_channels, test_depth, test_hands, test_ortho, test_small_spheres | 10 passed (FB_TEST=1) |
| test_render_points pulsar tests, test_camera_conversions::test_pulsar_conversion | 3 passed |
| points_to_volumes, iou_box3d, marching_cubes | 20 failures, all env-only |
The 20 env-only failures are `torch.inverse()` on CPU tensors in test reference paths; this verification host's PyTorch was built with `USE_LAPACK: 0` (only `mkl-static` `.a` archives in the conda env; PyTorch's `FindBLAS` looks for `libmkl_intel_lp64.so`). Unrelated to the port — re-verifying with a LAPACK-linked PyTorch is left to upstream.
Reviewed By: MichaelRamamonjisoa
Differential Revision: D106825690
Pulled By: bottler
fbshipit-source-id: f7a9b6028e6fb555f3b8c0f9792e88b818327166
Summary:
Applies new import merging and sorting from µsort v1.0.
When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.
Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.
For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting
Reviewed By: bottler
Differential Revision: D35553814
fbshipit-source-id: be49bdb6a4c25264ff8d4db3a601f18736d17be1
Summary:
1. changed IsInsideTriangle in geometry_utils to take in min_triangle_area parameter instead of hardcoded value
2. updated point_mesh_cpu.cpp and point_mesh_cuda.[h/cu] to adapt to changes in geometry_utils function signatures
3. updated point_mesh_distance.py and test_point_mesh_distance.py to modify _C. calls
Reviewed By: bottler
Differential Revision: D34459764
fbshipit-source-id: 0549e78713c6d68f03d85fb597a13dd88e09b686
Summary: Small fix by adjusting the area `eps` to account for really small faces when computing point to face distances
Reviewed By: bottler
Differential Revision: D34331336
fbshipit-source-id: 51c4888ea46fefa4e31d5b0bb494a9f9d77813cd
Summary: Update all FB license strings to the new format.
Reviewed By: patricklabatut
Differential Revision: D33403538
fbshipit-source-id: 97a4596c5c888f3c54f44456dc07e718a387a02c
Summary:
Fix small face issue for point_mesh distance computation.
The issue lies in the computation of `IsInsideTriangle` which is unstable and non-symmetrical when faces with small areas are given as input. This diff fixes the issue by returning `False` for `IsInsideTriangle` when small faces are given as input.
Reviewed By: bottler
Differential Revision: D29163052
fbshipit-source-id: be297002f26b5e6eded9394fde00553a37406bee
Summary:
point_mesh functions were missing CPU implementations.
The indices returned are not always matching, possibly due to numerical instability.
Reviewed By: gkioxari
Differential Revision: D21594264
fbshipit-source-id: 3016930e2a9a0f3cd8b3ac4c94a92c9411c0989d
Summary:
Pytorch seems to be becoming stricter about integer tensors of shape `(1,)` on GPU and not allowing them to be used as `int`s. For example the following no longer works on pytorch master,
foo = torch.tensor([3, 5, 3], device="cuda:0")
torch.arange(10) + foo[0]
because this is the sum of tensors on different devices.
Here fix tests which recently broke because of this.
Reviewed By: nikhilaravi
Differential Revision: D21929745
fbshipit-source-id: 25374f70468d1c895372766f1a9dd61df0833957
Summary:
Ran the linter.
TODO: need to update the linter as per D21353065.
Reviewed By: bottler
Differential Revision: D21362270
fbshipit-source-id: ad0e781de0a29f565ad25c43bc94a19b1828c020
Summary:
Updates to:
- enable cuda kernel launches on any GPU (not just the default)
- cuda and contiguous checks for all kernels
- checks to ensure all tensors are on the same device
- error reporting in the cuda kernels
- cuda tests now run on a random device not just the default
Reviewed By: jcjohnson, gkioxari
Differential Revision: D21215280
fbshipit-source-id: 1bedc9fe6c35e9e920bdc4d78ed12865b1005519