Summary:
Enables building pytorch3d's `_C` extension against a ROCm-built PyTorch and running the test suite on AMD GPUs, including the pulsar subrenderer. Verified on AMD Instinct MI250X (gfx90a, warpSize=64), HIP 7.2, PyTorch 2.13.
## Mechanics
`torch.utils.cpp_extension.BuildExtension` auto-hipifies `.cu` sources of a `CUDAExtension` against a HIP-built torch (`cuda_runtime.h → hip/hip_runtime.h`, `cub:: → hipcub::`, `cudaStream_t → hipStream_t`, etc.), so most of the lift is build-system glue and a small number of CUDA intrinsics that don't have HIP equivalents.
- `setup.py`: detect ROCm via `torch.version.hip is not None`; treat `ROCM_HOME` as the GPU-toolkit-root analogue of `CUDA_HOME` (without this, `CUDA_HOME is None` silently demoted the build to a CPU-only `CppExtension`); skip `CUB_HOME`, CUDA-13 visibility flags, and `-ccbin=` on ROCm.
- `pytorch3d/csrc/pulsar/gpu/commands.h`: CUDA's `_rn`-suffixed FP rounding intrinsics (`__fadd_rn`, `__fdiv_rn`, `__fsqrt_rn`, `__fmaf_rn`, `__frcp_rn`) and `__saturatef` have no HIP equivalents — AMD's GPU ISA has no instruction-level rounding-mode override, so they expand to plain operators / `sqrtf` / `fmaf` / `1.0f/x` / `fmaxf(0,fminf(1,x))` on the `USE_ROCM` arm, which are rounding-mode-equivalent (both round-to-nearest-even). The HIP compiler may fuse `a+b*c` into a single-rounding FMA where CUDA's `_rn` would have prevented it; if FMA-fusion drift ever becomes a numerical issue, add `-ffp-contract=off` to pulsar's HIPCC flags. `__powf` is replaced with `powf`. `atomicAdd_block` has no HIP function-name equivalent — the semantic equivalent is `__hip_atomic_fetch_add(ptr, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_WORKGROUP)` (plain HIP `atomicAdd` is device-scope, strictly stronger than block-scope and forces L2-coherent atomics).
- `tests/test_point_mesh_distance.py`: loosen `grad_faces` tolerance in `test_point_face_distance` from `5e-7` to `5e-6` to match the sibling `test_face_point_distance`. The backward kernel uses `atomicAdd` and calls `alertNotDeterministic`; FP add order varies by wavefront width.
- The X_t / camera-R/T equality checks in `test_points_alignment.py` and `test_cameras_alignment.py` are now skipped when `n_points <= dim` (resp. `batch_size <= 3` for camera-center alignment in 3D). Mean-centering renders the SVD rank-deficient in those cases, so the rotation around the degenerate axis is non-unique and different BLAS implementations (rocBLAS RDNA vs CDNA, cuBLAS) pick different valid null-space directions. The center-alignment check still runs and verifies the well-defined part of the transformation.
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/2039
Test Plan:
All GPU tests pass on both AMD Instinct MI250X (gfx90a, wave64, HIP 7.2) and AMD Radeon Pro W7800 (gfx1100, wave32, HIP 7.2.53211, torch 2.13.0a0).
| Module | Result |
|---|---|
| knn, ball_query, sample_farthest_points, face_areas_normals | all pass |
| rasterize_points, rasterize_meshes, chamfer, packed_to_padded | all pass |
| interpolate_face_attributes, blending, compositing, sample_pdf, mesh_normal_consistency | all pass |
| point_mesh_distance | 9/9 pass (with tolerance fix in this PR) |
| pulsar/test_forward, test_channels, test_depth, test_hands, test_ortho, test_small_spheres | 10 passed (FB_TEST=1) |
| test_render_points pulsar tests, test_camera_conversions::test_pulsar_conversion | 3 passed |
| points_to_volumes, iou_box3d, marching_cubes | 20 failures, all env-only |
The 20 env-only failures are `torch.inverse()` on CPU tensors in test reference paths; this verification host's PyTorch was built with `USE_LAPACK: 0` (only `mkl-static` `.a` archives in the conda env; PyTorch's `FindBLAS` looks for `libmkl_intel_lp64.so`). Unrelated to the port — re-verifying with a LAPACK-linked PyTorch is left to upstream.
Reviewed By: MichaelRamamonjisoa
Differential Revision: D106825690
Pulled By: bottler
fbshipit-source-id: f7a9b6028e6fb555f3b8c0f9792e88b818327166
Summary: Some random seed changes. Skip multigpu tests when there's only one gpu. This is a better fix for what AI is doing in D80600882.
Reviewed By: MichaelRamamonjisoa
Differential Revision: D80625966
fbshipit-source-id: ac3952e7144125fd3a05ad6e4e6e5976ae10a8ef
Summary:
Converts the directory specified to use the Ruff formatter in pyfmt
ruff_dog
If this diff causes merge conflicts when rebasing, please run
`hg status -n -0 --change . -I '**/*.{py,pyi}' | xargs -0 arc pyfmt`
on your diff, and amend any changes before rebasing onto latest.
That should help reduce or eliminate any merge conflicts.
allow-large-files
Reviewed By: bottler
Differential Revision: D66472063
fbshipit-source-id: 35841cb397e4f8e066e2159550d2f56b403b1bef
Summary:
Applies new import merging and sorting from µsort v1.0.
When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.
Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.
For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting
Reviewed By: bottler
Differential Revision: D35553814
fbshipit-source-id: be49bdb6a4c25264ff8d4db3a601f18736d17be1
Summary: Update all FB license strings to the new format.
Reviewed By: patricklabatut
Differential Revision: D33403538
fbshipit-source-id: 97a4596c5c888f3c54f44456dc07e718a387a02c
Summary: Increase some test tolerances so that they pass in more situations, and re-enable two tests.
Reviewed By: nikhilaravi
Differential Revision: D31379717
fbshipit-source-id: 06a25470cc7b6d71cd639d9fd7df500d4b84c079
Summary: Make common functions for finding directories where test data is found, instead of lots of tests using their own `__file__` while trying to get ./tests/data and the tutorials data.
Reviewed By: nikhilaravi
Differential Revision: D27633701
fbshipit-source-id: 1467bb6018cea16eba3cab097d713116d51071e9
Summary:
These two tests fail (with non-small differences) when the seed is changed or if certain environmental changes are made. We disable them pending investigation.
A small change to the tolerance at the failing assertion doesn't help. The change in common_testing helps diagnose this.
Reviewed By: shapovalov
Differential Revision: D26233419
fbshipit-source-id: 357afc1786825256c9bade101fb15707e4dea5ed
Summary:
Ran the linter.
TODO: need to update the linter as per D21353065.
Reviewed By: bottler
Differential Revision: D21362270
fbshipit-source-id: ad0e781de0a29f565ad25c43bc94a19b1828c020
Summary:
1. Introduced weights to Umeyama implementation. This will be needed for weighted ePnP but is useful on its own.
2. Refactored to use the same code for the Pointclouds mask and passed weights.
3. Added test cases with random weights.
4. Fixed a bug in tests that calls the function with 0 points (fails randomly in Pytorch 1.3, will be fixed in the next release: https://github.com/pytorch/pytorch/issues/31421 ).
Reviewed By: gkioxari
Differential Revision: D20070293
fbshipit-source-id: e9f549507ef6dcaa0688a0f17342e6d7a9a4336c