Port pytorch3d (#2039)

Summary: Enables building pytorch3d's `_C` extension against a ROCm-built PyTorch and running the test suite on AMD GPUs, including the pulsar subrenderer. Verified on AMD Instinct MI250X (gfx90a, warpSize=64), HIP 7.2, PyTorch 2.13. ## Mechanics `torch.utils.cpp_extension.BuildExtension` auto-hipifies `.cu` sources of a `CUDAExtension` against a HIP-built torch (`cuda_runtime.h → hip/hip_runtime.h`, `cub:: → hipcub::`, `cudaStream_t → hipStream_t`, etc.), so most of the lift is build-system glue and a small number of CUDA intrinsics that don't have HIP equivalents. - `setup.py`: detect ROCm via `torch.version.hip is not None`; treat `ROCM_HOME` as the GPU-toolkit-root analogue of `CUDA_HOME` (without this, `CUDA_HOME is None` silently demoted the build to a CPU-only `CppExtension`); skip `CUB_HOME`, CUDA-13 visibility flags, and `-ccbin=` on ROCm. - `pytorch3d/csrc/pulsar/gpu/commands.h`: CUDA's `_rn`-suffixed FP rounding intrinsics (`__fadd_rn`, `__fdiv_rn`, `__fsqrt_rn`, `__fmaf_rn`, `__frcp_rn`) and `__saturatef` have no HIP equivalents — AMD's GPU ISA has no instruction-level rounding-mode override, so they expand to plain operators / `sqrtf` / `fmaf` / `1.0f/x` / `fmaxf(0,fminf(1,x))` on the `USE_ROCM` arm, which are rounding-mode-equivalent (both round-to-nearest-even). The HIP compiler may fuse `a+b*c` into a single-rounding FMA where CUDA's `_rn` would have prevented it; if FMA-fusion drift ever becomes a numerical issue, add `-ffp-contract=off` to pulsar's HIPCC flags. `__powf` is replaced with `powf`. `atomicAdd_block` has no HIP function-name equivalent — the semantic equivalent is `__hip_atomic_fetch_add(ptr, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_WORKGROUP)` (plain HIP `atomicAdd` is device-scope, strictly stronger than block-scope and forces L2-coherent atomics). - `tests/test_point_mesh_distance.py`: loosen `grad_faces` tolerance in `test_point_face_distance` from `5e-7` to `5e-6` to match the sibling `test_face_point_distance`. The backward kernel uses `atomicAdd` and calls `alertNotDeterministic`; FP add order varies by wavefront width. - The X_t / camera-R/T equality checks in `test_points_alignment.py` and `test_cameras_alignment.py` are now skipped when `n_points <= dim` (resp. `batch_size <= 3` for camera-center alignment in 3D). Mean-centering renders the SVD rank-deficient in those cases, so the rotation around the degenerate axis is non-unique and different BLAS implementations (rocBLAS RDNA vs CDNA, cuBLAS) pick different valid null-space directions. The center-alignment check still runs and verifies the well-defined part of the transformation. Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/2039 Test Plan: All GPU tests pass on both AMD Instinct MI250X (gfx90a, wave64, HIP 7.2) and AMD Radeon Pro W7800 (gfx1100, wave32, HIP 7.2.53211, torch 2.13.0a0). | Module | Result | |---|---| | knn, ball_query, sample_farthest_points, face_areas_normals | all pass | | rasterize_points, rasterize_meshes, chamfer, packed_to_padded | all pass | | interpolate_face_attributes, blending, compositing, sample_pdf, mesh_normal_consistency | all pass | | point_mesh_distance | 9/9 pass (with tolerance fix in this PR) | | pulsar/test_forward, test_channels, test_depth, test_hands, test_ortho, test_small_spheres | 10 passed (FB_TEST=1) | | test_render_points pulsar tests, test_camera_conversions::test_pulsar_conversion | 3 passed | | points_to_volumes, iou_box3d, marching_cubes | 20 failures, all env-only | The 20 env-only failures are `torch.inverse()` on CPU tensors in test reference paths; this verification host's PyTorch was built with `USE_LAPACK: 0` (only `mkl-static` `.a` archives in the conda env; PyTorch's `FindBLAS` looks for `libmkl_intel_lp64.so`). Unrelated to the port — re-verifying with a LAPACK-linked PyTorch is left to upstream. Reviewed By: MichaelRamamonjisoa Differential Revision: D106825690 Pulled By: bottler fbshipit-source-id: f7a9b6028e6fb555f3b8c0f9792e88b818327166
2026-08-01 13:36:08 +08:00 · 2026-06-01 06:08:12 -07:00
parent c307c64c70
commit b73d735ecf
9 changed files with 171 additions and 57 deletions
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -21,3 +21,45 @@ jobs:
      run: |-
        conda create --name env --yes --quiet conda-build
        conda run --no-capture-output --name env python3 ./packaging/build_conda.py --use-conda-cuda
+
+  # Build-only verification for the ROCm/HIP code paths. Runs in an AMD ROCm
+  # dev container on a CPU-only GitHub runner; we don't need an AMD GPU just
+  # to compile, and not running tests keeps the CI cost low. Catches build
+  # regressions in the ROCm code paths (USE_ROCM guards, hipify-touched sources,
+  # the pulsar HIP intrinsic replacements, etc.).
+  linux_rocm_build:
+    runs-on: ubuntu-latest
+    container:
+      # `-complete` tag bundles the full ROCm math stack (rocThrust, hipCUB,
+      # rocPRIM, ...). The plain `7.2.3` tag is HIP-runtime-only and fails to
+      # find <thrust/complex.h> when including PyTorch headers.
+      image: rocm/dev-ubuntu-22.04:7.2.3-complete
+    env:
+      PYTORCH_VERSION: "2.11.0"
+      ROCM_INDEX: "rocm7.2"
+    steps:
+    - uses: actions/checkout@v4
+    - name: Install Python and torch+rocm
+      run: |-
+        apt-get update
+        apt-get install -y --no-install-recommends python3 python3-dev python3-pip git
+        python3 -m pip install --upgrade pip
+        python3 -m pip install --index-url https://download.pytorch.org/whl/${ROCM_INDEX} torch==${PYTORCH_VERSION}
+    - name: Verify torch is ROCm-built
+      run: |-
+        python3 -c "import torch; assert torch.version.hip is not None, 'torch is not HIP-built'; print('torch.version.hip:', torch.version.hip)"
+    - name: Build pytorch3d _C extension (build only, no tests)
+      env:
+        # CPU-only runner: torch.cuda.is_available() is False, so force the
+        # CUDAExtension path. ROCM_HOME is auto-detected from /opt/rocm in
+        # the rocm/dev-ubuntu container.
+        FORCE_CUDA: "1"
+      run: |-
+        python3 -m pip install --no-build-isolation -v .
+    - name: Smoke import
+      # cd out of the checkout root so the source-tree pytorch3d/ directory
+      # (which has no _C.so since the build doesn't install in-place) doesn't
+      # shadow the site-packages install via sys.path[0] for `python -c`.
+      run: |-
+        cd /tmp
+        python3 -c "import torch; from pytorch3d import _C; print('PulsarRenderer:', hasattr(_C, 'PulsarRenderer')); print('n_symbols:', len([s for s in dir(_C) if not s.startswith('_')]))"