Summary:
Added `atol=1e-4` tolerance parameter to the `assertClose` calls on lines 682 and 683 in the `test_inverse` method of `TestTranslate` class.
This is a retry of D90225548
Reviewed By: sgrigory
Differential Revision: D90682979
fbshipit-source-id: ac13f000174dd9962326296e1c3116d0d39c7751
Summary:
## LLM-generated Summary:
Replaces self.assertTrue(torch.allclose(...)) with self.assertClose(...) throughout fbcode/vision/fair/pytorch3d/tests/test_transforms.py. This standardizes numeric closeness assertions for clearer failures and consistency while preserving tolerances and test behavior.
---
Session: DEV34970678
Reviewed By: shapovalov
Differential Revision: D90251428
fbshipit-source-id: cdae842be82f0ba548802e6977be272134e8508c
Summary:
CUDA 13.0 introduced breaking changes that cause build failures in pytorch3d:
**1. Symbol Visibility Changes (pulsar)**
- NVCC now forces `__global__` functions to have hidden ELF visibility by default
- `__global__` function template stubs now have internal linkage
**Fix:** Added NVCC flags (`--device-entity-has-hidden-visibility=false` and `-static-global-template-stub=false`) for fbcode builds with CUDA 13.0+.
**2. cuCtxCreate API Change (pycuda)**
- CUDA 13.0 changed `cuCtxCreate` from 3 to 4 arguments
- pycuda 2022.2 (current default) uses the old signature and fails to compile
- pycuda 2025.1.2 (D83501913) includes the CUDA 13.0 fix
**Fix:** Added CUDA 13.0 constraint to pycuda alias to auto-select pycuda 2025.1.2.
**NCCL Compatibility Note:**
- Current stable NCCL (2.25) is NOT compatible with CUDA 13.0 (`cudaTypedefs.h` removed)
- NCCL 2.27+ works with CUDA 13.0 and will become stable in early January 2026 (per HPC Comms team)
- Until then, CUDA 13.0 builds require `-c hpc_comms.use_nccl=2.27`
References:
- GitHub issue: https://github.com/facebookresearch/pytorch3d/issues/2011
- NVIDIA blog: https://developer.nvidia.com/blog/cuda-c-compiler-updates-impacting-elf-visibility-and-linkage/
- FBGEMM_GPU fix: D86474263
- pycuda 2025.1.2 buckification: D83501913
Reviewed By: bottler
Differential Revision: D88816596
fbshipit-source-id: 1ba666dab8c0e06d1286b8d5bc5d84cfc55c86e6
Summary: When using `sample_farthest_points` with `lengths`, it throws an error because of the device mismatch between `lengths` and `torch.rand(lengths.size())` on GPU.
Reviewed By: bottler
Differential Revision: D82378997
fbshipit-source-id: 8e929256177d543d1dd1249e8488f70e03e4101f
Summary: Some random seed changes. Skip multigpu tests when there's only one gpu. This is a better fix for what AI is doing in D80600882.
Reviewed By: MichaelRamamonjisoa
Differential Revision: D80625966
fbshipit-source-id: ac3952e7144125fd3a05ad6e4e6e5976ae10a8ef
Summary:
Optimizing sample_farthest_poinst by reducing CPU/GPU sync:
1. replacing iterative randint for starting indexes for 1 function call, if length is constant
2. Avoid sync in fetching maxumum of sample points, if we sample the same amount
3. Initializing 1 tensor for samples and indixes
compare
https://fburl.com/mlhub/7wk0xi98
Before
{F1980383703}
after
{F1980383707}
Histogram match pretty closely
{F1980464338}
Reviewed By: bottler
Differential Revision: D78731869
fbshipit-source-id: 060528ae7a1e0fbbd005d129c151eaf9405841de
Summary:
Fixes hard crashes (bus errors) when using MPS device (Apple Silicon) by implementing CPU checks throughout files in csrc subdirectories to check if on same mesh on a CPU device.
Note that this is the fourth and ultimate part of a larger change through multiple files & directories.
Reviewed By: bottler
Differential Revision: D77698176
fbshipit-source-id: 5bc9e3c5cea61afd486aed7396f390d92775ec6d
Summary:
Adds CHECK_CPU macros that checks if a tensor is on the CPU device throughout csrc directories and subdir up to `pulsar`.
Note that this is the third part of a larger change, and to keep diffs better organized, subsequent diffs will update the remaining directories.
Reviewed By: bottler
Differential Revision: D77696998
fbshipit-source-id: 470ca65b23d9965483b5bdd30c712da8e1131787
Summary:
Adds CHECK_CPU macros that checks if a tensor is on the CPU device throughout csrc directories up to `marching_cubes`. Directories updated include those in `gather_scatter`, `interp_face_attrs`, `iou_box3d`, `knn`, and `marching_cubes`.
Note that this is the second part of a larger change, and to keep diffs better organized, subsequent diffs will update the remaining directories.
Reviewed By: bottler
Differential Revision: D77558550
fbshipit-source-id: 762a0fe88548dc8d0901b198a11c40d0c36e173f
Summary:
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/1986
Adds device checks to prevent crashes on unsupported devices in PyTorch3D. Updates the `pytorch3d_cutils.h` file to include new macro CHECK_CPU that checks if a tensor is on the CPU device. This macro is then used in the directories from `ball_query` to `face_area_normals` to ensure that tensors are not on unsupported devices like MPS.
Note that this is the first part of a larger change, and to keep diffs better organized, subsequent diffs will update the remaining directories.
Reviewed By: bottler
Differential Revision: D77473296
fbshipit-source-id: 13dc84620dee667bddebad1dade2d2cb5a59c737
Summary:
The current implementation of `matrix_to_quaternion` and `_sqrt_positive_part` uses boolean indexing, which can slow down performance and cause incompatibility with `torch.compile` unless `torch._dynamo.config.capture_dynamic_output_shape_ops` is set to `True`.
To enhance performance and compatibility, I recommend using `torch.gather` to select the best-conditioned quaternions and `F.relu` instead of `x>0` (bottler's suggestion)
For a detailed comparison of the implementation differences when using `torch.compile`, please refer to my Bento notebook
N7438339.
Reviewed By: bottler
Differential Revision: D77176230
fbshipit-source-id: 9a6a2e0015b5865056297d5f45badc3c425b93ce
Summary: Resolved self-assignment warnings in the `renderer.forward.device.h` file by removing redundant assignments of the `stream` variable to itself in `cub::DeviceSelect::Flagged` function calls. This change eliminates compiler errors and ensures cleaner, more efficient code execution.
Reviewed By: bottler
Differential Revision: D76554140
fbshipit-source-id: 28eae0186246f51a8ac8002644f184349aa49560
Summary:
I could not access https://github.com/NVlabs/cub/issues/172 to understand whether IntWrapper was still necessary but the comment is from 5 years ago and causes problems for the ROCm build.
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/1964
Reviewed By: MichaelRamamonjisoa
Differential Revision: D71937895
Pulled By: bottler
fbshipit-source-id: 5e0351e1bd8599b670436cd3464796eca33156f6
Summary:
CUDA kernel variables matching the type `(thread|block|grid).(Idx|Dim).(x|y|z)` [have the data type `uint`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#built-in-variables).
Many programmers mistakenly use implicit casts to turn these data types into `int`. In fact, the [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/) it self is inconsistent and incorrect in its use of data types in programming examples.
The result of these implicit casts is that our kernels may give unexpected results when exposed to large datasets, i.e., those exceeding >~2B items.
While we now have linters in place to prevent simple mistakes (D71236150), our codebase has many problematic instances. This diff fixes some of them.
Reviewed By: dtolnay
Differential Revision: D71355356
fbshipit-source-id: cea44891416d9efd2f466d6c45df4e36008fa036
Summary:
A continuation of https://github.com/facebookresearch/pytorch3d/issues/1948 -- this commit fixes a small numerical issue with `matrix_to_axis_angle(..., fast=True)` near `pi`.
bottler feel free to check this out, it's a single-line change.
Pull Request resolved: https://github.com/facebookresearch/pytorch3d/pull/1953
Reviewed By: MichaelRamamonjisoa
Differential Revision: D70088251
Pulled By: bottler
fbshipit-source-id: 54cc7f946283db700cec2cd5575cf918456b7f32