Nikhila Ravi bd04ffaf77 Farthest point sampling CUDA
Summary:
CUDA implementation of farthest point sampling algorithm.

## Visual comparison

Compared to random sampling, farthest point sampling gives better coverage of the shape.

{F658631262}

## Reduction

Parallelized block reduction to find the max value at each iteration happens as follows:

1. First split the points into two equal sized parts (e.g. for a list with 8 values):
`[20, 27, 6, 8 | 11, 10, 2, 33]`
2. Use half of the thread (4 threads) to compare pairs of elements from each half (e.g elements [0, 4], [1, 5] etc) and store the result in the first half of the list:
`[20, 27, 6, 33 | 11, 10, 2, 33]`
Now we no longer care about the second part but again divide the first part into two
`[20, 27 | 6, 33| -, -, -, -]`
Now we can use 2 threads to compare the 4 elements
4. Finally we have gotten down to a single pair
`[20 | 33 | -, - | -, -, -, -]`
Use 1 thread to compare the remaining two elements
5. The max will now be at thread id = 0
`[33 | - | -, - | -, -, -, -]`
The reduction will give the farthest point for the selected batch index at this iteration.

Reviewed By: bottler, jcjohnson

Differential Revision: D30401803

fbshipit-source-id: 525bd5ae27c4b13b501812cfe62306bb003827d2
2021-09-15 13:49:22 -07:00
..
2021-08-03 08:10:52 -07:00
2021-09-15 13:49:22 -07:00
2021-09-01 05:33:25 -07:00
2021-09-15 13:49:22 -07:00
2021-09-13 07:08:58 -07:00
2021-06-25 19:56:25 -07:00
2021-07-30 03:06:15 -07:00
2021-09-13 07:08:58 -07:00
2021-08-03 08:10:52 -07:00