Summary:
Added support for barycentric clipping in the C++/CUDA rasterization kernels which can be switched on/off via a rasterization setting.
Added tests and a benchmark to compare with the current implementation in PyTorch - for some cases of large image size/faces per pixel the cuda version is 10x faster.
Reviewed By: gkioxari
Differential Revision: D21705503
fbshipit-source-id: e835c0f927f1e5088ca89020aef5ff27ac3a8769