In this PR, we switch to a new implementation of the class `SAM2VideoPredictor` for in sam2/sam2_video_predictor.py, which allows for independent per-object inference.
Specifically, the new `SAM2VideoPredictor`:
* it handles the inference of each object separately, as if we are opening a separate session for each object
* it relaxes the assumption on prompting
* previously if a frame receives clicks only for a subset of objects, the rest of (non-prompted) objects are assumed to be non-existing in this frame
* now if a frame receives clicks only for a subset of objects, we don't make any assumptions for the remaining (non-prompted) objects
* it allows adding new objects after tracking starts
* (The previous implementation is backed up to `SAM2VideoPredictor` in sam2/sam2_video_predictor_legacy.py)
Also, fix a small typo `APP_URL` => `API_URL` in the doc.
Test plan: tested with the predictor notebook `notebooks/video_predictor_example.ipynb` and VOS script `tools/vos_inference.py`. Also tested with the demo.