switch to a new implementation of the class SAM2VideoPredictor for per-object inference sam2/sam2_video_predictor.py

In this PR, we switch to a new implementation of the class `SAM2VideoPredictor` for in sam2/sam2_video_predictor.py, which allows for independent per-object inference.

Specifically, the new `SAM2VideoPredictor`:
* it handles the inference of each object separately, as if we are opening a separate session for each object
* it relaxes the assumption on prompting
  * previously if a frame receives clicks only for a subset of objects, the rest of (non-prompted) objects are assumed to be non-existing in this frame
  * now if a frame receives clicks only for a subset of objects, we don't make any assumptions for the remaining (non-prompted) objects
* it allows adding new objects after tracking starts
* (The previous implementation is backed up to `SAM2VideoPredictor` in sam2/sam2_video_predictor_legacy.py)

Also, fix a small typo `APP_URL` => `API_URL` in the doc.

Test plan: tested with the predictor notebook `notebooks/video_predictor_example.ipynb` and VOS script `tools/vos_inference.py`. Also tested with the demo.
This commit is contained in:
Ronghang Hu
2024-12-05 07:49:43 +00:00
parent 3297dd0eb0
commit c61e2475e6
3 changed files with 1299 additions and 328 deletions

View File

@@ -105,7 +105,7 @@ cd demo/backend/server/
```bash
PYTORCH_ENABLE_MPS_FALLBACK=1 \
APP_ROOT="$(pwd)/../../../" \
APP_URL=http://localhost:7263 \
API_URL=http://localhost:7263 \
MODEL_SIZE=base_plus \
DATA_PATH="$(pwd)/../../data" \
DEFAULT_VIDEO_PATH=gallery/05_default_juggle.mp4 \