Article thumbnail

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

By Gedas Bertasius and Lorenzo Torresani

Abstract

We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence. Our method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates frame-level object instance masks from each video frame to all the other frames in a video clip. This allows our system to predict clip-level instance tracks with respect to the object instances segmented in the middle frame of the clip. Clip-level instance tracks generated densely for each frame in the sequence are finally aggregated to produce video-level object instance segmentation and classification. Our experiments demonstrate that our clip-level instance segmentation makes our approach robust to motion blur and object occlusions in video. MaskProp achieves the best reported accuracy on the YouTube-VIS dataset, outperforming the ICCV 2019 video instance segmentation challenge winner despite being much simpler and using orders of magnitude less labeled data (1.3M vs 1B images and 860K vs 14M bounding boxes). The project page is at: https://gberta.github.io/maskprop/.Comment: CVPR 2020 Best Paper Nomine

Topics: Computer Science - Computer Vision and Pattern Recognition
Year: 2020
OAI identifier: oai:arXiv.org:1912.04573
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://arxiv.org/abs/1912.0457... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.