Search CORE

11,224 research outputs found

The 2017 DAVIS Challenge on Video Object Segmentation

Author: Arbeláez Pablo
Caelles Sergi
Perazzi Federico
Pont-Tuset Jordi
Sorkine-Hornung Alex
Van Gool Luc
Publication venue
Publication date: 01/03/2018
Field of study

We present the 2017 DAVIS Challenge on Video Object Segmentation, a public dataset, benchmark, and competition specifically designed for the task of video object segmentation. Following the footsteps of other successful initiatives, such as ILSVRC and PASCAL VOC, which established the avenue of research in the fields of scene classification and semantic segmentation, the DAVIS Challenge comprises a dataset, an evaluation methodology, and a public competition with a dedicated workshop co-located with CVPR 2017. The DAVIS Challenge follows up on the recent publication of DAVIS (Densely-Annotated VIdeo Segmentation), which has fostered the development of several novel state-of-the-art video object segmentation techniques. In this paper we describe the scope of the benchmark, highlight the main characteristics of the dataset, define the evaluation metrics of the competition, and present a detailed analysis of the results of the participants to the challenge.Comment: Challenge website: http://davischallenge.or

arXiv.org e-Print Archive

The 2018 DAVIS Challenge on Video Object Segmentation

Author: Caelles Sergi
Chen Yuhua
Maninis Kevis-Kokitsi
Montes Alberto
Perazzi Federico
Pont-Tuset Jordi
Van Gool Luc
Publication venue
Publication date: 27/03/2018
Field of study

We present the 2018 DAVIS Challenge on Video Object Segmentation, a public competition specifically designed for the task of video object segmentation. It builds upon the DAVIS 2017 dataset, which was presented in the previous edition of the DAVIS Challenge, and added 100 videos with multiple objects per sequence to the original DAVIS 2016 dataset. Motivated by the analysis of the results of the 2017 edition, the main track of the competition will be the same than in the previous edition (segmentation given the full mask of the objects in the first frame -- semi-supervised scenario). This edition, however, also adds an interactive segmentation teaser track, where the participants will interact with a web service simulating the input of a human that provides scribbles to iteratively improve the result.Comment: Challenge website: http://davischallenge.org

arXiv.org e-Print Archive

Video Object Segmentation using Tracked Object Proposals

Author: Friedman Itamar
Sharir Gilad
Smolyansky Eddie
Publication venue
Publication date: 20/07/2017
Field of study

We present an approach to semi-supervised video object segmentation, in the context of the DAVIS 2017 challenge. Our approach combines category-based object detection, category-independent object appearance segmentation and temporal object tracking. We are motivated by the fact that the objects semantic category tends not to change throughout the video while its appearance and location can vary considerably. In order to capture the specific object appearance independent of its category, for each video we train a fully convolutional network using augmentations of the given annotated frame. We refine the appearance segmentation mask with the bounding boxes provided either by a semantic object detection network, when applicable, or by a previous frame prediction. By introducing a temporal continuity constraint on the detected boxes, we are able to improve the object segmentation mask of the appearance network and achieve competitive results on the DAVIS datasets.Comment: All authors contributed equally, CVPR-2017 workshop, DAVIS-2017 Challeng

arXiv.org e-Print Archive

PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation

Author: Leibe Bastian
Luiten Jonathon
Voigtlaender Paul
Publication venue
Publication date: 03/11/2018
Field of study

We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present the PReMVOS algorithm (Proposal-generation, Refinement and Merging for Video Object Segmentation). Our method separates this problem into two steps, first generating a set of accurate object segmentation mask proposals for each video frame and then selecting and merging these proposals into accurate and temporally consistent pixel-wise object tracks over a video sequence in a way which is designed to specifically tackle the difficult challenges involved with segmenting multiple objects across a video sequence. Our approach surpasses all previous state-of-the-art results on the DAVIS 2017 video object segmentation benchmark with a J & F mean score of 71.6 on the test-dev dataset, and achieves first place in both the DAVIS 2018 Video Object Segmentation Challenge and the YouTube-VOS 1st Large-scale Video Object Segmentation Challenge.Comment: Accepted for publication in ACCV1

arXiv.org e-Print Archive

UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking

Author: Leibe Bastian
Luiten Jonathon
Zulfikar Idil Esen
Publication venue
Publication date: 15/01/2020
Field of study

We address Unsupervised Video Object Segmentation (UVOS), the task of automatically generating accurate pixel masks for salient objects in a video sequence and of tracking these objects consistently through time, without any input about which objects should be tracked. Towards solving this task, we present UnOVOST (Unsupervised Offline Video Object Segmentation and Tracking) as a simple and generic algorithm which is able to track and segment a large variety of objects. This algorithm builds up tracks in a number stages, first grouping segments into short tracklets that are spatio-temporally consistent, before merging these tracklets into long-term consistent object tracks based on their visual similarity. In order to achieve this we introduce a novel tracklet-based Forest Path Cutting data association algorithm which builds up a decision forest of track hypotheses before cutting this forest into paths that form long-term consistent object tracks. When evaluating our approach on the DAVIS 2017 Unsupervised dataset we obtain state-of-the-art performance with a mean J &F score of 67.9% on the val, 58% on the test-dev and 56.4% on the test-challenge benchmarks, obtaining first place in the DAVIS 2019 Unsupervised Video Object Segmentation Challenge. UnOVOST even performs competitively with many semi-supervised video object segmentation algorithms even though it is not given any input as to which objects should be tracked and segmented.Comment: Accepted for publication at WACV 202

arXiv.org e-Print Archive

Learning to Segment Instances in Videos with Spatial Propagation Network

Author: Cheng Jingchun
De Mello Shalini
Gu Jinwei
Hung Wei-Chih
Kautz Jan
Liu Sifei
Tsai Yi-Hsuan
Wang Shengjin
Yang Ming-Hsuan
Publication venue
Publication date: 14/09/2017
Field of study

We propose a deep learning-based framework for instance-level object segmentation. Our method mainly consists of three steps. First, We train a generic model based on ResNet-101 for foreground/background segmentations. Second, based on this generic model, we fine-tune it to learn instance-level models and segment individual objects by using augmented object annotations in first frames of test videos. To distinguish different instances in the same video, we compute a pixel-level score map for each object from these instance-level models. Each score map indicates the objectness likelihood and is only computed within the foreground mask obtained in the first step. To further refine this per frame score map, we learn a spatial propagation network. This network aims to learn how to propagate a coarse segmentation mask spatially based on the pairwise similarities in each frame. In addition, we apply a filter on the refined score map that aims to recognize the best connected region using spatial and temporal consistencies in the video. Finally, we decide the instance-level object segmentation in each video by comparing score maps of different instances.Comment: CVPR 2017 Workshop on DAVIS Challenge. Code is available at http://github.com/JingchunCheng/Seg-with-SP

arXiv.org e-Print Archive

Self-supervised Video Object Segmentation

Author: Fu Yanwei
Guo Guodong
Xie Weidi
Zhang Li
Zhu Fangrui
Publication venue
Publication date: 22/06/2020
Field of study

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking). We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching, which resolves the challenge caused by the dis-appearance and reappearance of objects; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity, e.g. occlusions or dis-occlusions, fast motions; (iii) we explore the efficiency of self-supervised representation learning for dense tracking, surprisingly, we show that a powerful tracking model can be trained with as few as 100 raw video clips (equivalent to a duration of 11mins), indicating that low-level statistics have already been effective for tracking tasks; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube-VOS, as well as surpassing most of methods trained with millions of manual segmentation annotations, further bridging the gap between self-supervised and supervised learning. Codes are released to foster any further research (https://github.com/fangruizhu/self_sup_semiVOS)

arXiv.org e-Print Archive

FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

Author: Adam Hartwig
Chai Yuning
Chen Liang-Chieh
Leibe Bastian
Schroff Florian
Voigtlaender Paul
Publication venue
Publication date: 08/04/2019
Field of study

Many of the recent successful methods for video object segmentation (VOS) are overly complicated, heavily rely on fine-tuning on the first frame, and/or are slow, and are hence of limited practical use. In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning. In order to segment a video, for each frame FEELVOS uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame. In contrast to previous work, our embedding is only used as an internal guidance of a convolutional network. Our novel dynamic segmentation head allows us to train the network, including the embedding, end-to-end for the multiple object segmentation task with a cross entropy loss. We achieve a new state of the art in video object segmentation without fine-tuning with a J&F measure of 71.5% on the DAVIS 2017 validation set. We make our code and models available at https://github.com/tensorflow/models/tree/master/research/feelvos.Comment: CVPR 2019 camera-ready versio

arXiv.org e-Print Archive

Fast User-Guided Video Object Segmentation by Interaction-and-Propagation Networks

Author: Kim Seon Joo
Lee Joon-Young
Oh Seoung Wug
Xu Ning
Publication venue
Publication date: 02/05/2019
Field of study

We present a deep learning method for the interactive video object segmentation. Our method is built upon two core operations, interaction and propagation, and each operation is conducted by Convolutional Neural Networks. The two networks are connected both internally and externally so that the networks are trained jointly and interact with each other to solve the complex video object segmentation problem. We propose a new multi-round training scheme for the interactive video object segmentation so that the networks can learn how to understand the user's intention and update incorrect estimations during the training. At the testing time, our method produces high-quality results and also runs fast enough to work with users interactively. We evaluated the proposed method quantitatively on the interactive track benchmark at the DAVIS Challenge 2018. We outperformed other competing methods by a significant margin in both the speed and the accuracy. We also demonstrated that our method works well with real user interactions.Comment: CVPR 201

arXiv.org e-Print Archive

BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames

Author: Corso Jason J.
Griffin Brent A.
Publication venue
Publication date: 23/11/2020
Field of study

Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. However, we find that segmentation performance across the entire video varies dramatically when selecting an alternative frame for annotation. This paper address the problem of learning to suggest the single best frame across the video for user annotation-this is, in fact, never the first frame of video. We achieve this by introducing BubbleNets, a novel deep sorting network that learns to select frames using a performance-based loss function that enables the conversion of expansive amounts of training examples from already existing datasets. Using BubbleNets, we are able to achieve an 11% relative improvement in segmentation performance on the DAVIS benchmark without any changes to the underlying method of segmentation.Comment: CVPR 201

arXiv.org e-Print Archive