10,795 research outputs found
F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation
Although deep learning based methods have achieved great progress in
unsupervised video object segmentation, difficult scenarios (e.g., visual
similarity, occlusions, and appearance changing) are still not well-handled. To
alleviate these issues, we propose a novel Focus on Foreground Network (F2Net),
which delves into the intra-inter frame details for the foreground objects and
thus effectively improve the segmentation performance. Specifically, our
proposed network consists of three main parts: Siamese Encoder Module, Center
Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.
Firstly, we take a siamese encoder to extract the feature representations of
paired frames (reference frame and current frame). Then, a Center Guiding
Appearance Diffusion Module is designed to capture the inter-frame feature
(dense correspondences between reference frame and current frame), intra-frame
feature (dense correspondences in current frame), and original semantic feature
of current frame. Specifically, we establish a Center Prediction Branch to
predict the center location of the foreground object in current frame and
leverage the center point information as spatial guidance prior to enhance the
inter-frame and intra-frame feature extraction, and thus the feature
representation considerably focus on the foreground objects. Finally, we
propose a Dynamic Information Fusion Module to automatically select relatively
important features through three aforementioned different level features.
Extensive experiments on DAVIS2016, Youtube-object, and FBMS datasets show that
our proposed F2Net achieves the state-of-the-art performance with significant
improvement.Comment: Accepted by AAAI202
- …