109 research outputs found
RGB-D Salient Object Detection: A Survey
Salient object detection (SOD), which simulates the human visual perception
system to locate the most attractive object(s) in a scene, has been widely
applied to various computer vision tasks. Now, with the advent of depth
sensors, depth maps with affluent spatial information that can be beneficial in
boosting the performance of SOD, can easily be captured. Although various RGB-D
based SOD models with promising performance have been proposed over the past
several years, an in-depth understanding of these models and challenges in this
topic remains lacking. In this paper, we provide a comprehensive survey of
RGB-D based SOD models from various perspectives, and review related benchmark
datasets in detail. Further, considering that the light field can also provide
depth maps, we review SOD models and popular benchmark datasets from this
domain as well. Moreover, to investigate the SOD ability of existing models, we
carry out a comprehensive evaluation, as well as attribute-based evaluation of
several representative RGB-D based SOD models. Finally, we discuss several
challenges and open directions of RGB-D based SOD for future research. All
collected models, benchmark datasets, source code links, datasets constructed
for attribute-based evaluation, and codes for evaluation will be made publicly
available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi
Salient Object Detection in RGB-D Videos
Given the widespread adoption of depth-sensing acquisition devices, RGB-D
videos and related data/media have gained considerable traction in various
aspects of daily life. Consequently, conducting salient object detection (SOD)
in RGB-D videos presents a highly promising and evolving avenue. Despite the
potential of this area, SOD in RGB-D videos remains somewhat under-explored,
with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To
explore this emerging field, this paper makes two primary contributions: the
dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D
VSOD dataset with realistic depth and characterized by its diversity of scenes
and rigorous frame-by-frame annotations. We validate the dataset through
comprehensive attribute and object-oriented analyses, and provide training and
testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored
for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical
flow as auxiliary modalities. In pursuit of effective feature enhancement,
refinement, and fusion for precise final prediction, we propose two modules:
the multi-modal attention module (MAM) and the refinement fusion module (RFM).
To enhance interaction and fusion within RFM, we design a universal interaction
module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs)
for refining multi-modal low-level features before reaching RFMs. Comprehensive
experiments, conducted on pseudo RGB-D video datasets alongside our RDVS,
highlight the superiority of DCTNet+ over 17 VSOD models and 14 RGB-D SOD
models. Ablation experiments were performed on both pseudo and realistic RGB-D
video datasets to demonstrate the advantages of individual modules as well as
the necessity of introducing realistic depth. Our code together with RDVS
dataset will be available at https://github.com/kerenfu/RDVS/
Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection
Co-saliency detection aims to discover the common and salient foregrounds
from a group of relevant images. For this task, we present a novel adaptive
graph convolutional network with attention graph clustering (GCAGC). Three
major contributions have been made, and are experimentally shown to have
substantial practical merits. First, we propose a graph convolutional network
design to extract information cues to characterize the intra- and interimage
correspondence. Second, we develop an attention graph clustering algorithm to
discriminate the common objects from all the salient foreground objects in an
unsupervised fashion. Third, we present a unified framework with
encoder-decoder structure to jointly train and optimize the graph convolutional
network, attention graph cluster, and co-saliency detection decoder in an
end-to-end manner. We evaluate our proposed GCAGC method on three cosaliency
detection benchmark datasets (iCoseg, Cosal2015 and COCO-SEG). Our GCAGC method
obtains significant improvements over the state-of-the-arts on most of them.Comment: CVPR202
- …