854 research outputs found
RGB-D Salient Object Detection: A Survey
Salient object detection (SOD), which simulates the human visual perception
system to locate the most attractive object(s) in a scene, has been widely
applied to various computer vision tasks. Now, with the advent of depth
sensors, depth maps with affluent spatial information that can be beneficial in
boosting the performance of SOD, can easily be captured. Although various RGB-D
based SOD models with promising performance have been proposed over the past
several years, an in-depth understanding of these models and challenges in this
topic remains lacking. In this paper, we provide a comprehensive survey of
RGB-D based SOD models from various perspectives, and review related benchmark
datasets in detail. Further, considering that the light field can also provide
depth maps, we review SOD models and popular benchmark datasets from this
domain as well. Moreover, to investigate the SOD ability of existing models, we
carry out a comprehensive evaluation, as well as attribute-based evaluation of
several representative RGB-D based SOD models. Finally, we discuss several
challenges and open directions of RGB-D based SOD for future research. All
collected models, benchmark datasets, source code links, datasets constructed
for attribute-based evaluation, and codes for evaluation will be made publicly
available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi
Middle-level Fusion for Lightweight RGB-D Salient Object Detection
Most existing lightweight RGB-D salient object detection (SOD) models are
based on two-stream structure or single-stream structure. The former one first
uses two sub-networks to extract unimodal features from RGB and depth images,
respectively, and then fuses them for SOD. While, the latter one directly
extracts multi-modal features from the input RGB-D images and then focuses on
exploiting cross-level complementary information. However, two-stream structure
based models inevitably require more parameters and single-stream structure
based ones cannot well exploit the cross-modal complementary information since
they ignore the modality difference. To address these issues, we propose to
employ the middle-level fusion structure for designing lightweight RGB-D SOD
model in this paper, which first employs two sub-networks to extract low- and
middle-level unimodal features, respectively, and then fuses those extracted
middle-level unimodal features for extracting corresponding high-level
multi-modal features in the subsequent sub-network. Different from existing
models, this structure can effectively exploit the cross-modal complementary
information and significantly reduce the network's parameters, simultaneously.
Therefore, a novel lightweight SOD model is designed, which contains a
information-aware multi-modal feature fusion (IMFF) module for effectively
capturing the cross-modal complementary information and a lightweight
feature-level and decision-level feature fusion (LFDF) module for aggregating
the feature-level and the decision-level saliency information in different
stages with less parameters. Our proposed model has only 3.9M parameters and
runs at 33 FPS. The experimental results on several benchmark datasets verify
the effectiveness and superiority of the proposed method over some
state-of-the-art methods.Comment: 11 pages, 6 figure
- …