107 research outputs found
RGB-D Salient Object Detection: A Survey
Salient object detection (SOD), which simulates the human visual perception
system to locate the most attractive object(s) in a scene, has been widely
applied to various computer vision tasks. Now, with the advent of depth
sensors, depth maps with affluent spatial information that can be beneficial in
boosting the performance of SOD, can easily be captured. Although various RGB-D
based SOD models with promising performance have been proposed over the past
several years, an in-depth understanding of these models and challenges in this
topic remains lacking. In this paper, we provide a comprehensive survey of
RGB-D based SOD models from various perspectives, and review related benchmark
datasets in detail. Further, considering that the light field can also provide
depth maps, we review SOD models and popular benchmark datasets from this
domain as well. Moreover, to investigate the SOD ability of existing models, we
carry out a comprehensive evaluation, as well as attribute-based evaluation of
several representative RGB-D based SOD models. Finally, we discuss several
challenges and open directions of RGB-D based SOD for future research. All
collected models, benchmark datasets, source code links, datasets constructed
for attribute-based evaluation, and codes for evaluation will be made publicly
available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi
Bifurcated backbone strategy for RGB-D salient object detection
Multi-level feature fusion is a fundamental topic in computer vision. It has
been exploited to detect, segment and classify objects at various scales. When
multi-level features meet multi-modal cues, the optimal feature aggregation and
multi-modal learning strategy become a hot potato. In this paper, we leverage
the inherent multi-modal and multi-level nature of RGB-D salient object
detection to devise a novel cascaded refinement network. In particular, first,
we propose to regroup the multi-level features into teacher and student
features using a bifurcated backbone strategy (BBS). Second, we introduce a
depth-enhanced module (DEM) to excavate informative depth cues from the channel
and spatial views. Then, RGB and depth modalities are fused in a complementary
way. Our architecture, named Bifurcated Backbone Strategy Network (BBS-Net), is
simple, efficient, and backbone-independent. Extensive experiments show that
BBS-Net significantly outperforms eighteen SOTA models on eight challenging
datasets under five evaluation measures, demonstrating the superiority of our
approach ( improvement in S-measure the top-ranked model:
DMRA-iccv2019). In addition, we provide a comprehensive analysis on the
generalization ability of different RGB-D datasets and provide a powerful
training set for future research.Comment: A preliminary version of this work has been accepted in ECCV 202
RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection
The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy
for the limited application scenarios of traditional RGB camera. The RGB-X
tasks, which rely on RGB input and another type of data input to resolve
specific problems, have become a popular research topic in multimedia. A
crucial part in two-branch RGB-X deep neural networks is how to fuse
information across modalities. Given the tremendous information inside RGB-X
networks, previous works typically apply naive fusion (e.g., average or max
fusion) or only focus on the feature fusion at the same scale(s). While in this
paper, we propose a novel method called RXFOOD for the fusion of features
across different scales within the same modality branch and from different
modality branches simultaneously in a unified attention mechanism. An Energy
Exchange Module is designed for the interaction of each feature map's energy
matrix, who reflects the inter-relationship of different positions and
different channels inside a feature map. The RXFOOD method can be easily
incorporated to any dual-branch encoder-decoder network as a plug-in module,
and help the original backbone network better focus on important positions and
channels for object of interest detection. Experimental results on RGB-NIR
salient object detection, RGB-D salient object detection, and RGBFrequency
image manipulation detection demonstrate the clear effectiveness of the
proposed RXFOOD.Comment: 10 page
Boundary-semantic collaborative guidance network with dual-stream feedback mechanism for salient object detection in optical remote sensing imagery
With the increasing application of deep learning in various domains, salient
object detection in optical remote sensing images (ORSI-SOD) has attracted
significant attention. However, most existing ORSI-SOD methods predominantly
rely on local information from low-level features to infer salient boundary
cues and supervise them using boundary ground truth, but fail to sufficiently
optimize and protect the local information, and almost all approaches ignore
the potential advantages offered by the last layer of the decoder to maintain
the integrity of saliency maps. To address these issues, we propose a novel
method named boundary-semantic collaborative guidance network (BSCGNet) with
dual-stream feedback mechanism. First, we propose a boundary protection
calibration (BPC) module, which effectively reduces the loss of edge position
information during forward propagation and suppresses noise in low-level
features without relying on boundary ground truth. Second, based on the BPC
module, a dual feature feedback complementary (DFFC) module is proposed, which
aggregates boundary-semantic dual features and provides effective feedback to
coordinate features across different layers, thereby enhancing cross-scale
knowledge communication. Finally, to obtain more complete saliency maps, we
consider the uniqueness of the last layer of the decoder for the first time and
propose the adaptive feedback refinement (AFR) module, which further refines
feature representation and eliminates differences between features through a
unique feedback mechanism. Extensive experiments on three benchmark datasets
demonstrate that BSCGNet exhibits distinct advantages in challenging scenarios
and outperforms the 17 state-of-the-art (SOTA) approaches proposed in recent
years. Codes and results have been released on GitHub:
https://github.com/YUHsss/BSCGNet.Comment: Accepted by TGR
- …