599 research outputs found
RGB-T salient object detection via fusing multi-level CNN features
RGB-induced salient object detection has recently witnessed substantial progress, which is attributed to the superior feature learning capability of deep convolutional neural networks (CNNs). However, such detections suffer from challenging scenarios characterized by cluttered backgrounds, low-light conditions and variations in illumination. Instead of improving RGB based saliency detection, this paper takes advantage of the complementary benefits of RGB and thermal infrared images. Specifically, we propose a novel end-to-end network for multi-modal salient object detection, which turns the challenge of RGB-T saliency detection to a CNN feature fusion problem. To this end, a backbone network (e.g., VGG-16) is first adopted to extract the coarse features from each RGB or thermal infrared image individually, and then several adjacent-depth feature combination (ADFC) modules are designed to extract multi-level refined features for each single-modal input image, considering that features captured at different depths differ in semantic information and visual details. Subsequently, a multi-branch group fusion (MGF) module is employed to capture the cross-modal features by fusing those features from ADFC modules for a RGB-T image pair at each level. Finally, a joint attention guided bi-directional message passing (JABMP) module undertakes the task of saliency prediction via integrating the multi-level fused features from MGF modules. Experimental results on several public RGB-T salient object detection datasets demonstrate the superiorities of our proposed algorithm over the state-of-the-art approaches, especially under challenging conditions, such as poor illumination, complex background and low contrast
RGB-D Salient Object Detection: A Survey
Salient object detection (SOD), which simulates the human visual perception
system to locate the most attractive object(s) in a scene, has been widely
applied to various computer vision tasks. Now, with the advent of depth
sensors, depth maps with affluent spatial information that can be beneficial in
boosting the performance of SOD, can easily be captured. Although various RGB-D
based SOD models with promising performance have been proposed over the past
several years, an in-depth understanding of these models and challenges in this
topic remains lacking. In this paper, we provide a comprehensive survey of
RGB-D based SOD models from various perspectives, and review related benchmark
datasets in detail. Further, considering that the light field can also provide
depth maps, we review SOD models and popular benchmark datasets from this
domain as well. Moreover, to investigate the SOD ability of existing models, we
carry out a comprehensive evaluation, as well as attribute-based evaluation of
several representative RGB-D based SOD models. Finally, we discuss several
challenges and open directions of RGB-D based SOD for future research. All
collected models, benchmark datasets, source code links, datasets constructed
for attribute-based evaluation, and codes for evaluation will be made publicly
available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi
Densely Deformable Efficient Salient Object Detection Network
Salient Object Detection (SOD) domain using RGB-D data has lately emerged
with some current models' adequately precise results. However, they have
restrained generalization abilities and intensive computational complexity. In
this paper, inspired by the best background/foreground separation abilities of
deformable convolutions, we employ them in our Densely Deformable Network
(DDNet) to achieve efficient SOD. The salient regions from densely deformable
convolutions are further refined using transposed convolutions to optimally
generate the saliency maps. Quantitative and qualitative evaluations using the
recent SOD dataset against 22 competing techniques show our method's efficiency
and effectiveness. We also offer evaluation using our own created
cross-dataset, surveillance-SOD (S-SOD), to check the trained models' validity
in terms of their applicability in diverse scenarios. The results indicate that
the current models have limited generalization potentials, demanding further
research in this direction. Our code and new dataset will be publicly available
at https://github.com/tanveer-hussain/EfficientSO
Dynamic Knowledge Distillation with A Single Stream Structure for RGB-D Salient Object Detection
RGB-D salient object detection(SOD) demonstrates its superiority on detecting
in complex environments due to the additional depth information introduced in
the data. Inevitably, an independent stream is introduced to extract features
from depth images, leading to extra computation and parameters. This
methodology which sacrifices the model size to improve the detection accuracy
may impede the practical application of SOD problems. To tackle this dilemma,
we propose a dynamic distillation method along with a lightweight framework,
which significantly reduces the parameters. This method considers the factors
of both teacher and student performance within the training stage and
dynamically assigns the distillation weight instead of applying a fixed weight
on the student model. Extensive experiments are conducted on five public
datasets to demonstrate that our method can achieve competitive performance
compared to 10 prior methods through a 78.2MB lightweight structure
Salient Object Detection in RGB-D Videos
Given the widespread adoption of depth-sensing acquisition devices, RGB-D
videos and related data/media have gained considerable traction in various
aspects of daily life. Consequently, conducting salient object detection (SOD)
in RGB-D videos presents a highly promising and evolving avenue. Despite the
potential of this area, SOD in RGB-D videos remains somewhat under-explored,
with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To
explore this emerging field, this paper makes two primary contributions: the
dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D
VSOD dataset with realistic depth and characterized by its diversity of scenes
and rigorous frame-by-frame annotations. We validate the dataset through
comprehensive attribute and object-oriented analyses, and provide training and
testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored
for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical
flow as auxiliary modalities. In pursuit of effective feature enhancement,
refinement, and fusion for precise final prediction, we propose two modules:
the multi-modal attention module (MAM) and the refinement fusion module (RFM).
To enhance interaction and fusion within RFM, we design a universal interaction
module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs)
for refining multi-modal low-level features before reaching RFMs. Comprehensive
experiments, conducted on pseudo RGB-D video datasets alongside our RDVS,
highlight the superiority of DCTNet+ over 17 VSOD models and 14 RGB-D SOD
models. Ablation experiments were performed on both pseudo and realistic RGB-D
video datasets to demonstrate the advantages of individual modules as well as
the necessity of introducing realistic depth. Our code together with RDVS
dataset will be available at https://github.com/kerenfu/RDVS/
- …