293 research outputs found
Learning RGB-D Salient Object Detection using background enclosure, depth contrast, and top-down features
Recently, deep Convolutional Neural Networks (CNN) have demonstrated strong
performance on RGB salient object detection. Although, depth information can
help improve detection results, the exploration of CNNs for RGB-D salient
object detection remains limited. Here we propose a novel deep CNN architecture
for RGB-D salient object detection that exploits high-level, mid-level, and low
level features. Further, we present novel depth features that capture the ideas
of background enclosure and depth contrast that are suitable for a learned
approach. We show improved results compared to state-of-the-art RGB-D salient
object detection methods. We also show that the low-level and mid-level depth
features both contribute to improvements in the results. Especially, F-Score of
our method is 0.848 on RGBD1000 dataset, which is 10.7% better than the second
place
RGB-D Salient Object Detection: A Survey
Salient object detection (SOD), which simulates the human visual perception
system to locate the most attractive object(s) in a scene, has been widely
applied to various computer vision tasks. Now, with the advent of depth
sensors, depth maps with affluent spatial information that can be beneficial in
boosting the performance of SOD, can easily be captured. Although various RGB-D
based SOD models with promising performance have been proposed over the past
several years, an in-depth understanding of these models and challenges in this
topic remains lacking. In this paper, we provide a comprehensive survey of
RGB-D based SOD models from various perspectives, and review related benchmark
datasets in detail. Further, considering that the light field can also provide
depth maps, we review SOD models and popular benchmark datasets from this
domain as well. Moreover, to investigate the SOD ability of existing models, we
carry out a comprehensive evaluation, as well as attribute-based evaluation of
several representative RGB-D based SOD models. Finally, we discuss several
challenges and open directions of RGB-D based SOD for future research. All
collected models, benchmark datasets, source code links, datasets constructed
for attribute-based evaluation, and codes for evaluation will be made publicly
available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi
RGB-D Scene Representations for Prosthetic Vision
This thesis presents a new approach to scene representation for prosthetic vision. Structurally salient information from the scene is conveyed through the prosthetic vision display. Given the low resolution and dynamic range of the display, this enables robust identification and reliable interpretation of key structural features that are missed when using standard appearance-based scene representations. Specifically, two different types of salient structure are investigated: salient edge structure, for depiction of scene shape to the user; and salient object structure, for emulation of biological attention deployment when viewing a scene. This thesis proposes and evaluates novel computer vision algorithms for extracting salient edge and salient object structure from RGB-D input.
Extraction of salient edge structure from the scene is first investigated through low-level analysis of surface shape. Our approach is based on the observation that regions of irregular surface shape, such as the boundary between the wall and the floor, tend to be more informative of scene structure than uniformly shaped regions. We detect these surface irregularities through multi-scale analysis of iso-disparity contour orientations, providing a real time method that robustly identifies important scene structure. This approach is then extended by using a deep CNN to learn high level information for distinguishing salient edges from structural texture. A novel depth input encoding called the depth surface descriptor (DSD) is presented, which better captures scene geometry that corresponds to salient edges, improving the learned model. These methods provide robust detection of salient edge structure in the scene. The detection of salient object structure is first achieved by noting that salient objects often have contrasting shape from their surroundings. Contrasting shape in the depth image is captured through the proposed histogram of surface orientations (HOSO) feature. This feature is used to modulate depth and colour contrast in a saliency detection framework, improving the precision of saliency seed regions and through this the accuracy of the final detection. After this, a novel formulation of structural saliency is introduced based on the angular measure of local background enclosure (LBE). This formulation addresses fundamental limitations of depth contrast methods and is not reliant on foreground depth contrast in the scene. Saliency is instead measured through the degree to which a candidate patch exhibits foreground structure.
The effectiveness of the proposed approach is evaluated through both standard datasets as well as user studies that measure the contribution of structure-based representations. Our methods are found to more effectively measure salient structure in the scene than existing methods. Our approach results in improved performance compared to standard methods during practical use of an implant display
Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks
The use of RGB-D information for salient object detection has been
extensively explored in recent years. However, relatively few efforts have been
put towards modeling salient object detection in real-world human activity
scenes with RGBD. In this work, we fill the gap by making the following
contributions to RGB-D salient object detection. (1) We carefully collect a new
SIP (salient person) dataset, which consists of ~1K high-resolution images that
cover diverse real-world scenes from various viewpoints, poses, occlusions,
illuminations, and backgrounds. (2) We conduct a large-scale (and, so far, the
most comprehensive) benchmark comparing contemporary methods, which has long
been missing in the field and can serve as a baseline for future research. We
systematically summarize 32 popular models and evaluate 18 parts of 32 models
on seven datasets containing a total of about 97K images. (3) We propose a
simple general architecture, called Deep Depth-Depurator Network (D3Net). It
consists of a depth depurator unit (DDU) and a three-stream feature learning
module (FLM), which performs low-quality depth map filtering and cross-modal
feature learning respectively. These components form a nested structure and are
elaborately designed to be learned jointly. D3Net exceeds the performance of
any prior contenders across all five metrics under consideration, thus serving
as a strong model to advance research in this field. We also demonstrate that
D3Net can be used to efficiently extract salient object masks from real scenes,
enabling effective background changing application with a speed of 65fps on a
single GPU. All the saliency maps, our new SIP dataset, the D3Net model, and
the evaluation tools are publicly available at
https://github.com/DengPingFan/D3NetBenchmark.Comment: Accepted in TNNLS20. 15 pages, 12 figures. Code:
https://github.com/DengPingFan/D3NetBenchmar
- …