307 research outputs found
Monocular Object Instance Segmentation and Depth Ordering with CNNs
In this paper we tackle the problem of instance-level segmentation and depth
ordering from a single monocular image. Towards this goal, we take advantage of
convolutional neural nets and train them to directly predict instance-level
segmentations where the instance ID encodes the depth ordering within image
patches. To provide a coherent single explanation of an image we develop a
Markov random field which takes as input the predictions of convolutional
neural nets applied at overlapping patches of different resolutions, as well as
the output of a connected component algorithm. It aims to predict accurate
instance-level segmentation and depth ordering. We demonstrate the
effectiveness of our approach on the challenging KITTI benchmark and show good
performance on both tasks.Comment: International Conference on Computer Vision (ICCV), 201
Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation
This paper addresses the task of designing a modular neural network
architecture that jointly solves different tasks. As an example we use the
tasks of depth estimation and semantic segmentation given a single RGB image.
The main focus of this work is to analyze the cross-modality influence between
depth and semantic prediction maps on their joint refinement. While most
previous works solely focus on measuring improvements in accuracy, we propose a
way to quantify the cross-modality influence. We show that there is a
relationship between final accuracy and cross-modality influence, although not
a simple linear one. Hence a larger cross-modality influence does not
necessarily translate into an improved accuracy. We find that a beneficial
balance between the cross-modality influences can be achieved by network
architecture and conjecture that this relationship can be utilized to
understand different network design choices. Towards this end we propose a
Convolutional Neural Network (CNN) architecture that fuses the state of the
state-of-the-art results for depth estimation and semantic labeling. By
balancing the cross-modality influences between depth and semantic prediction,
we achieve improved results for both tasks using the NYU-Depth v2 benchmark.Comment: Accepted to ICRA 201
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels
In the context of scene understanding, a variety of methods exists to
estimate different information channels from mono or stereo images, including
disparity, depth, and normals. Although several advances have been reported in
the recent years for these tasks, the estimated information is often imprecise
particularly near depth discontinuities or creases. Studies have however shown
that precisely such depth edges carry critical cues for the perception of
shape, and play important roles in tasks like depth-based segmentation or
foreground selection. Unfortunately, the currently extracted channels often
carry conflicting signals, making it difficult for subsequent applications to
effectively use them. In this paper, we focus on the problem of obtaining
high-precision depth edges (i.e., depth contours and creases) by jointly
analyzing such unreliable information channels. We propose DepthCut, a
data-driven fusion of the channels using a convolutional neural network trained
on a large dataset with known depth. The resulting depth edges can be used for
segmentation, decomposing a scene into depth layers with relatively flat depth,
or improving the accuracy of the depth estimate near depth edges by
constraining its gradients to agree with these edges. Quantitatively, we
compare against 15 variants of baselines and demonstrate that our depth edges
result in an improved segmentation performance and an improved depth estimate
near depth edges compared to data-agnostic channel fusion. Qualitatively, we
demonstrate that the depth edges result in superior segmentation and depth
orderings.Comment: 12 page
- …