5,403 research outputs found
How do neural networks see depth in single images?
Deep neural networks have lead to a breakthrough in depth estimation from
single images. Recent work often focuses on the accuracy of the depth map,
where an evaluation on a publicly available test set such as the KITTI vision
benchmark is often the main result of the article. While such an evaluation
shows how well neural networks can estimate depth, it does not show how they do
this. To the best of our knowledge, no work currently exists that analyzes what
these networks have learned.
In this work we take the MonoDepth network by Godard et al. and investigate
what visual cues it exploits for depth estimation. We find that the network
ignores the apparent size of known obstacles in favor of their vertical
position in the image. Using the vertical position requires the camera pose to
be known; however we find that MonoDepth only partially corrects for changes in
camera pitch and roll and that these influence the estimated depth towards
obstacles. We further show that MonoDepth's use of the vertical image position
allows it to estimate the distance towards arbitrary obstacles, even those not
appearing in the training set, but that it requires a strong edge at the ground
contact point of the object to do so. In future work we will investigate
whether these observations also apply to other neural networks for monocular
depth estimation.Comment: Submitte
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
Contour Detection from Deep Patch-level Boundary Prediction
In this paper, we present a novel approach for contour detection with
Convolutional Neural Networks. A multi-scale CNN learning framework is designed
to automatically learn the most relevant features for contour patch detection.
Our method uses patch-level measurements to create contour maps with
overlapping patches. We show the proposed CNN is able to to detect large-scale
contours in an image efficienly. We further propose a guided filtering method
to refine the contour maps produced from large-scale contours. Experimental
results on the major contour benchmark databases demonstrate the effectiveness
of the proposed technique. We show our method can achieve good detection of
both fine-scale and large-scale contours.Comment: IEEE International Conference on Signal and Image Processing 201
Playing for Data: Ground Truth from Computer Games
Recent progress in computer vision has been driven by high-capacity models
trained on large datasets. Unfortunately, creating large datasets with
pixel-level labels has been extremely costly due to the amount of human effort
required. In this paper, we present an approach to rapidly creating
pixel-accurate semantic label maps for images extracted from modern computer
games. Although the source code and the internal operation of commercial games
are inaccessible, we show that associations between image patches can be
reconstructed from the communication between the game and the graphics
hardware. This enables rapid propagation of semantic labels within and across
images synthesized by the game, with no access to the source code or the
content. We validate the presented approach by producing dense pixel-level
semantic annotations for 25 thousand images synthesized by a photorealistic
open-world computer game. Experiments on semantic segmentation datasets show
that using the acquired data to supplement real-world images significantly
increases accuracy and that the acquired data enables reducing the amount of
hand-labeled real-world data: models trained with game data and just 1/3 of the
CamVid training set outperform models trained on the complete CamVid training
set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV
2016
- …