129,638 research outputs found
How do neural networks see depth in single images?
Deep neural networks have lead to a breakthrough in depth estimation from
single images. Recent work often focuses on the accuracy of the depth map,
where an evaluation on a publicly available test set such as the KITTI vision
benchmark is often the main result of the article. While such an evaluation
shows how well neural networks can estimate depth, it does not show how they do
this. To the best of our knowledge, no work currently exists that analyzes what
these networks have learned.
In this work we take the MonoDepth network by Godard et al. and investigate
what visual cues it exploits for depth estimation. We find that the network
ignores the apparent size of known obstacles in favor of their vertical
position in the image. Using the vertical position requires the camera pose to
be known; however we find that MonoDepth only partially corrects for changes in
camera pitch and roll and that these influence the estimated depth towards
obstacles. We further show that MonoDepth's use of the vertical image position
allows it to estimate the distance towards arbitrary obstacles, even those not
appearing in the training set, but that it requires a strong edge at the ground
contact point of the object to do so. In future work we will investigate
whether these observations also apply to other neural networks for monocular
depth estimation.Comment: Submitte
A Generative Model For Zero Shot Learning Using Conditional Variational Autoencoders
Zero shot learning in Image Classification refers to the setting where images
from some novel classes are absent in the training data but other information
such as natural language descriptions or attribute vectors of the classes are
available. This setting is important in the real world since one may not be
able to obtain images of all the possible classes at training. While previous
approaches have tried to model the relationship between the class attribute
space and the image space via some kind of a transfer function in order to
model the image space correspondingly to an unseen class, we take a different
approach and try to generate the samples from the given attributes, using a
conditional variational autoencoder, and use the generated samples for
classification of the unseen classes. By extensive testing on four benchmark
datasets, we show that our model outperforms the state of the art, particularly
in the more realistic generalized setting, where the training classes can also
appear at the test time along with the novel classes
- …