2,212 research outputs found
Improved deep depth estimation for environments with sparse visual cues
Most deep learning-based depth estimation models that learn scene structure self-supervised from monocular video base their estimation on visual cues such as vanishing points. In the established depth estimation benchmarks depicting, for example, street navigation or indoor offices, these cues can be found consistently, which enables neural networks to predict depth maps from single images. In this work, we are addressing the challenge of depth estimation from a real-world bird’s-eye perspective in an industry environment which contains, conditioned by its special geometry, a minimal amount of visual cues and, hence, requires incorporation of the temporal domain for structure from motion estimation. To enable the system to incorporate structure from motion from pixel translation when facing context-sparse, i.e., visual cue sparse, scenery, we propose a novel architecture built upon the structure from motion learner, which uses temporal pairs of jointly unrotated and stacked images for depth prediction. In order to increase the overall performance and to avoid blurred depth edges that lie in between the edges of the two input images, we integrate a geometric consistency loss into our pipeline. We assess the model’s ability to learn structure from motion by introducing a novel industry dataset whose perspective, orthogonal to the floor, contains only minimal visual cues. Through the evaluation with ground truth depth, we show that our proposed method outperforms the state of the art in difficult context-sparse environments.Peer reviewe
Vanishing Point Detection with Direct and Transposed Fast Hough Transform inside the neural network
In this paper, we suggest a new neural network architecture for vanishing
point detection in images. The key element is the use of the direct and
transposed Fast Hough Transforms separated by convolutional layer blocks with
standard activation functions. It allows us to get the answer in the
coordinates of the input image at the output of the network and thus to
calculate the coordinates of the vanishing point by simply selecting the
maximum. Besides, it was proved that calculation of the transposed Fast Hough
Transform can be performed using the direct one. The use of integral operators
enables the neural network to rely on global rectilinear features in the image,
and so it is ideal for detecting vanishing points. To demonstrate the
effectiveness of the proposed architecture, we use a set of images from a DVR
and show its superiority over existing methods. Note, in addition, that the
proposed neural network architecture essentially repeats the process of direct
and back projection used, for example, in computed tomography.Comment: 9 pages, 9 figures, submitted to "Computer Optics"; extra experiment
added, new theorem proof added, references added; typos correcte
ECO: Egocentric Cognitive Mapping
We present a new method to localize a camera within a previously unseen
environment perceived from an egocentric point of view. Although this is, in
general, an ill-posed problem, humans can effortlessly and efficiently
determine their relative location and orientation and navigate into a
previously unseen environments, e.g., finding a specific item in a new grocery
store. To enable such a capability, we design a new egocentric representation,
which we call ECO (Egocentric COgnitive map). ECO is biologically inspired, by
the cognitive map that allows human navigation, and it encodes the surrounding
visual semantics with respect to both distance and orientation. ECO possesses
three main properties: (1) reconfigurability: complex semantics and geometry is
captured via the synthesis of atomic visual representations (e.g., image
patch); (2) robustness: the visual semantics are registered in a geometrically
consistent way (e.g., aligning with respect to the gravity vector,
frontalizing, and rescaling to canonical depth), thus enabling us to learn
meaningful atomic representations; (3) adaptability: a domain adaptation
framework is designed to generalize the learned representation without manual
calibration. As a proof-of-concept, we use ECO to localize a camera within
real-world scenes---various grocery stores---and demonstrate performance
improvements when compared to existing semantic localization approaches
- …