281 research outputs found
Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video
We introduce a convolutional neural network model for unsupervised learning
of depth and ego-motion from cylindrical panoramic video. Panoramic depth
estimation is an important technology for applications such as virtual reality,
3D modeling, and autonomous robotic navigation. In contrast to previous
approaches for applying convolutional neural networks to panoramic imagery, we
use the cylindrical panoramic projection which allows for the use of the
traditional CNN layers such as convolutional filters and max pooling without
modification. Our evaluation of synthetic and real data shows that unsupervised
learning of depth and ego-motion on cylindrical panoramic images can produce
high-quality depth maps and that an increased field-of-view improves ego-motion
estimation accuracy. We also introduce Headcam, a novel dataset of panoramic
video collected from a helmet-mounted camera while biking in an urban setting.Comment: Accepted to IEEE AIVR 201
Neural Illumination: Lighting Prediction for Indoor Environments
This paper addresses the task of estimating the light arriving from all
directions to a 3D point observed at a selected pixel in an RGB image. This
task is challenging because it requires predicting a mapping from a partial
scene observation by a camera to a complete illumination map for a selected
position, which depends on the 3D location of the selection, the distribution
of unobserved light sources, the occlusions caused by scene geometry, etc.
Previous methods attempt to learn this complex mapping directly using a single
black-box neural network, which often fails to estimate high-frequency lighting
details for scenes with complicated 3D geometry. Instead, we propose "Neural
Illumination" a new approach that decomposes illumination prediction into
several simpler differentiable sub-tasks: 1) geometry estimation, 2) scene
completion, and 3) LDR-to-HDR estimation. The advantage of this approach is
that the sub-tasks are relatively easy to learn and can be trained with direct
supervision, while the whole pipeline is fully differentiable and can be
fine-tuned with end-to-end supervision. Experiments show that our approach
performs significantly better quantitatively and qualitatively than prior work
360MonoDepth: High-Resolution 360° Monocular Depth Estimation
360{\deg} cameras can capture complete environments in a single shot, which
makes 360{\deg} imagery alluring in many computer vision tasks. However,
monocular depth estimation remains a challenge for 360{\deg} data, particularly
for high resolutions like 2K (2048x1024) and beyond that are important for
novel-view synthesis and virtual reality applications. Current CNN-based
methods do not support such high resolutions due to limited GPU memory. In this
work, we propose a flexible framework for monocular depth estimation from
high-resolution 360{\deg} images using tangent images. We project the 360{\deg}
input image onto a set of tangent planes that produce perspective views, which
are suitable for the latest, most accurate state-of-the-art perspective
monocular depth estimators. To achieve globally consistent disparity estimates,
we recombine the individual depth estimates using deformable multi-scale
alignment followed by gradient-domain blending. The result is a dense,
high-resolution 360{\deg} depth map with a high level of detail, also for
outdoor scenes which are not supported by existing methods. Our source code and
data are available at https://manurare.github.io/360monodepth/.Comment: CVPR 2022. Project page: https://manurare.github.io/360monodepth
Neural Contourlet Network for Monocular 360 Depth Estimation
For a monocular 360 image, depth estimation is a challenging because the
distortion increases along the latitude. To perceive the distortion, existing
methods devote to designing a deep and complex network architecture. In this
paper, we provide a new perspective that constructs an interpretable and sparse
representation for a 360 image. Considering the importance of the geometric
structure in depth estimation, we utilize the contourlet transform to capture
an explicit geometric cue in the spectral domain and integrate it with an
implicit cue in the spatial domain. Specifically, we propose a neural
contourlet network consisting of a convolutional neural network and a
contourlet transform branch. In the encoder stage, we design a spatial-spectral
fusion module to effectively fuse two types of cues. Contrary to the encoder,
we employ the inverse contourlet transform with learned low-pass subbands and
band-pass directional subbands to compose the depth in the decoder. Experiments
on the three popular panoramic image datasets demonstrate that the proposed
approach outperforms the state-of-the-art schemes with faster convergence. Code
is available at
https://github.com/zhijieshen-bjtu/Neural-Contourlet-Network-for-MODE.Comment: IEEE Transactions on Circuits and Systems for Video Technolog
- …