20,898 research outputs found
FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image
In this work, we introduce the novel problem of identifying dense canonical
3D coordinate frames from a single RGB image. We observe that each pixel in an
image corresponds to a surface in the underlying 3D geometry, where a canonical
frame can be identified as represented by three orthogonal axes, one along its
normal direction and two in its tangent plane. We propose an algorithm to
predict these axes from RGB. Our first insight is that canonical frames
computed automatically with recently introduced direction field synthesis
methods can provide training data for the task. Our second insight is that
networks designed for surface normal prediction provide better results when
trained jointly to predict canonical frames, and even better when trained to
also predict 2D projections of canonical frames. We conjecture this is because
projections of canonical tangent directions often align with local gradients in
images, and because those directions are tightly linked to 3D canonical frames
through projective geometry and orthogonality constraints. In our experiments,
we find that our method predicts 3D canonical frames that can be used in
applications ranging from surface normal estimation, feature matching, and
augmented reality
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference
We propose a data-driven method for recovering miss-ing parts of 3D shapes.
Our method is based on a new deep learning architecture consisting of two
sub-networks: a global structure inference network and a local geometry
refinement network. The global structure inference network incorporates a long
short-term memorized context fusion module (LSTM-CF) that infers the global
structure of the shape based on multi-view depth information provided as part
of the input. It also includes a 3D fully convolutional (3DFCN) module that
further enriches the global structure representation according to volumetric
information in the input. Under the guidance of the global structure network,
the local geometry refinement network takes as input lo-cal 3D patches around
missing regions, and progressively produces a high-resolution, complete surface
through a volumetric encoder-decoder architecture. Our method jointly trains
the global structure inference and local geometry refinement networks in an
end-to-end manner. We perform qualitative and quantitative evaluations on six
object categories, demonstrating that our method outperforms existing
state-of-the-art work on shape completion.Comment: 8 pages paper, 11 pages supplementary material, ICCV spotlight pape
- …