6,064 research outputs found
ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
We introduce ScanComplete, a novel data-driven approach for taking an
incomplete 3D scan of a scene as input and predicting a complete 3D model along
with per-voxel semantic labels. The key contribution of our method is its
ability to handle large scenes with varying spatial extent, managing the cubic
growth in data size as scene size increases. To this end, we devise a
fully-convolutional generative 3D CNN model whose filter kernels are invariant
to the overall scene size. The model can be trained on scene subvolumes but
deployed on arbitrarily large scenes at test time. In addition, we propose a
coarse-to-fine inference strategy in order to produce high-resolution output
while also leveraging large input context sizes. In an extensive series of
experiments, we carefully evaluate different model design choices, considering
both deterministic and probabilistic models for completion and semantic
inference. Our results show that we outperform other methods not only in the
size of the environments handled and processing efficiency, but also with
regard to completion quality and semantic segmentation performance by a
significant margin.Comment: Video: https://youtu.be/5s5s8iH0NF
Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View
We present Im2Pano3D, a convolutional neural network that generates a dense
prediction of 3D structure and a probability distribution of semantic labels
for a full 360 panoramic view of an indoor scene when given only a partial
observation (<= 50%) in the form of an RGB-D image. To make this possible,
Im2Pano3D leverages strong contextual priors learned from large-scale synthetic
and real-world indoor scenes. To ease the prediction of 3D structure, we
propose to parameterize 3D surfaces with their plane equations and train the
model to predict these parameters directly. To provide meaningful training
supervision, we use multiple loss functions that consider both pixel level
accuracy and global context consistency. Experiments demon- strate that
Im2Pano3D is able to predict the semantics and 3D structure of the unobserved
scene with more than 56% pixel accuracy and less than 0.52m average distance
error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Learning Less is More - 6D Camera Localization via 3D Surface Regression
Popular research areas like autonomous driving and augmented reality have
renewed the interest in image-based camera localization. In this work, we
address the task of predicting the 6D camera pose from a single RGB image in a
given 3D environment. With the advent of neural networks, previous works have
either learned the entire camera localization process, or multiple components
of a camera localization pipeline. Our key contribution is to demonstrate and
explain that learning a single component of this pipeline is sufficient. This
component is a fully convolutional neural network for densely regressing
so-called scene coordinates, defining the correspondence between the input
image and the 3D scene space. The neural network is prepended to a new
end-to-end trainable pipeline. Our system is efficient, highly accurate, robust
in training, and exhibits outstanding generalization capabilities. It exceeds
state-of-the-art consistently on indoor and outdoor datasets. Interestingly,
our approach surpasses existing techniques even without utilizing a 3D model of
the scene during training, since the network is able to discover 3D scene
geometry automatically, solely from single-view constraints.Comment: CVPR 201
- …