47 research outputs found
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
ObjectMatch: Robust Registration using Canonical Object Correspondences
We present ObjectMatch, a semantic and object-centric camera pose estimator
for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct
correspondences of overlapping regions between frames; however, they cannot
align camera frames with little or no overlap. In this work, we propose to
leverage indirect correspondences obtained via semantic object identification.
For instance, when an object is seen from the front in one frame and from the
back in another frame, we can provide additional pose constraints through
canonical object correspondences. We first propose a neural network to predict
such correspondences on a per-pixel level, which we then combine in our energy
formulation with state-of-the-art keypoint matching solved with a joint
Gauss-Newton optimization. In a pairwise setting, our method improves
registration recall of state-of-the-art feature matching, including from 24% to
45% in pairs with 10% or less inter-frame overlap. In registering RGB-D
sequences, our method outperforms cutting-edge SLAM baselines in challenging,
low-frame-rate scenarios, achieving more than 35% reduction in trajectory error
in multiple scenes.Comment: Project Page: http://cangumeli.github.io/ObjectMatch Video:
https://www.youtube.com/watch?v=kuXoKVrzUR
DAC: Detector-Agnostic Spatial Covariances for Deep Local Features
Current deep visual local feature detectors do not model the spatial
uncertainty of detected features, producing suboptimal results in downstream
applications. In this work, we propose two post-hoc covariance estimates that
can be plugged into any pretrained deep feature detector: a simple, isotropic
covariance estimate that uses the predicted score at a given pixel location,
and a full covariance estimate via the local structure tensor of the learned
score maps. Both methods are easy to implement and can be applied to any deep
feature detector. We show that these covariances are directly related to errors
in feature matching, leading to improvements in downstream tasks, including
solving the perspective-n-point problem and motion-only bundle adjustment. Code
is available at https://github.com/javrtg/DA
DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data
Applying data-driven approaches to non-rigid 3D reconstruction has been difficult, which we believe can be attributed to the lack of a large-scale training corpus. One recent approach proposes self-supervision based on non-rigid reconstruction. Unfortunately, this method fails for important cases such as highly non-rigid deformations. We first address this problem of lack of data by introducing a novel semi-supervised strategy to obtain dense inter-frame correspondences from a sparse set of annotations. This way, we obtain a large dataset of 400 scenes, over 390,000 RGB-D frames, and 2,537 densely aligned frame pairs; in addition, we provide a test set along with several metrics for evaluation. Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline. Here, we propose a new neural network that operates on RGB-D frames, while maintaining robustness under large non-rigid deformations and producing accurate predictions. Our approach significantly outperforms both existing non-rigid reconstruction methods that do not use learned data terms, as well as learning-based approaches that only use self-supervision