5 research outputs found

    Fine-To-Coarse Global Registration of RGB-D Scans

    Full text link
    RGB-D scanning of indoor environments is important for many applications, including real estate, interior design, and virtual reality. However, it is still challenging to register RGB-D images from a hand-held camera over a long video sequence into a globally consistent 3D model. Current methods often can lose tracking or drift and thus fail to reconstruct salient structures in large environments (e.g., parallel walls in different rooms). To address this problem, we propose a "fine-to-coarse" global registration algorithm that leverages robust registrations at finer scales to seed detection and enforcement of new correspondence and structural constraints at coarser scales. To test global registration algorithms, we provide a benchmark with 10,401 manually-clicked point correspondences in 25 scenes from the SUN3D dataset. During experiments with this benchmark, we find that our fine-to-coarse algorithm registers long RGB-D sequences better than previous methods

    Rescan: Inductive Instance Segmentation for Indoor RGBD Scans

    Full text link
    In depth-sensing applications ranging from home robotics to AR/VR, it will be common to acquire 3D scans of interior spaces repeatedly at sparse time intervals (e.g., as part of regular daily use). We propose an algorithm that analyzes these "rescans" to infer a temporal model of a scene with semantic instance information. Our algorithm operates inductively by using the temporal model resulting from past observations to infer an instance segmentation of a new scan, which is then used to update the temporal model. The model contains object instance associations across time and thus can be used to track individual objects, even though there are only sparse observations. During experiments with a new benchmark for the new task, our algorithm outperforms alternate approaches based on state-of-the-art networks for semantic instance segmentation.Comment: IEEE International Conference on Computer Vision 201

    Matterport3D: Learning from RGB-D Data in Indoor Environments

    Full text link
    Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms. However, existing datasets still cover only a limited number of views or a restricted scale of spaces. In this paper, we introduce Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D and 3D semantic segmentations. The precise global alignment and comprehensive, diverse panoramic set of views over entire buildings enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification

    RGBD Pipeline for Indoor Scene Reconstruction and Understanding

    No full text
    In this work, we consider the problem of reconstructing a 3D model from a sequence of color and depth frames. Generating such a model has many important applications, ranging from the entertainment industry to real estate. However, transforming the RGBD frames into high-quality 3D models is a challenging problem, especially if additional semantic information is required. In this document, we introduce three projects, which implement various stages of a robust RGBD processing pipeline. First, we consider the challenges arising during the RGBD data capture process. While the depth cameras are providing dense, per-pixel depth measurements, there is a non-trivial error associated with the resulting data. We discuss the depth generation problem and propose an error reduction technique based on estimating an image-space undistortion field. We describe the capture process of the data required for the generation of such an undistortion field. We showcase how correcting the depth measurements improves the reconstruction quality. Second, we address the problem of registering RGBD frames over a long video sequence into a globally consistent 3D model. We propose a ``fine-to-coarse'' global registration algorithm that leverages robust registrations at finer scales to seed detection and enforcement of geometrical constraints, modeled as planar structures, at coarser scales. To test global registration algorithms, we provide a benchmark with 10,401 manually-clicked point correspondences in 25 scenes from the SUN3D dataset. We find that our fine-to-coarse algorithm registers long RGBD sequences better than previous methods. Last, we show how repeated scans of the same space can be used to establish associations between the different observations. Specifically, we consider a situation where 3D scans are acquired repeatedly at sparse time intervals. We develop an algorithm that analyzes these “rescans” and builds a temporal model of a scene with semantic instance information. The proposed algorithm operates inductively by using a temporal model resulting from past observations to infer instance segmentation of a new scan. The temporal model is continuously updated to reflect the changes that occur in the scene over time, providing object associations across time. The algorithm outperforms alternate approaches based on state-of-the-art networks for semantic instance segmentation

    ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

    No full text
    A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available -- current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. The dataset is freely available at http://www.scan-net.org
    corecore