1,183 research outputs found
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
3D indoor scene modeling from RGB-D data: a survey
3D scene modeling has long been a fundamental problem in computer graphics and computer vision. With the popularity of consumer-level RGB-D cameras, there is a growing interest in digitizing real-world indoor 3D scenes. However, modeling indoor 3D scenes remains a challenging problem because of the complex structure of interior objects and poor quality of RGB-D data acquired by consumer-level sensors. Various methods have been proposed to tackle these challenges. In this survey, we provide an overview of recent advances in indoor scene modeling techniques, as well as public datasets and code libraries which can facilitate experiments and evaluation
Plane-Based Optimization of Geometry and Texture for RGB-D Reconstruction of Indoor Scenes
We present a novel approach to reconstruct RGB-D indoor scene with plane
primitives. Our approach takes as input a RGB-D sequence and a dense coarse
mesh reconstructed by some 3D reconstruction method on the sequence, and
generate a lightweight, low-polygonal mesh with clear face textures and sharp
features without losing geometry details from the original scene. To achieve
this, we firstly partition the input mesh with plane primitives, simplify it
into a lightweight mesh next, then optimize plane parameters, camera poses and
texture colors to maximize the photometric consistency across frames, and
finally optimize mesh geometry to maximize consistency between geometry and
planes. Compared to existing planar reconstruction methods which only cover
large planar regions in the scene, our method builds the entire scene by
adaptive planes without losing geometry details and preserves sharp features in
the final mesh. We demonstrate the effectiveness of our approach by applying it
onto several RGB-D scans and comparing it to other state-of-the-art
reconstruction methods.Comment: in International Conference on 3D Vision 2018; Models and Code: see
https://github.com/chaowang15/plane-opt-rgbd. arXiv admin note: text overlap
with arXiv:1905.0885
ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
We introduce ScanComplete, a novel data-driven approach for taking an
incomplete 3D scan of a scene as input and predicting a complete 3D model along
with per-voxel semantic labels. The key contribution of our method is its
ability to handle large scenes with varying spatial extent, managing the cubic
growth in data size as scene size increases. To this end, we devise a
fully-convolutional generative 3D CNN model whose filter kernels are invariant
to the overall scene size. The model can be trained on scene subvolumes but
deployed on arbitrarily large scenes at test time. In addition, we propose a
coarse-to-fine inference strategy in order to produce high-resolution output
while also leveraging large input context sizes. In an extensive series of
experiments, we carefully evaluate different model design choices, considering
both deterministic and probabilistic models for completion and semantic
inference. Our results show that we outperform other methods not only in the
size of the environments handled and processing efficiency, but also with
regard to completion quality and semantic segmentation performance by a
significant margin.Comment: Video: https://youtu.be/5s5s8iH0NF
Evaluation of CNN-based Single-Image Depth Estimation Methods
While an increasing interest in deep models for single-image depth estimation
methods can be observed, established schemes for their evaluation are still
limited. We propose a set of novel quality criteria, allowing for a more
detailed analysis by focusing on specific characteristics of depth maps. In
particular, we address the preservation of edges and planar regions, depth
consistency, and absolute distance accuracy. In order to employ these metrics
to evaluate and compare state-of-the-art single-image depth estimation
approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera
together with a laser scanner to acquire high-resolution images and highly
accurate depth maps. Experimental results show the validity of our proposed
evaluation protocol
Fast Monte-Carlo Localization on Aerial Vehicles using Approximate Continuous Belief Representations
Size, weight, and power constrained platforms impose constraints on
computational resources that introduce unique challenges in implementing
localization algorithms. We present a framework to perform fast localization on
such platforms enabled by the compressive capabilities of Gaussian Mixture
Model representations of point cloud data. Given raw structural data from a
depth sensor and pitch and roll estimates from an on-board attitude reference
system, a multi-hypothesis particle filter localizes the vehicle by exploiting
the likelihood of the data originating from the mixture model. We demonstrate
analysis of this likelihood in the vicinity of the ground truth pose and detail
its utilization in a particle filter-based vehicle localization strategy, and
later present results of real-time implementations on a desktop system and an
off-the-shelf embedded platform that outperform localization results from
running a state-of-the-art algorithm on the same environment
Distinctive 3D local deep descriptors
We present a simple but yet effective method for learning distinctive 3D
local deep descriptors (DIPs) that can be used to register point clouds without
requiring an initial alignment. Point cloud patches are extracted,
canonicalised with respect to their estimated local reference frame and encoded
into rotation-invariant compact descriptors by a PointNet-based deep neural
network. DIPs can effectively generalise across different sensor modalities
because they are learnt end-to-end from locally and randomly sampled points.
Because DIPs encode only local geometric information, they are robust to
clutter, occlusions and missing regions. We evaluate and compare DIPs against
alternative hand-crafted and deep descriptors on several indoor and outdoor
datasets consisting of point clouds reconstructed using different sensors.
Results show that DIPs (i) achieve comparable results to the state-of-the-art
on RGB-D indoor scenes (3DMatch dataset), (ii) outperform state-of-the-art by a
large margin on laser-scanner outdoor scenes (ETH dataset), and (iii)
generalise to indoor scenes reconstructed with the Visual-SLAM system of
Android ARCore. Source code: https://github.com/fabiopoiesi/dip.Comment: IEEE International Conference on Pattern Recognition 202
Autonomous 3D mapping and surveillance of mines with MAVs
A dissertation Submitted to the Faculty of Science, University of the
Witwatersrand, Johannesburg, for the degree of Master of Science.
12 July 2017.The mapping of mines, both operational and abandoned, is a long, di cult and occasionally
dangerous task especially in the latter case. Recent developments in active and passive consumer
grade sensors, as well as quadcopter drones present the opportunity to automate these
challenging tasks providing cost and safety bene ts. The goal of this research is to develop an
autonomous vision-based mapping system that employs quadrotor drones to explore and map
sections of mine tunnels. The system is equipped with inexpensive, structured light, depth cameras
in place of traditional laser scanners, making the quadrotor setup more viable to produce in
bulk. A modi ed version of Microsoft's Kinect Fusion algorithm is used to construct 3D point
clouds in real-time as the agents traverse the scene. Finally, the generated and merged point
clouds from the system are compared with those produced by current Lidar scanners.LG201
- …