767 research outputs found
Volume-based Semantic Labeling with Signed Distance Functions
Research works on the two topics of Semantic Segmentation and SLAM
(Simultaneous Localization and Mapping) have been following separate tracks.
Here, we link them quite tightly by delineating a category label fusion
technique that allows for embedding semantic information into the dense map
created by a volume-based SLAM algorithm such as KinectFusion. Accordingly, our
approach is the first to provide a semantically labeled dense reconstruction of
the environment from a stream of RGB-D images. We validate our proposal using a
publicly available semantically annotated RGB-D dataset and a) employing ground
truth labels, b) corrupting such annotations with synthetic noise, c) deploying
a state of the art semantic segmentation algorithm based on Convolutional
Neural Networks.Comment: Submitted to PSIVT201
An Octree-Based Approach towards Efficient Variational Range Data Fusion
Volume-based reconstruction is usually expensive both in terms of memory
consumption and runtime. Especially for sparse geometric structures, volumetric
representations produce a huge computational overhead. We present an efficient
way to fuse range data via a variational Octree-based minimization approach by
taking the actual range data geometry into account. We transform the data into
Octree-based truncated signed distance fields and show how the optimization can
be conducted on the newly created structures. The main challenge is to uphold
speed and a low memory footprint without sacrificing the solutions' accuracy
during optimization. We explain how to dynamically adjust the optimizer's
geometric structure via joining/splitting of Octree nodes and how to define the
operators. We evaluate on various datasets and outline the suitability in terms
of performance and geometric accuracy.Comment: BMVC 201
OctNetFusion: Learning Depth Fusion from Data
In this paper, we present a learning based approach to depth fusion, i.e.,
dense 3D reconstruction from multiple depth images. The most common approach to
depth fusion is based on averaging truncated signed distance functions, which
was originally proposed by Curless and Levoy in 1996. While this method is
simple and provides great results, it is not able to reconstruct (partially)
occluded surfaces and requires a large number frames to filter out sensor noise
and outliers. Motivated by the availability of large 3D model repositories and
recent advances in deep learning, we present a novel 3D CNN architecture that
learns to predict an implicit surface representation from the input depth maps.
Our learning based method significantly outperforms the traditional volumetric
fusion approach in terms of noise reduction and outlier suppression. By
learning the structure of real world 3D objects and scenes, our approach is
further able to reconstruct occluded regions and to fill in gaps in the
reconstruction. We demonstrate that our learning based approach outperforms
both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric
fusion. Further, we demonstrate state-of-the-art 3D shape completion results.Comment: 3DV 2017, https://github.com/griegler/octnetfusio
Real-time High Resolution Fusion of Depth Maps on GPU
A system for live high quality surface reconstruction using a single moving
depth camera on a commodity hardware is presented. High accuracy and real-time
frame rate is achieved by utilizing graphics hardware computing capabilities
via OpenCL and by using sparse data structure for volumetric surface
representation. Depth sensor pose is estimated by combining serial texture
registration algorithm with iterative closest points algorithm (ICP) aligning
obtained depth map to the estimated scene model. Aligned surface is then fused
into the scene. Kalman filter is used to improve fusion quality. Truncated
signed distance function (TSDF) stored as block-based sparse buffer is used to
represent surface. Use of sparse data structure greatly increases accuracy of
scanned surfaces and maximum scanning area. Traditional GPU implementation of
volumetric rendering and fusion algorithms were modified to exploit sparsity to
achieve desired performance. Incorporation of texture registration for sensor
pose estimation and Kalman filter for measurement integration improved accuracy
and robustness of scanning process
ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
We introduce ScanComplete, a novel data-driven approach for taking an
incomplete 3D scan of a scene as input and predicting a complete 3D model along
with per-voxel semantic labels. The key contribution of our method is its
ability to handle large scenes with varying spatial extent, managing the cubic
growth in data size as scene size increases. To this end, we devise a
fully-convolutional generative 3D CNN model whose filter kernels are invariant
to the overall scene size. The model can be trained on scene subvolumes but
deployed on arbitrarily large scenes at test time. In addition, we propose a
coarse-to-fine inference strategy in order to produce high-resolution output
while also leveraging large input context sizes. In an extensive series of
experiments, we carefully evaluate different model design choices, considering
both deterministic and probabilistic models for completion and semantic
inference. Our results show that we outperform other methods not only in the
size of the environments handled and processing efficiency, but also with
regard to completion quality and semantic segmentation performance by a
significant margin.Comment: Video: https://youtu.be/5s5s8iH0NF
C-blox: A Scalable and Consistent TSDF-based Dense Mapping Approach
In many applications, maintaining a consistent dense map of the environment
is key to enabling robotic platforms to perform higher level decision making.
Several works have addressed the challenge of creating precise dense 3D maps
from visual sensors providing depth information. However, during operation over
longer missions, reconstructions can easily become inconsistent due to
accumulated camera tracking error and delayed loop closure. Without explicitly
addressing the problem of map consistency, recovery from such distortions tends
to be difficult. We present a novel system for dense 3D mapping which addresses
the challenge of building consistent maps while dealing with scalability.
Central to our approach is the representation of the environment as a
collection of overlapping TSDF subvolumes. These subvolumes are localized
through feature-based camera tracking and bundle adjustment. Our main
contribution is a pipeline for identifying stable regions in the map, and to
fuse the contributing subvolumes. This approach allows us to reduce map growth
while still maintaining consistency. We demonstrate the proposed system on a
publicly available dataset and simulation engine, and demonstrate the efficacy
of the proposed approach for building consistent and scalable maps. Finally we
demonstrate our approach running in real-time on-board a lightweight MAV.Comment: 8 pages, 5 figures, conferenc
InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure
Volumetric models have become a popular representation for 3D scenes in
recent years. One breakthrough leading to their popularity was KinectFusion,
which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM
has since also been tackled with very similar approaches. Representing the
reconstruction volumetrically as a TSDF leads to most of the simplicity and
efficiency that can be achieved with GPU implementations of these systems.
However, this representation is memory-intensive and limits applicability to
small-scale reconstructions. Several avenues have been explored to overcome
this. With the aim of summarizing them and providing for a fast, flexible 3D
reconstruction pipeline, we propose a new, unifying framework called InfiniTAM.
The idea is that steps like camera tracking, scene representation and
integration of new data can easily be replaced and adapted to the user's needs.
This report describes the technical implementation details of InfiniTAM v3,
the third version of our InfiniTAM system. We have added various new features,
as well as making numerous enhancements to the low-level code that
significantly improve our camera tracking performance. The new features that we
expect to be of most interest are (i) a robust camera tracking module; (ii) an
implementation of Glocker et al.'s keyframe-based random ferns camera
relocaliser; (iii) a novel approach to globally-consistent TSDF-based
reconstruction, based on dividing the scene into rigid submaps and optimising
the relative poses between them; and (iv) an implementation of Keller et al.'s
surfel-based reconstruction approach.Comment: This article largely supersedes arxiv:1410.0925 (it describes version
3 of the InfiniTAM framework
- …