150 research outputs found
Scalable Surface Reconstruction from Point Clouds with Extreme Scale and Density Diversity
In this paper we present a scalable approach for robustly computing a 3D
surface mesh from multi-scale multi-view stereo point clouds that can handle
extreme jumps of point density (in our experiments three orders of magnitude).
The backbone of our approach is a combination of octree data partitioning,
local Delaunay tetrahedralization and graph cut optimization. Graph cut
optimization is used twice, once to extract surface hypotheses from local
Delaunay tetrahedralizations and once to merge overlapping surface hypotheses
even when the local tetrahedralizations do not share the same topology.This
formulation allows us to obtain a constant memory consumption per sub-problem
while at the same time retaining the density independent interpolation
properties of the Delaunay-based optimization. On multiple public datasets, we
demonstrate that our approach is highly competitive with the state-of-the-art
in terms of accuracy, completeness and outlier resilience. Further, we
demonstrate the multi-scale potential of our approach by processing a newly
recorded dataset with 2 billion points and a point density variation of more
than four orders of magnitude - requiring less than 9GB of RAM per process.Comment: This paper was accepted to the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2017. The copyright was transfered to IEEE
(ieee.org). The official version of the paper will be made available on IEEE
Xplore (R) (ieeexplore.ieee.org). This version of the paper also contains the
supplementary material, which will not appear IEEE Xplore (R
Using Self-Contradiction to Learn Confidence Measures in Stereo Vision
Learned confidence measures gain increasing importance for outlier removal
and quality improvement in stereo vision. However, acquiring the necessary
training data is typically a tedious and time consuming task that involves
manual interaction, active sensing devices and/or synthetic scenes. To overcome
this problem, we propose a new, flexible, and scalable way for generating
training data that only requires a set of stereo images as input. The key idea
of our approach is to use different view points for reasoning about
contradictions and consistencies between multiple depth maps generated with the
same stereo algorithm. This enables us to generate a huge amount of training
data in a fully automated manner. Among other experiments, we demonstrate the
potential of our approach by boosting the performance of three learned
confidence measures on the KITTI2012 dataset by simply training them on a vast
amount of automatically generated training data rather than a limited amount of
laser ground truth data.Comment: This paper was accepted to the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2016. The copyright was transfered to IEEE
(https://www.ieee.org). The official version of the paper will be made
available on IEEE Xplore (R) (http://ieeexplore.ieee.org). This version of
the paper also contains the supplementary material, which will not appear
IEEE Xplore (R
Map-Repair: Deep Cadastre Maps Alignment and Temporal Inconsistencies Fix in Satellite Images
In the fast developing countries it is hard to trace new buildings
construction or old structures destruction and, as a result, to keep the
up-to-date cadastre maps. Moreover, due to the complexity of urban regions or
inconsistency of data used for cadastre maps extraction, the errors in form of
misalignment is a common problem. In this work, we propose an end-to-end deep
learning approach which is able to solve inconsistencies between the input
intensity image and the available building footprints by correcting label
noises and, at the same time, misalignments if needed. The obtained results
demonstrate the robustness of the proposed method to even severely misaligned
examples that makes it potentially suitable for real applications, like
OpenStreetMap correction
Prioritized Multi-View Stereo Depth Map Generation Using Confidence Prediction
In this work, we propose a novel approach to prioritize the depth map
computation of multi-view stereo (MVS) to obtain compact 3D point clouds of
high quality and completeness at low computational cost. Our prioritization
approach operates before the MVS algorithm is executed and consists of two
steps. In the first step, we aim to find a good set of matching partners for
each view. In the second step, we rank the resulting view clusters (i.e. key
views with matching partners) according to their impact on the fulfillment of
desired quality parameters such as completeness, ground resolution and
accuracy. Additional to geometric analysis, we use a novel machine learning
technique for training a confidence predictor. The purpose of this confidence
predictor is to estimate the chances of a successful depth reconstruction for
each pixel in each image for one specific MVS algorithm based on the RGB images
and the image constellation. The underlying machine learning technique does not
require any ground truth or manually labeled data for training, but instead
adapts ideas from depth map fusion for providing a supervision signal. The
trained confidence predictor allows us to evaluate the quality of image
constellations and their potential impact to the resulting 3D reconstruction
and thus builds a solid foundation for our prioritization approach. In our
experiments, we are thus able to reach more than 70% of the maximal reachable
quality fulfillment using only 5% of the available images as key views. For
evaluating our approach within and across different domains, we use two
completely different scenarios, i.e. cultural heritage preservation and
reconstruction of single family houses.Comment: This paper was accepted to ISPRS Journal of Photogrammetry and Remote
Sensing
(https://www.journals.elsevier.com/isprs-journal-of-photogrammetry-and-remote-sensing)
on March 21, 2018. The official version will be made available on
ScienceDirect (https://www.sciencedirect.com
Machine-learned Regularization and Polygonization of Building Segmentation Masks
We propose a machine learning based approach for automatic regularization and
polygonization of building segmentation masks. Taking an image as input, we
first predict building segmentation maps exploiting generic fully convolutional
network (FCN). A generative adversarial network (GAN) is then involved to
perform a regularization of building boundaries to make them more realistic,
i.e., having more rectilinear outlines which construct right angles if
required. This is achieved through the interplay between the discriminator
which gives a probability of input image being true and generator that learns
from discriminator's response to create more realistic images. Finally, we
train the backbone convolutional neural network (CNN) which is adapted to
predict sparse outcomes corresponding to building corners out of regularized
building segmentation results. Experiments on three building segmentation
datasets demonstrate that the proposed method is not only capable of obtaining
accurate results, but also of producing visually pleasing building outlines
parameterized as polygons
ATLAS-MVSNet: Attention Layers for Feature Extraction and Cost Volume Regularization in Multi-View Stereo
We present ATLAS-MVSNet, an end-to-end deep learning architecture relying on local attention layers for depth map inference from multi-view images. Distinct from existing works, we introduce a novel module design for neural networks, which we termed hybrid attention block, that utilizes the latest insights into attention in vision models. We are able to reap the benefits of attention in both, the carefully designed multi-stage feature extraction network and the cost volume regularization network. Our new approach displays significant improvement over its counterpart based purely on convolutions. While many state-of-the-art methods need multiple high-end GPUs in the training phase, we are able to train our network on a single consumer grade GPU. ATLAS-MVSNet exhibits excellent performance, especially in terms of accuracy, on the DTU dataset.
Furthermore, ATLAS-MVSNet ranks amongst the top published methods on the online Tanks and Temples benchmark
Grasping Point Prediction in Cluttered Environment using Automatically Labeled Data
We propose a method to automatically
generate high quality ground truth annotations for
grasping point prediction and show the usefulness of
these annotations by training a deep neural network
to predict grasping candidates for objects in a cluttered environment. First, we acquire sequences of
RGBD images of a real world picking scenario and
leverage the sequential depth information to extract
labels for grasping point prediction. Afterwards,
we train a deep neural network to predict grasping
points, establishing a fully automatic pipeline from
acquiring data to a trained network without the need
of human annotators. We show in our experiments
that our network trained with automatically generated labels delivers high quality results for predicting
grasping candidates, on par with a trained network
which uses human annotated data. This work lowers the cost/complexity of creating specific datasets
for grasping and makes it easy to expand the existing
dataset without additional effort
- …