3,432 research outputs found
Efficient Surface-Aware Semi-Global Matching with Multi-View Plane-Sweep Sampling
Online augmentation of an oblique aerial image sequence with structural information is an essential aspect in the process of 3D scene interpretation and analysis. One key aspect in this is the efficient dense image matching and depth estimation. Here, the Semi-Global Matching (SGM) approach has proven to be one of the most widely used algorithms for efficient depth estimation, providing a good trade-off between accuracy and computational complexity. However, SGM only models a first-order smoothness assumption, thus favoring fronto-parallel surfaces. In this work, we present a hierarchical algorithm that allows for efficient depth and normal map estimation together with confidence measures for each estimate. Our algorithm relies on a plane-sweep multi-image matching followed by an extended SGM optimization that allows to incorporate local surface orientations, thus achieving more consistent and accurate estimates in areasmade up of slanted surfaces, inherent to oblique aerial imagery. We evaluate numerous configurations of our algorithm on two different datasets using an absolute and relative accuracy measure. In our evaluation, we show that the results of our approach are comparable to the ones achieved by refined Structure-from-Motion (SfM) pipelines, such as COLMAP, which are designed for offline processing. In contrast, however, our approach only considers a confined image bundle of an input sequence, thus allowing to perform an online and incremental computation at 1Hz–2Hz
Real-Time Dense 3D Reconstruction from Monocular Video Data Captured by Low-Cost UAVS
Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency. In contrast to most real-time capable approaches, our method does not need an explicit depth sensor. Instead, we only rely on a video stream from a camera and its intrinsic calibration. By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content. To create a 3D model of the scene, we rely on a three-stage processing chain. First, we estimate the rough camera trajectory using a simultaneous localization and mapping (SLAM) algorithm. Once a suitable constellation is found, we estimate depth for local bundles of images using a Multi-View Stereo (MVS) approach and then fuse this depth into a global surfel-based model. For our evaluation, we use 55 video sequences with diverse settings, consisting of both synthetic and real scenes. We evaluate not only the generated reconstruction but also the intermediate products and achieve competitive results both qualitatively and quantitatively. At the same time, our method can keep up with a 30 fps video for a resolution of 768 × 448 pixels
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
DeepC-MVS: Deep Confidence Prediction for Multi-View Stereo Reconstruction
Deep Neural Networks (DNNs) have the potential to improve the quality of
image-based 3D reconstructions. However, the use of DNNs in the context of 3D
reconstruction from large and high-resolution image datasets is still an open
challenge, due to memory and computational constraints. We propose a pipeline
which takes advantage of DNNs to improve the quality of 3D reconstructions
while being able to handle large and high-resolution datasets. In particular,
we propose a confidence prediction network explicitly tailored for Multi-View
Stereo (MVS) and we use it for both depth map outlier filtering and depth map
refinement within our pipeline, in order to improve the quality of the final 3D
reconstructions. We train our confidence prediction network on (semi-)dense
ground truth depth maps from publicly available real world MVS datasets. With
extensive experiments on popular benchmarks, we show that our overall pipeline
can produce state-of-the-art 3D reconstructions, both qualitatively and
quantitatively.Comment: changes in V3: re-worked confidence prediction scheme, re-organized
text, updated experiments; changes in V2: a reference was update
Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery
Supervised learning based methods for monocular depth estimation usually
require large amounts of extensively annotated training data. In the case of
aerial imagery, this ground truth is particularly difficult to acquire.
Therefore, in this paper, we present a method for self-supervised learning for
monocular depth estimation from aerial imagery that does not require annotated
training data. For this, we only use an image sequence from a single moving
camera and learn to simultaneously estimate depth and pose information. By
sharing the weights between pose and depth estimation, we achieve a relatively
small model, which favors real-time application. We evaluate our approach on
three diverse datasets and compare the results to conventional methods that
estimate depth maps based on multi-view geometry. We achieve an accuracy
{\delta}1.25 of up to 93.5 %. In addition, we have paid particular attention to
the generalization of a trained model to unknown data and the self-improving
capabilities of our approach. We conclude that, even though the results of
monocular depth estimation are inferior to those achieved by conventional
methods, they are well suited to provide a good initialization for methods that
rely on image matching or to provide estimates in regions where image matching
fails, e.g. occluded or texture-less regions
Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery
Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy δ1:25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions
- …