7 research outputs found
Efficient Online Surface Correction for Real-time Large-Scale 3D Reconstruction
State-of-the-art methods for large-scale 3D reconstruction from RGB-D sensors
usually reduce drift in camera tracking by globally optimizing the estimated
camera poses in real-time without simultaneously updating the reconstructed
surface on pose changes. We propose an efficient on-the-fly surface correction
method for globally consistent dense 3D reconstruction of large-scale scenes.
Our approach uses a dense Visual RGB-D SLAM system that estimates the camera
motion in real-time on a CPU and refines it in a global pose graph
optimization. Consecutive RGB-D frames are locally fused into keyframes, which
are incorporated into a sparse voxel hashed Signed Distance Field (SDF) on the
GPU. On pose graph updates, the SDF volume is corrected on-the-fly using a
novel keyframe re-integration strategy with reduced GPU-host streaming. We
demonstrate in an extensive quantitative evaluation that our method is up to
93% more runtime efficient compared to the state-of-the-art and requires
significantly less memory, with only negligible loss of surface quality.
Overall, our system requires only a single GPU and allows for real-time surface
correction of large environments.Comment: British Machine Vision Conference (BMVC), London, September 201
Efficient Online Surface Correction for Real-time Large-Scale 3D Reconstruction
State-of-the-art methods for large-scale 3D reconstruction from RGB-D sensors
usually reduce drift in camera tracking by globally optimizing the estimated
camera poses in real-time without simultaneously updating the reconstructed
surface on pose changes. We propose an efficient on-the-fly surface correction
method for globally consistent dense 3D reconstruction of large-scale scenes.
Our approach uses a dense Visual RGB-D SLAM system that estimates the camera
motion in real-time on a CPU and refines it in a global pose graph
optimization. Consecutive RGB-D frames are locally fused into keyframes, which
are incorporated into a sparse voxel hashed Signed Distance Field (SDF) on the
GPU. On pose graph updates, the SDF volume is corrected on-the-fly using a
novel keyframe re-integration strategy with reduced GPU-host streaming. We
demonstrate in an extensive quantitative evaluation that our method is up to
93% more runtime efficient compared to the state-of-the-art and requires
significantly less memory, with only negligible loss of surface quality.
Overall, our system requires only a single GPU and allows for real-time surface
correction of large environments.Comment: British Machine Vision Conference (BMVC), London, September 201
C-blox: A Scalable and Consistent TSDF-based Dense Mapping Approach
In many applications, maintaining a consistent dense map of the environment
is key to enabling robotic platforms to perform higher level decision making.
Several works have addressed the challenge of creating precise dense 3D maps
from visual sensors providing depth information. However, during operation over
longer missions, reconstructions can easily become inconsistent due to
accumulated camera tracking error and delayed loop closure. Without explicitly
addressing the problem of map consistency, recovery from such distortions tends
to be difficult. We present a novel system for dense 3D mapping which addresses
the challenge of building consistent maps while dealing with scalability.
Central to our approach is the representation of the environment as a
collection of overlapping TSDF subvolumes. These subvolumes are localized
through feature-based camera tracking and bundle adjustment. Our main
contribution is a pipeline for identifying stable regions in the map, and to
fuse the contributing subvolumes. This approach allows us to reduce map growth
while still maintaining consistency. We demonstrate the proposed system on a
publicly available dataset and simulation engine, and demonstrate the efficacy
of the proposed approach for building consistent and scalable maps. Finally we
demonstrate our approach running in real-time on-board a lightweight MAV.Comment: 8 pages, 5 figures, conferenc
Real-Time RGB-D Camera Pose Estimation in Novel Scenes using a Relocalisation Cascade
Camera pose estimation is an important problem in computer vision. Common
techniques either match the current image against keyframes with known poses,
directly regress the pose, or establish correspondences between keypoints in
the image and points in the scene to estimate the pose. In recent years,
regression forests have become a popular alternative to establish such
correspondences. They achieve accurate results, but have traditionally needed
to be trained offline on the target scene, preventing relocalisation in new
environments. Recently, we showed how to circumvent this limitation by adapting
a pre-trained forest to a new scene on the fly. The adapted forests achieved
relocalisation performance that was on par with that of offline forests, and
our approach was able to estimate the camera pose in close to real time. In
this paper, we present an extension of this work that achieves significantly
better relocalisation performance whilst running fully in real time. To achieve
this, we make several changes to the original approach: (i) instead of
accepting the camera pose hypothesis without question, we make it possible to
score the final few hypotheses using a geometric approach and select the most
promising; (ii) we chain several instantiations of our relocaliser together in
a cascade, allowing us to try faster but less accurate relocalisation first,
only falling back to slower, more accurate relocalisation as necessary; and
(iii) we tune the parameters of our cascade to achieve effective overall
performance. These changes allow us to significantly improve upon the
performance our original state-of-the-art method was able to achieve on the
well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional
contributions, we present a way of visualising the internal behaviour of our
forests and show how to entirely circumvent the need to pre-train a forest on a
generic scene.Comment: Tommaso Cavallari, Stuart Golodetz, Nicholas Lord and Julien Valentin
assert joint first authorshi
Large-scale and drift-free surface reconstruction using online subvolume registration
Depth cameras have helped commoditize 3D digitization of the real-world. It is now feasible to use a single Kinect-like camera to scan in an entire building or other large-scale scenes. At large scales, however, there is an inherent challenge of dealing with distortions and drift due to accumulated pose estimation errors. Existing techniques suffer from one or more of the following: a) requiring an expensive offline global optimization step taking hours to compute; b) needing a full second pass over the input depth frames to correct for accumulated errors; c) relying on RGB data alongside depth data to optimize poses; or d) requiring the user to create explicit loop closures to allow gross alignment errors to be resolved. In this paper, we present a method that addresses all of these issues. Our method supports online model correction, without needing to reprocess or store any input depth data. Even while performing global correction of a large 3D model, our method takes only minutes rather than hours to compute. Our model does not require any explicit loop closures to be detected and, finally, relies on depth data alone, allowing operation in low-lighting conditions. We show qualitative results on many large scale scenes, highlighting the lack of error and drift in our reconstructions. We compare to state of the art techniques and demonstrate large-scale dense surface reconstruction 'in the dark', a capability not offered by RGB-D techniques
Large-Scale Textured 3D Scene Reconstruction
Die Erstellung dreidimensionaler Umgebungsmodelle ist eine fundamentale Aufgabe im Bereich des maschinellen Sehens. Rekonstruktionen sind für eine Reihe von Anwendungen von Nutzen, wie bei der Vermessung, dem Erhalt von Kulturgütern oder der Erstellung virtueller Welten in der Unterhaltungsindustrie. Im Bereich des automatischen Fahrens helfen sie bei der Bewältigung einer Vielzahl an Herausforderungen. Dazu gehören Lokalisierung, das Annotieren großer Datensätze oder die vollautomatische Erstellung von Simulationsszenarien.
Die Herausforderung bei der 3D Rekonstruktion ist die gemeinsame Schätzung von Sensorposen und einem Umgebunsmodell. Redundante und potenziell fehlerbehaftete Messungen verschiedener Sensoren müssen in eine gemeinsame Repräsentation der Welt integriert werden, um ein metrisch und photometrisch korrektes Modell zu erhalten. Gleichzeitig muss die Methode effizient Ressourcen nutzen, um Laufzeiten zu erreichen, welche die praktische Nutzung ermöglichen.
In dieser Arbeit stellen wir ein Verfahren zur Rekonstruktion vor, das fähig ist, photorealistische 3D Rekonstruktionen großer Areale zu erstellen, die sich über mehrere Kilometer erstrecken. Entfernungsmessungen aus Laserscannern und Stereokamerasystemen werden zusammen mit Hilfe eines volumetrischen Rekonstruktionsverfahrens fusioniert. Ringschlüsse werden erkannt und als zusätzliche Bedingungen eingebracht, um eine global konsistente Karte zu erhalten. Das resultierende Gitternetz wird aus Kamerabildern texturiert, wobei die einzelnen Beobachtungen mit ihrer Güte gewichtet werden. Für eine nahtlose Erscheinung werden die unbekannten Belichtungszeiten und Parameter des optischen Systems mitgeschätzt und die Bilder entsprechend korrigiert.
Wir evaluieren unsere Methode auf synthetischen Daten, realen Sensordaten unseres Versuchsfahrzeugs und öffentlich verfügbaren Datensätzen. Wir zeigen qualitative Ergebnisse großer innerstädtischer Bereiche, sowie quantitative Auswertungen der Fahrzeugtrajektorie und der Rekonstruktionsqualität.
Zuletzt präsentieren wir mehrere Anwendungen und zeigen somit den Nutzen unserer Methode für Anwendungen im Bereich des automatischen Fahrens
Scene Mapping and Understanding by Robotic Vision
The first mechanical Automaton concept was found in a Chinese
text written in the 3rd century BC, while Computer Vision was born
in the late 1960s. Therefore, visual perception applied to machines (i.e. the Machine Vision) is a young and exciting alliance.
When robots came in, the new field of Robotic Vision was born, and these terms began to be erroneously interchanged. In short, we can say that Machine Vision is an engineering domain, which concern the industrial use of Vision. The Robotic Vision, instead, is a research field that tries to incorporate robotics aspects in computer vision algorithms. Visual Servoing, for example, is one of the problems that cannot be solved by computer vision only. Accordingly, a large part of this work deals with boosting popular Computer Vision techniques by exploiting robotics: e.g. the use of kinematics to localize a vision sensor, mounted as the robot end-effector. The remainder of this work is dedicated to the counterparty, i.e. the use of computer vision to solve real robotic problems like grasping objects or navigate avoiding obstacles. Will be presented a brief survey about mapping data structures most widely used in robotics along with SkiMap, a novel sparse data structure created both for robotic mapping and as a general purpose 3D spatial index. Thus, several approaches to implement Object Detection and Manipulation, by exploiting the aforementioned mapping strategies, will be proposed, along with a completely new Machine Teaching facility in order to simply the training procedure of modern Deep Learning networks