44 research outputs found
Large-Scale Textured 3D Scene Reconstruction
Die Erstellung dreidimensionaler Umgebungsmodelle ist eine fundamentale Aufgabe im Bereich des maschinellen Sehens. Rekonstruktionen sind für eine Reihe von Anwendungen von Nutzen, wie bei der Vermessung, dem Erhalt von Kulturgütern oder der Erstellung virtueller Welten in der Unterhaltungsindustrie. Im Bereich des automatischen Fahrens helfen sie bei der Bewältigung einer Vielzahl an Herausforderungen. Dazu gehören Lokalisierung, das Annotieren großer Datensätze oder die vollautomatische Erstellung von Simulationsszenarien.
Die Herausforderung bei der 3D Rekonstruktion ist die gemeinsame Schätzung von Sensorposen und einem Umgebunsmodell. Redundante und potenziell fehlerbehaftete Messungen verschiedener Sensoren müssen in eine gemeinsame Repräsentation der Welt integriert werden, um ein metrisch und photometrisch korrektes Modell zu erhalten. Gleichzeitig muss die Methode effizient Ressourcen nutzen, um Laufzeiten zu erreichen, welche die praktische Nutzung ermöglichen.
In dieser Arbeit stellen wir ein Verfahren zur Rekonstruktion vor, das fähig ist, photorealistische 3D Rekonstruktionen großer Areale zu erstellen, die sich über mehrere Kilometer erstrecken. Entfernungsmessungen aus Laserscannern und Stereokamerasystemen werden zusammen mit Hilfe eines volumetrischen Rekonstruktionsverfahrens fusioniert. Ringschlüsse werden erkannt und als zusätzliche Bedingungen eingebracht, um eine global konsistente Karte zu erhalten. Das resultierende Gitternetz wird aus Kamerabildern texturiert, wobei die einzelnen Beobachtungen mit ihrer Güte gewichtet werden. Für eine nahtlose Erscheinung werden die unbekannten Belichtungszeiten und Parameter des optischen Systems mitgeschätzt und die Bilder entsprechend korrigiert.
Wir evaluieren unsere Methode auf synthetischen Daten, realen Sensordaten unseres Versuchsfahrzeugs und öffentlich verfügbaren Datensätzen. Wir zeigen qualitative Ergebnisse großer innerstädtischer Bereiche, sowie quantitative Auswertungen der Fahrzeugtrajektorie und der Rekonstruktionsqualität.
Zuletzt präsentieren wir mehrere Anwendungen und zeigen somit den Nutzen unserer Methode für Anwendungen im Bereich des automatischen Fahrens
Structureless Camera Motion Estimation of Unordered Omnidirectional Images
This work aims at providing a novel camera motion estimation pipeline from large collections of unordered omnidirectional images. In oder to keep the pipeline as general and flexible as possible, cameras are modelled as unit spheres, allowing to incorporate any central camera type. For each camera an unprojection lookup is generated from intrinsics, which is called P2S-map (Pixel-to-Sphere-map), mapping pixels to their corresponding positions on the unit sphere. Consequently the camera geometry becomes independent of the underlying projection model. The pipeline also generates P2S-maps from world map projections with less distortion effects as they are known from cartography. Using P2S-maps from camera calibration and world map projection allows to convert omnidirectional camera images to an appropriate world map projection in oder to apply standard feature extraction and matching algorithms for data association. The proposed estimation pipeline combines the flexibility of SfM (Structure from Motion) - which handles unordered image collections - with the efficiency of PGO (Pose Graph Optimization), which is used as back-end in graph-based Visual SLAM (Simultaneous Localization and Mapping) approaches to optimize camera poses from large image sequences. SfM uses BA (Bundle Adjustment) to jointly optimize camera poses (motion) and 3d feature locations (structure), which becomes computationally expensive for large-scale scenarios. On the contrary PGO solves for camera poses (motion) from measured transformations between cameras, maintaining optimization managable. The proposed estimation algorithm combines both worlds. It obtains up-to-scale transformations between image pairs using two-view constraints, which are jointly scaled using trifocal constraints. A pose graph is generated from scaled two-view transformations and solved by PGO to obtain camera motion efficiently even for large image collections. Obtained results can be used as input data to provide initial pose estimates for further 3d reconstruction purposes e.g. to build a sparse structure from feature correspondences in an SfM or SLAM framework with further refinement via BA.
The pipeline also incorporates fixed extrinsic constraints from multi-camera setups as well as depth information provided by RGBD sensors. The entire camera motion estimation pipeline does not need to generate a sparse 3d structure of the captured environment and thus is called SCME (Structureless Camera Motion Estimation).:1 Introduction
1.1 Motivation
1.1.1 Increasing Interest of Image-Based 3D Reconstruction
1.1.2 Underground Environments as Challenging Scenario
1.1.3 Improved Mobile Camera Systems for Full Omnidirectional Imaging
1.2 Issues
1.2.1 Directional versus Omnidirectional Image Acquisition
1.2.2 Structure from Motion versus Visual Simultaneous Localization and Mapping
1.3 Contribution
1.4 Structure of this Work
2 Related Work
2.1 Visual Simultaneous Localization and Mapping
2.1.1 Visual Odometry
2.1.2 Pose Graph Optimization
2.2 Structure from Motion
2.2.1 Bundle Adjustment
2.2.2 Structureless Bundle Adjustment
2.3 Corresponding Issues
2.4 Proposed Reconstruction Pipeline
3 Cameras and Pixel-to-Sphere Mappings with P2S-Maps
3.1 Types
3.2 Models
3.2.1 Unified Camera Model
3.2.2 Polynomal Camera Model
3.2.3 Spherical Camera Model
3.3 P2S-Maps - Mapping onto Unit Sphere via Lookup Table
3.3.1 Lookup Table as Color Image
3.3.2 Lookup Interpolation
3.3.3 Depth Data Conversion
4 Calibration
4.1 Overview of Proposed Calibration Pipeline
4.2 Target Detection
4.3 Intrinsic Calibration
4.3.1 Selected Examples
4.4 Extrinsic Calibration
4.4.1 3D-2D Pose Estimation
4.4.2 2D-2D Pose Estimation
4.4.3 Pose Optimization
4.4.4 Uncertainty Estimation
4.4.5 PoseGraph Representation
4.4.6 Bundle Adjustment
4.4.7 Selected Examples
5 Full Omnidirectional Image Projections
5.1 Panoramic Image Stitching
5.2 World Map Projections
5.3 World Map Projection Generator for P2S-Maps
5.4 Conversion between Projections based on P2S-Maps
5.4.1 Proposed Workflow
5.4.2 Data Storage Format
5.4.3 Real World Example
6 Relations between Two Camera Spheres
6.1 Forward and Backward Projection
6.2 Triangulation
6.2.1 Linear Least Squares Method
6.2.2 Alternative Midpoint Method
6.3 Epipolar Geometry
6.4 Transformation Recovery from Essential Matrix
6.4.1 Cheirality
6.4.2 Standard Procedure
6.4.3 Simplified Procedure
6.4.4 Improved Procedure
6.5 Two-View Estimation
6.5.1 Evaluation Strategy
6.5.2 Error Metric
6.5.3 Evaluation of Estimation Algorithms
6.5.4 Concluding Remarks
6.6 Two-View Optimization
6.6.1 Epipolar-Based Error Distances
6.6.2 Projection-Based Error Distances
6.6.3 Comparison between Error Distances
6.7 Two-View Translation Scaling
6.7.1 Linear Least Squares Estimation
6.7.2 Non-Linear Least Squares Optimization
6.7.3 Comparison between Initial and Optimized Scaling Factor
6.8 Homography to Identify Degeneracies
6.8.1 Homography for Spherical Cameras
6.8.2 Homography Estimation
6.8.3 Homography Optimization
6.8.4 Homography and Pure Rotation
6.8.5 Homography in Epipolar Geometry
7 Relations between Three Camera Spheres
7.1 Three View Geometry
7.2 Crossing Epipolar Planes Geometry
7.3 Trifocal Geometry
7.4 Relation between Trifocal, Three-View and Crossing Epipolar Planes
7.5 Translation Ratio between Up-To-Scale Two-View Transformations
7.5.1 Structureless Determination Approaches
7.5.2 Structure-Based Determination Approaches
7.5.3 Comparison between Proposed Approaches
8 Pose Graphs
8.1 Optimization Principle
8.2 Solvers
8.2.1 Additional Graph Solvers
8.2.2 False Loop Closure Detection
8.3 Pose Graph Generation
8.3.1 Generation of Synthetic Pose Graph Data
8.3.2 Optimization of Synthetic Pose Graph Data
9 Structureless Camera Motion Estimation
9.1 SCME Pipeline
9.2 Determination of Two-View Translation Scale Factors
9.3 Integration of Depth Data
9.4 Integration of Extrinsic Camera Constraints
10 Camera Motion Estimation Results
10.1 Directional Camera Images
10.2 Omnidirectional Camera Images
11 Conclusion
11.1 Summary
11.2 Outlook and Future Work
Appendices
A.1 Additional Extrinsic Calibration Results
A.2 Linear Least Squares Scaling
A.3 Proof Rank Deficiency
A.4 Alternative Derivation Midpoint Method
A.5 Simplification of Depth Calculation
A.6 Relation between Epipolar and Circumferential Constraint
A.7 Covariance Estimation
A.8 Uncertainty Estimation from Epipolar Geometry
A.9 Two-View Scaling Factor Estimation: Uncertainty Estimation
A.10 Two-View Scaling Factor Optimization: Uncertainty Estimation
A.11 Depth from Adjoining Two-View Geometries
A.12 Alternative Three-View Derivation
A.12.1 Second Derivation Approach
A.12.2 Third Derivation Approach
A.13 Relation between Trifocal Geometry and Alternative Midpoint Method
A.14 Additional Pose Graph Generation Examples
A.15 Pose Graph Solver Settings
A.16 Additional Pose Graph Optimization Examples
Bibliograph
Recommended from our members
Camera positioning for 3D panoramic image rendering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known
camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated
using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation
of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with
a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects.
To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential
of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality