37 research outputs found

    DFM: A Performance Baseline for Deep Feature Matching

    Full text link
    A novel image matching method is proposed that utilizes learned features extracted by an off-the-shelf deep neural network to obtain a promising performance. The proposed method uses pre-trained VGG architecture as a feature extractor and does not require any additional training specific to improve matching. Inspired by well-established concepts in the psychology area, such as the Mental Rotation paradigm, an initial warping is performed as a result of a preliminary geometric transformation estimate. These estimates are simply based on dense matching of nearest neighbors at the terminal layer of VGG network outputs of the images to be matched. After this initial alignment, the same approach is repeated again between reference and aligned images in a hierarchical manner to reach a good localization and matching performance. Our algorithm achieves 0.57 and 0.80 overall scores in terms of Mean Matching Accuracy (MMA) for 1 pixel and 2 pixels thresholds respectively on Hpatches dataset, which indicates a better performance than the state-of-the-art.Comment: CVPR 2021 Image Matching Workshop Camera Ready Versio

    SCENES: Subpixel Correspondence Estimation With Epipolar Supervision

    Full text link
    Extracting point correspondences from two or more views of a scene is a fundamental computer vision problem with particular importance for relative camera pose estimation and structure-from-motion. Existing local feature matching approaches, trained with correspondence supervision on large-scale datasets, obtain highly-accurate matches on the test sets. However, they do not generalise well to new datasets with different characteristics to those they were trained on, unlike classic feature extractors. Instead, they require finetuning, which assumes that ground-truth correspondences or ground-truth camera poses and 3D structure are available. We relax this assumption by removing the requirement of 3D structure, e.g., depth maps or point clouds, and only require camera pose information, which can be obtained from odometry. We do so by replacing correspondence losses with epipolar losses, which encourage putative matches to lie on the associated epipolar line. While weaker than correspondence supervision, we observe that this cue is sufficient for finetuning existing models on new data. We then further relax the assumption of known camera poses by using pose estimates in a novel bootstrapping approach. We evaluate on highly challenging datasets, including an indoor drone dataset and an outdoor smartphone camera dataset, and obtain state-of-the-art results without strong supervision

    A window to the past through modern urban environments: Developing a photogrammetric workflow for the orientation parameter estimation of historical images

    Get PDF
    The ongoing process of digitization in archives is providing access to ever-increasing historical image collections. In many of these repositories, images can typically be viewed in a list or gallery view. Due to the growing number of digitized objects, this type of visualization is becoming increasingly complex. Among other things, it is difficult to determine how many photographs show a particular object and spatial information can only be communicated via metadata. Within the scope of this thesis, research is conducted on the automated determination and provision of this spatial data. Enhanced visualization options make this information more eas- ily accessible to scientists as well as citizens. Different types of visualizations can be presented in three-dimensional (3D), Virtual Reality (VR) or Augmented Reality (AR) applications. However, applications of this type require the estimation of the photographer’s point of view. In the photogrammetric context, this is referred to as estimating the interior and exterior orientation parameters of the camera. For determination of orientation parameters for single images, there are the established methods of Direct Linear Transformation (DLT) or photogrammetric space resection. Using these methods requires the assignment of measured object points to their homologue image points. This is feasible for single images, but quickly becomes impractical due to the large amount of images available in archives. Thus, for larger image collections, usually the Structure-from-Motion (SfM) method is chosen, which allows the simultaneous estimation of the interior as well as the exterior orientation of the cameras. While this method yields good results especially for sequential, contemporary image data, its application to unsorted historical photographs poses a major challenge. In the context of this work, which is mainly limited to scenarios of urban terrestrial photographs, the reasons for failure of the SfM process are identified. In contrast to sequential image collections, pairs of images from different points in time or from varying viewpoints show huge differences in terms of scene representation such as deviations in the lighting situation, building state, or seasonal changes. Since homologue image points have to be found automatically in image pairs or image sequences in the feature matching procedure of SfM, these image differences pose the most complex problem. In order to test different feature matching methods, it is necessary to use a pre-oriented historical dataset. Since such a benchmark dataset did not exist yet, eight historical image triples (corresponding to 24 image pairs) are oriented in this work by manual selection of homologue image points. This dataset allows the evaluation of frequently new published methods in feature matching. The initial methods used, which are based on algorithmic procedures for feature matching (e.g., Scale Invariant Feature Transform (SIFT)), provide satisfactory results for only few of the image pairs in this dataset. By introducing methods that use neural networks for feature detection and feature description, homologue features can be reliably found for a large fraction of image pairs in the benchmark dataset. In addition to a successful feature matching strategy, determining camera orientation requires an initial estimate of the principal distance. Hence for historical images, the principal distance cannot be directly determined as the camera information is usually lost during the process of digitizing the analog original. A possible solution to this problem is to use three vanishing points that are automatically detected in the historical image and from which the principal distance can then be determined. The combination of principal distance estimation and robust feature matching is integrated into the SfM process and allows the determination of the interior and exterior camera orientation parameters of historical images. Based on these results, a workflow is designed that allows archives to be directly connected to 3D applications. A search query in archives is usually performed using keywords, which have to be assigned to the corresponding object as metadata. Therefore, a keyword search for a specific building also results in hits on drawings, paintings, events, interior or detailed views directly connected to this building. However, for the successful application of SfM in an urban context, primarily the photographic exterior view of the building is of interest. While the images for a single building can be sorted by hand, this process is too time-consuming for multiple buildings. Therefore, in collaboration with the Competence Center for Scalable Data Services and Solutions (ScaDS), an approach is developed to filter historical photographs by image similarities. This method reliably enables the search for content-similar views via the selection of one or more query images. By linking this content-based image retrieval with the SfM approach, automatic determination of camera parameters for a large number of historical photographs is possible. The developed method represents a significant improvement over commercial and open-source SfM standard solutions. The result of this work is a complete workflow from archive to application that automatically filters images and calculates the camera parameters. The expected accuracy of a few meters for the camera position is sufficient for the presented applications in this work, but offer further potential for improvement. A connection to archives, which will automatically exchange photographs and positions via interfaces, is currently under development. This makes it possible to retrieve interior and exterior orientation parameters directly from historical photography as metadata which opens up new fields of research.:1 Introduction 1 1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4 1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5 1.3.3 Feature detection, feature description and feature matching . . . . . . 6 1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13 1.3.3.5 Joint methods . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.4 Initial parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.5 Bundle adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.6 Dense reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.7 Georeferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.4 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Generation of a benchmark dataset using historical photographs for the evaluation of feature matching methods 29 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.1 Image differences based on digitization and image medium . . . . . . . 30 2.1.2 Image differences based on different cameras and acquisition technique 31 2.1.3 Object differences based on different dates of acquisition . . . . . . . . 31 2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 The image dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Comparison of different feature detection and description methods . . . . . . 35 2.4.1 Oriented FAST and Rotated BRIEF (ORB) . . . . . . . . . . . . . . . 36 2.4.2 Maximally Stable Extremal Region Detector (MSER) . . . . . . . . . 36 2.4.3 Radiation-invariant Feature Transform (RIFT) . . . . . . . . . . . . . 36 2.4.4 Feature matching and outlier removal . . . . . . . . . . . . . . . . . . 36 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3 Photogrammetry as a link between image repository and 4D applications 45 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 IX Contents 3.2 Multimodal access on repositories . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.1 Conventional access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Virtual access using online collections . . . . . . . . . . . . . . . . . . 48 3.2.3 Virtual museums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Workflow and access strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.3 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.4 Browser access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.5 VR and AR access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4 An adapted Structure-from-Motion Workflow for the orientation of historical images 69 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.1 Historical images for 3D reconstruction . . . . . . . . . . . . . . . . . 72 4.2.2 Algorithmic Feature Detection and Matching . . . . . . . . . . . . . . 73 4.2.3 Feature Detection and Matching using Convolutional Neural Networks 74 4.3 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.1 Step 1: Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.2 Step 2.1: Feature Detection and Matching . . . . . . . . . . . . . . . . 78 4.4.3 Step 2.2: Vanishing Point Detection and Principal Distance Estimation 80 4.4.4 Step 3: Scene Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 80 4.4.5 Comparison with Three Other State-of-the-Art SfM Workflows . . . . 81 4.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5 Fully automated pose estimation of historical images 97 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.1 Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.2 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . 101 5.3 Data Preparation: Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.1 Experiment and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.2.1 Layer Extraction Approach (LEA) . . . . . . . . . . . . . . . 104 5.3.2.2 Attentive Deep Local Features (DELF) Approach . . . . . . 105 5.3.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4 Camera Pose Estimation of Historical Images Using Photogrammetric Methods 110 5.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.1.1 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.1.2 Retrieval Datasets . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4.2.1 Feature Detection and Matching . . . . . . . . . . . . . . . . 115 5.4.2.2 Geometric Verification and Camera Pose Estimation . . . . . 116 5.4.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6 Related publications 129 6.1 Photogrammetric analysis of historical image repositores for virtual reconstruction in the field of digital humanities . . . . . . . . . . . . . . . . . . . . . . . 130 6.2 Feature matching of historical images based on geometry of quadrilaterals . . 131 6.3 Geo-information technologies for a multimodal access on historical photographs and maps for research and communication in urban history . . . . . . . . . . 132 6.4 An automated pipeline for a browser-based, city-scale mobile 4D VR application based on historical images . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.5 Software and content design of a browser-based mobile 4D VR application to explore historical city architecture . . . . . . . . . . . . . . . . . . . . . . . . 134 7 Synthesis 135 7.1 Summary of the developed workflows . . . . . . . . . . . . . . . . . . . . . . . 135 7.1.1 Error assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1.2 Accuracy estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.1.3 Transfer of the workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Developments and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8 Appendix 149 8.1 Setup for the feature matching evaluation . . . . . . . . . . . . . . . . . . . . 149 8.2 Transformation from COLMAP coordinate system to OpenGL . . . . . . . . 150 References 151 List of Figures 165 List of Tables 167 List of Abbreviations 169Der andauernde Prozess der Digitalisierung in Archiven ermöglicht den Zugriff auf immer größer werdende historische Bildbestände. In vielen Repositorien können die Bilder typischerweise in einer Listen- oder Gallerieansicht betrachtet werden. Aufgrund der steigenden Zahl an digitalisierten Objekten wird diese Art der Visualisierung zunehmend unübersichtlicher. Es kann u.a. nur noch schwierig bestimmt werden, wie viele Fotografien ein bestimmtes Motiv zeigen. Des Weiteren können räumliche Informationen bisher nur über Metadaten vermittelt werden. Im Rahmen der Arbeit wird an der automatisierten Ermittlung und Bereitstellung dieser räumlichen Daten geforscht. Erweiterte Visualisierungsmöglichkeiten machen diese Informationen Wissenschaftlern sowie Bürgern einfacher zugänglich. Diese Visualisierungen können u.a. in drei-dimensionalen (3D), Virtual Reality (VR) oder Augmented Reality (AR) Anwendungen präsentiert werden. Allerdings erfordern Anwendungen dieser Art die Schätzung des Standpunktes des Fotografen. Im photogrammetrischen Kontext spricht man dabei von der Schätzung der inneren und äußeren Orientierungsparameter der Kamera. Zur Bestimmung der Orientierungsparameter für Einzelbilder existieren die etablierten Verfahren der direkten linearen Transformation oder des photogrammetrischen Rückwärtsschnittes. Dazu muss eine Zuordnung von gemessenen Objektpunkten zu ihren homologen Bildpunkten erfolgen. Das ist für einzelne Bilder realisierbar, wird aber aufgrund der großen Menge an Bildern in Archiven schnell nicht mehr praktikabel. Für größere Bildverbände wird im photogrammetrischen Kontext somit üblicherweise das Verfahren Structure-from-Motion (SfM) gewählt, das die simultane Schätzung der inneren sowie der äußeren Orientierung der Kameras ermöglicht. Während diese Methode vor allem für sequenzielle, gegenwärtige Bildverbände gute Ergebnisse liefert, stellt die Anwendung auf unsortierten historischen Fotografien eine große Herausforderung dar. Im Rahmen der Arbeit, die sich größtenteils auf Szenarien stadträumlicher terrestrischer Fotografien beschränkt, werden zuerst die Gründe für das Scheitern des SfM Prozesses identifiziert. Im Gegensatz zu sequenziellen Bildverbänden zeigen Bildpaare aus unterschiedlichen zeitlichen Epochen oder von unterschiedlichen Standpunkten enorme Differenzen hinsichtlich der Szenendarstellung. Dies können u.a. Unterschiede in der Beleuchtungssituation, des Aufnahmezeitpunktes oder Schäden am originalen analogen Medium sein. Da für die Merkmalszuordnung in SfM automatisiert homologe Bildpunkte in Bildpaaren bzw. Bildsequenzen gefunden werden müssen, stellen diese Bilddifferenzen die größte Schwierigkeit dar. Um verschiedene Verfahren der Merkmalszuordnung testen zu können, ist es notwendig einen vororientierten historischen Datensatz zu verwenden. Da solch ein Benchmark-Datensatz noch nicht existierte, werden im Rahmen der Arbeit durch manuelle Selektion homologer Bildpunkte acht historische Bildtripel (entspricht 24 Bildpaaren) orientiert, die anschließend genutzt werden, um neu publizierte Verfahren bei der Merkmalszuordnung zu evaluieren. Die ersten verwendeten Methoden, die algorithmische Verfahren zur Merkmalszuordnung nutzen (z.B. Scale Invariant Feature Transform (SIFT)), liefern nur für wenige Bildpaare des Datensatzes zufriedenstellende Ergebnisse. Erst durch die Verwendung von Verfahren, die neuronale Netze zur Merkmalsdetektion und Merkmalsbeschreibung einsetzen, können für einen großen Teil der historischen Bilder des Benchmark-Datensatzes zuverlässig homologe Bildpunkte gefunden werden. Die Bestimmung der Kameraorientierung erfordert zusätzlich zur Merkmalszuordnung eine initiale Schätzung der Kamerakonstante, die jedoch im Zuge der Digitalisierung des analogen Bildes nicht mehr direkt zu ermitteln ist. Eine mögliche Lösung dieses Problems ist die Verwendung von drei Fluchtpunkten, die automatisiert im historischen Bild detektiert werden und aus denen dann die Kamerakonstante bestimmt werden kann. Die Kombination aus Schätzung der Kamerakonstante und robuster Merkmalszuordnung wird in den SfM Prozess integriert und erlaubt die Bestimmung der Kameraorientierung historischer Bilder. Auf Grundlage dieser Ergebnisse wird ein Arbeitsablauf konzipiert, der es ermöglicht, Archive mittels dieses photogrammetrischen Verfahrens direkt an 3D-Anwendungen anzubinden. Eine Suchanfrage in Archiven erfolgt üblicherweise über Schlagworte, die dann als Metadaten dem entsprechenden Objekt zugeordnet sein müssen. Eine Suche nach einem bestimmten Gebäude generiert deshalb u.a. Treffer zu Zeichnungen, Gemälden, Veranstaltungen, Innen- oder Detailansichten. Für die erfolgreiche Anwendung von SfM im stadträumlichen Kontext interessiert jedoch v.a. die fotografische Außenansicht des Gebäudes. Während die Bilder für ein einzelnes Gebäude von Hand sortiert werden können, ist dieser Prozess für mehrere Gebäude zu zeitaufwendig. Daher wird in Zusammenarbeit mit dem Competence Center for Scalable Data Services and Solutions (ScaDS) ein Ansatz entwickelt, um historische Fotografien über Bildähnlichkeiten zu filtern. Dieser ermöglicht zuverlässig über die Auswahl eines oder mehrerer Suchbilder die Suche nach inhaltsähnlichen Ansichten. Durch die Verknüpfung der inhaltsbasierten Suche mit dem SfM Ansatz ist es möglich, automatisiert für eine große Anzahl historischer Fotografien die Kameraparameter zu bestimmen. Das entwickelte Verfahren stellt eine deutliche Verbesserung im Vergleich zu kommerziellen und open-source SfM Standardlösungen dar. Das Ergebnis dieser Arbeit ist ein kompletter Arbeitsablauf vom Archiv bis zur Applikation, der automatisch Bilder filtert und diese orientiert. Die zu erwartende Genauigkeit von wenigen Metern für die Kameraposition sind ausreichend für die dargestellten Anwendungen in dieser Arbeit, bieten aber weiteres Verbesserungspotential. Eine Anbindung an Archive, die über Schnittstellen automatisch Fotografien und Positionen austauschen soll, befindet sich bereits in der Entwicklung. Dadurch ist es möglich, innere und äußere Orientierungsparameter direkt von der historischen Fotografie als Metadaten abzurufen, was neue Forschungsfelder eröffnet.:1 Introduction 1 1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4 1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5 1.3.3 Feature detection, feature description and feature matching . . . . . . 6 1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13 1.3.3.5 Joint methods . . . . . . . . . . . . . . . .

    Geometric and photometric affine invariant image registration

    Get PDF
    This thesis aims to present a solution to the correspondence problem for the registration of wide-baseline images taken from uncalibrated cameras. We propose an affine invariant descriptor that combines the geometry and photometry of the scene to find correspondences between both views. The geometric affine invariant component of the descriptor is based on the affine arc-length metric, whereas the photometry is analysed by invariant colour moments. A graph structure represents the spatial distribution of the primitive features; i.e. nodes correspond to detected high-curvature points, whereas arcs represent connectivities by extracted contours. After matching, we refine the search for correspondences by using a maximum likelihood robust algorithm. We have evaluated the system over synthetic and real data. The method is endemic to propagation of errors introduced by approximations in the system.BAE SystemsSelex Sensors and Airborne System

    Real-Time Multi-Fisheye Camera Self-Localization and Egomotion Estimation in Complex Indoor Environments

    Get PDF
    In this work a real-time capable multi-fisheye camera self-localization and egomotion estimation framework is developed. The thesis covers all aspects ranging from omnidirectional camera calibration to the development of a complete multi-fisheye camera SLAM system based on a generic multi-camera bundle adjustment method

    Robust and Accurate Camera Localisation at a Large Scale

    Get PDF
    The task of camera-based localization aims to quickly and precisely pinpoint at which location (and viewing direction) the image was taken, against a pre-stored large-scale map of the environment. This technique can be used in many 3D computer vision applications, e.g., AR/VR and autonomous driving. Mapping the world is the first step to enable camera-based localization since a pre-stored map serves as a reference for a query image/sequence. In this thesis, we exploit three readily available sources: (i) satellite images; (ii) ground-view images; (iii) 3D points cloud. Based on the above three sources, we propose solutions to localize a query camera both effectively and efficiently, i.e., accurately localizing a query camera under a variety of lighting and viewing conditions within a small amount of time. The main contributions are summarized as follows. In chapter 3, we separately present a minimal 4-point and 2-point solver to estimate a relative and absolute camera pose. The core idea is exploiting the vertical direction from IMU or vanishing point to derive a closed-form solution of a quartic equation and a quadratic equation for the relative and absolute camera pose, respectively. In chapter 4, we localize a ground-view query image against a satellite map. Inspired by the insight that humans commonly use orientation information as an important cue for spatial localization, we propose a method that endows deep neural networks with the 'commonsense' of orientation. We design a Siamese network that explicitly encodes each pixel's orientation of the ground-view and satellite images. Our method boosts the learned deep features' discriminative power, outperforming all previous methods. In chapter 5, we localize a ground-view query image against a ground-view image database. We propose a representation learning method having higher location-discriminating power. The core idea is learning discriminative image embedding. Similarities among intra-place images (viewing the same landmarks) are maximized while similarities among inter-place images (viewing different landmarks) are minimized. The method is easy to implement and pluggable into any CNN. Experiments show that our method outperforms all previous methods. In chapter 6, we localize a ground-view query image against a large-scale 3D points cloud with visual descriptors. To address the ambiguities in direct 2D--3D feature matching, we introduce a global matching method that harnesses global contextual information exhibited both within the query image and among all the 3D points in the map. The core idea is to find the optimal 2D set to 3D set matching. Tests on standard benchmark datasets show the effectiveness of our method. In chapter 7, we localize a ground-view query image against a 3D points cloud with only coordinates. The problem is also known as blind Perspective-n-Point. We propose a deep CNN model that simultaneously solves for both the 6-DoF absolute camera pose and 2D--3D correspondences. The core idea is extracting point-wise 2D and 3D features from their coordinates and matching 2D and 3D features effectively in a global feature matching module. Extensive tests on both real and simulated data have shown that our method substantially outperforms existing approaches. Last, in chapter 8, we study the potential of using 3D lines. Specifically, we study the problem of aligning two partially overlapping 3D line reconstructions in Euclidean space. This technique can be used for localization with respect to a 3D line database when query 3D line reconstructions are available (e.g., from stereo triangulation). We propose a neural network, taking Pluecker representations of lines as input, and solving for line-to-line matches and estimate a 6-DoF rigid transformation. Experiments on indoor and outdoor datasets show that our method's registration (rotation and translation) precision outperforms baselines significantly

    Heterogeneous Collaborative Mapping for Autonomous Mobile Systems

    Get PDF
    An accurate map of the environment is essential for autonomous robot navigation. During collaborative simultaneous localization and mapping, the individual robots usually represent the environment as probabilistic occupancy grid maps. These maps can be exchanged among robots and fused to reduce the overall exploration time, which is the main advantage of the collaborative systems. Such fusion is challenging due to the unknown initial correspondence problem. This thesis presents a novel feature-based map fusion approach through detecting, describing, and matching geometrically consistent features present in the overlapping region between the maps. The main drawback of usual feature-based approaches is the incapability to establish adequate valid feature correspondence primarily due to noisy sensory observation. Further, many existing map fusion approaches neglect the heterogeneity which arises due to different map resolutions and types of mapping sensors. This thesis shows that exploiting the probabilistic spatial information to refine the maps and utilizing nonlinear diffusion filtering to detect features can drastically improve the feature-matching performance. Additionally, this thesis presents a certainty grid fusion approach based on Bayesian inference to fuse pair-wise grid information. It also presents an extensive comparison of traditional feature detection methods to register map images at different scales. Finally, the effectiveness of the proposed method is illustrated based on the following map fusion assumptions using real-world data: homogeneous, hierarchical, and heterogeneous (fusing different resolution maps and maps generated using different types of mapping sensors)

    A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation

    Full text link
    Monocular Re-Localization (MRL) is a critical component in autonomous applications, estimating 6 degree-of-freedom ego poses w.r.t. the scene map based on monocular images. In recent decades, significant progress has been made in the development of MRL techniques. Numerous algorithms have accomplished extraordinary success in terms of localization accuracy and robustness. In MRL, scene maps are represented in various forms, and they determine how MRL methods work and how MRL methods perform. However, to the best of our knowledge, existing surveys do not provide systematic reviews about the relationship between MRL solutions and their used scene map representation. This survey fills the gap by comprehensively reviewing MRL methods from such a perspective, promoting further research. 1) We commence by delving into the problem definition of MRL, exploring current challenges, and comparing ours with existing surveys. 2) Many well-known MRL methods are categorized and reviewed into five classes according to the representation forms of utilized map, i.e., geo-tagged frames, visual landmarks, point clouds, vectorized semantic map, and neural network-based map. 3) To quantitatively and fairly compare MRL methods with various map, we introduce some public datasets and provide the performances of some state-of-the-art MRL methods. The strengths and weakness of MRL methods with different map are analyzed. 4) We finally introduce some topics of interest in this field and give personal opinions. This survey can serve as a valuable referenced materials for MRL, and a continuously updated summary of this survey is publicly available to the community at: https://github.com/jinyummiao/map-in-mono-reloc.Comment: 33 pages, 10 tables, 16 figures, under revie

    Estimation, planning, and mapping for autonomous flight using an RGB-D camera in GPS-denied environments

    Get PDF
    RGB-D cameras provide both color images and per-pixel depth estimates. The richness of this data and the recent development of low-cost sensors have combined to present an attractive opportunity for mobile robotics research. In this paper, we describe a system for visual odometry and mapping using an RGB-D camera, and its application to autonomous flight. By leveraging results from recent state-of-the-art algorithms and hardware, our system enables 3D flight in cluttered environments using only onboard sensor data. All computation and sensing required for local position control are performed onboard the vehicle, reducing the dependence on an unreliable wireless link to a ground station. However, even with accurate 3D sensing and position estimation, some parts of the environment have more perceptual structure than others, leading to state estimates that vary in accuracy across the environment. If the vehicle plans a path without regard to how well it can localize itself along that path, it runs the risk of becoming lost or worse. We show how the belief roadmap algorithm prentice2009belief, a belief space extension of the probabilistic roadmap algorithm, can be used to plan vehicle trajectories that incorporate the sensing model of the RGB-D camera. We evaluate the effectiveness of our system for controlling a quadrotor micro air vehicle, demonstrate its use for constructing detailed 3D maps of an indoor environment, and discuss its limitations.United States. Office of Naval Research (Grant MURI N00014-07-1-0749)United States. Office of Naval Research (Science of Autonomy Program N00014-09-1-0641)United States. Army Research Office (MAST CTA)United States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N00014-09-1-1052)National Science Foundation (U.S.) (Contract IIS-0812671)United States. Army Research Office (Robotics Consortium Agreement W911NF-10-2-0016)National Science Foundation (U.S.). Division of Information, Robotics, and Intelligent Systems (Grant 0546467

    Advances in Simultaneous Localization and Mapping in Confined Underwater Environments Using Sonar and Optical Imaging.

    Full text link
    This thesis reports on the incorporation of surface information into a probabilistic simultaneous localization and mapping (SLAM) framework used on an autonomous underwater vehicle (AUV) designed for underwater inspection. AUVs operating in cluttered underwater environments, such as ship hulls or dams, are commonly equipped with Doppler-based sensors, which---in addition to navigation---provide a sparse representation of the environment in the form of a three-dimensional (3D) point cloud. The goal of this thesis is to develop perceptual algorithms that take full advantage of these sparse observations for correcting navigational drift and building a model of the environment. In particular, we focus on three objectives. First, we introduce a novel representation of this 3D point cloud as collections of planar features arranged in a factor graph. This factor graph representation probabalistically infers the spatial arrangement of each planar segment and can effectively model smooth surfaces (such as a ship hull). Second, we show how this technique can produce 3D models that serve as input to our pipeline that produces the first-ever 3D photomosaics using a two-dimensional (2D) imaging sonar. Finally, we propose a model-assisted bundle adjustment (BA) framework that allows for robust registration between surfaces observed from a Doppler sensor and visual features detected from optical images. Throughout this thesis, we show methods that produce 3D photomosaics using a combination of triangular meshes (derived from our SLAM framework or given a-priori), optical images, and sonar images. Overall, the contributions of this thesis greatly increase the accuracy, reliability, and utility of in-water ship hull inspection with AUVs despite the challenges they face in underwater environments. We provide results using the Hovering Autonomous Underwater Vehicle (HAUV) for autonomous ship hull inspection, which serves as the primary testbed for the algorithms presented in this thesis. The sensor payload of the HAUV consists primarily of: a Doppler velocity log (DVL) for underwater navigation and ranging, monocular and stereo cameras, and---for some applications---an imaging sonar.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120750/1/paulozog_1.pd
    corecore