150 research outputs found

    Trifocal Relative Pose from Lines at Points and its Efficient Solution

    Full text link
    We present a new minimal problem for relative pose estimation mixing point features with lines incident at points observed in three views and its efficient homotopy continuation solver. We demonstrate the generality of the approach by analyzing and solving an additional problem with mixed point and line correspondences in three views. The minimal problems include correspondences of (i) three points and one line and (ii) three points and two lines through two of the points which is reported and analyzed here for the first time. These are difficult to solve, as they have 216 and - as shown here - 312 solutions, but cover important practical situations when line and point features appear together, e.g., in urban scenes or when observing curves. We demonstrate that even such difficult problems can be solved robustly using a suitable homotopy continuation technique and we provide an implementation optimized for minimal problems that can be integrated into engineering applications. Our simulated and real experiments demonstrate our solvers in the camera geometry computation task in structure from motion. We show that new solvers allow for reconstructing challenging scenes where the standard two-view initialization of structure from motion fails.Comment: This material is based upon work supported by the National Science Foundation under Grant No. DMS-1439786 while most authors were in residence at Brown University's Institute for Computational and Experimental Research in Mathematics -- ICERM, in Providence, R

    Metric 3D-reconstruction from Unordered and Uncalibrated Image Collections

    Get PDF
    In this thesis the problem of Structure from Motion (SfM) for uncalibrated and unordered image collections is considered. The proposed framework is an adaptation of the framework for calibrated SfM proposed by Olsson-Enqvist (2011) to the uncalibrated case. Olsson-Enqvist's framework consists of three main steps; pairwise relative rotation estimation, rotation averaging, and geometry estimation with known rotations. For this to work with uncalibrated images we also perform auto-calibration during the first step. There is a well-known degeneracy for pairwise auto-calibration which occurs when the two principal axes meet in a point. This is unfortunately common for real images. To mitigate this the rotation estimation is instead performed by estimating image triplets. For image triplets the degenerate congurations are less likely to occur in practice. This is followed by estimation of the pairs which did not get a successful relative rotation from the previous step. The framework is successfully applied to an uncalibrated and unordered collection of images of the cathedral in Lund. It is also applied to the well-known Oxford dinosaur sequence which consists of turntable motion. Image pairs from the turntable motion are in a degenerate conguration for auto-calibration since they both view the same point on the rotation axis

    Collaborative Perception From Data Association To Localization

    Get PDF
    During the last decade, visual sensors have become ubiquitous. One or more cameras can be found in devices ranging from smartphones to unmanned aerial vehicles and autonomous cars. During the same time, we have witnessed the emergence of large scale networks ranging from sensor networks to robotic swarms. Assume multiple visual sensors perceive the same scene from different viewpoints. In order to achieve consistent perception, the problem of correspondences between ob- served features must be first solved. Then, it is often necessary to perform distributed localization, i.e. to estimate the pose of each agent with respect to a global reference frame. Having everything set in the same coordinate system and everything having the same meaning for all agents, operation of the agents and interpretation of the jointly observed scene become possible. The questions we address in this thesis are the following: first, can a group of visual sensors agree on what they see, in a decentralized fashion? This is the problem of collaborative data association. Then, based on what they see, can the visual sensors agree on where they are, in a decentralized fashion as well? This is the problem of cooperative localization. The contributions of this work are five-fold. We are the first to address the problem of consistent multiway matching in a decentralized setting. Secondly, we propose an efficient decentralized dynamical systems approach for computing any number of smallest eigenvalues and the associated eigenvectors of a weighted graph with global convergence guarantees with direct applications in group synchronization problems, e.g. permutations or rotations synchronization. Thirdly, we propose a state-of-the art framework for decentralized collaborative localization for mobile agents under the presence of unknown cross-correlations by solving a minimax optimization prob- lem to account for the missing information. Fourthly, we are the first to present an approach to the 3-D rotation localization of a camera sensor network from relative bearing measurements. Lastly, we focus on the case of a group of three visual sensors. We propose a novel Riemannian geometric representation of the trifocal tensor which relates projections of points and lines in three overlapping views. The aforemen- tioned representation enables the use of the state-of-the-art optimization methods on Riemannian manifolds and the use of robust averaging techniques for estimating the trifocal tensor

    Structure from Motion with Higher-level Environment Representations

    Get PDF
    Computer vision is an important area focusing on understanding, extracting and using the information from vision-based sensor. It has many applications such as vision-based 3D reconstruction, simultaneous localization and mapping(SLAM) and data-driven understanding of the real world. Vision is a fundamental sensing modality in many different fields of application. While the traditional structure from motion mostly uses sparse point-based feature, this thesis aims to explore the possibility of using higher order feature representation. It starts with a joint work which uses straight line for feature representation and performs bundle adjustment with straight line parameterization. Then, we further try an even higher order representation where we use Bezier spline for parameterization. We start with a simple case where all contours are lying on the plane and uses Bezier splines to parametrize the curves in the background and optimize on both camera position and Bezier splines. For application, we present a complete end-to-end pipeline which produces meaningful dense 3D models from natural data of a 3D object: the target object is placed on a structured but unknown planar background that is modeled with splines. The data is captured using only a hand-held monocular camera. However, this application is limited to a planar scenario and we manage to push the parameterizations into real 3D. Following the potential of this idea, we introduce a more flexible higher-order extension of points that provide a general model for structural edges in the environment, no matter if straight or curved. Our model relies on linked B´ezier curves, the geometric intuition of which proves great benefits during parameter initialization and regularization. We present the first fully automatic pipeline that is able to generate spline-based representations without any human supervision. Besides a full graphical formulation of the problem, we introduce both geometric and photometric cues as well as higher-level concepts such overall curve visibility and viewing angle restrictions to automatically manage the correspondences in the graph. Results prove that curve-based structure from motion with splines is able to outperform state-of-the-art sparse feature-based methods, as well as to model curved edges in the environment

    A window to the past through modern urban environments: Developing a photogrammetric workflow for the orientation parameter estimation of historical images

    Get PDF
    The ongoing process of digitization in archives is providing access to ever-increasing historical image collections. In many of these repositories, images can typically be viewed in a list or gallery view. Due to the growing number of digitized objects, this type of visualization is becoming increasingly complex. Among other things, it is difficult to determine how many photographs show a particular object and spatial information can only be communicated via metadata. Within the scope of this thesis, research is conducted on the automated determination and provision of this spatial data. Enhanced visualization options make this information more eas- ily accessible to scientists as well as citizens. Different types of visualizations can be presented in three-dimensional (3D), Virtual Reality (VR) or Augmented Reality (AR) applications. However, applications of this type require the estimation of the photographer’s point of view. In the photogrammetric context, this is referred to as estimating the interior and exterior orientation parameters of the camera. For determination of orientation parameters for single images, there are the established methods of Direct Linear Transformation (DLT) or photogrammetric space resection. Using these methods requires the assignment of measured object points to their homologue image points. This is feasible for single images, but quickly becomes impractical due to the large amount of images available in archives. Thus, for larger image collections, usually the Structure-from-Motion (SfM) method is chosen, which allows the simultaneous estimation of the interior as well as the exterior orientation of the cameras. While this method yields good results especially for sequential, contemporary image data, its application to unsorted historical photographs poses a major challenge. In the context of this work, which is mainly limited to scenarios of urban terrestrial photographs, the reasons for failure of the SfM process are identified. In contrast to sequential image collections, pairs of images from different points in time or from varying viewpoints show huge differences in terms of scene representation such as deviations in the lighting situation, building state, or seasonal changes. Since homologue image points have to be found automatically in image pairs or image sequences in the feature matching procedure of SfM, these image differences pose the most complex problem. In order to test different feature matching methods, it is necessary to use a pre-oriented historical dataset. Since such a benchmark dataset did not exist yet, eight historical image triples (corresponding to 24 image pairs) are oriented in this work by manual selection of homologue image points. This dataset allows the evaluation of frequently new published methods in feature matching. The initial methods used, which are based on algorithmic procedures for feature matching (e.g., Scale Invariant Feature Transform (SIFT)), provide satisfactory results for only few of the image pairs in this dataset. By introducing methods that use neural networks for feature detection and feature description, homologue features can be reliably found for a large fraction of image pairs in the benchmark dataset. In addition to a successful feature matching strategy, determining camera orientation requires an initial estimate of the principal distance. Hence for historical images, the principal distance cannot be directly determined as the camera information is usually lost during the process of digitizing the analog original. A possible solution to this problem is to use three vanishing points that are automatically detected in the historical image and from which the principal distance can then be determined. The combination of principal distance estimation and robust feature matching is integrated into the SfM process and allows the determination of the interior and exterior camera orientation parameters of historical images. Based on these results, a workflow is designed that allows archives to be directly connected to 3D applications. A search query in archives is usually performed using keywords, which have to be assigned to the corresponding object as metadata. Therefore, a keyword search for a specific building also results in hits on drawings, paintings, events, interior or detailed views directly connected to this building. However, for the successful application of SfM in an urban context, primarily the photographic exterior view of the building is of interest. While the images for a single building can be sorted by hand, this process is too time-consuming for multiple buildings. Therefore, in collaboration with the Competence Center for Scalable Data Services and Solutions (ScaDS), an approach is developed to filter historical photographs by image similarities. This method reliably enables the search for content-similar views via the selection of one or more query images. By linking this content-based image retrieval with the SfM approach, automatic determination of camera parameters for a large number of historical photographs is possible. The developed method represents a significant improvement over commercial and open-source SfM standard solutions. The result of this work is a complete workflow from archive to application that automatically filters images and calculates the camera parameters. The expected accuracy of a few meters for the camera position is sufficient for the presented applications in this work, but offer further potential for improvement. A connection to archives, which will automatically exchange photographs and positions via interfaces, is currently under development. This makes it possible to retrieve interior and exterior orientation parameters directly from historical photography as metadata which opens up new fields of research.:1 Introduction 1 1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4 1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5 1.3.3 Feature detection, feature description and feature matching . . . . . . 6 1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13 1.3.3.5 Joint methods . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.4 Initial parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.5 Bundle adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.6 Dense reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.7 Georeferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.4 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Generation of a benchmark dataset using historical photographs for the evaluation of feature matching methods 29 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.1 Image differences based on digitization and image medium . . . . . . . 30 2.1.2 Image differences based on different cameras and acquisition technique 31 2.1.3 Object differences based on different dates of acquisition . . . . . . . . 31 2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 The image dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Comparison of different feature detection and description methods . . . . . . 35 2.4.1 Oriented FAST and Rotated BRIEF (ORB) . . . . . . . . . . . . . . . 36 2.4.2 Maximally Stable Extremal Region Detector (MSER) . . . . . . . . . 36 2.4.3 Radiation-invariant Feature Transform (RIFT) . . . . . . . . . . . . . 36 2.4.4 Feature matching and outlier removal . . . . . . . . . . . . . . . . . . 36 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3 Photogrammetry as a link between image repository and 4D applications 45 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 IX Contents 3.2 Multimodal access on repositories . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.1 Conventional access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Virtual access using online collections . . . . . . . . . . . . . . . . . . 48 3.2.3 Virtual museums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Workflow and access strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.3 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.4 Browser access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.5 VR and AR access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4 An adapted Structure-from-Motion Workflow for the orientation of historical images 69 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.1 Historical images for 3D reconstruction . . . . . . . . . . . . . . . . . 72 4.2.2 Algorithmic Feature Detection and Matching . . . . . . . . . . . . . . 73 4.2.3 Feature Detection and Matching using Convolutional Neural Networks 74 4.3 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.1 Step 1: Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.2 Step 2.1: Feature Detection and Matching . . . . . . . . . . . . . . . . 78 4.4.3 Step 2.2: Vanishing Point Detection and Principal Distance Estimation 80 4.4.4 Step 3: Scene Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 80 4.4.5 Comparison with Three Other State-of-the-Art SfM Workflows . . . . 81 4.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5 Fully automated pose estimation of historical images 97 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.1 Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2.2 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . 101 5.3 Data Preparation: Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.1 Experiment and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.2.1 Layer Extraction Approach (LEA) . . . . . . . . . . . . . . . 104 5.3.2.2 Attentive Deep Local Features (DELF) Approach . . . . . . 105 5.3.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4 Camera Pose Estimation of Historical Images Using Photogrammetric Methods 110 5.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.1.1 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.1.2 Retrieval Datasets . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4.2.1 Feature Detection and Matching . . . . . . . . . . . . . . . . 115 5.4.2.2 Geometric Verification and Camera Pose Estimation . . . . . 116 5.4.3 Results and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6 Related publications 129 6.1 Photogrammetric analysis of historical image repositores for virtual reconstruction in the field of digital humanities . . . . . . . . . . . . . . . . . . . . . . . 130 6.2 Feature matching of historical images based on geometry of quadrilaterals . . 131 6.3 Geo-information technologies for a multimodal access on historical photographs and maps for research and communication in urban history . . . . . . . . . . 132 6.4 An automated pipeline for a browser-based, city-scale mobile 4D VR application based on historical images . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.5 Software and content design of a browser-based mobile 4D VR application to explore historical city architecture . . . . . . . . . . . . . . . . . . . . . . . . 134 7 Synthesis 135 7.1 Summary of the developed workflows . . . . . . . . . . . . . . . . . . . . . . . 135 7.1.1 Error assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.1.2 Accuracy estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.1.3 Transfer of the workflow . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Developments and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8 Appendix 149 8.1 Setup for the feature matching evaluation . . . . . . . . . . . . . . . . . . . . 149 8.2 Transformation from COLMAP coordinate system to OpenGL . . . . . . . . 150 References 151 List of Figures 165 List of Tables 167 List of Abbreviations 169Der andauernde Prozess der Digitalisierung in Archiven ermöglicht den Zugriff auf immer größer werdende historische Bildbestände. In vielen Repositorien können die Bilder typischerweise in einer Listen- oder Gallerieansicht betrachtet werden. Aufgrund der steigenden Zahl an digitalisierten Objekten wird diese Art der Visualisierung zunehmend unübersichtlicher. Es kann u.a. nur noch schwierig bestimmt werden, wie viele Fotografien ein bestimmtes Motiv zeigen. Des Weiteren können räumliche Informationen bisher nur über Metadaten vermittelt werden. Im Rahmen der Arbeit wird an der automatisierten Ermittlung und Bereitstellung dieser räumlichen Daten geforscht. Erweiterte Visualisierungsmöglichkeiten machen diese Informationen Wissenschaftlern sowie Bürgern einfacher zugänglich. Diese Visualisierungen können u.a. in drei-dimensionalen (3D), Virtual Reality (VR) oder Augmented Reality (AR) Anwendungen präsentiert werden. Allerdings erfordern Anwendungen dieser Art die Schätzung des Standpunktes des Fotografen. Im photogrammetrischen Kontext spricht man dabei von der Schätzung der inneren und äußeren Orientierungsparameter der Kamera. Zur Bestimmung der Orientierungsparameter für Einzelbilder existieren die etablierten Verfahren der direkten linearen Transformation oder des photogrammetrischen Rückwärtsschnittes. Dazu muss eine Zuordnung von gemessenen Objektpunkten zu ihren homologen Bildpunkten erfolgen. Das ist für einzelne Bilder realisierbar, wird aber aufgrund der großen Menge an Bildern in Archiven schnell nicht mehr praktikabel. Für größere Bildverbände wird im photogrammetrischen Kontext somit üblicherweise das Verfahren Structure-from-Motion (SfM) gewählt, das die simultane Schätzung der inneren sowie der äußeren Orientierung der Kameras ermöglicht. Während diese Methode vor allem für sequenzielle, gegenwärtige Bildverbände gute Ergebnisse liefert, stellt die Anwendung auf unsortierten historischen Fotografien eine große Herausforderung dar. Im Rahmen der Arbeit, die sich größtenteils auf Szenarien stadträumlicher terrestrischer Fotografien beschränkt, werden zuerst die Gründe für das Scheitern des SfM Prozesses identifiziert. Im Gegensatz zu sequenziellen Bildverbänden zeigen Bildpaare aus unterschiedlichen zeitlichen Epochen oder von unterschiedlichen Standpunkten enorme Differenzen hinsichtlich der Szenendarstellung. Dies können u.a. Unterschiede in der Beleuchtungssituation, des Aufnahmezeitpunktes oder Schäden am originalen analogen Medium sein. Da für die Merkmalszuordnung in SfM automatisiert homologe Bildpunkte in Bildpaaren bzw. Bildsequenzen gefunden werden müssen, stellen diese Bilddifferenzen die größte Schwierigkeit dar. Um verschiedene Verfahren der Merkmalszuordnung testen zu können, ist es notwendig einen vororientierten historischen Datensatz zu verwenden. Da solch ein Benchmark-Datensatz noch nicht existierte, werden im Rahmen der Arbeit durch manuelle Selektion homologer Bildpunkte acht historische Bildtripel (entspricht 24 Bildpaaren) orientiert, die anschließend genutzt werden, um neu publizierte Verfahren bei der Merkmalszuordnung zu evaluieren. Die ersten verwendeten Methoden, die algorithmische Verfahren zur Merkmalszuordnung nutzen (z.B. Scale Invariant Feature Transform (SIFT)), liefern nur für wenige Bildpaare des Datensatzes zufriedenstellende Ergebnisse. Erst durch die Verwendung von Verfahren, die neuronale Netze zur Merkmalsdetektion und Merkmalsbeschreibung einsetzen, können für einen großen Teil der historischen Bilder des Benchmark-Datensatzes zuverlässig homologe Bildpunkte gefunden werden. Die Bestimmung der Kameraorientierung erfordert zusätzlich zur Merkmalszuordnung eine initiale Schätzung der Kamerakonstante, die jedoch im Zuge der Digitalisierung des analogen Bildes nicht mehr direkt zu ermitteln ist. Eine mögliche Lösung dieses Problems ist die Verwendung von drei Fluchtpunkten, die automatisiert im historischen Bild detektiert werden und aus denen dann die Kamerakonstante bestimmt werden kann. Die Kombination aus Schätzung der Kamerakonstante und robuster Merkmalszuordnung wird in den SfM Prozess integriert und erlaubt die Bestimmung der Kameraorientierung historischer Bilder. Auf Grundlage dieser Ergebnisse wird ein Arbeitsablauf konzipiert, der es ermöglicht, Archive mittels dieses photogrammetrischen Verfahrens direkt an 3D-Anwendungen anzubinden. Eine Suchanfrage in Archiven erfolgt üblicherweise über Schlagworte, die dann als Metadaten dem entsprechenden Objekt zugeordnet sein müssen. Eine Suche nach einem bestimmten Gebäude generiert deshalb u.a. Treffer zu Zeichnungen, Gemälden, Veranstaltungen, Innen- oder Detailansichten. Für die erfolgreiche Anwendung von SfM im stadträumlichen Kontext interessiert jedoch v.a. die fotografische Außenansicht des Gebäudes. Während die Bilder für ein einzelnes Gebäude von Hand sortiert werden können, ist dieser Prozess für mehrere Gebäude zu zeitaufwendig. Daher wird in Zusammenarbeit mit dem Competence Center for Scalable Data Services and Solutions (ScaDS) ein Ansatz entwickelt, um historische Fotografien über Bildähnlichkeiten zu filtern. Dieser ermöglicht zuverlässig über die Auswahl eines oder mehrerer Suchbilder die Suche nach inhaltsähnlichen Ansichten. Durch die Verknüpfung der inhaltsbasierten Suche mit dem SfM Ansatz ist es möglich, automatisiert für eine große Anzahl historischer Fotografien die Kameraparameter zu bestimmen. Das entwickelte Verfahren stellt eine deutliche Verbesserung im Vergleich zu kommerziellen und open-source SfM Standardlösungen dar. Das Ergebnis dieser Arbeit ist ein kompletter Arbeitsablauf vom Archiv bis zur Applikation, der automatisch Bilder filtert und diese orientiert. Die zu erwartende Genauigkeit von wenigen Metern für die Kameraposition sind ausreichend für die dargestellten Anwendungen in dieser Arbeit, bieten aber weiteres Verbesserungspotential. Eine Anbindung an Archive, die über Schnittstellen automatisch Fotografien und Positionen austauschen soll, befindet sich bereits in der Entwicklung. Dadurch ist es möglich, innere und äußere Orientierungsparameter direkt von der historischen Fotografie als Metadaten abzurufen, was neue Forschungsfelder eröffnet.:1 Introduction 1 1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Historical image data and archives . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Structure-from-Motion for historical images . . . . . . . . . . . . . . . . . . . 4 1.3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Selection of images and preprocessing . . . . . . . . . . . . . . . . . . 5 1.3.3 Feature detection, feature description and feature matching . . . . . . 6 1.3.3.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3.2 Feature description . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3.4 Geometric verification and robust estimators . . . . . . . . . 13 1.3.3.5 Joint methods . . . . . . . . . . . . . . . .

    Robust convex optimisation techniques for autonomous vehicle vision-based navigation

    Get PDF
    This thesis investigates new convex optimisation techniques for motion and pose estimation. Numerous computer vision problems can be formulated as optimisation problems. These optimisation problems are generally solved via linear techniques using the singular value decomposition or iterative methods under an L2 norm minimisation. Linear techniques have the advantage of offering a closed-form solution that is simple to implement. The quantity being minimised is, however, not geometrically or statistically meaningful. Conversely, L2 algorithms rely on iterative estimation, where a cost function is minimised using algorithms such as Levenberg-Marquardt, Gauss-Newton, gradient descent or conjugate gradient. The cost functions involved are geometrically interpretable and can statistically be optimal under an assumption of Gaussian noise. However, in addition to their sensitivity to initial conditions, these algorithms are often slow and bear a high probability of getting trapped in a local minimum or producing infeasible solutions, even for small noise levels. In light of the above, in this thesis we focus on developing new techniques for finding solutions via a convex optimisation framework that are globally optimal. Presently convex optimisation techniques in motion estimation have revealed enormous advantages. Indeed, convex optimisation ensures getting a global minimum, and the cost function is geometrically meaningful. Moreover, robust optimisation is a recent approach for optimisation under uncertain data. In recent years the need to cope with uncertain data has become especially acute, particularly where real-world applications are concerned. In such circumstances, robust optimisation aims to recover an optimal solution whose feasibility must be guaranteed for any realisation of the uncertain data. Although many researchers avoid uncertainty due to the added complexity in constructing a robust optimisation model and to lack of knowledge as to the nature of these uncertainties, and especially their propagation, in this thesis robust convex optimisation, while estimating the uncertainties at every step is investigated for the motion estimation problem. First, a solution using convex optimisation coupled to the recursive least squares (RLS) algorithm and the robust H filter is developed for motion estimation. In another solution, uncertainties and their propagation are incorporated in a robust L convex optimisation framework for monocular visual motion estimation. In this solution, robust least squares is combined with a second order cone program (SOCP). A technique to improve the accuracy and the robustness of the fundamental matrix is also investigated in this thesis. This technique uses the covariance intersection approach to fuse feature location uncertainties, which leads to more consistent motion estimates. Loop-closure detection is crucial in improving the robustness of navigation algorithms. In practice, after long navigation in an unknown environment, detecting that a vehicle is in a location it has previously visited gives the opportunity to increase the accuracy and consistency of the estimate. In this context, we have developed an efficient appearance-based method for visual loop-closure detection based on the combination of a Gaussian mixture model with the KD-tree data structure. Deploying this technique for loop-closure detection, a robust L convex posegraph optimisation solution for unmanned aerial vehicle (UAVs) monocular motion estimation is introduced as well. In the literature, most proposed solutions formulate the pose-graph optimisation as a least-squares problem by minimising a cost function using iterative methods. In this work, robust convex optimisation under the L norm is adopted, which efficiently corrects the UAV’s pose after loop-closure detection. To round out the work in this thesis, a system for cooperative monocular visual motion estimation with multiple aerial vehicles is proposed. The cooperative motion estimation employs state-of-the-art approaches for optimisation, individual motion estimation and registration. Three-view geometry algorithms in a convex optimisation framework are deployed on board the monocular vision system for each vehicle. In addition, vehicle-to-vehicle relative pose estimation is performed with a novel robust registration solution in a global optimisation framework. In parallel, and as a complementary solution for the relative pose, a robust non-linear H solution is designed as well to fuse measurements from the UAVs’ on-board inertial sensors with the visual estimates. The suggested contributions have been exhaustively evaluated over a number of real-image data experiments in the laboratory using monocular vision systems and range imaging devices. In this thesis, we propose several solutions towards the goal of robust visual motion estimation using convex optimisation. We show that the convex optimisation framework may be extended to include uncertainty information, to achieve robust and optimal solutions. We observed that convex optimisation is a practical and very appealing alternative to linear techniques and iterative methods

    Tight Fusion of Events and Inertial Measurements for Direct Velocity Estimation

    Full text link
    Traditional visual-inertial state estimation targets absolute camera poses and spatial landmark locations while first-order kinematics are typically resolved as an implicitly estimated sub-state. However, this poses a risk in velocity-based control scenarios, as the quality of the estimation of kinematics depends on the stability of absolute camera and landmark coordinates estimation. To address this issue, we propose a novel solution to tight visual-inertial fusion directly at the level of first-order kinematics by employing a dynamic vision sensor instead of a normal camera. More specifically, we leverage trifocal tensor geometry to establish an incidence relation that directly depends on events and camera velocity, and demonstrate how velocity estimates in highly dynamic situations can be obtained over short time intervals. Noise and outliers are dealt with using a nested two-layer RANSAC scheme. Additionally, smooth velocity signals are obtained from a tight fusion with pre-integrated inertial signals using a sliding window optimizer. Experiments on both simulated and real data demonstrate that the proposed tight event-inertial fusion leads to continuous and reliable velocity estimation in highly dynamic scenarios independently of absolute coordinates. Furthermore, in extreme cases, it achieves more stable and more accurate estimation of kinematics than traditional, point-position-based visual-inertial odometry.Comment: Accepted by IEEE Transactions on Robotics (T-RO
    • …
    corecore