9 research outputs found

    Reconstruction quasi-dense et modèles 3D à partir d'une séquence d'images

    Get PDF
    National audienceCe papier propose une reconstruction quasi-dense à partir d'une séquence d'images non calibrées ainsi qu'un système associé de reconstruction de modèles 3D. La principale innovation est que toute la géométrie est calculée à partir de mises en correspondances quasi-denses sous-échantillonnées au lieu des points d'intérets épars usuels. Cela produit non seulement une reconstruction plus précise (au sens des incertitudes) et plus robuste grace à des mises en correspondances bien redondantes et réparties dans les images, mais aussi une reconstruction plus adéquate (car plus dense) pour l'application de la reconstruction de surface. Des expériences sur des séquences réelles montrent de meilleures performances des reconstructions quasi-denses par rapport aux reconstructions éparses à la fois en robustesse et incertitudes. De plus, les surfaces des nombreux objets ont été obtenues à partir des points quasi-denses reconstruits

    Extraction of Unfoliaged Trees from Terrestrial Image Sequences

    Get PDF
    This thesis presents a generative statistical approach for the fully automatic three-dimensional (3D) extraction and reconstruction of unfoliaged deciduous trees from wide-baseline image sequences. Tree models improve the realism of 3D Geoinformation systems (GIS) by adding a natural touch. Unfoliaged trees are, however, difficult to reconstruct from images due to partially weak contrast, background clutter, occlusions, and particularly the possibly varying order of branches in images from different viewpoints. The proposed approach combines generative modeling by L-systems and statistical maximum a posteriori (MAP) estimation for the extraction of the 3D branching structure of trees. Background estimation is conducted by means of mathematical (gray scale) morphology as basis for generative modeling. A Gaussian likelihood function based on intensity differences is employed to evaluate the hypotheses. A mechanism has been devised to control the sampling sequence of multiple parameters in the Markov Chain considering their characteristics and the performance in the previous step. A tree is classified into three typical branching types after the extraction of the first level of branches and more specific Production Rules of L-systems are used accordingly. Generic prior distributions for parameters are refined based on already extracted branches in a Bayesian framework and integrated into the MAP estimation. By these means most of the branching structure besides tiny twigs can be reconstructed. Results are presented in the form of VRML (Virtual Reality Modeling Language) models demonstrating the potential of the approach as well as its current shortcomings.Diese Dissertationsschrift stellt einen generativen statistischen Ansatz für die vollautomatische drei-dimensionale (3D) Extraktion und Rekonstruktion unbelaubter Laubbäume aus Bildsequenzen mit großer Basis vor. Modelle für Bäume verbessern den Realismus von 3D Geoinformationssystemen (GIS), indem sie Letzteren eine natürliche Note geben. Wegen z.T. schwachem Kontrast, Störobjekten im Hintergrund, Verdeckungen und insbesondere der möglicherweise unterschiedlichen Ordnung der Äste in Bildern von verschiedenen Blickpunkten sind unbelaubte Bäume aber schwierig zu rekonstruieren. Der vorliegende Ansatz kombiniert generative Modellierung mittels L-Systemen und statistische Maximum A Posteriori (MAP) Schätzung für die Extraktion der 3D Verzweigungsstruktur von Bäumen. Hintergrund-Schätzung wird auf Grundlage von mathematischer (Grauwert) Morphologie als Basis für die generative Modellierung durchgeführt. Für die Bewertung der Hypothesen wird eine Gaußsche Likelihood-Funktion basierend auf Intensitätsunterschieden benutzt. Es wurde ein Mechanismus entworfen, der die Reihenfolge der Verwendung mehrerer Parameter für die Markoff-Kette basierend auf deren Charakteristik und Performance im letzten Schritt kontrolliert. Ein Baum wird nach der Extraktion der ersten Stufe von Ästen in drei typische Verzweigungstypen klassifiziert und es werden entsprechend Produktionsregeln von spezifischen L-Systemen verwendet. Basierend auf bereits extrahierten Ästen werden generische Prior-Verteilungen für die Parameter in einem Bayes’schen Rahmen verfeinert und in die MAP Schätzung integriert. Damit kann ein großer Teil der Verzweigungsstruktur außer kleinen Ästen extrahiert werden. Die Ergebnisse werden als VRML (Virtual Reality Modeling Language) Modelle dargestellt. Sie zeigen das Potenzial aber auch die noch vorhandenen Defizite des Ansatzes

    PERCEPTION FOR SURVEILLANCE: LEARNING SELF-LOCALISATION AND INTRUDERS DETECTION FROM MONOCULAR IMAGES OF AN AERIAL ROBOT IN OUTDOOR URBAN ENVIRONMENTS

    Get PDF
    Unmanned aerial vehicles (UAVs), more commonly named drones, are one of the most versatile robotic platforms for their high mobility and low-cost design. Therefore, they have been applied to numerous civil applications. These robots generally can complete autonomous or semi-autonomous missions by undertaking complex calculations on their autopilot system based on the sensors' observations to control their attitude and speed and to plan and track a trajectory for navigating in a possibly unknown environment without human intervention. However, to enable higher degrees of autonomy, the perception system is paramount for extracting valuable knowledge that allows interaction with the external world. Therefore, this thesis aims to solve the core perception challenges of an autonomous surveillance application carried out by an aerial robot in an outdoor urban environment. We address a simplified use case of patrolling missions to monitor a confined area around buildings that is supposedly under access restriction. Hence, we identify the main research questions involved in this application context. On the one hand, the drone has to locate itself in a controlled navigation environment, keep track of its pose while flying, and understand the geometrical structure of the 3D scene around it. On the other hand, the surveillance mission entails detecting and localising people in the monitored area. Consequently, we develop numerous methodologies to address these challenging questions. Furthermore, constraining the UAV's sensor array to a monocular RGB camera, we approach the raised problems with algorithms in the computer vision field. First, we train a neural network with an unsupervised learning paradigm to predict the drone ego-motion and the geometrical scene structure. Hence, we introduce a novel algorithm that integrates a model-free epipolar method to adjust online the rotational drift of the trajectory estimated by the trained pose network. Second, we employ an efficient Convolutional Neural Network (CNN) architecture to regress the UAV global metric pose directly from a single colour image. Moreover, we investigate how dynamic objects in the camera field of view affect the localisation performance of such an approach. Following, we discuss the implementation of an object detection network and derive the equations to find the 3D position of the detected people in a reconstructed environment. Next, we describe the theory behind structure-from-motion and use it to recreate a 3D model of a dataset recorded with a drone at the University of Luxembourg's Belval campus. Ultimately, we perform multiple experiments to validate and evaluate our proposed algorithms with other state-of-the-art methodologies. Results show the superiority of our methods in different metrics. Also, in our analysis, we determine the limitations and highlight the benefits of the adopted strategies compared to other approaches. Finally, the introduced dataset provides an additional tool for benchmarking perception algorithms and future application developments

    Line Primitives and Their Applications in Geometric Computer Vision

    Get PDF
    Line primitives are widely found in structured scenes which provide a higher level of structure information about the scenes than point primitives. Furthermore, line primitives in space are closely related to Euclidean transformations, because the dual vector (also known as Pluecker coordinates) representation of 3D lines is the counterpart of the dual quaternion which depicts an Euclidean transformation. These geometric properties of line primitives motivate the work in this thesis with the following contributions: Firstly, by combining local appearances of lines and geometric constraints between line pairs in images, a line segment matching algorithm is developed which constructs a novel line band descriptor to depict the local appearance of a line and builds a relational graph to measure the pair-wise consistency between line correspondences. Experiments show that the matching algorithm is robust to various image transformations and more efficient than conventional graph based line matching algorithms. Secondly, by investigating the symmetric property of line directions in space, this thesis presents a complete analysis about the solutions of the Perspective-3-Line (P3L) problem which estimates the camera pose from three reference lines in space and their 2D projections. For three spatial lines in general configurations, a P3L polynomial is derived which is employed to develop a solution of the Perspective-n-Line problem. The proposed robust PnL algorithm can efficiently and accurately estimate the camera pose for both small numbers and large numbers of line correspondences. For three spatial lines in special configurations (e.g., in a Manhattan world which consists of three mutually orthogonal dominant directions), the solution of the P3L problem is employed to solve the vanishing point estimation and line classification problem. The proposed vanishing point estimation algorithm achieves high accuracy and efficiency by thoroughly utilizing the Manhattan world characteristic. Another advantage of the proposed framework is that it can be easily generalized to images taken by central catadioptric cameras or uncalibrated cameras. The third major contribution of this thesis is about structure-from-motion using line primitives. To circumvent the Pluecker constraints on the Pluecker coordinates of lines, the Cayley representation of lines is developed which is inspired by the geometric property of the Pluecker coordinates of lines. To build the line observation model, two derivations of line projection functions are presented: one is based on the dual relationship between points and lines; and the other is based on the relationship between Pluecker coordinates and the Pluecker matrix. Then the motion and structure parameters are initialized by an incremental approach and optimized by sparse bundle adjustment. Quantitative validations show the increase in performance when compared to conventional line reconstruction algorithms

    LEARNING VISUAL FEATURES FOR GRASP SELECTION AND CONTROL

    Get PDF
    J. J. Gibson suggested that objects in our environment can be represented by an agent in terms of the types of actions that the agent may perform on or with that object. This affordance representation allows the agent to make the connection between the perception of key properties of an object and these actions. In this dissertation, I explore the automatic construction of visual representations that are associated with components of objects that afford certain types of grasping actions. I propose that the type of grasp used on a class of objects should form the basis of these visual representations. The visual categories are driven by grasp types. A grasp type is defined as a cluster of grasp samples in the 6D hand position and orientation space relative to the object. Specifically, for each grasp type, a set of view-dependent visualoperators can be learned that match the appearance of the part of the object that is to be grasped. By focusing on object parts, as opposed to entire objects, the resulting visual operators can generalize across different object types that exhibit some similarities in 3D shape. In this dissertation, the training/testing data set is composed of a large set of example grasps made by a human teacher, and includes a set of fifty unique objects. Each grasp example consists of a stereo image pair of the object alone, a stereo image pair of the object being grasped, and information about the 3D pose of the hand relative to the object. The grasp regions in a training/testing image that correspond to locations at which certain grasp types could be applied to the object are automatically estimated. First, I show that classes of objects can beformed on the basis of how the individual objects are grasped. Second, I show that visual models based on Pair of Adjacent Segments (PAS) features can capture view-dependent similarities in object part appearance for different objects of the same class. Third, I show that these visual operators can suggest grasp types and hand locationsand orientations for novel objects in novel scenarios. Given a novel image of a novel object, the proposed algorithm matches the learned shape models to this image. A match of the shape model in a novel image is interpreted as that the corresponding component of the image affords a particular grasp action. Experimental results show that the proposed algorithm is capable of identifying the occurrence of learned grasp options in images containing novel objects

    Visual localization in challenging environments

    Get PDF
    Visual localization, the method of self-localization based on camera images, has established as an additional, GNSS-free technology that is investigated in increasingly real and challenging applications. Particularly demanding is the self-localization of first responders in unstructured and unknown environments, for which visual localization can substantially contribute to increase the situational awareness and safety of first responders. Challenges arise from the operation under adverse conditions on computationally restricted platforms in the presence of dynamic objects. Current solutions are quickly pushed to their limits and the development of more robust approaches is of high demand. This thesis investigates the application of visual localization in dynamic, adverse environments to identify challenges and accordingly to increase the robustness, on the example of a dedicated visual-inertial navigation system. The methodical contributions of this work relate to the introduction of semantic understanding, improvements in error propagation and the development of a digital twin. The geometric visual odometry component is extended to a hybrid approach that includes a deep neural network for semantic segmentation to ignore distracting image areas of certain object classes. A Sensor-AI approach complements this method by directly training the network to segment image areas that are critical for the considered visual odometry system. Another improvement results from analyses and modifications of the existing error propagation in visual odometry. Furthermore, a digital twin is presented that closely replicates geometric and radiometric properties of the real sensor system in simulation in order to multiply experimental possibilities. The experiments are based on datasets from inspections that are used to motivate three first responder scenarios, namely indoor rescue, flood disaster and wildfire. The datasets were recorded in corridor, mall, coast, river and fumarole environments and aim to analyze the influence of the dynamic elements person, water and smoke. Each investigation starts with extensive in-depth analyses in simulation based on created synthetic video clones of the respective dynamic environments. Specifically, a combined sensitivity analysis allows to jointly consider environment, system design, sensor property and calibration error parameters to account for adverse conditions. All investigations are verified with experiments based on the real system. The results show the susceptibility of geometric approaches to dynamic objects in challenging scenarios. The introduction of the segmentation aid within the hybrid system contributes well in terms of robustness by preventing significant failures, but understandably it cannot compensate for a lack of visible static backgrounds. As a consequence, future visual localization systems require both the ability of semantic understanding and its integration into a complementary multi-sensor system

    Robuste und genaue Erkennung von Mid-Level-Primitiven fĂĽr die 3D-Rekonstruktion in von Menschen geschaffenen Umgebungen

    Get PDF
    The detection of geometric primitives such as points, lines and arcs is a fundamental step in computer vision techniques like image analysis, pattern recognition and 3D scene reconstruction. In this thesis, we present a framework that enables a reliable detection of geometric primitives in images. The focus is on application in man-made environments, although the process is not limited to this. The method provides robust and subpixel accurate detection of points, lines and arcs, and builds up a graph describing the topological relationships between the detected features. The detection method works directly on distorted perspective and fisheye images. The additional recognition of repetitive structures in images ensures the unambiguity of the features in their local environment. We can show that our approach achieves a high localization accuracy comparable to the state-of-the-art methods and at the same time is more robust against disturbances caused by noise. In addition, our approach allows extracting more fine details in the images. The detection accuracy achieved on the real-world scenes is constantly above that achieved by the other methods. Furthermore, our process can reliably distinguish between line and arc segments. The additional topological information extracted by our method is largely consistent over several images of a scene and can therefore be a support for subsequent processing steps, such as matching and correspondence search. We show how the detection method can be integrated into a complete feature-based 3D reconstruction pipeline and present a novel reconstruction method that uses the topological relationships of the features to create a highly abstract but semantically rich 3D model of the reconstructed scenes, in which certain geometric structures can easily be detected.Die Detektion von geometrischen Primitiven wie Punkten, Linien und Bögen ist ein elementarer Verarbeitungsschritt für viele Techniken des maschinellen Sehens wie Bildanalyse, Mustererkennung und 3D-Szenenrekonstruktion. In dieser Arbeit wird eine Methode vorgestellt, die eine zuverlässige Detektion von geometrischen Primitiven in Bildern ermöglicht. Der Fokus liegt auf der Anwendung in urbanen Umgebungen, wobei der Prozess nicht darauf beschränkt ist. Die Methode ermöglicht eine robuste und subpixelgenaue Detektion von Punkten, Linien und Bögen und erstellt einen Graphen, der die topologischen Beziehungen zwischen den detektierten Merkmalen beschreibt. Die Detektionsmethode kann direkt auf verzeichnete perspektivische Bilder und Fischaugenbilder angewendet werden. Die zusätzliche Erkennung sich wiederholender Strukturen in Bildern gewährleistet die Eindeutigkeit der Merkmale in ihrer lokalen Umgebung. Das neu entwickelte Verfahren erreicht eine hohe Lokalisierungsgenauigkeit, die dem Stand der Technik entspricht und gleichzeitig robuster gegenüber Störungen durch Rauschen ist. Darüber hinaus ermöglicht das Verfahren, mehr Details in den Bildern zu extrahieren. Die Detektionsrate ist bei dem neuen Verfahren auf den realen Datensätzen stets höher als bei dem aktuellen Stand der Technik. Darüber hinaus kann das neue Verfahren zuverlässig zwischen Linien- und Bogensegmenten unterscheiden. Die durch das neue Verfahren gewonnenen zusätzlichen topologischen Informationen sind weitgehend konsistent über mehrere Bilder einer Szene und können somit eine Unterstützung für nachfolgende Verarbeitungsschritte wie Matching und Korrespondenzsuche sein. Die Detektionsmethode wird in eine vollständige merkmalsbasierte 3D-Rekonstruktionspipeline integriert und es wird eine neuartige Rekonstruktionsmethode vorgestellt, die die topologischen Beziehungen der Merkmale nutzt, um ein abstraktes, aber zugleich semantisch reichhaltiges 3D-Modell der rekonstruierten Szenen zu erstellen, in dem komplexere geometrische Strukturen leicht detektiert werden können

    Sparse Representations and Feature Learning for Image Set Classification and Correspondence Estimation

    Get PDF
    The use of effective features is a key component in solving many computer vision tasks including, but not limited to, image (set) classification and correspondence estimation. Many research directions have focused on finding good features for the task under consideration, traditionally by hand crafting and recently by machine learning. In our work, we present algorithms for feature extraction and sparse representation for the classification of image sets. In addition, we present an approach for deep metric learning for correspondence estimation. We start by benchmarking various image set classification methods on a mobile video dataset that we have collected and made public. The videos were acquired under three different ambient conditions to capture the type of variations caused by the 'mobility' of the devices. An inspection of these videos reveals a combination of favorable and challenging properties unique to smartphone face videos. Besides mobility, the dataset has other challenges including partial faces, occasional pose changes, blur and fiducial point localization errors. Based on the evaluation, the recognition rates drop dramatically when enrollment and test videos come from different sessions. We then present Bayesian Representation-based Classification (BRC), an approach based on sparse Bayesian regression and subspace clustering for image set classification. A Bayesian statistical framework is used to compare BRC with similar existing approaches such as Collaborative Representation-based Classification (CRC) and Sparse Representation-based Classification (SRC), where it is shown that BRC employs precision hyperpriors that are more non-informative than those of CRC/SRC. Furthermore, we present a robust probe image set handling strategy that balances the trade-off between efficiency and accuracy. Experiments on three datasets illustrate the effectiveness of our algorithm compared to state-of-the-art set-based methods. We then propose to represent image sets as a dictionaries of hand-crafted descriptors based on Symmetric Positive Definite (SPD) matrices that are more robust to local deformations and fiducial point location errors. We then learn a tangent map for transforming the SPD matrix logarithms into a lower-dimensional Log-Euclidean space such that the transformed gallery atoms adhere to a more discriminative subspace structure. A query image set is then classified by first mapping its SPD descriptors into the computed Log-Euclidean tangent space and then using the sparse representation over the tangent space to decide a label for the image set. Experiments on four public datasets show that representation-based classification based on the proposed features outperforms many state-of-the-art methods. We then present Nonlinear Subspace Feature Enhancement (NSFE), an approach for nonlinearly embedding image sets into a space where they adhere to a more discriminative subspace structure. We describe how the structured loss function of NSFE can be optimized in a batch-by-batch fashion by a two-step alternating algorithm. The algorithm makes very few assumptions about the form of the embedding to be learned and is compatible with stochastic gradient descent and back-propagation. We evaluate NSFE with different types of input features and nonlinear embeddings and show that NSFE compares favorably to state-of-the-art image set classification methods. Finally, we propose a hierarchical approach for deep metric learning and descriptor matching for the task of point correspondence estimation. Our idea is motivated by the observation that existing metric learning approaches based on supervising and matching with only the deepest layer result in features that are suboptimal in some aspects to shallower features. Instead, the best matching performance, as we empirically show, is obtained by combining the high invariance of deeper features with the geometric sensitivity and higher precision of shallower features. We compare our method to state-of-the-art networks as well as fusion baselines inspired from existing semantic segmentation networks and empirically show that our method is more accurate and better suited to correspondence estimation
    corecore