14 research outputs found

    A Unified Hybrid Formulation for Visual SLAM

    Visual Simultaneous Localization and Mapping (Visual SLAM (VSLAM)), is the process of estimating the six degrees of freedom ego-motion of a camera, from its video feed, while simultaneously constructing a 3D model of the observed environment. Extensive research in the field for the past two decades has yielded real-time and efficient algorithms for VSLAM, allowing various interesting applications in augmented reality, cultural heritage, robotics and the automotive industry, to name a few. The underlying formula behind VSLAM is a mixture of image processing, geometry, graph theory, optimization and machine learning; the theoretical and practical development of these building blocks led to a wide variety of algorithms, each leveraging different assumptions to achieve superiority under the presumed conditions of operation. An exhaustive survey on the topic outlined seven main components in a generic VSLAM pipeline, namely: the matching paradigm, visual initialization, data association, pose estimation, topological/metric map generation, optimization, and global localization. Before claiming VSLAM a solved problem, numerous challenging subjects pertaining to robustness in each of the aforementioned components have to be addressed; namely: resilience to a wide variety of scenes (poorly textured or self repeating scenarios), resilience to dynamic changes (moving objects), and scalability for long-term operation (computational resources awareness and management). Furthermore, current state-of-the art VSLAM pipelines are tailored towards static, basic point cloud reconstructions, an impediment to perception applications such as path planning, obstacle avoidance and object tracking. To address these limitations, this work proposes a hybrid scene representation, where different sources of information extracted solely from the video feed are fused in a hybrid VSLAM system. The proposed pipeline allows for seamless integration of data from pixel-based intensity measurements and geometric entities to produce and make use of a coherent scene representation. The goal is threefold: 1) Increase camera tracking accuracy under challenging motions, 2) improve robustness to challenging poorly textured environments and varying illumination conditions, and 3) ensure scalability and long-term operation by efficiently maintaining a global reusable map representation

    Cartographie dense basée sur une représentation compacte RGB-D dédiée à la navigation autonome

    Our aim is concentrated around building ego-centric topometric maps represented as a graph of keyframe nodes which can be efficiently used by autonomous agents. The keyframe nodes which combines a spherical image and a depth map (augmented visual sphere) synthesises information collected in a local area of space by an embedded acquisition system. The representation of the global environment consists of a collection of augmented visual spheres that provide the necessary coverage of an operational area. A "pose" graph that links these spheres together in six degrees of freedom, also defines the domain potentially exploitable for navigation tasks in real time. As part of this research, an approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and ; geometric information available from our augmented spherical database ; how to determine the quantity and optimal placement of these augmented spheres to cover an environment completely ; how tomodel sensor uncertainties and update the dense infomation of the augmented spheres ; how to compactly represent the information contained in the augmented sphere to ensure robustness, accuracy and stability along an explored trajectory by making use of saliency maps.Dans ce travail, nous proposons une représentation efficace de l’environnement adaptée à la problématique de la navigation autonome. Cette représentation topométrique est constituée d’un graphe de sphères de vision augmentées d’informations de profondeur. Localement la sphère de vision augmentée constitue une représentation égocentrée complète de l’environnement proche. Le graphe de sphères permet de couvrir un environnement de grande taille et d’en assurer la représentation. Les "poses" à 6 degrés de liberté calculées entre sphères sont facilement exploitables par des tâches de navigation en temps réel. Dans cette thèse, les problématiques suivantes ont été considérées : Comment intégrer des informations géométriques et photométriques dans une approche d’odométrie visuelle robuste ; comment déterminer le nombre et le placement des sphères augmentées pour représenter un environnement de façon complète ; comment modéliser les incertitudes pour fusionner les observations dans le but d’augmenter la précision de la représentation ; comment utiliser des cartes de saillances pour augmenter la précision et la stabilité du processus d’odométrie visuelle

    Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

    A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.Comment: PhD Dissertation (162 pages

    Semi-dense filter-based visual odometry for automotive augmented reality applications

    In order to integrate virtual objects convincingly into a real scene, Augmented Reality (AR) systems typically need to solve two problems: Firstly, the movement and position of the AR system within the environment needs to be known to be able to compensate the motion of the AR system in order to make placement of the virtual objects stable relative to the real world and to provide overall correct placement of virtual objects. Secondly, an AR system needs to have a notion of the geometry of the real environment to be able to properly integrate virtual objects into the real scene via techniques such as the determination of the occlusion relation between real and virtual objects or context-aware positioning of virtual content. To solve the second problem, the following two approaches have emerged: A simple solution is to create a map of the real scene a priori by whatever means and to then use this map in real-time operation of the AR system. A more challenging, but also more flexible solution is to create a map of the environment dynamically from real time data of sensors of the AR-system. Our target applications are Augmented Reality in-car infotainment systems in which a video of a forward facing camera is augmented. Using map data to determine the geometry of the environment of the vehicle is limited by the fact that currently available digital maps only provide a rather coarse and abstract picture of the world. Furthermore, map coverage and amount of detail vary greatly regionally and between different maps. Hence, the objective of the presented thesis is to obtain the geometry of the environment in real time from vehicle sensors. More specifically, the aim is to obtain the scene geometry by triangulating it from the camera images at different camera positions (i.e. stereo computation) while the vehicle moves. The problem of estimating geometry from camera images where the camera positions are not (exactly) known is investigated in the (overlapping) fields of visual odometry (VO) and structure from motion (SfM). Since Augmented Reality applications have tight latency requirements, it is necessary to obtain an estimate of the current scene geometry for each frame of the video stream without delay. Furthermore, Augmented Reality applications need detailed information about the scene geometry, which means dense (or semi-dense) depth estimation, that is one depth estimate per pixel. The capability of low-latency geometry estimation is currently only found in filter based VO methods, which model the depth estimates of the pixels as the state vector of a probabilistic filter (e.g. Kalman filter). However, such filters maintain a covariance matrix for the uncertainty of the pixel depth estimates whose complexity is quadratic in the number of estimated pixel depths, which causes infeasible complexity for dense depth estimation. To resolve this conflict, the (full) covariance matrix will be replaced by a matrix requiring only linear complexity in processing and storage. This way, filter-based VO methods can be combined with dense estimation techniques and efficiently scaled up to arbitrarily large image sizes while allowing easy parallelization. For treating the covariance matrix of the filter state, two methods are introduced and discussed. These methods are implemented as modifications to the (existing) VO method LSD-SLAM, yielding the "continuous" variant C-LSD-SLAM. In the first method, a diagonal matrix is used as the covariance matrix. In particular, the correlation between different scene point estimates is neglected. For stabilizing the resulting VO method in forward motion, a reweighting scheme is introduced based on how far scene point estimates are moved when reprojecting them from one frame to the next frame. This way, erroneous scene point estimates are prevented from causing the VO method to diverge. The second method for treating the covariance matrix models the correlation of the scene point estimates caused by camera pose uncertainty by approximating the combined influence of all camera pose estimates in a small subspace of the scene point estimates. This subspace has fixed dimension 15, which forces the complexity of the replacement of the covariance matrix to be linear in the number of scene point estimates

    Short-Term Visual Object Tracking in Real-Time

    In the thesis, we propose two novel short-term object tracking methods, the Flock of Trackers (FoT) and the Scale-Adaptive Mean-Shift (ASMS), a framework for fusion of multiple trackers and detector and contributions to the problem of tracker evaluation within the Visual Object Tracking (VOT) initiative. The Flock of Trackers partitions the object of interest to an equally sized parts. For each part, the FoT computes an optical flow correspondence and estimates its reliability. Reliable correspondences are used to robustly estimates a target pose using RANSAC technique, which allows for range of complex rigid transformation (e.g. affine transformation) of a target. The scale-adaptive mean-shift tracker is a gradient optimization method that iteratively moves a search window to the position which minimizes a distance of a appearance model extracted from the search window to the target model. The ASMS propose a theoretically justified modification of the mean-shift framework that addresses one of the drawbacks of the mean-shift trackers which is the fixed size search window, i.e. target scale. Moreover, the ASMS introduce a technique that incorporates a background information into the gradient optimization to reduce tracker failures in presence of background clutter. To take advantage of strengths of the previous methods, we introduce a novel tracking framework HMMTxD that fuses multiple tracking methods together with a proposed feature-based online detector. The framework utilizes a hidden Markov model (HMM) to learn online how well each tracking method performs using sparsely ”annotated” data provided by a detector, which are assumed to be correct, and confidence provided by the trackers. The HMM estimates the probability that a tracker is correct in the current frame given the previously learned HMM model and the current tracker confidence. This tracker fusion alleviates the drawbacks of the individual tracking methods since the HMMTxD learns which trackers are performing well and switch off the rest. All of the proposed trackers were extensively evaluated on several benchmarks and publicly available tracking sequences and achieve excellent results in various evaluation criteria. The FoT achieved state-of-the-art performance in the VOT2013 benchmark, finishing second. Today, the FoT is used as a building block in complex applications such as multi-object tracking frameworks. The ASMS achieved state-of-the-art results in the VOT2015 benchmark and was chosen as the best performing method in terms of a trade-off between performance and running time. The HMMTxD demonstrated state-of-the-art performance in multiple benchmarks (VOT2014, VOT2015 and OTB). The thesis also contributes, and provides an overview, to the Visual Object Tracking (VOT) evaluation methodology. This methodology provides a means for unbiased comparison of different tracking methods across publication, which is crucial for advancement of the state-of-the-art over a longer timespan and also provides a tools for deeper performance analysis of tracking methods. Furthermore, a annual workshops are organized on major computer vision conferences, where the authors are encouraged to submit their novel methods to compete against each other and where the advances in the visual object tracking are discussed.Katedra kybernetik

    The Fifth NASA/DOD Controls-Structures Interaction Technology Conference, part 2

    This publication is a compilation of the papers presented at the Fifth NASA/DoD Controls-Structures Interaction (CSI) Technology Conference held in Lake Tahoe, Nevada, March 3-5, 1992. The conference, which was jointly sponsored by the NASA Office of Aeronautics and Space Technology and the Department of Defense, was organized by the NASA Langley Research Center. The purpose of this conference was to report to industry, academia, and government agencies on the current status of controls-structures interaction technology. The agenda covered ground testing, integrated design, analysis, flight experiments and concepts

    Neural Radiance Fields: Past, Present, and Future

    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation

    Loop closure for topological mapping and navigation with omnidirectional images

    Dans le cadre de la robotique mobile, des progrès significatifs ont été obtenus au cours des trois dernières décennies pour la cartographie et la localisation. La plupart des projets de recherche traitent du problème de SLAM métrique. Les techniques alors développées sont sensibles aux erreurs liées à la dérive ce qui restreint leur utilisation à des environnements de petite échelle. Dans des environnements de grande taille, l utilisation de cartes topologiques, qui sont indépendantes de l information métrique, se présentent comme une alternative aux approches métriques.Cette thèse porte principalement sur le problème de la construction de cartes topologiques pour la navigation de robots mobiles dans des environnements urbains de grande taille, en utilisant des caméras omnidirectionnelles. La principale contribution de cette thèse est la résolution efficace et avec précision du problème de fermeture de boucles, problème qui est au coeur de tout algorithme de cartographie topologique. Le cadre de cartographie topologique éparse / hiérarchique proposé allie une approche de partionnement de séquence d images (ISP) par regroupement des images visuellement similaires dans un noeud avec une approche de détection de fermeture de boucles permettant de connecter ces noeux. Le graphe topologique alors obtenu représente l environnement du robot. L algorithme de fermeture de boucle hiérarchique développé permet d extraire dans un premier temps les noeuds semblables puis, dans un second temps, l image la plus similaire. Cette détection de fermeture de boucles hiérarchique est rendue efficace par le stockage du contenu des cartes éparses sous la forme d une structure de données d indexation appelée fichier inversé hiérarchique (HIF). Nous proposons de combiner le score de pondération TFIDF avec des contraintes spatiales et la fréquence des amers détectés pour obtenir une meilleur robustesse de la fermeture de boucles. Les résultats en terme de densité et précision des cartes obtenues et d efficacité sont évaluées et comparées aux résultats obtenus avec des approches de l état de l art sur des séquences d images omnidirectionnelles acquises en milieu extérieur. Au niveau de la précision des détections de boucles, des résultats similaires ont été observés vis-à-vis des autres approches mais sans étape de vérification utilisant la géométrie épipolaire. Bien qu efficace, l approche basée sur HIF présente des inconvénients comme la faible densité des cartes et le faible taux de détection des boucles. Une seconde technique de fermeture de boucle a alors été développée pour combler ces lacunes. Le problème de la faible densité des cartes est causé par un sur-partionnement de la séquence d images. Celui-ci est résolu en utilisant des vecteurs de descripteurs agrégés localement (VLAD) lors de l étape de ISP. Une mesure de similarité basée sur une contrainte spatiale spécifique à la structure des images omnidirectionnelles a également été développée. Des résultats plus précis sont obtenus, même en présence de peu d appariements. Les taux de réussite sont meilleurs qu avec FABMAP 2.0, la méthode la plus utilisée actuellement, sans étape supplémentaire de vérification géométrique.L environnement est souvent supposé invariant au cours du temps : la carte de l environnement est construite lors d une phase d apprentissage puis n est pas modifiée ensuite. Une gestion de la mémoire à long terme est nécessaire pour prendre en compte les modifications dans l environnement au cours du temps. La deuxième contribution de cette thèse est la formulation d une approche de gestion de la mémoire visuelle à long terme qui peut être utilisée dans le cadre de cartes visuelles topologiques et métriques. Les premiers résultats obtenus sont encourageants. (...)Over the last three decades, research in mobile robotic mapping and localization has seen significant progress. However, most of the research projects these problems into the SLAM framework while trying to map and localize metrically. As metrical mapping techniques are vulnerable to errors caused by drift, their ability to produce consistent maps is limited to small scale environments. Consequently, topological mapping approaches which are independent of metrical information stand as an alternative to metrical approaches in large scale environments. This thesis mainly deals with the loop closure problem which is the crux of any topological mapping algorithm. Our main aim is to solve the loop closure problem efficiently and accurately using an omnidirectional imaging sensor. Sparse topological maps can be built by representing groups of visually similar images of a sequence as nodes of a topological graph. We propose a sparse / hierarchical topological mapping framework which uses Image Sequence Partitioning (ISP) to group visually similar images of a sequence as nodes which are then connected on occurrence of loop closures to form a topological graph. A hierarchical loop closure algorithm that can first retrieve the similar nodes and then perform an image similarity analysis on the retrieved nodes is used. An indexing data structure called Hierarchical Inverted File (HIF) is proposed to store the sparse maps to facilitate an efficient hierarchical loop closure. TFIDF weighting is combined with spatial and frequency constraints on the detected features for improved loop closure robustness. Sparsity, efficiency and accuracy of the resulting maps are evaluated and compared to that of the other two existing techniques on publicly available outdoor omni-directional image sequences. Modest loop closure recall rates have been observed without using the epi-polar geometry verification step common in other approaches. Although efficient, the HIF based approach has certain disadvantages like low sparsity of maps and low recall rate of loop closure. To address these shortcomings, another loop closure technique using spatial constraint based similarity measure on omnidirectional images has been proposed. The low sparsity of maps caused by over-partitioning of the input sequence has been overcome by using Vector of Locally Aggregated Descriptors (VLAD) for ISP. Poor resolution of the omnidirectional images causes fewer feature matches in image pairs resulting in reduced recall rates. A spatial constraint exploiting the omnidirectional image structure is used for feature matching which gives accurate results even with fewer feature matches. Recall rates better than the contemporary FABMAP 2.0 approach have been observed without the additional geometric verification. The second contribution of this thesis is the formulation of a visual memory management approach suitable for long term operability of mobile robots. The formulated approach is suitable for both topological and metrical visual maps. Initial results which demonstrate the capabilities of this approach have been provided. Finally, a detailed description of the acquisition and construction of our multi-sensor dataset is provided. The aim of this dataset is to serve the researchers working in the mobile robotics and vision communities for evaluating applications like visual SLAM, mapping and visual odometry. This is the first dataset with omnidirectional images acquired on a car-like vehicle driven along a trajectory with multiple loops. The dataset consists of 6 sequences with data from 11 sensors including 7 cameras, stretching 18 kilometers in a semi-urban environmental setting with complete and precise ground-truth.CLERMONT FD-Bib.électronique (631139902) / SudocSudocFranceF

    Robust hybrid control for autonomous vehicle motion planning

    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2001. This dissertation focuses on the problem of motion planning for agile autonomous vehicles. In realistic situations, the motion planning problem must be solved in real-time, in a dynamic and uncertain environment. The fulfillment of the mission objectives might also require the exploitation of the full maneuvering capabilities of the vehicle. The main contribution of the dissertation is the development of a new computational and modelling framework (the Maneuver Automaton), and related algorithms, for steering underactuated, nonholonomic mechanical systems. The proposed approach is based on a quantization of the system's dynamics, by which the feasible nominal system trajectories are restricted to the family of curves that can be obtained by the interconnection of suitably defined primitives. This can be seen as a formalization of the concept of "maneuver", allowing for the construction of a framework amenable to mathematical programming. This motion planning framework is applicable to all time-invariant dynamical systems which admit dynamic symmetries and relative equilibria. No other assumptions are made on the dynamics, thus resulting in exact motion planning techniques of general applicability. Building on a relatively expensive off-line computation phase, we provide algorithms viable for real-time applications. A fundamental advantage of this approach is the ability to provide a mathematical foundation for generating a provably stable and consistent hierarchical system, and for developing the tools to analyze the robustness of the system in the presence of uncertainty and/or disturbances.(cont.) In the second part of the dissertation, a randomized algorithm is proposed for real-time motion planning in a dynamic environment. By employing the optimal control solution in a free space developed for the maneuver automaton (or for any other general system), we present a motion planning algorithm with probabilistic convergence and performance guarantees, and hard safety guarantees, even in the face of finite computation times. The proposed methodologies are applicable to a very large class of autonomous vehicles: throughout the dissertation, examples, simulation and experimental results are presented and discussed, involving a variety of mechanical systems, ranging from simple academic examples and laboratory setups, to detailed models of small autonomous helicopters.by Emilio Frazzoli.

    Single-target tracking of arbitrary objects using multi-layered features and contextual information

    This thesis investigated single-target tracking of arbitrary objects. Tracking is a difficult problem due to a variety of challenges such as significant deformations of the target, occlusions, illumination variations, background clutter and camouflage. To achieve robust tracking performance under these severe conditions, this thesis proposed firstly a novel RGB single-target tracker which models the target with multi-layered features and contextual information. The proposed algorithm was tested on two different tracking benchmarks, i.e., VTB and VOT, where it demonstrated significantly more robust performance than other state-of-the-art RGB trackers. Proposed secondly was an extension of the designed RGB tracker to handle RGB-D images using both temporal and spatial constraints to exploit depth information more robustly. For evaluation, the thesis introduced a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis, on which the proposed tracker achieved the best results. Proposed thirdly was a new tracking approach to handle camouflage problems in highly cluttered scenes exploiting global dynamic constraints from the context. To evaluate the tracker, a benchmark dataset was augmented with a new set of clutter sub-attributes. Using this dataset, it was demonstrated that the proposed method outperforms other state-of-the-art single target trackers on highly cluttered scenes