338 research outputs found

    Super-resolution:A comprehensive survey

    Get PDF

    Towards Robust Visual Localization in Challenging Conditions

    Get PDF
    Visual localization is a fundamental problem in computer vision, with a multitude of applications in robotics, augmented reality and structure-from-motion. The basic problem is to, based on one or more images, figure out the position and orientation of the camera which captured these images relative to some model of the environment. Current visual localization approaches typically work well when the images to be localized are captured under similar conditions compared to those captured during mapping. However, when the environment exhibits large changes in visual appearance, due to e.g. variations in weather, seasons, day-night or viewpoint, the traditional pipelines break down. The reason is that the local image features used are based on low-level pixel-intensity information, which is not invariant to these transformations: when the environment changes, this will cause a different set of keypoints to be detected, and their descriptors will be different, making the long-term visual localization problem a challenging one. In this thesis, four papers are included, which present work towards solving the problem of long-term visual localization. Three of the articles present ideas for how semantic information may be included to aid in the localization process: one approach relies only on the semantic information for visual localization, another shows how the semantics can be used to detect outlier feature correspondences, while the third presents a sequential localization algorithm which relies on the consistency of the reprojection of a semantic model, instead of traditional features. The final article is a benchmark paper, where we present three new benchmark datasets aimed at evaluating localization algorithms in the context of long-term visual localization

    Patch-based methods for variational image processing problems

    Get PDF
    Image Processing problems are notoriously difficult. To name a few of these difficulties, they are usually ill-posed, involve a huge number of unknowns (from one to several per pixel!), and images cannot be considered as the linear superposition of a few physical sources as they contain many different scales and non-linearities. However, if one considers instead of images as a whole small blocks (or patches) inside the pictures, many of these hurdles vanish and problems become much easier to solve, at the cost of increasing again the dimensionality of the data to process. Following the seminal NL-means algorithm in 2005-2006, methods that consider only the visual correlation between patches and ignore their spatial relationship are called non-local methods. While powerful, it is an arduous task to define non-local methods without using heuristic formulations or complex mathematical frameworks. On the other hand, another powerful property has brought global image processing algorithms one step further: it is the sparsity of images in well chosen representation basis. However, this property is difficult to embed naturally in non-local methods, yielding algorithms that are usually inefficient or circonvoluted. In this thesis, we explore alternative approaches to non-locality, with the goals of i) developing universal approaches that can handle local and non-local constraints and ii) leveraging the qualities of both non-locality and sparsity. For the first point, we will see that embedding the patches of an image into a graph-based framework can yield a simple algorithm that can switch from local to non-local diffusion, which we will apply to the problem of large area image inpainting. For the second point, we will first study a fast patch preselection process that is able to group patches according to their visual content. This preselection operator will then serve as input to a social sparsity enforcing operator that will create sparse groups of jointly sparse patches, thus exploiting all the redundancies present in the data, in a simple mathematical framework. Finally, we will study the problem of reconstructing plausible patches from a few binarized measurements. We will show that this task can be achieved in the case of popular binarized image keypoints descriptors, thus demonstrating a potential privacy issue in mobile visual recognition applications, but also opening a promising way to the design and the construction of a new generation of smart cameras

    Visual-Inertial State Estimation With Information Deficiency

    Get PDF
    State estimation is an essential part of intelligent navigation and mapping systems where tracking the location of a smartphone, car, robot, or a human-worn device is required. For autonomous systems such as micro aerial vehicles and self-driving cars, it is a prerequisite for control and motion planning. For AR/VR applications, it is the first step to image rendering. Visual-inertial odometry (VIO) is the de-facto standard algorithm for embedded platforms because it lends itself to lightweight sensors and processors, and maturity in research and industrial development. Various approaches have been proposed to achieve accurate real-time tracking, and numerous open-source software and datasets are available. However, errors and outliers are common due to the complexity of visual measurement processes and environmental changes, and in practice, estimation drift is inevitable. In this thesis, we introduce the concept of information deficiency in state estimation and how to utilize this concept to develop and improve VIO systems. We look into the information deficiencies in visual-inertial state estimation, which are often present and ignored, causing system failures and drift. In particular, we investigate three critical cases of information deficiency in visual-inertial odometry: low texture environment with limited computation, monocular visual odometry, and inertial odometry. We consider these systems under three specific application settings: a lightweight quadrotor platform in autonomous flight, driving scenarios, and AR/VR headset for pedestrians. We address the challenges in each application setting and explore how the tight fusion of deep learning and model-based VIO can improve the state-of-the-art system performance and compensate for the lack of information in real-time. We identify deep learning as a key technology in tackling the information deficiencies in state estimation. We argue that developing hybrid frameworks that leverage its advantage and enable supervision for performance guarantee provides the most accurate and robust solution to state estimation

    Fusion de données capteurs étendue pour applications vidéo embarquées

    Get PDF
    This thesis deals with sensor fusion between camera and inertial sensors measurements in order to provide a robust motion estimation algorithm for embedded video applications. The targeted platforms are mainly smartphones and tablets. We present a real-time, 2D online camera motion estimation algorithm combining inertial and visual measurements. The proposed algorithm extends the preemptive RANSAC motion estimation procedure with inertial sensors data, introducing a dynamic lagrangian hybrid scoring of the motion models, to make the approach adaptive to various image and motion contents. All these improvements are made with little computational cost, keeping the complexity of the algorithm low enough for embedded platforms. The approach is compared with pure inertial and pure visual procedures. A novel approach to real-time hybrid monocular visual-inertial odometry for embedded platforms is introduced. The interaction between vision and inertial sensors is maximized by performing fusion at multiple levels of the algorithm. Through tests conducted on sequences with ground-truth data specifically acquired, we show that our method outperforms classical hybrid techniques in ego-motion estimation.Le travail réalisé au cours de cette thÚse se concentre sur la fusion des données d'une caméra et de capteurs inertiels afin d'effectuer une estimation robuste de mouvement pour des applications vidéos embarquées. Les appareils visés sont principalement les téléphones intelligents et les tablettes. On propose une nouvelle technique d'estimation de mouvement 2D temps réel, qui combine les mesures visuelles et inertielles. L'approche introduite se base sur le RANSAC préemptif, en l'étendant via l'ajout de capteurs inertiels. L'évaluation des modÚles de mouvement se fait selon un score hybride, un lagrangien dynamique permettant une adaptation à différentes conditions et types de mouvements. Ces améliorations sont effectuées à faible coût, afin de permettre une implémentation sur plateforme embarquée. L'approche est comparée aux méthodes visuelles et inertielles. Une nouvelle méthode d'odométrie visuelle-inertielle temps réelle est présentée. L'interaction entre les données visuelles et inertielles est maximisée en effectuant la fusion dans de multiples étapes de l'algorithme. A travers des tests conduits sur des séquences acquises avec la vérité terrain, nous montrons que notre approche produit des résultats supérieurs aux techniques classiques de l'état de l'art

    ă‚čăƒšă‚Żăƒˆăƒ«ăźç·šćœąæ€§ă‚’è€ƒæ…źă—ăŸăƒă‚€ăƒ‘ăƒŒă‚čăƒšă‚Żăƒˆăƒ©ăƒ«ç”»ćƒăźăƒŽă‚€ă‚șé™€ćŽ»ăšă‚ąăƒłăƒŸă‚­ă‚·ăƒłă‚°ă«é–ąă™ă‚‹ç ”ç©¶

    Get PDF
    This study aims to generalize color line to M-dimensional spectral line feature (M>3) and introduce methods for denoising and unmixing of hyperspectral images based on the spectral linearity.For denoising, we propose a local spectral component decomposition method based on the spectral line. We first calculate the spectral line of an M-channel image, then using the line, we decompose the image into three components: a single M-channel image and two gray-scale images. By virtue of the decomposition, the noise is concentrated on the two images, thus the algorithm needs to denoise only two grayscale images, regardless of the number of channels. For unmixing, we propose an algorithm that exploits the low-rank local abundance by applying the unclear norm to the abundance matrix for local regions of spatial and abundance domains. In optimization problem, the local abundance regularizer is collaborated with the L2, 1 norm and the total variation.挗äčć·žćž‚立性

    Dynamic Scene Reconstruction and Understanding

    Get PDF
    Traditional approaches to 3D reconstruction have achieved remarkable progress in static scene acquisition. The acquired data serves as priors or benchmarks for many vision and graphics tasks, such as object detection and robotic navigation. Thus, obtaining interpretable and editable representations from a raw monocular RGB-D video sequence is an outstanding goal in scene understanding. However, acquiring an interpretable representation becomes significantly more challenging when a scene contains dynamic activities; for example, a moving camera, rigid object movement, and non-rigid motions. These dynamic scene elements introduce a scene factorization problem, i.e., dividing a scene into elements and jointly estimating elements’ motion and geometry. Moreover, the monocular setting brings in the problems of tracking and fusing partially occluded objects as they are scanned from one viewpoint at a time. This thesis explores several ideas for acquiring an interpretable model in dynamic environments. Firstly, we utilize synthetic assets such as floor plans and object meshes to generate dynamic data for training and evaluation. Then, we explore the idea of learning geometry priors with an instance segmentation module, which predicts the location and grouping of indoor objects. We use the learned geometry priors to infer the occluded object geometry for tracking and reconstruction. While instance segmentation modules usually have a generalization issue, i.e., struggling to handle unknown objects, we observed that the empty space information in the background geometry is more reliable for detecting moving objects. Thus, we proposed a segmentation-by-reconstruction strategy for acquiring rigidly-moving objects and backgrounds. Finally, we present a novel neural representation to learn a factorized scene representation, reconstructing every dynamic element. The proposed model supports both rigid and non-rigid motions without pre-trained templates. We demonstrate that our systems and representation improve the reconstruction quality on synthetic test sets and real-world scans

    Autonomous vision-based terrain-relative navigation for planetary exploration

    Get PDF
    Abstract: The interest of major space agencies in the world for vision sensors in their mission designs has been increasing over the years. Indeed, cameras offer an efficient solution to address the ever-increasing requirements in performance. In addition, these sensors are multipurpose, lightweight, proven and a low-cost technology. Several researchers in vision sensing for space application currently focuse on the navigation system for autonomous pin-point planetary landing and for sample and return missions to small bodies. In fact, without a Global Positioning System (GPS) or radio beacon around celestial bodies, high-accuracy navigation around them is a complex task. Most of the navigation systems are based only on accurate initialization of the states and on the integration of the acceleration and the angular rate measurements from an Inertial Measurement Unit (IMU). This strategy can track very accurately sudden motions of short duration, but their estimate diverges in time and leads normally to high landing error. In order to improve navigation accuracy, many authors have proposed to fuse those IMU measurements with vision measurements using state estimators, such as Kalman filters. The first proposed vision-based navigation approach relies on feature tracking between sequences of images taken in real time during orbiting and/or landing operations. In that case, image features are image pixels that have a high probability of being recognized between images taken from different camera locations. By detecting and tracking these features through a sequence of images, the relative motion of the spacecraft can be determined. This technique, referred to as Terrain-Relative Relative Navigation (TRRN), relies on relatively simple, robust and well-developed image processing techniques. It allows the determination of the relative motion (velocity) of the spacecraft. Despite the fact that this technology has been demonstrated with space qualified hardware, its gain in accuracy remains limited since the spacecraft absolute position is not observable from the vision measurements. The vision-based navigation techniques currently studied consist in identifying features and in mapping them into an on-board cartographic database indexed by an absolute coordinate system, thereby providing absolute position determination. This technique, referred to as Terrain-Relative Absolute Navigation (TRAN), relies on very complex Image Processing Software (IPS) having an obvious lack of robustness. In fact, these software depend often on the spacecraft attitude and position, they are sensitive to illumination conditions (the elevation and azimuth of the Sun when the geo-referenced database is built must be similar to the ones present during mission), they are greatly influenced by the image noise and finally they hardly manage multiple varieties of terrain seen during the same mission (the spacecraft can fly over plain zone as well as mountainous regions, the images may contain old craters with noisy rims as well as young crater with clean rims and so on). At this moment, no real-time hardware-in-the-loop experiment has been conducted to demonstrate the applicability of this technology to space mission. The main objective of the current study is to develop autonomous vision-based navigation algorithms that provide absolute position and surface-relative velocity during the proximity operations of a planetary mission (orbiting phase and landing phase) using a combined approach of TRRN and TRAN technologies. The contributions of the study are: (1) reference mission definition, (2) advancements in the TRAN theory (image processing as well as state estimation) and (3) practical implementation of vision-based navigation.RĂ©sumĂ©: L’intĂ©rĂȘt des principales agences spatiales envers les technologies basĂ©es sur la vision artificielle ne cesse de croĂźtre. En effet, les camĂ©ras offrent une solution efficace pour rĂ©pondre aux exigences de performance, toujours plus Ă©levĂ©es, des missions spatiales. De surcroĂźt, ces capteurs sont multi-usages, lĂ©gers, Ă©prouvĂ©s et peu coĂ»teux. Plusieurs chercheurs dans le domaine de la vision artificielle se concentrent actuellement sur les systĂšmes autonomes pour l’atterrissage de prĂ©cision sur des planĂštes et sur les missions d’échantillonnage sur des astĂ©roĂŻdes. En effet, sans systĂšme de positionnement global « Global Positioning System (GPS) » ou de balises radio autour de ces corps cĂ©lestes, la navigation de prĂ©cision est une tĂąche trĂšs complexe. La plupart des systĂšmes de navigation sont basĂ©s seulement sur l’intĂ©gration des mesures provenant d’une centrale inertielle. Cette stratĂ©gie peut ĂȘtre utilisĂ©e pour suivre les mouvements du vĂ©hicule spatial seulement sur une courte durĂ©e, car les donnĂ©es estimĂ©es divergent rapidement. Dans le but d’amĂ©liorer la prĂ©cision de la navigation, plusieurs auteurs ont proposĂ© de fusionner les mesures provenant de la centrale inertielle avec des mesures d’images du terrain. Les premiers algorithmes de navigation utilisant l’imagerie du terrain qui ont Ă©tĂ© proposĂ©s reposent sur l’extraction et le suivi de traits caractĂ©ristiques dans une sĂ©quence d’images prises en temps rĂ©el pendant les phases d’orbite et/ou d’atterrissage de la mission. Dans ce cas, les traits caractĂ©ristiques de l’image correspondent Ă  des pixels ayant une forte probabilitĂ© d’ĂȘtre reconnus entre des images prises avec diffĂ©rentes positions de camĂ©ra. En dĂ©tectant et en suivant ces traits caractĂ©ristiques, le dĂ©placement relatif du vĂ©hicule (la vitesse) peut ĂȘtre dĂ©terminĂ©. Ces techniques, nommĂ©es navigation relative, utilisent des algorithmes de traitement d’images robustes, faciles Ă  implĂ©menter et bien dĂ©veloppĂ©s. Bien que cette technologie a Ă©tĂ© Ă©prouvĂ©e sur du matĂ©riel de qualitĂ© spatiale, le gain en prĂ©cision demeure limitĂ© Ă©tant donnĂ© que la position absolue du vĂ©hicule n’est pas observable dans les mesures extraites de l’image. Les techniques de navigation basĂ©es sur la vision artificielle actuellement Ă©tudiĂ©es consistent Ă  identifier des traits caractĂ©ristiques dans l’image pour les apparier avec ceux contenus dans une base de donnĂ©es gĂ©o-rĂ©fĂ©rencĂ©es de maniĂšre Ă  fournir une mesure de position absolue au filtre de navigation. Cependant, cette technique, nommĂ©e navigation absolue, implique l’utilisation d’algorithmes de traitement d’images trĂšs complexes souffrant pour le moment des problĂšmes de robustesse. En effet, ces algorithmes dĂ©pendent souvent de la position et de l’attitude du vĂ©hicule. Ils sont trĂšs sensibles aux conditions d’illuminations (l’élĂ©vation et l’azimut du Soleil prĂ©sents lorsque la base de donnĂ©es gĂ©o-rĂ©fĂ©rencĂ©e est construite doit ĂȘtre similaire Ă  ceux observĂ©s pendant la mission). Ils sont grandement influencĂ©s par le bruit dans l’image et enfin ils supportent mal les multiples variĂ©tĂ©s de terrain rencontrĂ©es pendant la mĂȘme mission (le vĂ©hicule peut survoler autant des zones de plaine que des rĂ©gions montagneuses, les images peuvent contenir des vieux cratĂšres avec des contours flous aussi bien que des cratĂšres jeunes avec des contours bien dĂ©finis, etc.). De plus, actuellement, aucune expĂ©rimentation en temps rĂ©el et sur du matĂ©riel de qualitĂ© spatiale n’a Ă©tĂ© rĂ©alisĂ©e pour dĂ©montrer l’applicabilitĂ© de cette technologie pour les missions spatiales. Par consĂ©quent, l’objectif principal de ce projet de recherche est de dĂ©velopper un systĂšme de navigation autonome par imagerie du terrain qui fournit la position absolue et la vitesse relative au terrain d’un vĂ©hicule spatial pendant les opĂ©rations Ă  basse altitude sur une planĂšte. Les contributions de ce travail sont : (1) la dĂ©finition d’une mission de rĂ©fĂ©rence, (2) l’avancement de la thĂ©orie de la navigation par imagerie du terrain (algorithmes de traitement d’images et estimation d’états) et (3) implĂ©mentation pratique de cette technologie

    Robust Algorithms for Low-Rank and Sparse Matrix Models

    Full text link
    Data in statistical signal processing problems is often inherently matrix-valued, and a natural first step in working with such data is to impose a model with structure that captures the distinctive features of the underlying data. Under the right model, one can design algorithms that can reliably tease weak signals out of highly corrupted data. In this thesis, we study two important classes of matrix structure: low-rankness and sparsity. In particular, we focus on robust principal component analysis (PCA) models that decompose data into the sum of low-rank and sparse (in an appropriate sense) components. Robust PCA models are popular because they are useful models for data in practice and because efficient algorithms exist for solving them. This thesis focuses on developing new robust PCA algorithms that advance the state-of-the-art in several key respects. First, we develop a theoretical understanding of the effect of outliers on PCA and the extent to which one can reliably reject outliers from corrupted data using thresholding schemes. We apply these insights and other recent results from low-rank matrix estimation to design robust PCA algorithms with improved low-rank models that are well-suited for processing highly corrupted data. On the sparse modeling front, we use sparse signal models like spatial continuity and dictionary learning to develop new methods with important adaptive representational capabilities. We also propose efficient algorithms for implementing our methods, including an extension of our dictionary learning algorithms to the online or sequential data setting. The underlying theme of our work is to combine ideas from low-rank and sparse modeling in novel ways to design robust algorithms that produce accurate reconstructions from highly undersampled or corrupted data. We consider a variety of application domains for our methods, including foreground-background separation, photometric stereo, and inverse problems such as video inpainting and dynamic magnetic resonance imaging.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143925/1/brimoor_1.pd
    • 

    corecore