502 research outputs found

    Calibration of non-conventional imaging systems

    Get PDF

    Modeling the environment with egocentric vision systems

    Get PDF
    Cada vez más sistemas autónomos, ya sean robots o sistemas de asistencia, están presentes en nuestro día a día. Este tipo de sistemas interactúan y se relacionan con su entorno y para ello necesitan un modelo de dicho entorno. En función de las tareas que deben realizar, la información o el detalle necesario del modelo varía. Desde detallados modelos 3D para sistemas de navegación autónomos, a modelos semánticos que incluyen información importante para el usuario como el tipo de área o qué objetos están presentes. La creación de estos modelos se realiza a través de las lecturas de los distintos sensores disponibles en el sistema. Actualmente, gracias a su pequeño tamaño, bajo precio y la gran información que son capaces de capturar, las cámaras son sensores incluidos en todos los sistemas autónomos. El objetivo de esta tesis es el desarrollar y estudiar nuevos métodos para la creación de modelos del entorno a distintos niveles semánticos y con distintos niveles de precisión. Dos puntos importantes caracterizan el trabajo desarrollado en esta tesis: - El uso de cámaras con punto de vista egocéntrico o en primera persona ya sea en un robot o en un sistema portado por el usuario (wearable). En este tipo de sistemas, las cámaras son solidarias al sistema móvil sobre el que van montadas. En los últimos años han aparecido muchos sistemas de visión wearables, utilizados para multitud de aplicaciones, desde ocio hasta asistencia de personas. - El uso de sistemas de visión omnidireccional, que se distinguen por su gran campo de visión, incluyendo mucha más información en cada imagen que las cámara convencionales. Sin embargo plantean nuevas dificultades debido a distorsiones y modelos de proyección más complejos. Esta tesis estudia distintos tipos de modelos del entorno: - Modelos métricos: el objetivo de estos modelos es crear representaciones detalladas del entorno en las que localizar con precisión el sistema autónomo. Ésta tesis se centra en la adaptación de estos modelos al uso de visión omnidireccional, lo que permite capturar más información en cada imagen y mejorar los resultados en la localización. - Modelos topológicos: estos modelos estructuran el entorno en nodos conectados por arcos. Esta representación tiene menos precisión que la métrica, sin embargo, presenta un nivel de abstracción mayor y puede modelar el entorno con más riqueza. %, por ejemplo incluyendo el tipo de área de cada nodo, la localización de objetos importantes o el tipo de conexión entre los distintos nodos. Esta tesis se centra en la creación de modelos topológicos con información adicional sobre el tipo de área de cada nodo y conexión (pasillo, habitación, puertas, escaleras...). - Modelos semánticos: este trabajo también contribuye en la creación de nuevos modelos semánticos, más enfocados a la creación de modelos para aplicaciones en las que el sistema interactúa o asiste a una persona. Este tipo de modelos representan el entorno a través de conceptos cercanos a los usados por las personas. En particular, esta tesis desarrolla técnicas para obtener y propagar información semántica del entorno en secuencias de imágen

    Geodesic Active Fields - A Geometric Framework for Image Registration

    Get PDF
    In this paper we present a novel geometric framework called geodesic active fields for general image registration. In image registration, one looks for the underlying deformation field that best maps one image onto another. This is a classic ill-posed inverse problem, which is usually solved by adding a regularization term. Here, we propose a multiplicative coupling between the registration term and the regularization term, which turns out to be equivalent to embed the deformation field in a weighted minimal surface problem. Then, the deformation field is driven by a minimization flow toward a harmonic map corresponding to the solution of the registration problem. This proposed approach for registration shares close similarities with the well-known geodesic active contours model in image segmentation, where the segmentation term (the edge detector function) is coupled with the regularization term (the length functional) via multiplication as well. As a matter of fact, our proposed geometric model is actually the exact mathematical generalization to vector fields of the weighted length problem for curves and surfaces introduced by Caselles-Kimmel-Sapiro. The energy of the deformation field is measured with the Polyakov energy weighted by a suitable image distance, borrowed from standard registration models. We investigate three different weighting functions, the squared error and the approximated absolute error for monomodal images, and the local joint entropy for multimodal images. As compared to specialized state-of-the-art methods tailored for specific applications, our geometric framework involves important contributions. Firstly, our general formulation for registration works on any parametrizable, smooth and differentiable surface, including non-flat and multiscale images. In the latter case, multiscale images are registered at all scales simultaneously, and the relations between space and scale are intrinsically being accounted for. Secondly, this method is, to the best of our knowledge, the first re-parametrization invariant registration method introduced in the literature. Thirdly, the multiplicative coupling between the registration term, i.e. local image discrepancy, and the regularization term naturally results in a data-dependent tuning of the regularization strength. Finally, by choosing the metric on the deformation field one can freely interpolate between classic Gaussian and more interesting anisotropic, TV-like regularization

    Mobile Robots Navigation

    Get PDF
    Mobile robots navigation includes different interrelated activities: (i) perception, as obtaining and interpreting sensory information; (ii) exploration, as the strategy that guides the robot to select the next direction to go; (iii) mapping, involving the construction of a spatial representation by using the sensory information perceived; (iv) localization, as the strategy to estimate the robot position within the spatial map; (v) path planning, as the strategy to find a path towards a goal location being optimal or not; and (vi) path execution, where motor actions are determined and adapted to environmental changes. The book addresses those activities by integrating results from the research work of several authors all over the world. Research cases are documented in 32 chapters organized within 7 categories next described

    Vision Sensors and Edge Detection

    Get PDF
    Vision Sensors and Edge Detection book reflects a selection of recent developments within the area of vision sensors and edge detection. There are two sections in this book. The first section presents vision sensors with applications to panoramic vision sensors, wireless vision sensors, and automated vision sensor inspection, and the second one shows image processing techniques, such as, image measurements, image transformations, filtering, and parallel computing

    Combining Features and Semantics for Low-level Computer Vision

    Get PDF
    Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem Verständnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die Schätzung der Bewegung von Videokameras sind von größter Bedeutung für Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfür sind reflektierende und texturlose Oberflächen oder große Bewegungen, bei denen herkömmliche lokale Methoden häufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige Lichtverhältnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. Für die binokulare Stereo Schätzung schlagen wir zuallererst vor zusammenhängende Bereiche mit objektklassen-spezifischen Disparitäts Vorschlägen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spärlichen Disparitätsschätzung und semantischen Segmentierung des Bildes erhalten. Die Disparitäts Vorschläge kodieren die Tatsache, dass die Gegenstände bestimmter Kategorien nicht willkürlich geformt sind, sondern typischerweise regelmäßige Strukturen aufweisen. Wir integrieren sie für die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir für die 3D-Rekonstruktion die Tatsache, dass mit der Größe der rekonstruierten Fläche auch die Wahrscheinlichkeit steigt, Objekte von ähnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders für Szenen im Freien, in denen Gebäude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber ähnlichkeit in der Form aufweisen. Wir nutzen diese ähnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, während fehlende Flächen vervollständigt werden, da Objekte ähnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstädtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen Abhängigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, präsentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusätzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. Für das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm für eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes für das Feature Matching

    Método para el registro automático de imágenes basado en transformaciones proyectivas planas dependientes de las distancias y orientado a imágenes sin características comunes

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Físicas, Departamento de Arquitectura de Computadores y Automática, leída el 18-12-2015Multisensory data fusion oriented to image-based application improves the accuracy, quality and availability of the data, and consequently, the performance of robotic systems, by means of combining the information of a scene acquired from multiple and different sources into a unified representation of the 3D world scene, which is more enlightening and enriching for the subsequent image processing, improving either the reliability by using the redundant information, or the capability by taking advantage of complementary information. Image registration is one of the most relevant steps in image fusion techniques. This procedure aims the geometrical alignment of two or more images. Normally, this process relies on feature-matching techniques, which is a drawback for combining sensors that are not able to deliver common features. For instance, in the combination of ToF and RGB cameras, the robust feature-matching is not reliable. Typically, the fusion of these two sensors has been addressed from the computation of the cameras calibration parameters for coordinate transformation between them. As a result, a low resolution colour depth map is provided. For improving the resolution of these maps and reducing the loss of colour information, extrapolation techniques are adopted. A crucial issue for computing high quality and accurate dense maps is the presence of noise in the depth measurement from the ToF camera, which is normally reduced by means of sensor calibration and filtering techniques. However, the filtering methods, implemented for the data extrapolation and denoising, usually over-smooth the data, reducing consequently the accuracy of the registration procedure...La fusión multisensorial orientada a aplicaciones de procesamiento de imágenes, conocida como fusión de imágenes, es una técnica que permite mejorar la exactitud, la calidad y la disponibilidad de datos de un entorno tridimensional, que a su vez permite mejorar el rendimiento y la operatividad de sistemas robóticos. Dicha fusión, se consigue mediante la combinación de la información adquirida por múltiples y diversas fuentes de captura de datos, la cual se agrupa del tal forma que se obtiene una mejor representación del entorno 3D, que es mucho más ilustrativa y enriquecedora para la implementación de métodos de procesamiento de imágenes. Con ello se consigue una mejora en la fiabilidad y capacidad del sistema, empleando la información redundante que ha sido adquirida por múltiples sensores. El registro de imágenes es uno de los procedimientos más importantes que componen la fusión de imágenes. El objetivo principal del registro de imágenes es la consecución de la alineación geométrica entre dos o más imágenes. Normalmente, este proceso depende de técnicas de búsqueda de patrones comunes entre imágenes, lo cual puede ser un inconveniente cuando se combinan sensores que no proporcionan datos con características similares. Un ejemplo de ello, es la fusión de cámaras de color de alta resolución (RGB) con cámaras de Tiempo de Vuelo de baja resolución (Time-of-Flight (ToF)), con las cuales no es posible conseguir una detección robusta de patrones comunes entre las imágenes capturadas por ambos sensores. Por lo general, la fusión entre estas cámaras se realiza mediante el cálculo de los parámetros de calibración de las mismas, que permiten realizar la trasformación homogénea entre ellas. Y como resultado de este xii Abstract procedimiento, se obtienen mapas de profundad y de color de baja resolución. Con el objetivo de mejorar la resolución de estos mapas y de evitar la pérdida de información de color, se utilizan diversas técnicas de extrapolación de datos. Un factor crucial a tomar en cuenta para la obtención de mapas de alta calidad y alta exactitud, es la presencia de ruido en las medidas de profundidad obtenidas por las cámaras ToF. Este problema, normalmente se reduce mediante la calibración de estos sensores y con técnicas de filtrado de datos. Sin embargo, las técnicas de filtrado utilizadas, tanto para la interpolación de datos, como para la reducción del ruido, suelen producir el sobre-alisamiento de los datos originales, lo cual reduce la exactitud del registro de imágenes...Sección Deptal. de Arquitectura de Computadores y Automática (Físicas)Fac. de Ciencias FísicasTRUEunpu

    Geodesic Active Fields:A Geometric Framework for Image Registration

    Get PDF
    Image registration is the concept of mapping homologous points in a pair of images. In other words, one is looking for an underlying deformation field that matches one image to a target image. The spectrum of applications of image registration is extremely large: It ranges from bio-medical imaging and computer vision, to remote sensing or geographic information systems, and even involves consumer electronics. Mathematically, image registration is an inverse problem that is ill-posed, which means that the exact solution might not exist or not be unique. In order to render the problem tractable, it is usual to write the problem as an energy minimization, and to introduce additional regularity constraints on the unknown data. In the case of image registration, one often minimizes an image mismatch energy, and adds an additive penalty on the deformation field regularity as smoothness prior. Here, we focus on the registration of the human cerebral cortex. Precise cortical registration is required, for example, in statistical group studies in functional MR imaging, or in the analysis of brain connectivity. In particular, we work with spherical inflations of the extracted hemispherical surface and associated features, such as cortical mean curvature. Spatial mapping between cortical surfaces can then be achieved by registering the respective spherical feature maps. Despite the simplified spherical geometry, inter-subject registration remains a challenging task, mainly due to the complexity and inter-subject variability of the involved brain structures. In this thesis, we therefore present a registration scheme, which takes the peculiarities of the spherical feature maps into particular consideration. First, we realize that we need an appropriate hierarchical representation, so as to coarsely align based on the important structures with greater inter-subject stability, before taking smaller and more variable details into account. Based on arguments from brain morphogenesis, we propose an anisotropic scale-space of mean-curvature maps, built around the Beltrami framework. Second, inspired by concepts from vision-related elements of psycho-physical Gestalt theory, we hypothesize that anisotropic Beltrami regularization better suits the requirements of image registration regularization, compared to traditional Gaussian filtering. Different objects in an image should be allowed to move separately, and regularization should be limited to within the individual Gestalts. We render the regularization feature-preserving by limiting diffusion across edges in the deformation field, which is in clear contrast to the indifferent linear smoothing. We do so by embedding the deformation field as a manifold in higher-dimensional space, and minimize the associated Beltrami energy which represents the hyperarea of this embedded manifold as measure of deformation field regularity. Further, instead of simply adding this regularity penalty to the image mismatch in lieu of the standard penalty, we propose to incorporate the local image mismatch as weighting function into the Beltrami energy. The image registration problem is thus reformulated as a weighted minimal surface problem. This approach has several appealing aspects, including (1) invariance to re-parametrization and ability to work with images defined on non-flat, Riemannian domains (e.g., curved surfaces, scalespaces), and (2) intrinsic modulation of the local regularization strength as a function of the local image mismatch and/or noise level. On a side note, we show that the proposed scheme can easily keep up with recent trends in image registration towards using diffeomorphic and inverse consistent deformation models. The proposed registration scheme, called Geodesic Active Fields (GAF), is non-linear and non-convex. Therefore we propose an efficient optimization scheme, based on splitting. Data-mismatch and deformation field regularity are optimized over two different deformation fields, which are constrained to be equal. The constraint is addressed using an augmented Lagrangian scheme, and the resulting optimization problem is solved efficiently using alternate minimization of simpler sub-problems. In particular, we show that the proposed method can easily compete with state-of-the-art registration methods, such as Demons. Finally, we provide an implementation of the fast GAF method on the sphere, so as to register the triangulated cortical feature maps. We build an automatic parcellation algorithm for the human cerebral cortex, which combines the delineations available on a set of atlas brains in a Bayesian approach, so as to automatically delineate the corresponding regions on a subject brain given its feature map. In a leave-one-out cross-validation study on 39 brain surfaces with 35 manually delineated gyral regions, we show that the pairwise subject-atlas registration with the proposed spherical registration scheme significantly improves the individual alignment of cortical labels between subject and atlas brains, and, consequently, that the estimated automatic parcellations after label fusion are of better quality

    Geometry model for marker-based localisation

    Get PDF
    This work presents a novel approach for position estimation from monocularvision. It has been shown that vision systems have great capability in reaching precise and accurate measurements and are becoming the state-of-the-art innavigation. Navigation systems have only been integrated in commercial mobile robots since the early 2000s, and yet localisation in a dynamic environmentthat form the main building block of navigation, has no truly elegant solution.Solutions are many and their strategies and methods differ depending on theapplication. For the lack of a single accurate procedure, methods are combinedwhich make use of different sensors fusion. This thesis focus on the use of monocular vision sensor to develop an accurate Marker-Based positioning system thatcan be used in various applications in outdoor, in agriculture for example, andin other indoor applications. Many contributions arouse here in this context. Amain contribution is in perspective distortion correction in which distortions aremodeled in all its forms with correction process. This is essential when dealingwith measurements and shapes in images. Because of the lack of robustness indepth sensing using monocular vision-based system, a second contribution is inthe novel spherical marker-based approach position captured, which is designedand developed within the concept of relative pose estimation. In this Model-Basedposition estimation, relative position can be extracted instantaneously withoutthe need of prior knowledge of the previous state of the camera, as it relies onmonocular image. This model can as well compensate for the lack of knowledge inthe scale of the real world, for example in the case of Monocular Visual Simultaneous Localisation and Mapping (VSLAM). In addition to these contributions, someexperimental and simulation evidence presented in this work has shown feasibilityof the reading measurements like distance capture and relative pose between themarker-based model and the observer, with reliability and high accuracy. Thesystem has shown the ability to track accurately the object at a farthest possibleposition from low resolution digital images and from a single viewpoint. Whilethe main application field targeted is tracking mobile-robots, other applicationscan profit from this concept like motion capture and application related to thefield of topography
    corecore