298 research outputs found

    The Extraction and Use of Image Planes for Three-dimensional Metric Reconstruction

    Get PDF
    The three-dimensional (3D) metric reconstruction of a scene from two-dimensional images is a fundamental problem in Computer Vision. The major bottleneck in the process of retrieving such structure lies in the task of recovering the camera parameters. These parameters can be calculated either through a pattern-based calibration procedure, which requires an accurate knowledge of the scene, or using a more flexible approach, known as camera autocalibration, which exploits point correspondences across images. While pattern-based calibration requires the presence of a calibration object, autocalibration constraints are often cast into nonlinear optimization problems which are often sensitive to both image noise and initialization. In addition, autocalibration fails for some particular motions of the camera. To overcome these problems, we propose to combine scene and autocalibration constraints and address in this thesis (a) the problem of extracting geometric information of the scene from uncalibrated images, (b) the problem of obtaining a robust estimate of the affine calibration of the camera, and (c) the problem of upgrading and refining the affine calibration into a metric one. In particular, we propose a method for identifying the major planar structures in a scene from images and another method to recognize parallel pairs of planes whenever these are available. The identified parallel planes are then used to obtain a robust estimate of both the affine and metric 3D structure of the scene without resorting to the traditional error prone calculation of vanishing points. We also propose a refinement method which, unlike existing ones, is capable of simultaneously incorporating plane parallelism and perpendicularity constraints in the autocalibration process. Our experiments demonstrate that the proposed methods are robust to image noise and provide satisfactory results

    3-D Reconstruction of Urban Scenes from Sequences of Images

    Get PDF
    In this paper, we address the problem of the recovery of the Euclidean geometry of a scene from a sequence of images without any prior knowledge either about the parameters of the cameras, or about the motion of the camera(s). We do not require any knowledge of the absolute coordinates of some control points in the scene to achieve this goal. Using various computer vision tools, we establish correspondences between images and recover the epipolar geometry of the set of images, from which we show how to compute the complete set of perspective projection matrices for each camera position. These being known, we proceed to reconstruct the scene. This reconstruction is defined up to an unknown projective transformation (i.e. is parameterized with 15 arbitrary parameters). Next we show how to go from this reconstruction to a more constrained class of reconstructions, defined up to an unknown affine transformation (i.e. parameterized with 12 arbitrary parameters) by exploiting known geometr..

    Comparative Study of Model-Based and Learning-Based Disparity Map Fusion Methods

    Get PDF
    Creating an accurate depth map has several, valuable applications including augmented/virtual reality, autonomous navigation, indoor/outdoor mapping, object segmentation, and aerial topography. Current hardware solutions for precise 3D scanning are relatively expensive. To combat hardware costs, software alternatives based on stereoscopic images have previously been proposed. However, software solutions are less accurate than hardware solutions, such as laser scanning, and are subject to a variety of irregularities. Notably, disparity maps generated from stereo images typically fall short in cases of occlusion, near object boundaries, and on repetitive texture regions or texture-less regions. Several post-processing methods are examined in an effort to combine strong algorithm results and alleviate erroneous disparity regions. These methods include basic statistical combinations, histogram-based voting, edge detection guidance, support vector machines (SVMs), and bagged trees. Individual errors and average errors are compared between the newly introduced fusion methods and the existing disparity algorithms. Several acceptable solutions are identified to bridge the gap between 3D scanning and stereo imaging. It is shown that fusing disparity maps can result in lower error rates than individual algorithms across the dataset while maintaining a high level of robustness

    A theoretical eye model for uncalibrated real-time eye gaze estimation

    Get PDF
    Computer vision systems that monitor human activity can be utilized for many diverse applications. Some general applications stemming from such activity monitoring are surveillance, human-computer interfaces, aids for the handicapped, and virtual reality environments. For most of these applications, a non-intrusive system is desirable, either for reasons of covertness or comfort. Also desirable is generality across users, especially for humancomputer interfaces and surveillance. This thesis presents a method of gaze estimation that, without calibration, determines a relatively unconstrained user’s overall horizontal eye gaze. Utilizing anthropometric data and physiological models, a simple, yet general eye model is presented. The equations that describe the gaze angle of the eye in this model are presented. The procedure for choosing the proper features for gaze estimation is detailed and the algorithms utilized to find these points are described. Results from manual and automatic feature extraction are presented and analyzed. The error observed from this model is around 3± and the error observed from the implementation is around 6±. This amount of error is comparable to previous eye gaze estimation algorithms and it validates this model. The results presented across a set of subjects display consistency, which proves the generality of this model. A real-time implementation that operates around 17 frames per second displays the efficiency of the algorithms implemented. While there are many interesting directions for future work, the goals of this thesis were achieved

    Auto-Calibration and Three-Dimensional Reconstruction for Zooming Cameras

    Get PDF
    This dissertation proposes new algorithms to recover the calibration parameters and 3D structure of a scene, using 2D images taken by uncalibrated stationary zooming cameras. This is a common configuration, usually encountered in surveillance camera networks, stereo camera systems, and event monitoring vision systems. This problem is known as camera auto-calibration (also called self-calibration) and the motivation behind this work is to obtain the Euclidean three-dimensional reconstruction and metric measurements of the scene, using only the captured images. Under this configuration, the problem of auto-calibrating zooming cameras differs from the classical auto-calibration problem of a moving camera in two major aspects. First, the camera intrinsic parameters are changing due to zooming. Second, because cameras are stationary in our case, using classical motion constraints, such as a pure translation for example, is not possible. In order to simplify the non-linear complexity of this problem, i.e., auto-calibration of zooming cameras, we have followed a geometric stratification approach. In particular, we have taken advantage of the movement of the camera center, that results from the zooming process, to locate the plane at infinity and, consequently to obtain an affine reconstruction. Then, using the assumption that typical cameras have rectangular or square pixels, the calculation of the camera intrinsic parameters have become possible, leading to the recovery of the Euclidean 3D structure. Being linear, the proposed algorithms were easily extended to the case of an arbitrary number of images and cameras. Furthermore, we have devised a sufficient constraint for detecting scene parallel planes, a useful information for solving other computer vision problems

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Plenoptic Signal Processing for Robust Vision in Field Robotics

    Get PDF
    This thesis proposes the use of plenoptic cameras for improving the robustness and simplicity of machine vision in field robotics applications. Dust, rain, fog, snow, murky water and insufficient light can cause even the most sophisticated vision systems to fail. Plenoptic cameras offer an appealing alternative to conventional imagery by gathering significantly more light over a wider depth of field, and capturing a rich 4D light field structure that encodes textural and geometric information. The key contributions of this work lie in exploring the properties of plenoptic signals and developing algorithms for exploiting them. It lays the groundwork for the deployment of plenoptic cameras in field robotics by establishing a decoding, calibration and rectification scheme appropriate to compact, lenslet-based devices. Next, the frequency-domain shape of plenoptic signals is elaborated and exploited by constructing a filter which focuses over a wide depth of field rather than at a single depth. This filter is shown to reject noise, improving contrast in low light and through attenuating media, while mitigating occluders such as snow, rain and underwater particulate matter. Next, a closed-form generalization of optical flow is presented which directly estimates camera motion from first-order derivatives. An elegant adaptation of this "plenoptic flow" to lenslet-based imagery is demonstrated, as well as a simple, additive method for rendering novel views. Finally, the isolation of dynamic elements from a static background is considered, a task complicated by the non-uniform apparent motion caused by a mobile camera. Two elegant closed-form solutions are presented dealing with monocular time-series and light field image pairs. This work emphasizes non-iterative, noise-tolerant, closed-form, linear methods with predictable and constant runtimes, making them suitable for real-time embedded implementation in field robotics applications

    Método para el registro automático de imágenes basado en transformaciones proyectivas planas dependientes de las distancias y orientado a imágenes sin características comunes

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Físicas, Departamento de Arquitectura de Computadores y Automática, leída el 18-12-2015Multisensory data fusion oriented to image-based application improves the accuracy, quality and availability of the data, and consequently, the performance of robotic systems, by means of combining the information of a scene acquired from multiple and different sources into a unified representation of the 3D world scene, which is more enlightening and enriching for the subsequent image processing, improving either the reliability by using the redundant information, or the capability by taking advantage of complementary information. Image registration is one of the most relevant steps in image fusion techniques. This procedure aims the geometrical alignment of two or more images. Normally, this process relies on feature-matching techniques, which is a drawback for combining sensors that are not able to deliver common features. For instance, in the combination of ToF and RGB cameras, the robust feature-matching is not reliable. Typically, the fusion of these two sensors has been addressed from the computation of the cameras calibration parameters for coordinate transformation between them. As a result, a low resolution colour depth map is provided. For improving the resolution of these maps and reducing the loss of colour information, extrapolation techniques are adopted. A crucial issue for computing high quality and accurate dense maps is the presence of noise in the depth measurement from the ToF camera, which is normally reduced by means of sensor calibration and filtering techniques. However, the filtering methods, implemented for the data extrapolation and denoising, usually over-smooth the data, reducing consequently the accuracy of the registration procedure...La fusión multisensorial orientada a aplicaciones de procesamiento de imágenes, conocida como fusión de imágenes, es una técnica que permite mejorar la exactitud, la calidad y la disponibilidad de datos de un entorno tridimensional, que a su vez permite mejorar el rendimiento y la operatividad de sistemas robóticos. Dicha fusión, se consigue mediante la combinación de la información adquirida por múltiples y diversas fuentes de captura de datos, la cual se agrupa del tal forma que se obtiene una mejor representación del entorno 3D, que es mucho más ilustrativa y enriquecedora para la implementación de métodos de procesamiento de imágenes. Con ello se consigue una mejora en la fiabilidad y capacidad del sistema, empleando la información redundante que ha sido adquirida por múltiples sensores. El registro de imágenes es uno de los procedimientos más importantes que componen la fusión de imágenes. El objetivo principal del registro de imágenes es la consecución de la alineación geométrica entre dos o más imágenes. Normalmente, este proceso depende de técnicas de búsqueda de patrones comunes entre imágenes, lo cual puede ser un inconveniente cuando se combinan sensores que no proporcionan datos con características similares. Un ejemplo de ello, es la fusión de cámaras de color de alta resolución (RGB) con cámaras de Tiempo de Vuelo de baja resolución (Time-of-Flight (ToF)), con las cuales no es posible conseguir una detección robusta de patrones comunes entre las imágenes capturadas por ambos sensores. Por lo general, la fusión entre estas cámaras se realiza mediante el cálculo de los parámetros de calibración de las mismas, que permiten realizar la trasformación homogénea entre ellas. Y como resultado de este xii Abstract procedimiento, se obtienen mapas de profundad y de color de baja resolución. Con el objetivo de mejorar la resolución de estos mapas y de evitar la pérdida de información de color, se utilizan diversas técnicas de extrapolación de datos. Un factor crucial a tomar en cuenta para la obtención de mapas de alta calidad y alta exactitud, es la presencia de ruido en las medidas de profundidad obtenidas por las cámaras ToF. Este problema, normalmente se reduce mediante la calibración de estos sensores y con técnicas de filtrado de datos. Sin embargo, las técnicas de filtrado utilizadas, tanto para la interpolación de datos, como para la reducción del ruido, suelen producir el sobre-alisamiento de los datos originales, lo cual reduce la exactitud del registro de imágenes...Sección Deptal. de Arquitectura de Computadores y Automática (Físicas)Fac. de Ciencias FísicasTRUEunpu
    corecore