571 research outputs found

    Introduction to the Issue on Emerging Techniques in 3-D

    Get PDF
    Cataloged from PDF version of article.The fifteen papers in this special section that focus on three dimensional content (3D), with particular emphasis on the fusion of conventional camera outputs with those captured by other modalities, such as active sensors, multi-spectral data or dynamic range images as well as applications that support the measurement and improvement of 3-D content

    Método para el registro automático de imágenes basado en transformaciones proyectivas planas dependientes de las distancias y orientado a imágenes sin características comunes

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Físicas, Departamento de Arquitectura de Computadores y Automática, leída el 18-12-2015Multisensory data fusion oriented to image-based application improves the accuracy, quality and availability of the data, and consequently, the performance of robotic systems, by means of combining the information of a scene acquired from multiple and different sources into a unified representation of the 3D world scene, which is more enlightening and enriching for the subsequent image processing, improving either the reliability by using the redundant information, or the capability by taking advantage of complementary information. Image registration is one of the most relevant steps in image fusion techniques. This procedure aims the geometrical alignment of two or more images. Normally, this process relies on feature-matching techniques, which is a drawback for combining sensors that are not able to deliver common features. For instance, in the combination of ToF and RGB cameras, the robust feature-matching is not reliable. Typically, the fusion of these two sensors has been addressed from the computation of the cameras calibration parameters for coordinate transformation between them. As a result, a low resolution colour depth map is provided. For improving the resolution of these maps and reducing the loss of colour information, extrapolation techniques are adopted. A crucial issue for computing high quality and accurate dense maps is the presence of noise in the depth measurement from the ToF camera, which is normally reduced by means of sensor calibration and filtering techniques. However, the filtering methods, implemented for the data extrapolation and denoising, usually over-smooth the data, reducing consequently the accuracy of the registration procedure...La fusión multisensorial orientada a aplicaciones de procesamiento de imágenes, conocida como fusión de imágenes, es una técnica que permite mejorar la exactitud, la calidad y la disponibilidad de datos de un entorno tridimensional, que a su vez permite mejorar el rendimiento y la operatividad de sistemas robóticos. Dicha fusión, se consigue mediante la combinación de la información adquirida por múltiples y diversas fuentes de captura de datos, la cual se agrupa del tal forma que se obtiene una mejor representación del entorno 3D, que es mucho más ilustrativa y enriquecedora para la implementación de métodos de procesamiento de imágenes. Con ello se consigue una mejora en la fiabilidad y capacidad del sistema, empleando la información redundante que ha sido adquirida por múltiples sensores. El registro de imágenes es uno de los procedimientos más importantes que componen la fusión de imágenes. El objetivo principal del registro de imágenes es la consecución de la alineación geométrica entre dos o más imágenes. Normalmente, este proceso depende de técnicas de búsqueda de patrones comunes entre imágenes, lo cual puede ser un inconveniente cuando se combinan sensores que no proporcionan datos con características similares. Un ejemplo de ello, es la fusión de cámaras de color de alta resolución (RGB) con cámaras de Tiempo de Vuelo de baja resolución (Time-of-Flight (ToF)), con las cuales no es posible conseguir una detección robusta de patrones comunes entre las imágenes capturadas por ambos sensores. Por lo general, la fusión entre estas cámaras se realiza mediante el cálculo de los parámetros de calibración de las mismas, que permiten realizar la trasformación homogénea entre ellas. Y como resultado de este xii Abstract procedimiento, se obtienen mapas de profundad y de color de baja resolución. Con el objetivo de mejorar la resolución de estos mapas y de evitar la pérdida de información de color, se utilizan diversas técnicas de extrapolación de datos. Un factor crucial a tomar en cuenta para la obtención de mapas de alta calidad y alta exactitud, es la presencia de ruido en las medidas de profundidad obtenidas por las cámaras ToF. Este problema, normalmente se reduce mediante la calibración de estos sensores y con técnicas de filtrado de datos. Sin embargo, las técnicas de filtrado utilizadas, tanto para la interpolación de datos, como para la reducción del ruido, suelen producir el sobre-alisamiento de los datos originales, lo cual reduce la exactitud del registro de imágenes...Sección Deptal. de Arquitectura de Computadores y Automática (Físicas)Fac. de Ciencias FísicasTRUEunpu

    Full-parallax 3D display from stereo-hybrid 3D camera system

    Get PDF
    In this paper, we propose an innovative approach for the production of the microimages ready to display onto an integral-imaging monitor. Our main contribution is using a stereo-hybrid 3D camera system, which is used for picking up a 3D data pair and composing a denser point cloud. However, there is an intrinsic difficulty in the fact that hybrid sensors have dissimilarities and therefore should be equalized. Handled data facilitate to generating an integral image after projecting computationally the information through a virtual pinhole array. We illustrate this procedure with some imaging experiments that provide microimages with enhanced quality. After projection of such microimages onto the integral-imaging monitor, 3D images are produced with great parallax and viewing angle

    On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

    Get PDF
    Learning-based methods to solve dense 3D vision problems typically train on 3D sensor data. The respectively used principle of measuring distances provides advantages and drawbacks. These are typically not compared nor discussed in the literature due to a lack of multi-modal datasets. Texture-less regions are problematic for structure from motion and stereo, reflective material poses issues for active sensing, and distances for translucent objects are intricate to measure with existing hardware. Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities. These effects remain unnoticed if the sensor measurement is considered as ground truth during the evaluation. This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. We rigorously show the significant impact of sensor characteristics on the learned predictions and notice generalisation issues arising from various technologies in everyday household environments. For evaluation, we introduce a carefully designed dataset\footnote{dataset available at https://github.com/Junggy/HAMMER-dataset} comprising measurements from commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular RGB+P. Our study quantifies the considerable sensor noise impact and paves the way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note: substantial text overlap with arXiv:2205.0456

    On Creating Reference Data for Performance Analysis in Image Processing

    Get PDF
    This thesis investigates methods for the creation of reference datasets for image processing, especially for the dense correspondence problem. Three types of reference data can be identified: Real datasets with dense ground truth, real datasets with sparse or missing ground truth and synthetic datasets. For the creation of real datasets with ground truth a existing method based on depth map fusion was evaluated. The described method is especially suited for creating large amounts of reference data with known accuracy. The creation of reference datasets with missing ground truth was examined on the example of multiple datasets for the automotive industry. The data was used succesfully for verification and evaluation by multiple image processing projects. Finally, it was investigated how methods from computer graphics can be used for creating synthetic reference datasets. Especially the creation of photorealistic image sequences using global illumination has been examined for the task of evaluating algorithms. The results show that while such sequences can be used for evaluation, their creation is hindered by practicallity problems. As an application example, a new simulation method for Time-of-Flight depth cameras which can simulate all relevant error sources of these systems was developed

    Fusing spatial and temporal components for real-time depth data enhancement of dynamic scenes

    Get PDF
    The depth images from consumer depth cameras (e.g., structured-light/ToF devices) exhibit a substantial amount of artifacts (e.g., holes, flickering, ghosting) that needs to be removed for real-world applications. Existing methods cannot entirely remove them and perform slow. This thesis proposes a new real-time spatio-temporal depth image enhancement filter that completely removes flickering and ghosting, and significantly reduces holes. This thesis also presents a novel depth-data capture setup and two data reduction methods to optimize the performance of the proposed enhancement method

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Mobile robot map building from time-of-flight camera

    Get PDF
    A map building algorithm for mobile robots is introduced in this paper. The perceived environment is represented in a map containing in each cell a probability of presence of an object or part of an object. The environment is represented as a collection of modular occupancy grids which are added to the map as far as the mobile robot finds objects outside the existing grids. In this approach a time-of-flight (ToF) camera is exploited as a range sensor for mapping. Indeed, one of the areas where ToF sensors are adequate is in obstacle avoidance, because the detection region is not only horizontal but also vertical, allowing to detect obstacles with complex shapes. The main steps of the map building algorithm are extensively described in the paper. The results of testing the algorithm are considered in two different indoor environments

    Mitigating non-Lambertian surfaces issues in Stereo Matching with Neural Radiance Fields

    Get PDF
    Depth estimation from images has long been regarded as a preferable alternative compared to expensive and intrusive active sensors, such as LiDAR and ToF. The topic has attracted the attention of an increasingly wide audience thanks to the great amount of application domains, such as autonomous driving, robotic navigation and 3D reconstruction. Among the various techniques employed for depth estimation, stereo matching is one of the most widespread, owing to its robustness, speed and simplicity in setup. Recent developments has been aided by the abundance of annotated stereo images, which granted to deep learning the opportunity to thrive in a research area where deep networks can reach state-of-the-art sub-pixel precision in most cases. Despite the recent findings, stereo matching still begets many open challenges, two among them being finding pixel correspondences in presence of objects that exhibits a non-Lambertian behaviour and processing high-resolution images. Recently, a novel dataset named Booster, which contains high-resolution stereo pairs featuring a large collection of labeled non-Lambertian objects, has been released. The work shown that training state-of-the-art deep neural network on such data improves the generalization capabilities of these networks also in presence of non-Lambertian surfaces. Regardless being a further step to tackle the aforementioned challenge, Booster includes a rather small number of annotated images, and thus cannot satisfy the intensive training requirements of deep learning. This thesis work aims to investigate novel view synthesis techniques to augment the Booster dataset, with ultimate goal of improving stereo matching reliability in presence of high-resolution images that displays non-Lambertian surfaces

    Guided direct time-of-flight Lidar for self-driving vehicles

    Get PDF
    Self-driving vehicles demand efficient and reliable depth-sensing technologies. Lidar, with its capacity for long-distance, high-precision measurement, is a crucial component in this pursuit. However, conventional mechanical scanning implementations suffer from reliability, cost, and frame rate limitations. Solid-state lidar solutions have emerged as a promising alternative, but the vast amount of photon data processed and stored using conventional direct time-of-flight (dToF) prevents long-distance sensing unless power-intensive partial histogram approaches are used. This research introduces a pioneering ‘guided’ dToF approach, harnessing external guidance from other onboard sensors to narrow down the depth search space for a power and data-efficient solution. This approach centres around a dToF sensor in which the exposed time widow of independent pixels can be dynamically adjusted. A pair of vision cameras are used in this demonstrator to provide the guiding depth estimates. The implemented guided dToF demonstrator successfully captures a dynamic outdoor scene at 3 fps with distances up to 75 m. Compared to a conventional full histogram approach, on-chip data is reduced by over 25 times, while the total laser cycles in each frame are reduced by at least 6 times compared to any partial histogram approach. The capability of guided dToF to mitigate multipath reflections is also demonstrated. For self-driving vehicles where a wealth of sensor data is already available, guided dToF opens new possibilities for efficient solid-state lidar
    corecore