3 research outputs found

    A Few Photons Among Many: Unmixing Signal and Noise for Photon-Efficient Active Imaging

    Full text link
    Conventional LIDAR systems require hundreds or thousands of photon detections to form accurate depth and reflectivity images. Recent photon-efficient computational imaging methods are remarkably effective with only 1.0 to 3.0 detected photons per pixel, but they are not demonstrated at signal-to-background ratio (SBR) below 1.0 because their imaging accuracies degrade significantly in the presence of high background noise. We introduce a new approach to depth and reflectivity estimation that focuses on unmixing contributions from signal and noise sources. At each pixel in an image, short-duration range gates are adaptively determined and applied to remove detections likely to be due to noise. For pixels with too few detections to perform this censoring accurately, we borrow data from neighboring pixels to improve depth estimates, where the neighborhood formation is also adaptive to scene content. Algorithm performance is demonstrated on experimental data at varying levels of noise. Results show improved performance of both reflectivity and depth estimates over state-of-the-art methods, especially at low signal-to-background ratios. In particular, accurate imaging is demonstrated with SBR as low as 0.04. This validation of a photon-efficient, noise-tolerant method demonstrates the viability of rapid, long-range, and low-power LIDAR imaging

    Scene understanding for interactive applications

    Get PDF
    Para interactuar con el entorno, es necesario entender que está ocurriendo en la escena donde se desarrolla la acción. Décadas de investigación en el campo de la visión por computador han contribuido a conseguir sistemas que permiten interpretar de manera automática el contenido en una escena a partir de información visual. Se podría decir el objetivo principal de estos sistemas es replicar la capacidad humana para extraer toda la información a partir solo de datos visuales. Por ejemplo, uno de sus objetivos es entender como percibimosel mundo en tres dimensiones o como podemos reconocer sitios y objetos a pesar de la gran variación en su apariencia. Una de las tareas básicas para entender una escena es asignar un significado semántico a cada elemento (píxel) de una imagen. Esta tarea se puede formular como un problema de etiquetado denso el cual especifica valores (etiquetas) a cada pixel o región de una imagen. Dependiendo de la aplicación, estas etiquetas puedenrepresentar conceptos muy diferentes, desde magnitudes físicas como la información de profundidad, hasta información semántica, como la categoría de un objeto. El objetivo general en esta tesis es investigar y desarrollar nuevas técnicas para incorporar automáticamente una retroalimentación por parte del usuario, o un conocimiento previo en sistemas inteligente para conseguir analizar automáticamente el contenido de una escena. en particular,esta tesis explora dos fuentes comunes de información previa proporcionado por los usuario: interacción humana y etiquetado manual de datos de ejemplo.La primera parte de esta tesis esta dedicada a aprendizaje de información de una escena a partir de información proporcionada de manera interactiva por un usuario. Las soluciones que involucran a un usuario imponen limitaciones en el rendimiento, ya que la respuesta que se le da al usuario debe obtenerse en un tiempo interactivo. Esta tesis presenta un paradigma eficiente que aproxima cualquier magnitud por píxel a partir de unos pocos trazos del usuario. Este sistema propaga los escasos datos de entrada proporcionados por el usuario a cada píxel de la imagen. El paradigma propuesto se ha validado a través detres aplicaciones interactivas para editar imágenes, las cuales requieren un conocimiento por píxel de una cierta magnitud, con el objetivo de simular distintos efectos.Otra estrategia común para aprender a partir de información de usuarios es diseñar sistemas supervisados de aprendizaje automático. En los últimos años, las redes neuronales convolucionales han superado el estado del arte de gran variedad de problemas de reconocimiento visual. Sin embargo, para nuevas tareas, los datos necesarios de entrenamiento pueden no estar disponibles y recopilar suficientes no es siempre posible. La segunda parte de esta tesis explora como mejorar los sistema que aprenden etiquetado denso semántico a partir de imágenes previamente etiquetadas por los usuarios. En particular, se presenta y validan estrategias, basadas en los dos principales enfoques para transferir modelos basados en deep learning, para segmentación semántica, con el objetivo de poder aprender nuevas clases cuando los datos de entrenamiento no son suficientes en cantidad o precisión.Estas estrategias se han validado en varios entornos realistas muy diferentes, incluyendo entornos urbanos, imágenes aereas y imágenes submarinas.In order to interact with the environment, it is necessary to understand what is happening on it, on the scene where the action is ocurring. Decades of research in the computer vision field have contributed towards automatically achieving this scene understanding from visual information. Scene understanding is a very broad area of research within the computer vision field. We could say that it tries to replicate the human capability of extracting plenty of information from visual data. For example, we would like to understand how the people perceive the world in three dimensions or can quickly recognize places or objects despite substantial appearance variation. One of the basic tasks in scene understanding from visual data is to assign a semantic meaning to every element of the image, i.e., assign a concept or object label to every pixel in the image. This problem can be formulated as a dense image labeling problem which assigns specific values (labels) to each pixel or region in the image. Depending on the application, the labels can represent very different concepts, from a physical magnitude, such as depth information, to high level semantic information, such as an object category. The general goal in this thesis is to investigate and develop new ways to automatically incorporate human feedback or prior knowledge in intelligent systems that require scene understanding capabilities. In particular, this thesis explores two common sources of prior information from users: human interactions and human labeling of sample data. The first part of this thesis is focused on learning complex scene information from interactive human knowledge. Interactive user solutions impose limitations on the performance where the feedback to the user must be at interactive rates. This thesis presents an efficient interaction paradigm that approximates any per-pixel magnitude from a few user strokes. It propagates the sparse user input to each pixel of the image. We demonstrate the suitability of the proposed paradigm through three interactive image editing applications which require per-pixel knowledge of certain magnitude: simulate the effect of depth of field, dehazing and HDR tone mapping. Other common strategy to learn from user prior knowledge is to design supervised machine-learning approaches. In the last years, Convolutional Neural Networks (CNNs) have pushed the state-of-the-art on a broad variety of visual recognition problems. However, for new tasks, enough training data is not always available and therefore, training from scratch is not always feasible. The second part of this thesis investigates how to improve systems that learn dense semantic labeling of images from user labeled examples. In particular, we present and validate strategies, based on common transfer learning approaches, for semantic segmentation. The goal of these strategies is to learn new specific classes when there is not enough labeled data to train from scratch. We evaluate these strategies across different environments, such as autonomous driving scenes, aerial images or underwater ones.<br /

    Probabilistic modeling for single-photon lidar

    Full text link
    Lidar is an increasingly prevalent technology for depth sensing, with applications including scientific measurement and autonomous navigation systems. While conventional systems require hundreds or thousands of photon detections per pixel to form accurate depth and reflectivity images, recent results for single-photon lidar (SPL) systems using single-photon avalanche diode (SPAD) detectors have shown accurate images formed from as little as one photon detection per pixel, even when half of those detections are due to uninformative ambient light. The keys to such photon-efficient image formation are two-fold: (i) a precise model of the probability distribution of photon detection times, and (ii) prior beliefs about the structure of natural scenes. Reducing the number of photons needed for accurate image formation enables faster, farther, and safer acquisition. Still, such photon-efficient systems are often limited to laboratory conditions more favorable than the real-world settings in which they would be deployed. This thesis focuses on expanding the photon detection time models to address challenging imaging scenarios and the effects of non-ideal acquisition equipment. The processing derived from these enhanced models, sometimes modified jointly with the acquisition hardware, surpasses the performance of state-of-the-art photon counting systems. We first address the problem of high levels of ambient light, which causes traditional depth and reflectivity estimators to fail. We achieve robustness to strong ambient light through a rigorously derived window-based censoring method that separates signal and background light detections. Spatial correlations both within and between depth and reflectivity images are encoded in superpixel constructions, which fill in holes caused by the censoring. Accurate depth and reflectivity images can then be formed with an average of 2 signal photons and 50 background photons per pixel, outperforming methods previously demonstrated at a signal-to-background ratio of 1. We next approach the problem of coarse temporal resolution for photon detection time measurements, which limits the precision of depth estimates. To achieve sub-bin depth precision, we propose a subtractively-dithered lidar implementation, which uses changing synchronization delays to shift the time-quantization bin edges. We examine the generic noise model resulting from dithering Gaussian-distributed signals and introduce a generalized Gaussian approximation to the noise distribution and simple order statistics-based depth estimators that take advantage of this model. Additional analysis of the generalized Gaussian approximation yields rules of thumb for determining when and how to apply dither to quantized measurements. We implement a dithered SPL system and propose a modification for non-Gaussian pulse shapes that outperforms the Gaussian assumption in practical experiments. The resulting dithered-lidar architecture could be used to design SPAD array detectors that can form precise depth estimates despite relaxed temporal quantization constraints. Finally, SPAD dead time effects have been considered a major limitation for fast data acquisition in SPL, since a commonly adopted approach for dead time mitigation is to operate in the low-flux regime where dead time effects can be ignored. We show that the empirical distribution of detection times converges to the stationary distribution of a Markov chain and demonstrate improvements in depth estimation and histogram correction using our Markov chain model. An example simulation shows that correctly compensating for dead times in a high-flux measurement can yield a 20-times speed up of data acquisition. The resulting accuracy at high photon flux could enable real-time applications such as autonomous navigation
    corecore