88 research outputs found

    Deep Depth From Focus

    Full text link
    Depth from focus (DFF) is one of the classical ill-posed inverse problems in computer vision. Most approaches recover the depth at each pixel based on the focal setting which exhibits maximal sharpness. Yet, it is not obvious how to reliably estimate the sharpness level, particularly in low-textured areas. In this paper, we propose `Deep Depth From Focus (DDFF)' as the first end-to-end learning approach to this problem. One of the main challenges we face is the hunger for data of deep neural networks. In order to obtain a significant amount of focal stacks with corresponding groundtruth depth, we propose to leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us to digitally create focal stacks of varying sizes. Compared to existing benchmarks our dataset is 25 times larger, enabling the use of machine learning for this inverse problem. We compare our results with state-of-the-art DFF methods and we also analyze the effect of several key deep architectural components. These experiments show that our proposed method `DDFFNet' achieves state-of-the-art performance in all scenes, reducing depth error by more than 75% compared to the classical DFF methods.Comment: accepted to Asian Conference on Computer Vision (ACCV) 201

    Design of Belief Propagation Based on FPGA for the Multistereo CAFADIS Camera

    Get PDF
    In this paper we describe a fast, specialized hardware implementation of the belief propagation algorithm for the CAFADIS camera, a new plenoptic sensor patented by the University of La Laguna. This camera captures the lightfield of the scene and can be used to find out at which depth each pixel is in focus. The algorithm has been designed for FPGA devices using VHDL. We propose a parallel and pipeline architecture to implement the algorithm without external memory. Although the BRAM resources of the device increase considerably, we can maintain real-time restrictions by using extremely high-performance signal processing capability through parallelism and by accessing several memories simultaneously. The quantifying results with 16 bit precision have shown that performances are really close to the original Matlab programmed algorithm

    Consideraciones acerca de la viabilidad de un sensor plenóptico en dispositivos de consumo

    Get PDF
    Doctorado en Ingeniería IndustrialPassive distance measurement of the objects in an image gives place to interesting applications that have the potential to revolutionize the field of photography. In this thesis a prototype of plenoptic camera for mobile devices was created and studied. This technique has two main disadvantages: the need for modifying the camera module and the loss of resolution. Because of this, the prototype was discarded in order to utilize another technique: depth from focus. In this technique the capture method consists in taking several images while varying the focus distance. The set of images is called focal-stack. Different focus operators are studied, which give a measure of defocus per pixel and plane of the focal-stack. The curvelet based focus operator is chosen as the most adequate. It is computationally more intensive than other operators but it is capable of decomposing natural images using few coefficients. In order to make viable its usage in mobile devices a new curvelet transform based on the discrete Radon transform is built. The discrete Radon transform has logarithmic complexity, does not use the Fourier transform and uses only integer sums. Lastly, different versions of the Radon transform are analyzed with the goal of achieving an even faster transform. These transforms are implemented to be executed on mobile devices. Additionally, an application of the Radon transform is presented. It consists in the detection of bar-codes that have any orientation in an image.La medida pasiva de distancia a los objetos en una imagen da lugar a interesantes aplicaciones con capacidad para revolucionar la fotografía. En esta tesis se creó y estudió un prototipo de cámara plenóptica para dispositivos móviles. Esta técnica presenta dos inconvenientes: la necesidad de modificar el módulo de cámara y la pérdida de resolución. Por ello, el prototipo fue descartado para utilizar otra técnica: la profundidad a partir del desenfoque. En esta técnica el método de captura consiste en tomar varias imagenes variando la distancia de enfoque. El conjunto de imágenes se denomina focal-stack. Se estudian distintos operadores de desenfoque, que dan una medida de desenfoque por pixel y por plano del focal-stack. Siendo elegido como óptimo el operador de desenfoque curvelet, que es computacionalmente más intensivo que otros operadores pero es capaz de descomponer imagenes naturales utilizando muy pocos coeficientes. Para hacer posible su uso en dispositivos móviles se construye una nueva transformada curvelet basada en la transformada discreta de Radon. La transformada discreta de Radon tiene complejidad linearítmica, no utiliza la transformada de Fourier y usa sólo sumas de enteros. Por último, se analizan distintas versiones de la transformada de Radon con el objetivo de conseguir una transformada aún más rápida y se implementan para ser ejecutadas en dispositivos móviles. Además se presenta una aplicación de la transformada de Radon consistente en la detección de códigos de barras con cualquier orientación en una imagen

    Application for light field inpainting

    Get PDF
    Light Field (LF) imaging is a multimedia technology that can provide more immersive experience when visualizing a multimedia content with higher levels of realism compared to conventional imaging technologies. This technology is mainly promising for Virtual Reality (VR) since it displays real-world scenes in a way that users can experience the captured scenes in every position and every angle, due to its 4-dimensional LF representation. For these reasons, LF is a fast-growing technology, with so many topics to explore, being the LF inpainting the one that was explored in this dissertation. Image inpainting is an editing technique that allows synthesizing alternative content to fill in holes in an image. It is commonly used to fill missing parts in a scene and restore damaged images such that the modifications are correct and visually realistic. Applying traditional 2D inpainting techniques straightforwardly to LFs is very unlikely to result in a consistent inpainting in its all 4 dimensions. Usually, to inpaint a 4D LF content, 2D inpainting algorithms are used to inpaint a particular point of view and then 4D inpainting propagation algorithms propagate the inpainted result for the whole 4D LF data. Based on this idea of 4D inpainting propagation, some 4D LF inpainting techniques have been recently proposed in the literature. Therefore, this dissertation proposes to design and implement an LF inpainting application that can be used by the public that desire to work in this field and/or explore and edit LFs.Campos de luz é uma tecnologia multimédia que fornece uma experiência mais imersiva ao visualizar conteúdo multimédia com níveis mais altos de realismo, comparando a tecnologias convencionais de imagem. Esta tecnologia é promissora, principalmente para Realidade Virtual, pois exibe cenas capturadas do mundo real de forma que utilizadores as possam experimentar em todas as posições e ângulos, devido à sua representação em 4 dimensões. Por isso, esta é tecnologia em rápido crescimento, com tantos tópicos para explorar, sendo o inpainting o explorado nesta dissertação. Inpainting de imagens é uma técnica de edição, permitindo sintetizar conteúdo alternativo para preencher lacunas numa imagem. Comumente usado para preencher partes que faltam numa cena e restaurar imagens danificadas, de forma que as modificações sejam corretas e visualmente realistas. É muito improvável que aplicar técnicas tradicionais de inpainting 2D diretamente a campos de luz resulte num inpainting consistente em todas as suas 4 dimensões. Normalmente, para fazer inpainting num conteúdo 4D de campos de luz, os algoritmos de inpainting 2D são usados para fazer inpainting de um ponto de vista específico e, seguidamente, os algoritmos de propagação de inpainting 4D propagam o resultado do inpainting para todos os dados do campo de luz 4D. Com base nessa ideia de propagação de inpainting 4D, algumas técnicas foram recentemente propostas na literatura. Assim, esta dissertação propõe-se a conceber e implementar uma aplicação de inpainting de campos de luz que possa ser utilizada pelo público que pretenda trabalhar nesta área e/ou explorar e editar campos de luz

    Computational methods for 3D imaging of neural activity in light-field microscopy

    Get PDF
    Light Field Microscopy (LFM) is a 3D imaging technique that captures spatial and angular information from light in a single snapshot. LFM is an appealing technique for applications in biological imaging due to its relatively simple implementation and fast 3D imaging speed. For instance, LFM can help to understand how neurons process information, as shown for functional neuronal calcium imaging. However, traditional volume reconstruction approaches for LFM suffer from low lateral resolution, high computational cost, and reconstruction artifacts near the native object plane. Therefore, in this thesis, we propose computational methods to improve the reconstruction performance of 3D imaging for LFM with applications to imaging neural activity. First, we study the image formation process and propose methods for discretization and simplification of the LF system. Typical approaches for discretization are performed by computing the discrete impulse response at different input locations defined by a sampling grid. Unlike conventional methods, we propose an approach that uses shift-invariant subspaces to generalize the discretization framework used in LFM. Our approach allows the selection of diverse sampling kernels and sampling intervals. Furthermore, the typical discretization method is a particular case of our formulation. Moreover, we propose a description of the system based on filter banks that fit the physics of the system. The periodic-shift invariant property per depth guarantees that the system can be accurately described by using filter banks. This description leads to a novel method to reduce the computational time using singular value decomposition (SVD). Our simplification method capitalizes on the inherent low-rank behaviour of the system. Furthermore, we propose rearranging our filter-bank model into a linear convolution neural network (CNN) that allows more convenient implementation using existing deep-learning software. Then, we study the problem of 3D reconstruction from single light-field images. We propose the shift-invariant-subspace assumption as a prior for volume reconstruction under ideal conditions. We experimentally show that artifact-free reconstruction (aliasing-free) is achievable under these settings. Furthermore, the tools developed to study the forward model are exploited to design a reconstruction algorithm based on ADMM that allows artifact-free 3D reconstruction for real data. Contrary to traditional approaches, our method includes additional priors for reconstruction without dramatically increasing the computational complexity. We extensively evaluate our approach on synthetic and real data and show that our approach performs better than conventional model-based strategies in computational time, image quality, and artifact reduction. Finally, we exploit deep-learning techniques for reconstruction. Specifically, we propose to use two-photon imaging to enhance the performance of LFM when imaging neurons in brain tissues. The architecture of our network is derived from a sparsity-based algorithm for reconstruction named Iterative Shrinkage and Thresholding Algorithm (ISTA). Furthermore, we propose a semi-supervised training based on Generative Adversarial Neural Networks (GANs) that exploits the knowledge of the forward model to achieve remarkable reconstruction quality. We propose efficient architectures to compute the forward model using linear CNNs. This description allows fast computation of the forward model and complements our reconstruction approach. Our method is tested under adverse conditions: lack of training data, background noise, and non-transparent samples. We experimentally show that our method performs better than model-based reconstruction strategies and typical neural networks for imaging neuronal activity in mammalian brain tissue. Our approach enjoys both the robustness of the model-based methods and the reconstruction speed of deep learning.Open Acces

    The Application of Preconditioned Alternating Direction Method of Multipliers in Depth from Focal Stack

    Get PDF
    Post capture refocusing effect in smartphone cameras is achievable by using focal stacks. However, the accuracy of this effect is totally dependent on the combination of the depth layers in the stack. The accuracy of the extended depth of field effect in this application can be improved significantly by computing an accurate depth map which has been an open issue for decades. To tackle this issue, in this paper, a framework is proposed based on Preconditioned Alternating Direction Method of Multipliers (PADMM) for depth from the focal stack and synthetic defocus application. In addition to its ability to provide high structural accuracy and occlusion handling, the optimization function of the proposed method can, in fact, converge faster and better than state of the art methods. The evaluation has been done on 21 sets of focal stacks and the optimization function has been compared against 5 other methods. Preliminary results indicate that the proposed method has a better performance in terms of structural accuracy and optimization in comparison to the current state of the art methods.Comment: 15 pages, 8 figure

    Reconstruction from Spatio-Spectrally Coded Multispectral Light Fields

    Get PDF
    In this work, spatio-spectrally coded multispectral light fields, as taken by a light field camera with a spectrally coded microlens array, are investigated. For the reconstruction of the coded light fields, two methods, one based on the principles of compressed sensing and one deep learning approach, are developed. Using novel synthetic as well as a real-world datasets, the proposed reconstruction approaches are evaluated in detail

    Reconstruction from Spatio-Spectrally Coded Multispectral Light Fields

    Get PDF
    In dieser Arbeit werden spektral kodierte multispektrale Lichtfelder untersucht, wie sie von einer Lichtfeldkamera mit einem spektral kodierten Mikrolinsenarray aufgenommen werden. Für die Rekonstruktion der kodierten Lichtfelder werden zwei Methoden entwickelt, eine basierend auf den Prinzipien des Compressed Sensing sowie eine Deep Learning Methode. Anhand neuartiger synthetischer und realer Datensätze werden die vorgeschlagenen Rekonstruktionsansätze im Detail evaluiert

    Reconstruction from Spatio-Spectrally Coded Multispectral Light Fields

    Get PDF
    In dieser Arbeit werden spektral codierte multispektrale Lichtfelder, wie sie von einer Lichtfeldkamera mit einem spektral codierten Mikrolinsenarray aufgenommen werden, untersucht. Für die Rekonstruktion der codierten Lichtfelder werden zwei Methoden entwickelt und im Detail ausgewertet. Zunächst wird eine vollständige Rekonstruktion des spektralen Lichtfelds entwickelt, die auf den Prinzipien des Compressed Sensing basiert. Um die spektralen Lichtfelder spärlich darzustellen, werden 5D-DCT-Basen sowie ein Ansatz zum Lernen eines Dictionary untersucht. Der konventionelle vektorisierte Dictionary-Lernansatz wird auf eine tensorielle Notation verallgemeinert, um das Lichtfeld-Dictionary tensoriell zu faktorisieren. Aufgrund der reduzierten Anzahl von zu lernenden Parametern ermöglicht dieser Ansatz größere effektive Atomgrößen. Zweitens wird eine auf Deep Learning basierende Rekonstruktion der spektralen Zentralansicht und der zugehörigen Disparitätskarte aus dem codierten Lichtfeld entwickelt. Dabei wird die gewünschte Information direkt aus den codierten Messungen geschätzt. Es werden verschiedene Strategien des entsprechenden Multi-Task-Trainings verglichen. Um die Qualität der Rekonstruktion weiter zu verbessern, wird eine neuartige Methode zur Einbeziehung von Hilfslossfunktionen auf der Grundlage ihrer jeweiligen normalisierten Gradientenähnlichkeit entwickelt und gezeigt, dass sie bisherige adaptive Methoden übertrifft. Um die verschiedenen Rekonstruktionsansätze zu trainieren und zu bewerten, werden zwei Datensätze erstellt. Zunächst wird ein großer synthetischer spektraler Lichtfelddatensatz mit verfügbarer Disparität Ground Truth unter Verwendung eines Raytracers erstellt. Dieser Datensatz, der etwa 100k spektrale Lichtfelder mit dazugehöriger Disparität enthält, wird in einen Trainings-, Validierungs- und Testdatensatz aufgeteilt. Um die Qualität weiter zu bewerten, werden sieben handgefertigte Szenen, so genannte Datensatz-Challenges, erstellt. Schließlich wird ein realer spektraler Lichtfelddatensatz mit einer speziell angefertigten spektralen Lichtfeldreferenzkamera aufgenommen. Die radiometrische und geometrische Kalibrierung der Kamera wird im Detail besprochen. Anhand der neuen Datensätze werden die vorgeschlagenen Rekonstruktionsansätze im Detail bewertet. Es werden verschiedene Codierungsmasken untersucht -- zufällige, reguläre, sowie Ende-zu-Ende optimierte Codierungsmasken, die mit einer neuartigen differenzierbaren fraktalen Generierung erzeugt werden. Darüber hinaus werden weitere Untersuchungen durchgeführt, zum Beispiel bezüglich der Abhängigkeit von Rauschen, der Winkelauflösung oder Tiefe. Insgesamt sind die Ergebnisse überzeugend und zeigen eine hohe Rekonstruktionsqualität. Die Deep-Learning-basierte Rekonstruktion, insbesondere wenn sie mit adaptiven Multitasking- und Hilfslossstrategien trainiert wird, übertrifft die Compressed-Sensing-basierte Rekonstruktion mit anschließender Disparitätsschätzung nach dem Stand der Technik
    corecore