88 research outputs found
Deep Depth From Focus
Depth from focus (DFF) is one of the classical ill-posed inverse problems in
computer vision. Most approaches recover the depth at each pixel based on the
focal setting which exhibits maximal sharpness. Yet, it is not obvious how to
reliably estimate the sharpness level, particularly in low-textured areas. In
this paper, we propose `Deep Depth From Focus (DDFF)' as the first end-to-end
learning approach to this problem. One of the main challenges we face is the
hunger for data of deep neural networks. In order to obtain a significant
amount of focal stacks with corresponding groundtruth depth, we propose to
leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us
to digitally create focal stacks of varying sizes. Compared to existing
benchmarks our dataset is 25 times larger, enabling the use of machine learning
for this inverse problem. We compare our results with state-of-the-art DFF
methods and we also analyze the effect of several key deep architectural
components. These experiments show that our proposed method `DDFFNet' achieves
state-of-the-art performance in all scenes, reducing depth error by more than
75% compared to the classical DFF methods.Comment: accepted to Asian Conference on Computer Vision (ACCV) 201
Design of Belief Propagation Based on FPGA for the Multistereo CAFADIS Camera
In this paper we describe a fast, specialized hardware implementation of the belief propagation algorithm for the CAFADIS camera, a new plenoptic sensor patented by the University of La Laguna. This camera captures the lightfield of the scene and can be used to find out at which depth each pixel is in focus. The algorithm has been designed for FPGA devices using VHDL. We propose a parallel and pipeline architecture to implement the algorithm without external memory. Although the BRAM resources of the device increase considerably, we can maintain real-time restrictions by using extremely high-performance signal processing capability through parallelism and by accessing several memories simultaneously. The quantifying results with 16 bit precision have shown that performances are really close to the original Matlab programmed algorithm
Consideraciones acerca de la viabilidad de un sensor plenóptico en dispositivos de consumo
Doctorado en Ingeniería IndustrialPassive distance measurement of the objects in an image gives place to interesting applications that have the potential to revolutionize the field of photography. In this thesis a prototype of plenoptic camera for mobile devices was created and studied. This technique has two main disadvantages: the need for modifying the camera module and the loss of resolution. Because of this, the prototype was discarded in order to utilize another technique: depth from focus.
In this technique the capture method consists in taking several images while varying the focus distance. The set of images is called focal-stack. Different focus operators are studied, which give a measure of defocus per pixel and plane of the focal-stack.
The curvelet based focus operator is chosen as the most adequate. It is computationally more intensive than other operators but it is capable of decomposing natural images using few coefficients. In order to make viable its usage in mobile devices a new curvelet transform based on the discrete Radon transform is built. The discrete Radon transform has logarithmic complexity, does not use the Fourier transform and uses only integer sums.
Lastly, different versions of the Radon transform are analyzed with the goal of achieving an even faster transform. These transforms are implemented to be executed on mobile devices.
Additionally, an application of the Radon transform is presented. It consists in the detection of bar-codes that have any orientation in an image.La medida pasiva de distancia a los objetos en una imagen da lugar a interesantes aplicaciones con capacidad para revolucionar la fotografía. En esta tesis se creó y estudió un prototipo de cámara plenóptica para dispositivos móviles. Esta técnica presenta dos inconvenientes: la necesidad de modificar el módulo de cámara y la pérdida de resolución. Por ello, el prototipo fue descartado para utilizar otra técnica: la profundidad a partir del desenfoque.
En esta técnica el método de captura consiste en tomar varias imagenes variando la distancia de enfoque. El conjunto de imágenes se denomina focal-stack. Se estudian distintos operadores de desenfoque, que dan una medida de desenfoque por pixel y por plano del focal-stack. Siendo elegido como óptimo el operador de desenfoque curvelet, que es computacionalmente más intensivo que otros operadores pero es capaz de descomponer imagenes naturales utilizando muy pocos coeficientes.
Para hacer posible su uso en dispositivos móviles se construye una nueva transformada curvelet basada en la transformada discreta de Radon. La transformada discreta de Radon tiene complejidad linearítmica, no utiliza la transformada de Fourier y usa sólo sumas de enteros.
Por último, se analizan distintas versiones de la transformada de Radon con el objetivo de conseguir una transformada aún más rápida y se implementan para ser ejecutadas en dispositivos móviles.
Además se presenta una aplicación de la transformada de Radon consistente en la detección de códigos de barras con cualquier orientación en una imagen
Application for light field inpainting
Light Field (LF) imaging is a multimedia technology that can provide more immersive experience when visualizing a multimedia content with higher levels of realism compared to conventional imaging technologies. This technology is mainly promising for Virtual Reality (VR) since it displays real-world scenes in a way that users can experience the captured scenes in every position and every angle, due to its 4-dimensional LF representation. For these reasons, LF is a fast-growing technology, with so many topics to explore, being the LF inpainting the one that was explored in this dissertation.
Image inpainting is an editing technique that allows synthesizing alternative content to fill in holes in an image. It is commonly used to fill missing parts in a scene and restore damaged images such that the modifications are correct and visually realistic. Applying traditional 2D inpainting techniques straightforwardly to LFs is very unlikely to result in a consistent inpainting in its all 4 dimensions. Usually, to inpaint a 4D LF content, 2D inpainting algorithms are used to inpaint a particular point of view and then 4D inpainting propagation algorithms propagate the inpainted result for the whole 4D LF data.
Based on this idea of 4D inpainting propagation, some 4D LF inpainting techniques have been recently proposed in the literature. Therefore, this dissertation proposes to design and implement an LF inpainting application that can be used by the public that desire to work in this field and/or explore and edit LFs.Campos de luz é uma tecnologia multimédia que fornece uma experiência mais imersiva ao visualizar conteúdo multimédia com níveis mais altos de realismo, comparando a tecnologias convencionais de imagem. Esta tecnologia é promissora, principalmente para Realidade Virtual, pois exibe cenas capturadas do mundo real de forma que utilizadores as possam experimentar em todas as posições e ângulos, devido à sua representação em 4 dimensões. Por isso, esta é tecnologia em rápido crescimento, com tantos tópicos para explorar, sendo o inpainting o explorado nesta dissertação.
Inpainting de imagens é uma técnica de edição, permitindo sintetizar conteúdo alternativo para preencher lacunas numa imagem. Comumente usado para preencher partes que faltam numa cena e restaurar imagens danificadas, de forma que as modificações sejam corretas e visualmente realistas. É muito improvável que aplicar técnicas tradicionais de inpainting 2D diretamente a campos de luz resulte num inpainting consistente em todas as suas 4 dimensões. Normalmente, para fazer inpainting num conteúdo 4D de campos de luz, os algoritmos de inpainting 2D são usados para fazer inpainting de um ponto de vista específico e, seguidamente, os algoritmos de propagação de inpainting 4D propagam o resultado do inpainting para todos os dados do campo de luz 4D.
Com base nessa ideia de propagação de inpainting 4D, algumas técnicas foram recentemente propostas na literatura. Assim, esta dissertação propõe-se a conceber e implementar uma aplicação de inpainting de campos de luz que possa ser utilizada pelo público que pretenda trabalhar nesta área e/ou explorar e editar campos de luz
Computational methods for 3D imaging of neural activity in light-field microscopy
Light Field Microscopy (LFM) is a 3D imaging technique that captures spatial and angular information from light in a single snapshot. LFM is an appealing technique for applications in biological imaging due to its relatively simple implementation and fast 3D imaging speed. For instance, LFM can help to understand how neurons process information, as shown for functional neuronal calcium imaging. However, traditional volume reconstruction approaches for LFM suffer from low lateral resolution, high computational cost, and reconstruction artifacts near the native object plane. Therefore, in this thesis, we propose computational methods to improve the reconstruction performance of 3D imaging for LFM with applications to imaging neural activity.
First, we study the image formation process and propose methods for discretization and simplification of the LF system. Typical approaches for discretization are performed by computing the discrete impulse response at different input locations defined by a sampling grid. Unlike conventional methods, we propose an approach that uses shift-invariant subspaces to generalize the discretization framework used in LFM. Our approach allows the selection of diverse sampling kernels and sampling intervals. Furthermore, the typical discretization method is a particular case of our formulation.
Moreover, we propose a description of the system based on filter banks that fit the physics of the system. The periodic-shift invariant property per depth guarantees that the system can be accurately described by using filter banks. This description leads to a novel method to reduce the computational time using singular value decomposition (SVD). Our simplification method capitalizes on the inherent low-rank behaviour of the system. Furthermore, we propose rearranging our filter-bank model into a linear convolution neural network (CNN) that allows more convenient implementation using existing deep-learning software.
Then, we study the problem of 3D reconstruction from single light-field images. We propose the shift-invariant-subspace assumption as a prior for volume reconstruction under ideal conditions. We experimentally show that artifact-free reconstruction (aliasing-free) is achievable under these settings. Furthermore, the tools developed to study the forward model are exploited to design a reconstruction algorithm based on ADMM that allows artifact-free 3D reconstruction for real data. Contrary to traditional approaches, our method includes additional priors for reconstruction without dramatically increasing the computational complexity. We extensively evaluate our approach on synthetic and real data and show that our approach performs better than conventional model-based strategies in computational time, image quality, and artifact reduction.
Finally, we exploit deep-learning techniques for reconstruction. Specifically, we propose to use two-photon imaging to enhance the performance of LFM when imaging neurons in brain tissues. The architecture of our network is derived from a sparsity-based algorithm for reconstruction named Iterative Shrinkage and Thresholding Algorithm (ISTA). Furthermore, we propose a semi-supervised training based on Generative Adversarial Neural Networks (GANs) that exploits the knowledge of the forward model to achieve remarkable reconstruction quality. We propose efficient architectures to compute the forward model using linear CNNs. This description allows fast computation of the forward model and complements our reconstruction approach. Our method is tested under adverse conditions: lack of training data, background noise, and non-transparent samples. We experimentally show that our method performs better than model-based reconstruction strategies and typical neural networks for imaging neuronal activity in mammalian brain tissue. Our approach enjoys both the robustness of the model-based methods and the reconstruction speed of deep learning.Open Acces
The Application of Preconditioned Alternating Direction Method of Multipliers in Depth from Focal Stack
Post capture refocusing effect in smartphone cameras is achievable by using
focal stacks. However, the accuracy of this effect is totally dependent on the
combination of the depth layers in the stack. The accuracy of the extended
depth of field effect in this application can be improved significantly by
computing an accurate depth map which has been an open issue for decades. To
tackle this issue, in this paper, a framework is proposed based on
Preconditioned Alternating Direction Method of Multipliers (PADMM) for depth
from the focal stack and synthetic defocus application. In addition to its
ability to provide high structural accuracy and occlusion handling, the
optimization function of the proposed method can, in fact, converge faster and
better than state of the art methods. The evaluation has been done on 21 sets
of focal stacks and the optimization function has been compared against 5 other
methods. Preliminary results indicate that the proposed method has a better
performance in terms of structural accuracy and optimization in comparison to
the current state of the art methods.Comment: 15 pages, 8 figure
Reconstruction from Spatio-Spectrally Coded Multispectral Light Fields
In this work, spatio-spectrally coded multispectral light fields, as taken by a light field camera with a spectrally coded microlens array, are investigated. For the reconstruction of the coded light fields, two methods, one based on the principles of compressed sensing and one deep learning approach, are developed. Using novel synthetic as well as a real-world datasets, the proposed reconstruction approaches are evaluated in detail
Reconstruction from Spatio-Spectrally Coded Multispectral Light Fields
In dieser Arbeit werden spektral kodierte multispektrale Lichtfelder untersucht, wie sie von einer Lichtfeldkamera mit einem spektral kodierten Mikrolinsenarray aufgenommen werden. Für die Rekonstruktion der kodierten Lichtfelder werden zwei Methoden entwickelt, eine basierend auf den Prinzipien des Compressed Sensing sowie eine Deep Learning Methode. Anhand neuartiger synthetischer und realer Datensätze werden die vorgeschlagenen Rekonstruktionsansätze im Detail evaluiert
Reconstruction from Spatio-Spectrally Coded Multispectral Light Fields
In dieser Arbeit werden spektral codierte multispektrale Lichtfelder, wie sie von einer Lichtfeldkamera mit einem spektral codierten Mikrolinsenarray aufgenommen werden, untersucht. Für die Rekonstruktion der codierten Lichtfelder werden zwei Methoden entwickelt und im Detail ausgewertet.
Zunächst wird eine vollständige Rekonstruktion des spektralen Lichtfelds entwickelt, die auf den Prinzipien des Compressed Sensing basiert. Um die spektralen Lichtfelder spärlich darzustellen, werden 5D-DCT-Basen sowie ein Ansatz zum Lernen eines Dictionary untersucht. Der konventionelle vektorisierte Dictionary-Lernansatz wird auf eine tensorielle Notation verallgemeinert, um das Lichtfeld-Dictionary tensoriell zu faktorisieren. Aufgrund der reduzierten Anzahl von zu lernenden Parametern ermöglicht dieser Ansatz größere effektive Atomgrößen.
Zweitens wird eine auf Deep Learning basierende Rekonstruktion der spektralen Zentralansicht und der zugehörigen Disparitätskarte aus dem codierten Lichtfeld entwickelt. Dabei wird die gewünschte Information direkt aus den codierten Messungen geschätzt. Es werden verschiedene Strategien des entsprechenden Multi-Task-Trainings verglichen. Um die Qualität der Rekonstruktion weiter zu verbessern, wird eine neuartige Methode zur Einbeziehung von Hilfslossfunktionen auf der Grundlage ihrer jeweiligen normalisierten Gradientenähnlichkeit entwickelt und gezeigt, dass sie bisherige adaptive Methoden übertrifft.
Um die verschiedenen Rekonstruktionsansätze zu trainieren und zu bewerten, werden zwei Datensätze erstellt. Zunächst wird ein großer synthetischer spektraler Lichtfelddatensatz mit verfügbarer Disparität Ground Truth unter Verwendung eines Raytracers erstellt. Dieser Datensatz, der etwa 100k spektrale Lichtfelder mit dazugehöriger Disparität enthält, wird in einen Trainings-, Validierungs- und Testdatensatz aufgeteilt. Um die Qualität weiter zu bewerten, werden sieben handgefertigte Szenen, so genannte Datensatz-Challenges, erstellt. Schließlich wird ein realer spektraler Lichtfelddatensatz mit einer speziell angefertigten spektralen Lichtfeldreferenzkamera aufgenommen. Die radiometrische und geometrische Kalibrierung der Kamera wird im Detail besprochen.
Anhand der neuen Datensätze werden die vorgeschlagenen Rekonstruktionsansätze im Detail bewertet. Es werden verschiedene Codierungsmasken untersucht -- zufällige, reguläre, sowie Ende-zu-Ende optimierte Codierungsmasken, die mit einer neuartigen differenzierbaren fraktalen Generierung erzeugt werden. Darüber hinaus werden weitere Untersuchungen durchgeführt, zum Beispiel bezüglich der Abhängigkeit von Rauschen, der Winkelauflösung oder Tiefe.
Insgesamt sind die Ergebnisse überzeugend und zeigen eine hohe Rekonstruktionsqualität. Die Deep-Learning-basierte Rekonstruktion, insbesondere wenn sie mit adaptiven Multitasking- und Hilfslossstrategien trainiert wird, übertrifft die Compressed-Sensing-basierte Rekonstruktion mit anschließender Disparitätsschätzung nach dem Stand der Technik
- …