3 research outputs found

    Spatial and Angular Resolution Enhancement of Light Fields Using Convolutional Neural Networks

    Get PDF
    Light field imaging extends the traditional photography by capturing both spatial and angular distribution of light, which enables new capabilities, including post-capture refocusing, post-capture aperture control, and depth estimation from a single shot. Micro-lens array (MLA) based light field cameras offer a cost-effective approach to capture light field. A major drawback of MLA based light field cameras is low spatial resolution, which is due to the fact that a single image sensor is shared to capture both spatial and angular information. In this paper, we present a learning based light field enhancement approach. Both spatial and angular resolution of captured light field is enhanced using convolutional neural networks. The proposed method is tested with real light field data captured with a Lytro light field camera, clearly demonstrating spatial and angular resolution improvement

    Light Field compression and manipulation via residual convolutional neural network

    Get PDF
    Light field (LF) imaging has gained significant attention due to its recent success in microscopy, 3-dimensional (3D) displaying and rendering, augmented and virtual reality usage. Postprocessing of LF enables us to extract more information from a scene compared to traditional cameras. However, the use of LF is still a research novelty because of the current limitations in capturing high-resolution LF in all of its four dimensions. While researchers are actively improving methods of capturing high-resolution LF\u27s, using simulation, it is possible to explore a high-quality captured LF\u27s properties. The immediate concerns following the LF capture are its storage and processing time. A rich LF occupies a large chunk of memory ---order of multiple gigabytes per LF---. Also, most feature extraction techniques associated with LF postprocessing involve multi-dimensional integration that requires access to the whole LF and is usually time-consuming. Recent advancements in computer processing units made it possible to simulate realistic images using physical-based rendering software. In this work, at first, a transformation function is proposed for building a camera array (CA) to capture the same portion of LF from a scene that a standard plenoptic camera (SPC) can acquire. Using this transformation, LF simulation with similar properties as a plenoptic camera will become trivial in any rendering software. Artificial intelligence (AI) and machine learning (ML) algorithms ---when deployed on the new generation of GPUs--- are faster than ever. It is possible to generate and train large networks with millions of trainable parameters to learn very complex features. Here, residual convolutional neural network (RCNN) structures are employed to build complex networks for compression and feature extraction from an LF. By combining state-of-the-art image compression and RCNN, I have created a compression pipeline. The proposed pipeline\u27s bit per pixel (bpp) ratio is 0.0047 on average. I show that with a 1% compression time cost and 18x speedup for decompression, our methods reconstructed LFs have better structural similarity index metric (SSIM) and comparable peak signal-to-noise ratio (PSNR) compared to the state-of-the-art video compression techniques used to compress LFs. In the end, using RCNN, I created a network called RefNet, for extracting a group of 16 refocused images from a raw LF. The training parameters of the 16 LFs are set to (\alpha=0.125, 0.250, 0.375, ..., 2.0) for training. I show that RefNet is 134x faster than the state-of-the-art refocusing technique. The RefNet is also superior in color prediction compared to the state-of-the-art ---Fourier slice and shift-and-sum--- methods

    Depth scene estimation from images captured with a plenoptic câmera

    Get PDF
    Monografia (graduação)—Université de Bordeaux, ENSEIRB-MATMECA, Universidade de Brasília, 2013.Uma câmera plenóptica, também conhecida como \textit{light field camera}, é um dispositivo que emprega uma rede de microlentes colocada entre a lente principal e o sensor da câmera para capturar a informação 4D da luz de uma cena. Este \textit{light field} nos permite conhecer a posição e o ângulo de incidência dos raios de luz capturados pela câmera e pode ser usado para melhorar as soluções de problemas relacionados com gráfico computacional e visão por computador. Com um campo de luz amostrado adquirido pela câmera, várias imagens da cena em baixa resolução estão disponíveis das quais é possível inferir a profundidade. Diferentemente do estéreo multivisão tradicional, estas vistas são capturadas pelo mesmo sensor, implicando que elas são adquiridas com os mesmos parâmetros da câmera. Da mesma forma, as vistas estão em geometria epipolar perfeita. Entretanto, outros problemas aparecem devido a esta configuração. O sensor da câmera usa um filtro de Bayer e a dematriçagem da imagem bruta implica em interferência entre as vistas, criando artefatos de imagem. A construção das vistas modifica o padrão de cores, adicionando complexidade para a dematriçagem. A resolução das vistas que podemos obter é outro problema. Como a informação angular e espacial são amostrados pelo mesmo sensor, existe um compromisso entre a resolução das vistas e o número de vistas disponíveis. Para a câmera Lytro, por exemplo, as vistas são construídas com uma resolução de aproximadamente 0,12 megapixels, implicando em \textit{aliasing} para a maioria das cenas. Este trabalho apresenta: Um técnica para construir as vistas a partir da imagem bruta capturada pela câmera; um método de estimação de disparidade adaptado às câmeras plenópticas que permite a estimação mesmo sem a dematriçagem; um novo conceito para representar a disparidade no caso do estéreo multi-vistas; um esquema de reconstrução e dematriçagem usando a informação da disparidade e os pixels de vistas vizinhas.A plenoptic camera, also known as light field camera, is a device that employs a microlens array placed between the main lens and the camera sensor to capture the 4D light field information about a scene. Such light field enable us to know the position and angle of incidence of the light rays captured by the camera and can be used to improve the solution of computer graphics and computer vision-related problems. With a sampled light field acquired from a plenoptic camera, several low-resolution views of the scene are available from which to infer depth. Unlike traditional multiview stereo, these views are captured by the same sensor, implying that they are acquired with the same camera parameters. Also the views are in perfect epipolar geometry. However, other problems arises with such configuration. The camera sensor uses a Bayer color filter and demosaicing the RAW data implies view cross-talk creating image artifacts. The rendering of the views modify the color pattern, adding complexity for demosaicing. The resolution of the views we can get is another problem. As the angular and spatial position of the light rays are sampled by the same sensor, there is a trade off between view resolution and number of available views. For Lytro camera, for example, the views are rendered with about 0.12 megapixels of resolution, implying in aliasing on the views for most of the scenes. This work present: an approach to render the views from the RAW image captured by the camera; a method of disparity estimation adapted to plenoptic cameras that enables the estimation even without executing the demosaicing; a new concept of representing the disparity information on the case of multiview stereo; a reconstruction and demosaicing scheme using the disparity information and the pixels of neighbouring views
    corecore