86 research outputs found

    Scalable light field representation and coding

    Get PDF
    This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images. To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions. To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency. A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually. In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, são também apresentadas propostas para melhorar funcionalidades como o acesso aleatório ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, são propostas e estudadas várias técnicas de codificação de campos de luz para explorar as características inerentes aos seus tipos mais populares de representação, que são normalmente baseadas em micro-imagens ou imagens de sub-abertura. Para codificar as micro-imagens, são propostas duas soluções, visando explorar a redundância entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parâmetros do modelo são explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluções propostas são capazes de superar as soluções de predição de baixa ordem. Para codificar imagens de sub-abertura, é proposta uma solução baseada em HEVC que explora a inerente redundância intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz é codificada como uma pseudo-sequência de vídeo, onde a ordem de varrimento é sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referência para melhorar a eficiência da codificação. Também é proposta uma nova abordagem de codificação baseada na representação híbrida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente. A fim de facilitar a rápida implantação da tecnologia de campo de luz, esta Tese também propõe abordagens escaláveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscópicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiência de codificação. Além disso, o acesso aleatório de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, também é permitido com um equilíbrio flexível entre eficiência de codificação e acesso aleatório de pontos de vista

    Light field image processing: an overview

    Get PDF
    Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data

    Transformées basées graphes pour la compression de nouvelles modalités d’image

    Get PDF
    Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilité de nouveaux types de caméras capturant des informations géométriques supplémentaires, ainsi que de l'émergence de nouvelles modalités d'image telles que les champs de lumière et les images omnidirectionnelles, il est nécessaire de stocker et de diffuser une quantité énorme de hautes dimensions. Les exigences croissantes en matière de streaming et de stockage de ces nouvelles modalités d’image nécessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces données. Cette thèse a pour but d'explorer de nouvelles approches basées sur les graphes pour adapter les techniques de codage de transformées d'image aux types de données émergents où les informations échantillonnées reposent sur des structures irrégulières. Dans une première contribution, de nouvelles transformées basées sur des graphes locaux sont conçues pour des représentations compactes des champs de lumière. En tirant parti d’une conception minutieuse des supports de transformées locaux et d’une procédure d’optimisation locale des fonctions de base , il est possible d’améliorer considérablement le compaction d'énergie. Néanmoins, la localisation des supports ne permettait pas d'exploiter les dépendances à long terme du signal. Cela a conduit à une deuxième contribution où différentes stratégies d'échantillonnage sont étudiées. Couplés à de nouvelles méthodes de prédiction, ils ont conduit à des résultats très importants en ce qui concerne la compression quasi sans perte de champs de lumière statiques. La troisième partie de la thèse porte sur la définition de sous-graphes optimisés en distorsion de débit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de liberté aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou définir un modèle (ensemble de poids sur les arêtes) qui pourrait ne pas être entièrement fiable pour la conception de transformées. La dernière partie de la thèse est consacrée à l'analyse théorique de l'effet de l'incertitude sur l'efficacité des transformées basées graphes

    Multiple viewpoint rendering for three-dimensional displays

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1997.Includes bibliographical references (leaves 159-164).Michael W. Halle.Ph.D

    Visual analysis and synthesis with physically grounded constraints

    Get PDF
    The past decade has witnessed remarkable progress in image-based, data-driven vision and graphics. However, existing approaches often treat the images as pure 2D signals and not as a 2D projection of the physical 3D world. As a result, a lot of training examples are required to cover sufficiently diverse appearances and inevitably suffer from limited generalization capability. In this thesis, I propose "inference-by-composition" approaches to overcome these limitations by modeling and interpreting visual signals in terms of physical surface, object, and scene. I show how we can incorporate physically grounded constraints such as scene-specific geometry in a non-parametric optimization framework for (1) revealing the missing parts of an image due to removal of a foreground or background element, (2) recovering high spatial frequency details that are not resolvable in low-resolution observations. I then extend the framework from 2D images to handle spatio-temporal visual data (videos). I demonstrate that we can convincingly fill spatio-temporal holes in a temporally coherent fashion by jointly reconstructing the appearance and motion. Compared to existing approaches, our technique can synthesize physically plausible contents even in challenging videos. For visual analysis, I apply stereo camera constraints for discovering multiple approximately linear structures in extremely noisy videos with an ecological application to bird migration monitoring at night. The resulting algorithms are simple and intuitive while achieving state-of-the-art performance without the need of training on an exhaustive set of visual examples

    Efficient and Accurate Disparity Estimation from MLA-Based Plenoptic Cameras

    Get PDF
    This manuscript focuses on the processing images from microlens-array based plenoptic cameras. These cameras enable the capturing of the light field in a single shot, recording a greater amount of information with respect to conventional cameras, allowing to develop a whole new set of applications. However, the enhanced information introduces additional challenges and results in higher computational effort. For one, the image is composed of thousand of micro-lens images, making it an unusual case for standard image processing algorithms. Secondly, the disparity information has to be estimated from those micro-images to create a conventional image and a three-dimensional representation. Therefore, the work in thesis is devoted to analyse and propose methodologies to deal with plenoptic images. A full framework for plenoptic cameras has been built, including the contributions described in this thesis. A blur-aware calibration method to model a plenoptic camera, an optimization method to accurately select the best microlenses combination, an overview of the different types of plenoptic cameras and their representation. Datasets consisting of both real and synthetic images have been used to create a benchmark for different disparity estimation algorithm and to inspect the behaviour of disparity under different compression rates. A robust depth estimation approach has been developed for light field microscopy and image of biological samples

    Methods for Light Field Display Profiling and Scalable Super-Multiview Video Coding

    Get PDF
    Light field 3D displays reproduce the light field of real or synthetic scenes, as observed by multiple viewers, without the necessity of wearing 3D glasses. Reproducing light fields is a technically challenging task in terms of optical setup, content creation, distributed rendering, among others; however, the impressive visual quality of hologramlike scenes, in full color, with real-time frame rates, and over a very wide field of view justifies the complexity involved. Seeing objects popping far out from the screen plane without glasses impresses even those viewers who have experienced other 3D displays before.Content for these displays can either be synthetic or real. The creation of synthetic (rendered) content is relatively well understood and used in practice. Depending on the technique used, rendering has its own complexities, quite similar to the complexity of rendering techniques for 2D displays. While rendering can be used in many use-cases, the holy grail of all 3D display technologies is to become the future 3DTVs, ending up in each living room and showing realistic 3D content without glasses. Capturing, transmitting, and rendering live scenes as light fields is extremely challenging, and it is necessary if we are about to experience light field 3D television showing real people and natural scenes, or realistic 3D video conferencing with real eye-contact.In order to provide the required realism, light field displays aim to provide a wide field of view (up to 180°), while reproducing up to ~80 MPixels nowadays. Building gigapixel light field displays is realistic in the next few years. Likewise, capturing live light fields involves using many synchronized cameras that cover the same display wide field of view and provide the same high pixel count. Therefore, light field capture and content creation has to be well optimized with respect to the targeted display technologies. Two major challenges in this process are addressed in this dissertation.The first challenge is how to characterize the display in terms of its capabilities to create light fields, that is how to profile the display in question. In clearer terms this boils down to finding the equivalent spatial resolution, which is similar to the screen resolution of 2D displays, and angular resolution, which describes the smallest angle, the color of which the display can control individually. Light field is formalized as 4D approximation of the plenoptic function in terms of geometrical optics through spatiallylocalized and angularly-directed light rays in the so-called ray space. Plenoptic Sampling Theory provides the required conditions to sample and reconstruct light fields. Subsequently, light field displays can be characterized in the Fourier domain by the effective display bandwidth they support. In the thesis, a methodology for displayspecific light field analysis is proposed. It regards the display as a signal processing channel and analyses it as such in spectral domain. As a result, one is able to derive the display throughput (i.e. the display bandwidth) and, subsequently, the optimal camera configuration to efficiently capture and filter light fields before displaying them.While the geometrical topology of optical light sources in projection-based light field displays can be used to theoretically derive display bandwidth, and its spatial and angular resolution, in many cases this topology is not available to the user. Furthermore, there are many implementation details which cause the display to deviate from its theoretical model. In such cases, profiling light field displays in terms of spatial and angular resolution has to be done by measurements. Measurement methods that involve the display showing specific test patterns, which are then captured by a single static or moving camera, are proposed in the thesis. Determining the effective spatial and angular resolution of a light field display is then based on an automated analysis of the captured images, as they are reproduced by the display, in the frequency domain. The analysis reveals the empirical limits of the display in terms of pass-band both in the spatial and angular dimension. Furthermore, the spatial resolution measurements are validated by subjective tests confirming that the results are in line with the smallest features human observers can perceive on the same display. The resolution values obtained can be used to design the optimal capture setup for the display in question.The second challenge is related with the massive number of views and pixels captured that have to be transmitted to the display. It clearly requires effective and efficient compression techniques to fit in the bandwidth available, as an uncompressed representation of such a super-multiview video could easily consume ~20 gigabits per second with today’s displays. Due to the high number of light rays to be captured, transmitted and rendered, distributed systems are necessary for both capturing and rendering the light field. During the first attempts to implement real-time light field capturing, transmission and rendering using a brute force approach, limitations became apparent. Still, due to the best possible image quality achievable with dense multi-camera light field capturing and light ray interpolation, this approach was chosen as the basis of further work, despite the massive amount of bandwidth needed. Decompression of all camera images in all rendering nodes, however, is prohibitively time consuming and is not scalable. After analyzing the light field interpolation process and the data-access patterns typical in a distributed light field rendering system, an approach to reduce the amount of data required in the rendering nodes has been proposed. This approach, on the other hand, requires rectangular parts (typically vertical bars in case of a Horizontal Parallax Only light field display) of the captured images to be available in the rendering nodes, which might be exploited to reduce the time spent with decompression of video streams. However, partial decoding is not readily supported by common image / video codecs. In the thesis, approaches aimed at achieving partial decoding are proposed for H.264, HEVC, JPEG and JPEG2000 and the results are compared.The results of the thesis on display profiling facilitate the design of optimal camera setups for capturing scenes to be reproduced on 3D light field displays. The developed super-multiview content encoding also facilitates light field rendering in real-time. This makes live light field transmission and real-time teleconferencing possible in a scalable way, using any number of cameras, and at the spatial and angular resolution the display actually needs for achieving a compelling visual experience

    Efficient rendering for three-dimensional displays

    Get PDF
    This thesis explores more efficient methods for visualizing point data sets on three-dimensional (3D) displays. Point data sets are used in many scientific applications, e.g. cosmological simulations. Visualizing these data sets in {3D} is desirable because it can more readily reveal structure and unknown phenomena. However, cutting-edge scientific point data sets are very large and producing/rendering even a single image is expensive. Furthermore, current literature suggests that the ideal number of views for 3D (multiview) displays can be in the hundreds, which compounds the costs. The accepted notion that many views are required for {3D} displays is challenged by carrying out a novel human factor trials study. The results suggest that humans are actually surprisingly insensitive to the number of viewpoints with regard to their task performance, when occlusion in the scene is not a dominant factor. Existing stereoscopic rendering algorithms can have high set-up costs which limits their use and none are tuned for uncorrelated {3D} point rendering. This thesis shows that it is possible to improve rendering speeds for a low number of views by perspective reprojection. The novelty in the approach described lies in delaying the reprojection and generation of the viewpoints until the fragment stage of the pipeline and streamlining the rendering pipeline for points only. Theoretical analysis suggests a fragment reprojection scheme will render at least 2.8 times faster than na\"{i}vely re-rendering the scene from multiple viewpoints. Building upon the fragment reprojection technique, further rendering performance is shown to be possible (at the cost of some rendering accuracy) by restricting the amount of reprojection required according to the stereoscopic resolution of the display. A significant benefit is that the scene depth can be mapped arbitrarily to the perceived depth range of the display at no extra cost than a single region mapping approach. Using an average case-study (rendering from a 500k points for a 9-view High Definition 3D display), theoretical analysis suggests that this new approach is capable of twice the performance gains than simply reprojecting every single fragment, and quantitative measures show the algorithm to be 5 times faster than a naïve rendering approach. Further detailed quantitative results, under varying scenarios, are provided and discussed

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF
    corecore