1,310 research outputs found

    Overview of MV-HEVC prediction structures for light field video

    Get PDF
    Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions

    Motion and disparity estimation with self adapted evolutionary strategy in 3D video coding

    Get PDF
    Real world information, obtained by humans is three dimensional (3-D). In experimental user-trials, subjective assessments have clearly demonstrated the increased impact of 3-D pictures compared to conventional flat-picture techniques. It is reasonable, therefore, that we humans want an imaging system that produces pictures that are as natural and real as things we see and experience every day. Three-dimensional imaging and hence, 3-D television (3DTV) are very promising approaches expected to satisfy these desires. Integral imaging, which can capture true 3D color images with only one camera, has been seen as the right technology to offer stress-free viewing to audiences of more than one person. In this paper, we propose a novel approach to use Evolutionary Strategy (ES) for joint motion and disparity estimation to compress 3D integral video sequences. We propose to decompose the integral video sequence down to viewpoint video sequences and jointly exploit motion and disparity redundancies to maximize the compression using a self adapted ES. A half pixel refinement algorithm is then applied by interpolating macro blocks in the previous frame to further improve the video quality. Experimental results demonstrate that the proposed adaptable ES with Half Pixel Joint Motion and Disparity Estimation can up to 1.5 dB objective quality gain without any additional computational cost over our previous algorithm.1Furthermore, the proposed technique get similar objective quality compared to the full search algorithm by reducing the computational cost up to 90%

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

    Spatial prediction based on self-similarity compensation for 3D holoscopic image and video coding

    Get PDF
    WOS:000298962501022 (NÂș de Acesso Web of Science)Holoscopic imaging, also known as integral imaging, provides a solution for glassless 3D, and is promising to change the market for 3D television. To start, this paper briefly describes the general concepts of holoscopic imaging, focusing mainly on the spatial correlations inherent to this new type of content, which appear due to the micro-lens array that is used for both acquisition and display. The micro-images that are formed behind each micro-lens, from which only one pixel is viewed from a given observation point, have a high cross-correlation between them, which can be exploited for coding. A novel scheme for spatial prediction, exploring the particular arrangement of holoscopic images, is proposed. The proposed scheme can be used for both still image coding and intra-coding of video. Experimental results based on an H.264/AVC video codec modified to handle 3D holoscopic images and video are presented, showing the superior performance of this approach

    Improved inter-layer prediction for Light field content coding with display scalability

    Get PDF
    Light field imaging based on microlens arrays - also known as plenoptic, holoscopic and integral imaging - has recently risen up as feasible and prospective technology due to its ability to support functionalities not straightforwardly available in conventional imaging systems, such as: post-production refocusing and depth of field changing. However, to gradually reach the consumer market and to provide interoperability with current 2D and 3D representations, a display scalable coding solution is essential. In this context, this paper proposes an improved display scalable light field codec comprising a three-layer hierarchical coding architecture (previously proposed by the authors) that provides interoperability with 2D (Base Layer) and 3D stereo and multiview (First Layer) representations, while the Second Layer supports the complete light field content. For further improving the compression performance, novel exemplar-based inter-layer coding tools are proposed here for the Second Layer, namely: (i) an inter-layer reference picture construction relying on an exemplar-based optimization algorithm for texture synthesis, and (ii) a direct prediction mode based on exemplar texture samples from lower layers. Experimental results show that the proposed solution performs better than the tested benchmark solutions, including the authors' previous scalable codec.info:eu-repo/semantics/acceptedVersio

    Fast Multi-frame Stereo Scene Flow with Motion Segmentation

    Full text link
    We propose a new multi-frame method for efficiently computing scene flow (dense depth and optical flow) and camera ego-motion for a dynamic scene observed from a moving stereo camera rig. Our technique also segments out moving objects from the rigid scene. In our method, we first estimate the disparity map and the 6-DOF camera motion using stereo matching and visual odometry. We then identify regions inconsistent with the estimated camera motion and compute per-pixel optical flow only at these regions. This flow proposal is fused with the camera motion-based flow proposal using fusion moves to obtain the final optical flow and motion segmentation. This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency. Our method is currently ranked third on the KITTI 2015 scene flow benchmark. Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3 orders of magnitude faster than the top six methods. We also report a thorough evaluation on challenging Sintel sequences with fast camera and object motion, where our method consistently outperforms OSF [Menze and Geiger, 2015], which is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo Scene Flow Benchmark in November 201

    Scalable light field representation and coding

    Get PDF
    This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images. To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions. To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency. A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually. In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, sĂŁo tambĂ©m apresentadas propostas para melhorar funcionalidades como o acesso aleatĂłrio ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, sĂŁo propostas e estudadas vĂĄrias tĂ©cnicas de codificação de campos de luz para explorar as caracterĂ­sticas inerentes aos seus tipos mais populares de representação, que sĂŁo normalmente baseadas em micro-imagens ou imagens de sub-abertura. Para codificar as micro-imagens, sĂŁo propostas duas soluçÔes, visando explorar a redundĂąncia entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parĂąmetros do modelo sĂŁo explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluçÔes propostas sĂŁo capazes de superar as soluçÔes de predição de baixa ordem. Para codificar imagens de sub-abertura, Ă© proposta uma solução baseada em HEVC que explora a inerente redundĂąncia intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz Ă© codificada como uma pseudo-sequĂȘncia de vĂ­deo, onde a ordem de varrimento Ă© sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referĂȘncia para melhorar a eficiĂȘncia da codificação. TambĂ©m Ă© proposta uma nova abordagem de codificação baseada na representação hĂ­brida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente. A fim de facilitar a rĂĄpida implantação da tecnologia de campo de luz, esta Tese tambĂ©m propĂ”e abordagens escalĂĄveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscĂłpicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiĂȘncia de codificação. AlĂ©m disso, o acesso aleatĂłrio de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, tambĂ©m Ă© permitido com um equilĂ­brio flexĂ­vel entre eficiĂȘncia de codificação e acesso aleatĂłrio de pontos de vista
    • 

    corecore