350 research outputs found

    Overview of 3D Video: Coding Algorithms, Implementations and Standardization

    Get PDF
    Projecte final de carrera fet en col.laboració amb Linköping Institute of TechnologyEnglish: 3D technologies have aroused a great interest over the world in the last years. Television, cinema and videogames are introducing, little by little, 3D technologies into the mass market. This comes as a result of the research done in the 3D field, solving many of its limitations such as quality, contents creation or 3D displays. This thesis focus on 3D video, considering concepts that concerns the coding issues and the video formats. The aim is to provide an overview of the current state of 3D video, including the standardization and some interesting implementations and alternatives that exist. In the report necessary background information is presented in order to understand the concepts developed: compression techniques, the different video formats, their standardization and some advances or alternatives to the processes previously explained. Finally, a comparison between the different concepts is presented to complete the overview, ending with some conclusions and proposed ideas for future works.Castellano: Las tecnologías 3D han despertado un gran interés en todo el mundo en los últimos años. Televisión, cine y videojuegos están introduciendo, poco a poco, ésta tecnología en el mercado. Esto es resultado de la investigación realizada en el campo de las 3D, solucionando muchas de sus limitaciones, como la calidad, la creación de contenidos o las pantallas 3D. Este proyecto se centra en el video 3D, considerando los conceptos relacionados con la codificación y los formatos de vídeo. El objetivo es proporcionar una visión del estado actual del vídeo 3D, incluyendo los estándares y algunas de las implementaciones más interesantes que existen. En la memoria, se presenta información adicional para facilitar el seguimiento de los conceptos desarrollados: técnicas de compresión, formatos de vídeo, su estandarización y algunos avances o alternativas a los procesos explicados. Finalmente, se presentan diferentes comparaciones entre los conceptos tratados, acabando el documento con las conclusiones obtenidas e ideas propuestas para futuros trabajos.Català: Les tecnologies 3D han despertat un gran interès a tot el món en els últims anys. Televisió, cinema i videojocs estan introduint, lentament, aquesta tecnologia en el mercat. Això és resultat de la investigació portada a terme en el camp de les 3D, solucionant moltes de les seves limitacions, com la qualitat, la creació de continguts o les pantalles 3D. Aquest proyecte es centra en el video 3D, considerant els conceptes relacionats amb la codificació i els formats de video. L'objectiu és proporcionar una visió de l'estat actual del video 3D, incloent-hi els estandàrds i algunes de les implementacions més interessants que existeixen. A la memòria, es presenta informació adicional per facilitar el seguiment dels conceptes desenvolupats: tècniques de compressió, formats de video, la seva estandardització i alguns avenços o alternatives als procesos explicats. Finalment, es presenten diferents comparacions entre els conceptes tractats i les conclusions obtingudes, juntament amb propostes per a futurs treballs

    Reducing 3D video coding complexity through more efficient disparity estimation

    Get PDF
    3D video coding for transmission exploits the Disparity Estimation (DE) to remove the inter-view redundancies present within both the texture and the depth map multi-view videos. Good estimation accuracy can be achieved by partitioning the macro-block into smaller subblocks partitions. However, the DE process must be performed on each individual sub-block to determine the optimal mode and their disparity vectors, in terms of ratedistortion efficiency. This vector estimation process is heavy on computational resources, thus, the coding computational cost becomes proportional to the number of search points and the inter-view modes tested during the rate-distortion optimization. In this paper, a solution that exploits the available depth map data, together with the multi-view geometry, is proposed to identify a better DE search area; such that it allows a reduction in its search points. It also exploits the number of different depth levels present within the current macro-block to determine which modes can be used for DE to further reduce its computations. Simulation results demonstrate that this can save up to 95% of the encoding time, with little influence on the coding efficiency of the texture and the depth map multi-view video coding. This makes 3D video coding more practical for any consumer devices, which tend to have limited computational power.peer-reviewe

    Low Complexity Multiview Video Coding

    Get PDF
    3D video is a technology that has seen a tremendous attention in the recent years. Multiview Video Coding (MVC) is an extension of the popular H.264 video coding standard and is commonly used to compress 3D videos. It offers an improvement of 20% to 50% in compression efficiency over simulcast encoding of multiview videos using the conventional H.264 video coding standard. However, there are two important problems associated with it: (i) its superior compression performance comes at the cost of significantly higher computational complexity which hampers the real-world realization of MVC encoder in applications such as 3D live broadcasting and interactive Free Viewpoint Television (FTV), and (ii) compressed 3D videos can suffer from packet loss during transmission, which can degrade the viewing quality of the 3D video at the decoder. This thesis aims to solve these problems by presenting techniques to reduce the computational complexity of the MVC encoder and by proposing a consistent error concealment technique for frame losses in 3D video transmission. The thesis first analyses the complexity of the MVC encoder. It then proposes two novel techniques to reduce the complexity of motion and disparity estimation. The first method achieves complexity reduction in the disparity estimation process by exploiting the relationship between temporal levels, type of macroblocks and search ranges while the second method achieves it by exploiting the geometrical relation- ship between motion and disparity vectors in stereo frames. These two methods are then combined with other state-of-the-art methods in a unique framework where gains add up. Experimental results show that the proposed low-complexity framework can reduce the encoding time of the standard MVC encoder by over 93% while maintaining similar compression efficiency performance. The addition of new View Synthesis Prediction (VSP) modes to the MVC encoding framework improves the compression efficiency of MVC. However, testing additional modes comes at the cost of increased encoding complexity. In order to reduce the encoding complexity, the thesis, next, proposes a bayesian early mode decision technique for a VSP enhanced MVC coder. It exploits the statistical similarities between the RD costs of the VSP SKIP mode in neighbouring views to terminate the mode decision process early. Results indicate that the proposed technique can reduce the encoding time of the enhanced MVC coder by over 33% at similar compression efficiency levels. Finally, compressed 3D videos are usually required to be broadcast to a large number of users where transmission errors can lead to frame losses which can degrade the video quality at the decoder. A simple reconstruction of the lost frames can lead to inconsistent reconstruction of the 3D scene which may negatively affect the viewing experience of a user. In order to solve this problem, the thesis proposes, at the end, a consistency model for recovering frames lost during transmission. The proposed consistency model is used to evaluate inter-view and temporal consistencies while selecting candidate blocks for concealment. Experimental results show that the proposed technique is able to recover the lost frames with high consistency and better quality than two standard error concealment methods and a baseline technique based on the boundary matching algorithm

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Computational Complexity Optimization on H.264 Scalable/Multiview Video Coding

    Get PDF
    The H.264/MPEG-4 Advanced Video Coding (AVC) standard is a high efficiency and flexible video coding standard compared to previous standards. The high efficiency is achieved by utilizing a comprehensive full search motion estimation method. Although the H.264 standard improves the visual quality at low bitrates, it enormously increases the computational complexity. The research described in this thesis focuses on optimization of the computational complexity on H.264 scalable and multiview video coding. Nowadays, video application areas range from multimedia messaging and mobile to high definition television, and they use different type of transmission systems. The Scalable Video Coding (SVC) extension of the H.264/AVC standard is able to scale the video stream in order to adapt to a variety of devices with different capabilities. Furthermore, a rate control scheme is utilized to improve the visual quality under the constraints of capability and channel bandwidth. However, the computational complexity is increased. A simplified rate control scheme is proposed to reduce the computational complexity. In the proposed scheme, the quantisation parameter can be computed directly instead of using the exhaustive Rate-Quantization model. The linear Mean Absolute Distortion (MAD) prediction model is used to predict the scene change, and the quantisation parameter will be increased directly by a threshold when the scene changes abruptly; otherwise, the comprehensive Rate-Quantisation model will be used. Results show that the optimized rate control scheme is efficient on time saving. Multiview Video Coding (MVC) is efficient on reducing the huge amount of data in multiple-view video coding. The inter-view reference frames from the adjacent views are exploited for prediction in addition to the temporal prediction. However, due to the increase in the number of reference frames, the computational complexity is also increased. In order to manage the reference frame efficiently, a phase correlation algorithm is utilized to remove the inefficient inter-view reference frame from the reference list. The dependency between the inter-view reference frame and current frame is decided based on the phase correlation coefficients. If the inter-view reference frame is highly related to the current frame, it is still enabled in the reference list; otherwise, it will be disabled. The experimental results show that the proposed scheme is efficient on time saving and without loss in visual quality and increase in bitrate. The proposed optimization algorithms are efficient in reducing the computational complexity on H.264/AVC extension. The low computational complexity algorithm is useful in the design of future video coding standards, especially on low power handheld devices

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

    Improved depth coding for HEVC focusing on depth edge approximation

    Get PDF
    The latest High Efficiency Video Coding (HEVC) standard has greatly improved the coding efficiency compared to its predecessor H.264. An important share of which is the adoption of hierarchical block partitioning structures and an extended number of modes. The structure of existing inter-modes is appropriate mainly to handle the rectangular and square aligned motion patterns. However, they could not be suitable for the block partitioning of depth objects having partial foreground motion with irregular edges and background. In such cases, the HEVC reference test model (HM) normally explores finer level block partitioning that requires more bits and encoding time to compensate large residuals. Since motion detection is the underlying criteria for mode selection, in this work, we use the energy concentration ratio feature of phase correlation to capture different types of motion in depth object. For better motion modeling focusing at depth edges, the proposed technique also uses an extra pattern mode comprising a group of templates with various rectangular and non-rectangular object shapes and edges. As the pattern mode could save bits by encoding only the foreground areas and beat all other inter-modes in a block once selected, the proposed technique could improve the rate-distortion performance. It could also reduce encoding time by skipping further branching using the pattern mode and selecting a subset of modes using innovative pre-processing criteria. Experimentally it could save 29% average encoding time and improve 0.10 dB Bjontegaard Delta peak signal-to-noise ratio compared to the HM

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

    Fast Motion Estimation Algorithms for Block-Based Video Coding Encoders

    Get PDF
    The objective of my research is reducing the complexity of video coding standards in real-time scalable and multi-view applications
    corecore