182 research outputs found

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Block based Rate-Distortion analysis for quality improvement of synthesized views

    Get PDF
    We present a preliminary study on the Rate-Distortion (RD) gain that can be achieved applying RD optimization techniques in a multiview plus depth encoder. We consider the use of Multiview Video Coding (MVC) for both, color and depth sequences, and evaluate the improvement that can be obtained allowing a quantization parameter (QP) assignment on a macroblock basis compared to the use of a fixed QP for the whole sequence. The optimization criterion is the minimization of the distortion of the synthesized views generated at the receiver. Our motivation for this criterion is to capture the impact of depth coding according to its final purpose: the generation of virtual views. Since a unique objective quality metric for view synthesis artifacts evaluation has not been set yet, the performance of several algorithms for quality evaluation of the target synthesized view have been compared. Beyond obtaining a better RD performance, as could be expected, results also show that optimized synthesized views achieve absolute lower distortion values than the best result of the approach that uses a fixed QP for the whole sequence

    Methodology and optimizing of multiple frame format buffering within FPGA H.264/AVC decoder with FRExt.

    Get PDF
    Digital representation of video data is an inherently resource demanding problem that continues to necessitate the development and refinement of coding methods. The H.264/AVC standard, along with its recent Fidelity Range Extensions amendment (FRExt), is quickly being adopted as the standard codec for broadcast and distribution of high definition video. The FRExt amendment, while not necessarily affecting the overall decoder architecture, presents an added complexity of providing efficient memory management for buffering intermediate frames of various pixel color samplings and depths. This thesis evaluated the role of designing the frame buffer of a hardware video decoder, with integrated support for the H.264/AVC codec plus FRExt. With focus on organizing external memory data access, the frame buffer was designed to provide intermediate data storage for the decoder, while using an efficient store and load scheme that takes into consideration each frame pixel format of the video data. VHDL was used to model the frame buffer. Exploitation of reconfigurability and post-synthesis FPGA simulations were used to evaluate behavior, scalability and power consumption, while providing an analysis of approaches to adding FRExt to the memory management. Real-time buffer performance was achieved for two common frame formats at 1080 HD resolution; and an innovative pipeline design provides dynamic switching of formats between video sequences. As an additional consequence of verifying the model, a preexisting Baseline H.264/AVC decoder testbench was augmented to support testing of multiple frame formats

    Parallel algorithms and architectures for low power video decoding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 197-204).Parallelism coupled with voltage scaling is an effective approach to achieve high processing performance with low power consumption. This thesis presents parallel architectures and algorithms designed to deliver the power and performance required for current and next generation video coding. Coding efficiency, area cost and scalability are also addressed. First, a low power video decoder is presented for the current state-of-the-art video coding standard H.264/AVC. Parallel architectures are used along with voltage scaling to deliver high definition (HD) decoding at low power levels. Additional architectural optimizations such as reducing memory accesses and multiple frequency/voltage domains are also described. An H.264/AVC Baseline decoder test chip was fabricated in 65-nm CMOS. It can operate at 0.7 V for HD (720p, 30 fps) video decoding and with a measured power of 1.8 mW. The highly scalable decoder can tradeoff power and performance across >100x range. Second, this thesis demonstrates how serial algorithms, such as Context-based Adaptive Binary Arithmetic Coding (CABAC), can be redesigned for parallel architectures to enable high throughput with low coding efficiency cost. A parallel algorithm called the Massively Parallel CABAC (MP-CABAC) is presented that uses syntax element partitions and interleaved entropy slices to achieve better throughput-coding efficiency and throughput-area tradeoffs than H.264/AVC. The parallel algorithm also improves scalability by providing a third dimension to tradeoff coding efficiency for power and performance. Finally, joint algorithm-architecture optimizations are used to increase performance and reduce area with almost no coding penalty. The MP-CABAC is mapped to a highly parallel architecture with 80 parallel engines, which together delivers >10x higher throughput than existing H.264/AVC CABAC implementations. A MP-CABAC test chip was fabricated in 65-nm CMOS to demonstrate the power-performance-coding efficiency tradeoff.by Vivienne. Sze.Ph.D

    Design, Implementation, and Evaluation of a Point Cloud Codec for Tele-Immersive Video

    Full text link

    Video Quality Assessment in Video Streaming Services:Encoder Performance Comparison

    Get PDF

    Hardware Software Synthesis of a H.264 / AVC Baseline Profile Decoder

    Get PDF
    The latest video compression standard is a joint effort between ITU and MPEG known as H.264/AVC. As with any video compression standard the H.264/AVC uses computationally intensive algorithms to maximize performance. During decompression these algorithms must be applied in real-time, processing 30 frames a second. This can be done in software, specialized hardware, or a combination of the two. Software solutions allow for maximum portability and ease of design, but General Purpose Processors (GPP) can not take full advantage of the parallelizable algorithms that the H.264 decoder is based upon. Specialized hardware solutions, on the other hand, allow concurrent data and instruction paths, but do not offer a high level of abstraction for cross platform development. Recent work by Xilinx has resulted in the advent of the MicroBlaze soft-processor that is a stand alone microcontroller built from an FPGA. The MicroBlaze provides a specialized hardware medium to run software on-chip with VHDL entities. The goal of this thesis was to model and simulate a software hardware hybrid H.264/AVC Baseline Profile decoder using VHDL and a soft-processor. It was proposed to divide all highly sequential calculations (run-length and CALVC decoding) and control data flow into software and perform the remaining calculations (prediction, inverse transform, inverse quantization, etc.) in hardware modules. The software runs on Xilinx\u27 s MicroBlaze soft-processor and the hardware was designed using VHDL. A major advantage of soft-processors over GPP\u27s, is that it hardware instantiations reside on-chip with the processor. The software and MicroBlaze soft-processor were simulated in a test bench and the results proved that the MicroBlaze could not handle the encoded bit-stream in real-time. For this reason the hardware interface and hardware decoder were never fully implemented. The scope of the thesis covers the H.264 Baseline Profile standard, MicroBlaze processor, the implemented software solution, and the proposed hardware counterpart

    Depth-based Multi-View 3D Video Coding

    Get PDF

    Variable Block Size Motion Compensation In The Redundant Wavelet Domain

    Get PDF
    Video is one of the most powerful forms of multimedia because of the extensive information it delivers. Video sequences are highly correlated both temporally and spatially, a fact which makes the compression of video possible. Modern video systems employ motion estimation and motion compensation (ME/MC) to de-correlate a video sequence temporally. ME/MC forms a prediction of the current frame using the frames which have been already encoded. Consequently, one needs to transmit the corresponding residual image instead of the original frame, as well as a set of motion vectors which describe the scene motion as observed at the encoder. The redundant wavelet transform (RDWT) provides several advantages over the conventional wavelet transform (DWT). The RDWT overcomes the shift invariant problem in DWT. Moreover, RDWT retains all the phase information of wavelet coefficients and provides multiple prediction possibilities for ME/MC in wavelet domain. The general idea of variable size block motion compensation (VSBMC) technique is to partition a frame in such a way that regions with uniform translational motions are divided into larger blocks while those containing complicated motions into smaller blocks, leading to an adaptive distribution of motion vectors (MV) across the frame. The research proposed new adaptive partitioning schemes and decision criteria in RDWT that utilize more effectively the motion content of a frame in terms of various block sizes. The research also proposed a selective subpixel accuracy algorithm for the motion vector using a multiband approach. The selective subpixel accuracy reduces the computations produced by the conventional subpixel algorithm while maintaining the same accuracy. In addition, the method of overlapped block motion compensation (OBMC) is used to reduce blocking artifacts. Finally, the research extends the applications of the proposed VSBMC to the 3D video sequences. The experimental results obtained here have shown that VSBMC in the RDWT domain can be a powerful tool for video compression

    Foveated Video Streaming for Cloud Gaming

    Get PDF
    Video gaming is generally a computationally intensive application and to provide a pleasant user experience specialized hardware like Graphic Processing Units may be required. Computational resources and power consumption are constraints which limit visually complex gaming on, for example, laptops, tablets and smart phones. Cloud gaming may be a possible approach towards providing a pleasant gaming experience on thin clients which have limited computational and energy resources. In a cloud gaming architecture, the game-play video is rendered and encoded in the cloud and streamed to a client where it is displayed. User inputs are captured at the client and streamed back to the server, where they are relayed to the game. High quality of experience requires the streamed video to be of high visual quality which translates to substantial downstream bandwidth requirements. The visual perception of the human eye is non-uniform, being maximum along the optical axis of the eye and dropping off rapidly away from it. This phenomenon, called foveation, makes the practice of encoding all areas of a video frame with the same resolution wasteful. In this thesis, foveated video streaming from a cloud gaming server to a cloud gaming client is investigated. A prototype cloud gaming system with foveated video streaming is implemented. The cloud gaming server of the prototype is configured to encode gameplay video in a foveated fashion based on gaze location data provided by the cloud gaming client. The effect of foveated encoding on the output bitrate of the streamed video is investigated. Measurements are performed using games from various genres and with different player points of view to explore changes in video bitrate with different parameters of foveation. Latencies involved in foveated video streaming for cloud gaming, including latency of the eye tracker used in the thesis, are also briefly discussed
    corecore