34 research outputs found

    Register-transfer level design of sum of absolute transformed difference for high efficiency video coding

    Get PDF
    High Efficiency Video Coding (HEVC) is the state-of-the-art video coding standard which offers 50% improvement in coding efficiency over its predecessor Advanced Video Coding (AVC). Compared to AVC, HEVC supports up to 33 angular modes, DC mode and planar mode. The significant rise in the number of intra prediction mode however has increased the computational complexity. Sum of Absolute Transformed Difference (SATD), a fast Rate Distortion Optimization (RDO) intra prediction algorithm in the HEVC standard, is one of the most complex and compute-intensive part of the encoding process. SATD alone can takes up to 40% of the total encoding time; hence off-loading it to dedicated hardware accelerators is necessary to address the increasing need for real-time video coding in accordance with the push for coding efficiency. This work proposes a Verilog-described N × N SATD hardware architecture which is based on Hadamard Transform. The architecture would support a variable block size from 4 × 4 to 32 × 32 with 1-D horizontal and 1-D vertical Hadamard Transform. At the same time, it is designed to achieve throughput optimization by pipelining and feedthrough control. The performance of the implemented SATD is then evaluated in terms of utilization, timing and power

    HEVC video compression hardware designs

    Get PDF
    High Efficiency Video Coding (HEVC), a recently developed international standard for video compression, offers significantly better video compression efficiency than previous international standards. However, this coding gain comes with an increase in computational complexity. Therefore, in this thesis, we first designed a high performance hardware architecture for deblocking filter algorithm used in HEVC standard. Two parallel datapaths are used in the hardware to increase its performance. The proposed hardware is implemented in Verilog HDL. The Verilog RTL code is mapped to a Xilinx XC6VLX240T FPGA, and it is verified to work correctly on a Xilinx ML605 FPGA board which includes a Xilinx XC6VLX240T FPGA. The FPGA implementation can work at 108 MHz, and it can code 30 full HD (1920x1080) video frames per second. We then proposed an energy reduction technique for Sum of Absolute Transformed Difference (SATD) based HEVC intra mode decision algorithm. We designed an efficient hardware architecture for SATD based HEVC intra mode decision algorithm including the proposed technique. The proposed hardware is implemented in Verilog HDL. The Verilog RTL code is mapped to a Xilinx XC6VLX365T FPGA, and it is verified with post place & route simulations. The FPGA implementation can work at 116 MHz, and it can code 21 HD (1280x720) video frames per second. The proposed technique reduced its energy consumption up to 64.6% on this FPGA without any PSNR loss

    A computation and energy reduction technique for HEVC intra mode decision

    Full text link

    Hadamard transform improvement for hevc using intel avx-512

    Get PDF
    High Efficiency Video Coding (HEVC) doubles the data compression ratio compared to previous generation compression technology, Moving Picture Expert Group-Advanced Video Codec (MPEG-AVC/H.264) without sacrificing the image quality. However, this superior compression comes at the cost of more computation payload resulting in longer time for encoding and decoding. This work proposes the vectorization on HEVC data heavy computation algorithm, Hadamard Transform or Sum of Absolute Transform Difference (SATD) and Sum of Absolute Difference (SAD) to achieve optimized compression performance. Single Instruction Multiple Data (SIMD) acceleration will be based on the Intel AVX-512 (Advanced Vector Extension) Instruction Set Architecture (ISA). Since HEVC supports more coding tree block (CTB) sizes, SATD and SAD algorithms eventually become more complex compared to AVC. As a result, SATD and SAD algorithms with various block sizes will be subjected to SIMD acceleration. We provide performance evaluation based on different SIMD ISA and without SIMD implementation on HEVC SATD and SAD and found that AVX-512 optimized implementation performed faster when compared to non- optimized SATD and SAD but showed signs of reduced performance when compared to SSE optimized SATD and SAD

    Hadarmard transform and sum of absolute difference improvement on high efficiency video coding using intel advanced vector extension-512

    Get PDF
    High Efficiency Video Coding (HEVC) doubles the data compression ratio compared to previous generation compression technology, Moving Picture Expert Group-Advanced Video Codec (MPEG-AVC/H.264) without sacrificing the image quality. However, this superior compression come at a cost of more computation payload resulting in longer time consumed in encoding and decoding. Hence, the objective of this thesis is to perform vectorization on HEVC data heavy computation algorithm, Hadamard Transform or Sum of Absolute Transform Difference (SATD) and Sum of Absolute Difference (SAD) to achieve optimized compression performance. Single Instruction Multiple Data (SIMD) acceleration will be based on the Intel AVX-512 (Advanced Vector Extension) Instruction Set Architecture (ISA). Since HEVC supports more coding tree block (CTB) sizes, SATD and SAD algorithm eventually become more complex compared to AVC. As a result, SATD and SAD algorithms with various block sizes will be subjected to SIMD acceleration. On the other hand, the second objective is to provide performance evaluation or analysis based on different SIMD ISA and without SIMD implementation on HEVC SATD and SAD. In the end, AVX-512 optimized was performed faster when compared to non optimized SATD and SAD but showed sign of slower in time execution when compared to SSE optimized SATD and SAD

    Diseño de una arquitectura hardware del algoritmo Blockmerging según el estándar HEVC de transmisión de video 4K en tiempo real

    Get PDF
    Las exigencias actuales en cuanto a calidad del contenido de video, a la par con el incremento del ancho de banda destinado a la transmisión de video, impulsan la necesidad de desarrollar estándares de codificación que sean más eficientes en cuanto a la calidad de reconstrucción de las imágenes decodificadas y a la cantidad de bits usados para la codificación. En este contexto, surge el nuevo estándar HEVC (High Efficiency VideoCoding) denominadoH.265[1]. Si bien el estándar HEVC ha superado enormemente a sus predecesores con la inclusión de nuevas herramientas tales como el particionamiento en bloques de dimensión variable de los cuadros de la secuencia de video, así como nuevas técnicas de predicción de vectores de movimiento como el AMVP (Advanced Motion Vector Prediction), la etapa de codificación no es del todo eficiente. Con el objetivo de mejorar la eficiencia de codificación del HEVC se incorpora la técnica de Block Merging. Esta técnica busca provechar las redundancias temporales y espaciales entre bloques de predicción vecinos, de manera que se puedan codificar en conjunto aquellos bloques que cumplan con ciertos criterios de selección [2]. Por lo tanto, el algoritmo de Block Merging tiene como finalidad encontrar, a partir de una lista de posibles candidatos, el bloque de predicción (bloque de píxeles de 8x8,16x16,32x32 o 64x64) cuyo vector de movimiento tenga asociado el mejor costo RD (Rate-Distortion). Al codificar en conjunto bloques de predicción, se evita enviar parámetros (como los vectores de movimiento) repetidas veces, logrando así una importante reducción en la cantidad de bytes a emplear durante la codificación de cuadros de video , considerando que resoluciones de 4k u 8k son cada vez más comunes en los contenidos audiovisuales[2]. El presente trabajo se enfoca en implementar la etapa de elección del mejor candidato usando el algoritmo de SATD (Sum of Absolute Transformed Differences) como parte del algoritmo de Block Merging. La arquitectura fue descrita usando Verilog-HDL y sintetizada en dispositivos FPGA (Field Programmable Gate Array) de la familia Kintex 7 de Xilinx. Por otra parte, se verificó el funcionamiento de la arquitectura mediante el uso conjunto del simulador RTL (Register Transfer Level) de la herramienta Vivado Design Suite y un software de referencia en MATLAB (Matrix Laboratory), logrando una frecuencia de operación de 263.158 MHz. Tomando en cuenta una lista de tres candidatos y un particionamiento variable con un Parámetro de Cuantizaciónn QP (Quantization Parameter) en el rango de 22 hasta 36 considerado en trabajos pasados, la tasa de procesamiento de secuencias de video 4k (3840x2160) de la presente arquitectura se encuentra en el rango de 77.30 fps hasta 145.93 fps, lo cual se ajusta a los requerimientos por parte del estándar HEVC para transmitir video en tiempo real

    Arquitetura energeticamente eficiente para cálculo da SATD através do reúso de dados

    Get PDF
    TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação.O contínuo aumento das resoluções usadas em vídeos digitais tornam necessária a adoção de novas técnicas de codificação de vídeo. A Estimação de Movimento (ME) é a etapa mais intensiva em termos de tempo e consumo energético por realizar um elevado número de cálculos de similaridade entre blocos, como por exemplo a SATD. Assim, este trabalho propõe uma arquitetura de SATD com reúso de cálculos, tendo como objetivo diminuir o consumo energético. Após a descrição da arquitetura, a mesma foi sintetizada e simulada com uma ferramenta de uso industrial. Foram utilizados cinco conjuntos de dados para simulação, um gerado a partir de dados aleatórios e quatro a partir de sequências de vídeos. Ao analisar os resultados obtidos, houve uma redução na área de até 80\% em relação às arquiteturas do estado da arte. O consumo energético da arquitetura projetada foi até 55\% menor do que aqueles apresentados pelas arquiteturas do estado da arte. Portanto, a arquitetura proposta se mostra vantajosa quando é necessário calcular múltiplos tamanhos de blocos.The increasing video resolutions bring the need for new video coding techniques. Among several tools, Motion Estimation (ME) is one of the most time and energy demanding due to the large number of distortion computations, such as the SATD. Thus, this work proposes a new SATD architecture aiming to reduce energy consumption through the reuse of calculations. Such architecture was synthesized and simulated with Synopsys tools and five distinct data sets were used as stimuli for simulation; one randomly generated and the remaining four obtained from video samples. The results show that the architecture occupies a smaller area than other SATD architectures from the literature. Moreover, the proposed architecture was up to 55\% more energy efficient than its counterparts. Therefore, the proposed architecture shows itself as the right design choice when doing variable block size ME using SATD as distortion metric

    HEVC의 소수 단위 움직임 추정을 위한 보간 필터 중복 연산 감소 방법

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 이혁재.High-Efficiency Video Coding (HEVC) [1] is the latest video coding standard established by Joint Collaborative Team on Video Coding (JCT-VC) aiming to achieve twice encoding efficiency with comparatively high video quality compared to its predecessor, the H.264 standard. Motion Estimation (ME) which consists of integer motion estimation (IME) and fractional motion estimation (FME) is the bottleneck of HEVC computation. In the execution of the HM reference software, ME alone accounts for about 50 % of the execution time in which IME contributes to about 20 % and FME does around 30% [2].The FMEs enormous computational complexity can be explained by two following reasons: • A large number of FME refinements processed: In HEVC, a frame is divided into CTU, whose size is usually 64x64 pixels. One 64x64 CTU consists of 85 CUs including one 64x64 CU at depth 0, four 32x32 CUs at depth 1, 16 16x16 CUs at depth 2, and 64 8x8 CUs at depth 3. Each CU can be partitioned into PUs according to a set of 8 allowable partition types. An HEVC encoder processes FME refinement for all possible PUs with usually 4 reference frames before deciding the best configuration for a CTU. As a result, typically in HEVCs reference software, HM, for one CTU, it has to process 2,372 FME refinements, which consumes a lot of computational resources. • A complicated and redundant interpolation process: Conventionally, FME refinement, which consists of interpolation and sum of absolute transformed difference (SATD), is processed for every PU in 4 reference frames. As a result, for a 64x64 CTU, in order to process fractional pixel refinement, FME needs to interpolate 6,232,900 fractional pixels. In addition, In HEVC, fractional pixels which consist half fractional pixels and quarter fractional pixels, are interpolated by 8-tap filters and 7-tap filters instead of 6-tap filters and bilinear filters as previous standards. As a result, interpolation process in FME imposes an extreme computational burden on HEVC encoders. This work proposes two algorithms which tackle each one of the two above reasons. The first algorithm, Advanced Decision of PU Partitions and CU Depths for FME, estimates the cost of IMEs and selects the PU partition types at the CU level and the CU depths at the coding tree unit (CTU) level for FME. Experimental results show that the algorithm effectively reduces the complexity by 67.47% with a BD-BR degrade of 1.08%. The second algorithm, A Reduction of the Interpolation Redundancy for FME, reduces up to 86.46% interpolation computation without any encoding performance decrease. The combination of the two algorithms forms a coherent solution to reduce the complexity of FME. Considering interpolation is a half of the complexity of an FME refinement, then the complexity of FME could be reduced more than 85% with a BD-BR increase of 1.66%Chapter 1. Introduction 1 1. Introduction to Video Coding 1 1.1. Definition of Video Coding 1 1.2. The Need of Video Coding 1 1.3. Basics of Video Coding 2 1.4. Video Coding Standard 2 2. Introduction to HEVC 6 2.1. HEVC Background and Development 6 2.2. Block Partitioning Structure in HEVC 9 Chapter 2. Fractional Motion Estimation in HEVC and Related Works on Complexity Reduction 21 1. Motion Estimation 21 2. Fractional Motion Estimation 22 2.1. Interpolation 22 2.2. Sum of Absolute Transformed Difference Calculation 27 2.3. Fractional Motion Estimation Procedure 28 Chapter 3. Complexity Reduction for FME 31 1. Problem Statement and Previous Studies 31 1.1. Problem Statement 31 1.2. Previous Studies 32 2. Proposed Algorithms 34 2.1. Advanced Decision of PU Partitions and CU Depths for Fractional Motion Estimation in HEVC 34 2.2. Range-based interpolation algorithm 40 Chapter 4. Experiment Results 43 1. Advanced Decision of PU Partitions and CU Depths for Fractional Motion Estimation in HEVC Algorithms 43 1.1. Advanced Decision of PU Partitions 43 1.2. Advanced Decision of CU Partitions 47 1.3. Combination of Advanced PU Partition and CU Depth Decision 47 1.4. Comparison with Other Similar Works 48 2. Range-based Algorithm 49 2.1. Software Implementation 49 2.2. Hardware Implementation of the Algorithm 50 Chapter 5. Conclusion 61 Bibliography 64 Abstract in Korean 66Maste
    corecore