26 research outputs found

    Low complexity in-loop perceptual video coding

    Get PDF
    The tradition of broadcast video is today complemented with user generated content, as portable devices support video coding. Similarly, computing is becoming ubiquitous, where Internet of Things (IoT) incorporate heterogeneous networks to communicate with personal and/or infrastructure devices. Irrespective, the emphasises is on bandwidth and processor efficiencies, meaning increasing the signalling options in video encoding. Consequently, assessment for pixel differences applies uniform cost to be processor efficient, in contrast the Human Visual System (HVS) has non-uniform sensitivity based upon lighting, edges and textures. Existing perceptual assessments, are natively incompatible and processor demanding, making perceptual video coding (PVC) unsuitable for these environments. This research allows existing perceptual assessment at the native level using low complexity techniques, before producing new pixel-base image quality assessments (IQAs). To manage these IQAs a framework was developed and implemented in the high efficiency video coding (HEVC) encoder. This resulted in bit-redistribution, where greater bits and smaller partitioning were allocated to perceptually significant regions. Using a HEVC optimised processor the timing increase was < +4% and < +6% for video streaming and recording applications respectively, 1/3 of an existing low complexity PVC solution. Future work should be directed towards perceptual quantisation which offers the potential for perceptual coding gain

    HEVC video compression hardware designs

    Get PDF
    High Efficiency Video Coding (HEVC), a recently developed international standard for video compression, offers significantly better video compression efficiency than previous international standards. However, this coding gain comes with an increase in computational complexity. Therefore, in this thesis, we first designed a high performance hardware architecture for deblocking filter algorithm used in HEVC standard. Two parallel datapaths are used in the hardware to increase its performance. The proposed hardware is implemented in Verilog HDL. The Verilog RTL code is mapped to a Xilinx XC6VLX240T FPGA, and it is verified to work correctly on a Xilinx ML605 FPGA board which includes a Xilinx XC6VLX240T FPGA. The FPGA implementation can work at 108 MHz, and it can code 30 full HD (1920x1080) video frames per second. We then proposed an energy reduction technique for Sum of Absolute Transformed Difference (SATD) based HEVC intra mode decision algorithm. We designed an efficient hardware architecture for SATD based HEVC intra mode decision algorithm including the proposed technique. The proposed hardware is implemented in Verilog HDL. The Verilog RTL code is mapped to a Xilinx XC6VLX365T FPGA, and it is verified with post place & route simulations. The FPGA implementation can work at 116 MHz, and it can code 21 HD (1280x720) video frames per second. The proposed technique reduced its energy consumption up to 64.6% on this FPGA without any PSNR loss

    Power consumption reduction techniques for H.264 video compression hardware

    Get PDF
    Video compression systems are used in many commercial products such as digital camcorders, cellular phones and video teleconferencing systems. H.264 / MPEG4 Part 10, the recently developed international standard for video compression, offers significantly better compression efficiency than previous video compression standards. However, this compression efficiency comes with an increase in encoding complexity and therefore in power consumption. Since portable devices operate with battery, it is important to reduce power consumption so that battery life can be increased. In addition, consuming excessive power degrades the performance of integrated circuits, increases packaging and cooling costs, reduces reliability and may cause device failures. In this thesis, we propose novel computational complexity and power reduction techniques for intra prediction, deblocking filter (DBF), and intra mode decision modules of an H.264 video encoder hardware, and intra prediction with template matching (TM) hardware. We quantified the computation reductions achieved by these techniques using H.264 Joint Model reference software encoder. We designed efficient hardware architectures for these video compression algorithms and implemented them in Verilog HDL. We mapped these hardware implementations to Xilinx Virtex FPGAs and estimated their power consumptions using Xilinx XPower Analyzer tool. We integrated the proposed techniques to these hardware implementations and quantified their impact on the power consumptions of these hardware implementations on Xilinx Virtex FPGAs. The proposed techniques significantly reduced the power consumptions of these FPGA implementations in some cases with no PSNR loss and in some cases with very small PSNR loss

    Arquitetura energeticamente eficiente para cálculo da SATD através do reúso de dados

    Get PDF
    TCC(graduação) - Universidade Federal de Santa Catarina. Centro Tecnológico. Ciências da Computação.O contínuo aumento das resoluções usadas em vídeos digitais tornam necessária a adoção de novas técnicas de codificação de vídeo. A Estimação de Movimento (ME) é a etapa mais intensiva em termos de tempo e consumo energético por realizar um elevado número de cálculos de similaridade entre blocos, como por exemplo a SATD. Assim, este trabalho propõe uma arquitetura de SATD com reúso de cálculos, tendo como objetivo diminuir o consumo energético. Após a descrição da arquitetura, a mesma foi sintetizada e simulada com uma ferramenta de uso industrial. Foram utilizados cinco conjuntos de dados para simulação, um gerado a partir de dados aleatórios e quatro a partir de sequências de vídeos. Ao analisar os resultados obtidos, houve uma redução na área de até 80\% em relação às arquiteturas do estado da arte. O consumo energético da arquitetura projetada foi até 55\% menor do que aqueles apresentados pelas arquiteturas do estado da arte. Portanto, a arquitetura proposta se mostra vantajosa quando é necessário calcular múltiplos tamanhos de blocos.The increasing video resolutions bring the need for new video coding techniques. Among several tools, Motion Estimation (ME) is one of the most time and energy demanding due to the large number of distortion computations, such as the SATD. Thus, this work proposes a new SATD architecture aiming to reduce energy consumption through the reuse of calculations. Such architecture was synthesized and simulated with Synopsys tools and five distinct data sets were used as stimuli for simulation; one randomly generated and the remaining four obtained from video samples. The results show that the architecture occupies a smaller area than other SATD architectures from the literature. Moreover, the proposed architecture was up to 55\% more energy efficient than its counterparts. Therefore, the proposed architecture shows itself as the right design choice when doing variable block size ME using SATD as distortion metric

    Low power H.264 video compression hardware designs

    Get PDF
    Video compression systems are used in many commercial products such as digital camcorders, cellular phones and video teleconferencing systems. H.264 / MPEG4 Part 10, the recently developed international standard for video compression, offers significantly better video compression efficiency than previous international standards. However, this coding gain comes with an increase in encoding complexity and therefore in power consumption. Since portable devices operate with battery, it is important to reduce power consumption so that the battery life can be increased. In addition, consuming excessive power degrades the performance of integrated circuits, increases packaging and cooling costs, reduces the reliability and may cause device failures. Therefore, power consumption is an important design metric for video compression hardware. In this thesis, we propose low power hardware designs for Deblocking Filter (DBF), intra prediction and intra mode decision parts of an H.264 video encoder. The proposed hardware architectures are implemented in Verilog HDL and mapped to Xilinx Virtex II FPGA. We performed detailed power consumption analysis of FPGA implementations of these hardware designs using Xilinx XPower tool. We also measured the power consumptions of DBF hardware implementations on a Xilinx Virtex II FPGA and there is a good match between estimated and measured power consumption results. We then worked on decreasing the power consumption of FPGA implementations of these H.264 video compression hardware designs by reducing switching activity using Register Transfer Level (RTL) low power techniques. We applied several RTL low power techniques such as clock gating and glitch reduction to these designs and quantified their impact on the power consumption of the FPGA implementations of these designs. We proposed novel computational complexity and power reduction techniques which avoid unnecessary calculations in DBF, intra prediction and intra mode decision parts of an H.264 video encoder. We quantified the computation reductions achieved by the proposed techniques using H.264 Joint Model software encoder. We applied these techniques to proposed hardware designs and quantified their impact on the power consumption of the FPGA implementations of these designs

    Complexity adaptation in video encoders for power limited platforms

    Get PDF
    With the emergence of video services on power limited platforms, it is necessary to consider both performance-centric and constraint-centric signal processing techniques. Traditionally, video applications have a bandwidth or computational resources constraint or both. The recent H.264/AVC video compression standard offers significantly improved efficiency and flexibility compared to previous standards, which leads to less emphasis on bandwidth. However, its high computational complexity is a problem for codecs running on power limited plat- forms. Therefore, a technique that integrates both complexity and bandwidth issues in a single framework should be considered. In this thesis we investigate complexity adaptation of a video coder which focuses on managing computational complexity and provides significant complexity savings when applied to recent standards. It consists of three sub functions specially designed for reducing complexity and a framework for using these sub functions; Variable Block Size (VBS) partitioning, fast motion estimation, skip macroblock detection, and complexity adaptation framework. Firstly, the VBS partitioning algorithm based on the Walsh Hadamard Transform (WHT) is presented. The key idea is to segment regions of an image as edges or flat regions based on the fact that prediction errors are mainly affected by edges. Secondly, a fast motion estimation algorithm called Fast Walsh Boundary Search (FWBS) is presented on the VBS partitioned images. Its results outperform other commonly used fast algorithms. Thirdly, a skip macroblock detection algorithm is proposed for use prior to motion estimation by estimating the Discrete Cosine Transform (DCT) coefficients after quantisation. A new orthogonal transform called the S-transform is presented for predicting Integer DCT coefficients from Walsh Hadamard Transform coefficients. Complexity saving is achieved by deciding which macroblocks need to be processed and which can be skipped without processing. Simulation results show that the proposed algorithm achieves significant complexity savings with a negligible loss in rate-distortion performance. Finally, a complexity adaptation framework which combines all three techniques mentioned above is proposed for maximizing the perceptual quality of coded video on a complexity constrained platform

    Reuse Buffer Architecture for Reducing the Computational Complexity of Inter Prediction in HEVC encoder

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 채수익.Motion Estimation(ME) 은 integer ME(IME)와 fractional ME(FME)로 구성되어 있다. IME는 모든 개별 prediction unit(PU)에 대한 integer motion vector(IMV)를 찾고, FME는 정수 사이의 fractional pixel을 생성한 후 개별 PU에 대한 fractional motion vector(FMV)를 찾는다. IME의 초기 탐색 지점은 AMVP candidate list로부터 선택하게 되는데, 이 리스트는 HEVC 인코더의 pipeline 구조를 고려할 때 현재 PU의 available한 이웃의 MV들 중에서 선택하여 생성한다. IME에서는 TZS 알고리즘에 의해 선택된 search point들에 대해 sum of absolute difference(SAD) 를 이용한 low complexity RD cost(LRD cost)를 계산하고, 이들 중에서 최소 RD cost값을 가진 search point를 predicted motion vector로 선택한다. FME에서는 먼저 7 개나 8개의 integer pixel을 이용한 보간 필터로 fractional pixel을 생성한 후, IMV주변의 search point들 중에서 sum of absolute transformed difference(SATD)로 계산한 minimum RD cost인 것을 fractional motion vector로 선택한다. HM 인코더의 RDO 탐색 알고리즘은 모든 reference picture에 대해 coding tree unit(CTU)내의 모든 가능한 CU partition 의 모든 가능한 PU partition들에 대한 MV를 찾게 된다. 그러므로 bi-prediction까지 고려하면 CTU마다 1000개 이상의 motion vector가 존재할 수 있다. 하나의 IMV를 찾기 위해서, TZS알고리즘은 평균 100개 이상의 search point에 대해 탐색한다. FMV를 찾기 위해서, FME알고리즘은 적어도 16개 의 search point를 탐색한다. 그러므로, SAD, SATD, 그리고 interpolation은 HEVC인코더의 전체 연산 복잡도의 큰 부분을 차지하고 있다. 본 논문에서는 연산 복잡도를 줄이기 용이한 reuse buffer를 만들기 위해 SAD 연산량을 줄이고 연산 중복성은 높이는 수정된 TZS 알고리즘을 제시하였다. 더불어 SAD, SATD, 그리고 interpolation 연산 결과를 on-chip buffer에 저장한 후 ME 수행 과정에서 재사용함으로써 중복 계산된 연산을 줄이는 구조를 제안하였다. 특히 SAD와 SATD의 경우, 저장된 데이터를 효율적으로 관리하기 위해 CU depth마다 병렬적으로 처리가 가능한 cache구조의 계층적 reuse buffer 구조를 적용하였다. 제안한 reuse buffer 구조를 사용하여 병렬 수행을 고려하여 실험한 결과, 20KB의 on-chip 메모리를 사용하여 전체 SAD연산량의 약 34.4%, 20KB의 on-chip 메모리를 사용하여 전체 SATD 연산량의 18.3%, 그리고 256KB를 사용하여 전체 interpolation 연산량의 50%를 절약하는 결과를 얻었다. 이는 SAD, SATD 및 interpolation 연산량이 전체 인코더의 약 50%를 차지하는 것을 고려할 때, 제안한 reuse buffer를 사용하면 전체 인코더의 computational complexity를 약 18%정도 줄일 수 있음을 의미한다. 이 때, 수정한 TZS algorithm으로 인한 0.09%와 pipeline 구조로 인해 ME초기 조건 변화로 발생하는 0.26%를 포함하여 0.35%의 성능저하가 발생한다.제 1 장 서 론 1 1.1 연구의 배경 1 1.2 연구의 내용 2 1.3 선행 연구 6 1.4 논문의 구성 8 제 2 장 Inter Prediction Flow 10 2.1 Inter Mode Decision 10 2.2 Motion Estimation 14 2.2.1. IME 14 2.2.2. FME 20 제 3 장 가정하는 HEVC 인코더의 pipeline 구성 24 3.1 가정하는 pipeline의 구성 24 3.2 Pseudo-AMVP list 구성으로 인한 BD-rate 저하 26 제 4 장 SAD Data Reuse 알고리즘 30 4.1 SAD 데이터 재사용의 범위 30 4.2 SAD 데이터의 중복률 31 4.3 TZS algorithm modification 32 4.3.1. Star Refinement 34 4.3.2. Grid Search 35 4.3.3. Raster Search 36 4.4 SAD sub sampling 43 4.5 SAD Data Reuse Process 44 제 5 장 SATD Data Reuse 알고리즘 47 5.1 SATD 데이터의 중복률 47 5.2 SATD Data Reuse Process 48 제 6 장 Interpolation Data Reuse 알고리즘 50 제 7 장 Reuse Buffer 구조 52 7.1 Reuse Buffer Architecture for SAD/SATD 52 7.2 Reuse Buffer Architecture for interpolation 56 제 8 장 실험 결과 58 8.1 SAD 실험 결과 58 8.2 SATD 실험 결과 60 8.3 Interpolation 실험 결과 64 8.4 Data Reuse Throughput 65 8.4.1 Design Target 65 8.4.2 Throughput calculation 65 8.4.3 SAD cycle 계산 66 8.4.4 SATD cycle 계산 68 제 9 장 결론 71 참고 문헌 73 Abstract 75Maste

    Algorithms & implementation of advanced video coding standards

    Get PDF
    Advanced video coding standards have become widely deployed coding techniques used in numerous products, such as broadcast, video conference, mobile television and blu-ray disc, etc. New compression techniques are gradually included in video coding standards so that a 50% compression rate reduction is achievable every five years. However, the trend also has brought many problems, such as, dramatically increased computational complexity, co-existing multiple standards and gradually increased development time. To solve the above problems, this thesis intends to investigate efficient algorithms for the latest video coding standard, H.264/AVC. Two aspects of H.264/AVC standard are inspected in this thesis: (1) Speeding up intra4x4 prediction with parallel architecture. (2) Applying an efficient rate control algorithm based on deviation measure to intra frame. Another aim of this thesis is to work on low-complexity algorithms for MPEG-2 to H.264/AVC transcoder. Three main mapping algorithms and a computational complexity reduction algorithm are focused by this thesis: motion vector mapping, block mapping, field-frame mapping and efficient modes ranking algorithms. Finally, a new video coding framework methodology to reduce development time is examined. This thesis explores the implementation of MPEG-4 simple profile with the RVC framework. A key technique of automatically generating variable length decoder table is solved in this thesis. Moreover, another important video coding standard, DV/DVCPRO, is further modeled by RVC framework. Consequently, besides the available MPEG-4 simple profile and China audio/video standard, a new member is therefore added into the RVC framework family. A part of the research work presented in this thesis is targeted algorithms and implementation of video coding standards. In the wide topic, three main problems are investigated. The results show that the methodologies presented in this thesis are efficient and encourage
    corecore