18,436 research outputs found

    Fast algorithm for H.264/AVC intra prediction based on discrete wavelet transform

    Get PDF
    H.264 or MPEG-4 AVC (Advanced Video Coding) is the new world-wide accepted international Standard for video coding, approved by ITU-T and ISO. New Tools have been added to improve the coding efficiency allowing a save up above of 50%, when is compared with previous standards (H.263, MPEG-2 y MPEG-4). From April 2007 there is a new set of profiles known as “all-Intra”. They were born as a sub-set from the “High” profile and have reached a high impact within broadcast industry where the highest quality video formats are demanded. The high efficiency of “all-Intra” mode in H.264 is due to Rate Distortion Optimization (RDO) technique. RDO chooses for each macroblock (MB) the best partition mode and directional prediction. However, the computational burden becomes extremely high due to huge number of prediction-prediction modes that should be evaluated. This article shows a new algorithm for fast partition mode algorithm based on Discrete Wavelet Transform (DWT). It allows reducing the number of candidate modes to those which are strictly defined for each partition. By using the local 2D-DWT over each MB, information of the homogeneity is obtained. It is got from multiresolution analysis of the transformed coefficients in each sub-band. This way it is easier to classify quickly the optimum partition mode avoiding the exhaustive seek made by RDO

    Algorithms and Hardware Co-Design of HEVC Intra Encoders

    Get PDF
    Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

    Register-transfer level design of sum of absolute transformed difference for high efficiency video coding

    Get PDF
    High Efficiency Video Coding (HEVC) is the state-of-the-art video coding standard which offers 50% improvement in coding efficiency over its predecessor Advanced Video Coding (AVC). Compared to AVC, HEVC supports up to 33 angular modes, DC mode and planar mode. The significant rise in the number of intra prediction mode however has increased the computational complexity. Sum of Absolute Transformed Difference (SATD), a fast Rate Distortion Optimization (RDO) intra prediction algorithm in the HEVC standard, is one of the most complex and compute-intensive part of the encoding process. SATD alone can takes up to 40% of the total encoding time; hence off-loading it to dedicated hardware accelerators is necessary to address the increasing need for real-time video coding in accordance with the push for coding efficiency. This work proposes a Verilog-described N × N SATD hardware architecture which is based on Hadamard Transform. The architecture would support a variable block size from 4 × 4 to 32 × 32 with 1-D horizontal and 1-D vertical Hadamard Transform. At the same time, it is designed to achieve throughput optimization by pipelining and feedthrough control. The performance of the implemented SATD is then evaluated in terms of utilization, timing and power

    Efficient Coding Tree Unit (CTU) Decision Method for Scalable High-Efficiency Video Coding (SHVC) Encoder

    Get PDF
    High-efficiency video coding (HEVC or H.265) is the latest video compression standard developed by the joint collaborative team on video coding (JCT-VC), finalized in 2013. HEVC can achieve an average bit rate decrease of 50% in comparison with H.264/AVC while still maintaining video quality. To upgrade the HEVC used in heterogeneous access networks, the JVT-VC has been approved scalable extension of HEVC (SHVC) in July 2014. The SHVC can achieve the highest coding efficiency but requires a very high computational complexity such that its real-time application is limited. To reduce the encoding complexity of SHVC, in this chapter, we employ the temporal-spatial and inter-layer correlations between base layer (BL) and enhancement layer (EL) to predict the best quadtree of coding tree unit (CTU) for quality SHVC. Due to exist a high correlation between layers, we utilize the coded information from the CTU quadtree in BL, including inter-layer intra/residual prediction and inter-layer motion parameter prediction, to predict the CTU quadtree in EL. Therefore, we develop an efficient CTU decision method by combing temporal-spatial searching order algorithm (TSSOA) in BL and a fast inter-layer searching algorithm (FILSA) in EL to speed up the encoding process of SHVC. The simulation results show that the proposed efficient CTU decision method can achieve an average time improving ratio (TIR) about 52–78% and 47–69% for low delay (LD) and random access (RA) configurations, respectively. It is clear that the proposed method can efficiently reduce the computational complexity of SHVC encoder with negligible loss of coding efficiency with various types of video sequences

    HEVC-videokoodekin intra-ennustuksen toteutus FPGA-piireille C-kielestä syntesoimalla

    Get PDF
    High Efficiency Video Coding (HEVC) is the latest video coding standard in video compression. With HEVC, it is possible to compress the video with half the bitrate compared to the previous video coding standard, Advanced Video Coding (AVC), with the same video quality. Now even, the complexity of the encoder is significantly larger. As designs become more and more complex, traditional hardware (HW) description languages (HDLs), such as Very High Speed Integrated Circuit Hardware Description Language (VHDL) or Verilog, can not be used to present the designs without increasing effort. The solution for this is a higher abstraction language for describing HW. High-Level Synthesis (HLS) is a way of using a programming language like C or C++ to describe the HW and automatically generating the HDL from it. This makes the code easier to understand and decreases the time used for implementing the design. This Thesis uses Catapult-C to create an HLS-based implementation of HEVC intra prediction for a Field Programmable Gate Array (FPGA). The HEVC encoder used in this Thesis is open source Kvazaar which has been developed at Tampere University of Technology. The objective is to implement an intra prediction accelerator faster than implementing it with register-transfer level (RTL) using VHDL or Verilog and still get comparable area and performance. This Thesis presents six development versions of the intra prediction accelerator. The complexity of the accelerator grows gradually, as more features were added to it. The final version is able to perform the intra prediction, mode cost computation and mode decision for Full HD video at 24.5 fps using 11 662 adaptive logic modules (ALMs) on an Altera Cyclone V FPGA. This Thesis presents the benefits of Catapult-C and HLS. The implementation results were comparable to hand coded RTL but achieved with a fraction of the estimated time for a VHDL implementation. As a rough estimate, if something takes a month to implement in VHDL, it takes a week with HLS. The biggest gain with HLS is the fast process of changes. Only the C implementation needs to change. The testbench and the RTL-code are generated automatically

    Kvazaar HEVC videokooderin pakkaustehokkuuden ja suorituskyvyn optimointi

    Get PDF
    Growing video resolutions have led to an increasing volume of Internet video traffic, which has created a need for more efficient video compression. New video coding standards, such as High Efficiency Video Coding (HEVC), enable a higher level of compression, but the complexity of the corresponding encoder implementations is also higher. Therefore, encoders that are efficient in terms of both compression and complexity are required. In this work, we implement four optimizations to Kvazaar HEVC encoder: 1) uniform inter and intra cost comparison; 2) concurrency-oriented SAO implementation; 3) resolution-adaptive thread allocation; and 4) fast cost estimation of coding coefficients. Optimization 1 changes the selection criterion of the prediction mode in fast configurations, which greatly improves the coding efficiency. Optimization 2 replaces the implementation of one of the in-loop filters with one that better supports concurrent processing. This allows removing some dependencies between encoding tasks, which provides more opportunities for parallel processing to increase coding speed. Optimization 3 reduces the overhead of thread management by spawning fewer threads when there is not enough work for all available threads. Optimization 4 speeds up the computation of residual coefficient coding costs by switching to a faster but less accurate estimation. The impact of the optimizations is measured with two coding configurations of Kvazaar: the ultrafast preset, which aims for the fastest coding speed, and the veryslow preset, which aims for the best coding efficiency. Together, the introduced optimizations give a 2.8× speedup in the ultrafast configuration and a 3.4× speedup in the veryslow configuration. The trade-off for the speedup with the veryslow preset is a 0.15 % bit rate increase. However, with the ultrafast preset, the optimizations also improve coding efficiency by 14.39 %

    Complexity Analysis Of Next-Generation VVC Encoding and Decoding

    Full text link
    While the next generation video compression standard, Versatile Video Coding (VVC), provides a superior compression efficiency, its computational complexity dramatically increases. This paper thoroughly analyzes this complexity for both encoder and decoder of VVC Test Model 6, by quantifying the complexity break-down for each coding tool and measuring the complexity and memory requirements for VVC encoding/decoding. These extensive analyses are performed for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD), Random-Access (RA), and All-Intra (AI) conditions (a total of 320 encoding/decoding). Results indicate that the VVC encoder and decoder are 5x and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI, respectively. Detailed analysis of coding tools reveals that in LD on average, motion estimation tools with 53%, transformation and quantization with 22%, and entropy coding with 7% dominate the encoding complexity. In decoding, loop filters with 30%, motion compensation with 20%, and entropy decoding with 16%, are the most complex modules. Moreover, the required memory bandwidth for VVC encoding/decoding are measured through memory profiling, which are 30x and 3x of HEVC. The reported results and insights are a guide for future research and implementations of energy-efficient VVC encoder/decoder.Comment: IEEE ICIP 202

    Fast intra prediction in the transform domain

    Get PDF
    In this paper, we present a fast intra prediction method based on separating the transformed coefficients. The prediction block can be obtained from the transformed and quantized neighboring block generating minimum distortion for each DC and AC coefficients independently. Two prediction methods are proposed, one is full block search prediction (FBSP) and the other is edge based distance prediction (EBDP), that find the best matched transformed coefficients on additional neighboring blocks. Experimental results show that the use of transform coefficients greatly enhances the efficiency of intra prediction whilst keeping complexity low compared to H.264/AVC
    corecore