181 research outputs found

    Real-time complexity constrained encoding

    Get PDF
    Complex software appliances can be deployed on hardware with limited available computational resources. This computational boundary puts an additional constraint on software applications. This can be an issue for real-time applications with a fixed time constraint such as low delay video encoding. In the context of High Efficiency Video Coding (HEVC), a limited number of publications have focused on controlling the complexity of an HEVC video encoder. In this paper, a technique is proposed to control complexity by deciding between 2Nx2N merge mode and full encoding, at different Coding Unit (CU) depths. The technique is demonstrated in two encoders. The results demonstrate fast convergence to a given complexity threshold, and a limited loss in rate-distortion performance (on average 2.84% Bjontegaard delta rate for 40% complexity reduction)

    High-Level Synthesis Based VLSI Architectures for Video Coding

    Get PDF
    High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard. Emerging applications like free-viewpoint video, 360degree video, augmented reality, 3D movies etc. require standardized extensions of HEVC. The standardized extensions of HEVC include HEVC Scalable Video Coding (SHVC), HEVC Multiview Video Coding (MV-HEVC), MV-HEVC+ Depth (3D-HEVC) and HEVC Screen Content Coding. 3D-HEVC is used for applications like view synthesis generation, free-viewpoint video. Coding and transmission of depth maps in 3D-HEVC is used for the virtual view synthesis by the algorithms like Depth Image Based Rendering (DIBR). As first step, we performed the profiling of the 3D-HEVC standard. Computational intensive parts of the standard are identified for the efficient hardware implementation. One of the computational intensive part of the 3D-HEVC, HEVC and H.264/AVC is the Interpolation Filtering used for Fractional Motion Estimation (FME). The hardware implementation of the interpolation filtering is carried out using High-Level Synthesis (HLS) tools. Xilinx Vivado Design Suite is used for the HLS implementation of the interpolation filters of HEVC and H.264/AVC. The complexity of the digital systems is greatly increased. High-Level Synthesis is the methodology which offers great benefits such as late architectural or functional changes without time consuming in rewriting of RTL-code, algorithms can be tested and evaluated early in the design cycle and development of accurate models against which the final hardware can be verified

    Syntax Element Partitioning for high-throughput HEVC CABAC decoding

    Get PDF
    Encoder and decoder implementations of the High Efficiency Video Coding (HEVC) standard have been subject to many optimization approaches since the release in 2013. However, the real-time decoding of high quality and ultra high resolution videos is still a very challenging task. Especially entropy decoding (CABAC) is most often the throughput bottleneck for very high bitrates. Syntax Element Partitioning (SEP) has been proposed for the H.264/AVC video compression standard to address this issue and the limitations of other parallelization techniques. Unfortunately, it has not been adopted in the latest video coding standard, although it allows to multiply the throughput in CABAC decoding. We propose an improved SEP scheme for HEVC CABAC decoding with eight syntax element partitions. Experimental results show throughput improvements up to 5.4× with negligible bitstream overhead, making SEP a useful technique to address the entropy decoding bottleneck in future video compression standards

    A Motion Estimation based Algorithm for Encoding Time Reduction in HEVC

    Get PDF
    High Efficiency Video Coding (HEVC) is a video compression standard that offers 50% more efficiency at the expense of high encoding time contrasted with the H.264 Advanced Video Coding (AVC) standard. The encoding time must be reduced to satisfy the needs of real-time applications. This paper has proposed the Multi- Level Resolution Vertical Subsampling (MLRVS) algorithm to reduce the encoding time. The vertical subsampling minimizes the number of Sum of Absolute Difference (SAD) computations during the motion estimation process. The complexity reduction algorithm is also used for fast coding the coefficients of the quantised block using a flag decision. Two distinct search patterns are suggested: New Cross Diamond Diamond (NCDD) and New Cross Diamond Hexagonal (NCDH) search patterns, which reduce the time needed to locate the motion vectors. In this paper, the MLRVS algorithm with NCDD and MLRVS algorithm with NCDH search patterns are simulated separately and analyzed. The results show that the encoding time of the encoder is decreased by 55% with MLRVS algorithm using NCDD search pattern and 56% with MLRVS using NCDH search pattern compared to HM16.5 with Test Zone (TZ) search algorithm. These results are achieved with a slight increase in bit rate and negligible deterioration in output video quality

    Optimizing HEVC CABAC decoding with a context model cache and application-specific prefetching

    Get PDF
    Context-based Adaptive Binary Arithmetic Coding is the entropy coding module in the most recent JCT-VC video coding standard HEVC/H.265. As in the predecessor H.264/AVC, CABAC is a well-known throughput bottleneck due to its strong data dependencies. Beside other optimizations, the replacement of the context model memory by a smaller cache has been proposed, resulting in an improved clock frequency. However, the effect of potential cache misses has not been properly evaluated. Our work fills this gap and performs an extensive evaluation of different cache configurations. Furthermore, it is demonstrated that application-specific context model prefetching can effectively reduce the miss rate and make it negligible. Best overall performance results were achieved with caches of two and four lines, where each cache line consists of four context models. Four cache lines allow a speed-up of 10% to 12% for all video configurations while two cache lines improve the throughput by 9% to 15% for high bitrate videos and by 1% to 4% for low bitrate videos.EC/H2020/645500/EU/Improving European VoD Creative Industry with High Efficiency Video Delivery/Film26

    Challenges and solutions in H.265/HEVC for integrating consumer electronics in professional video systems

    Get PDF

    Cost and Coding Efficient Motion Estimation Design Considerations for High Efficiency Video Coding (HEVC) Standard

    Get PDF
    This paper focuses on motion estimation engine design in future high-efficiency video coding (HEVC) encoders. First, a methodology is explained to analyze hardware implementation cost in terms of hardware area, memory size and memory bandwidth for various possible motion estimation engine designs. For 11 different configurations, hardware cost as well as the coding efficiency are quantified and are compared through a graphical analysis to make design decisions. It has been shown that using smaller block sizes (e.g. 4 × 4) imposes significantly larger hardware requirements at the expense of modest improvements in coding efficiency. Secondly, based on the analysis on various configurations, one configuration is chosen and algorithm improvements are presented to further reduce hardware implementation cost of the selected configuration. Overall, the proposed changes provide 56 × on-chip bandwidth, 151 × off-chip bandwidth, 4.3 × core area and 4.5 × on-chip memory area savings when compared to the hardware implementation of the HM-3.0 design.Texas Instruments Incorporate

    VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

    Get PDF
    The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost
    corecore