5 research outputs found

    Hardware-aware motion estimation search algorithm development for high-efficiency video coding (HEVC) standard

    Get PDF
    This work presents a hardware-aware search algorithm for HEVC motion estimation. Implications of several decisions in search algorithm are considered with respect to their hardware implementation costs (in terms of area and bandwidth). Proposed algorithm provides 3X logic area in integer motion estimation, 16% on-chip reference buffer area and 47X maximum off-chip bandwidth savings when compared to HM-3.0 fast search algorithm.Texas Instruments Incorporate

    An SRAM using output prediction to reduce BL-switching activity and statistically-gated SA for up to 1.9× reduction in energy/access

    Get PDF
    Mobile applications such as tablets pack increasingly more processing capability comparable to workstations or laptops but can do little for cooling or extending the battery life in their form factors. SRAMs account for a large fraction of chip area and are critical in this context. Recent work has focused on voltage scaling in SRAMs, which is an effective way of achieving energy efficiency [1,2]. These conventional SRAMs are mostly general-purpose in the sense that they are designed without considering the specific features of the data they will store. However, application-specific features such as statistics of storage data can be exploited and incorporated into the transistor-level design to provide a new dimension towards achieving the next level of energy savings in addition to the savings provided through voltage scaling. The work in [3] is an example where an inversion bit is added for each word to reduce read-bitline (RBL) transitions in an 8T-cell-based design with a single-ended read port. Similarly, the work in [4] stores only the LSBs of each word in 6T SRAMs where occasional bit-errors at low voltages are tolerable for its application. In this work, we focus on video; however, the ideas can be generalized to different applications. In video encoders, pixel processing is performed over large partitions of image frames (e.g., 192×192 pixels), which are stored in on-chip SRAMs and accessed frequently. Image frames generally consist of smooth backgrounds or large objects where the intensity of pixels is spatially correlated. For the video image frame in Fig. 18.2.1, the deviation of each pixel's intensity from its block average for a 16×16 block shows that 76% of pixels lie within 3 LSB of the average. This additional information can be used to design an SRAM where correlation of data is used to reduce bitline activity factor which, for an 8T SRAM in a 65nm low-power CMOS process, accounts for ~50% of total energy consumption during read a- cesses at 0.6V. In this work, we present a prediction-based reduced-bitline-switching-activity (PB-RBSA) scheme along with a hierarchical sensing network with statistical sense-amplifier gating to exploit the correlation of storage data. Reduction of switching activity on the bitlines and in the sensing network of the memory provide up to 1.9× reduction in energy/access.Texas Instruments Incorporate

    Cost and Coding Efficient Motion Estimation Design Considerations for High Efficiency Video Coding (HEVC) Standard

    Get PDF
    This paper focuses on motion estimation engine design in future high-efficiency video coding (HEVC) encoders. First, a methodology is explained to analyze hardware implementation cost in terms of hardware area, memory size and memory bandwidth for various possible motion estimation engine designs. For 11 different configurations, hardware cost as well as the coding efficiency are quantified and are compared through a graphical analysis to make design decisions. It has been shown that using smaller block sizes (e.g. 4 × 4) imposes significantly larger hardware requirements at the expense of modest improvements in coding efficiency. Secondly, based on the analysis on various configurations, one configuration is chosen and algorithm improvements are presented to further reduce hardware implementation cost of the selected configuration. Overall, the proposed changes provide 56 × on-chip bandwidth, 151 × off-chip bandwidth, 4.3 × core area and 4.5 × on-chip memory area savings when compared to the hardware implementation of the HM-3.0 design.Texas Instruments Incorporate

    VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

    Get PDF
    The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost

    Algoritmo de estimação de movimento e sua arquitetura de hardware para HEVC

    Get PDF
    Doutoramento em Engenharia EletrotécnicaVideo coding has been used in applications like video surveillance, video conferencing, video streaming, video broadcasting and video storage. In a typical video coding standard, many algorithms are combined to compress a video. However, one of those algorithms, the motion estimation is the most complex task. Hence, it is necessary to implement this task in real time by using appropriate VLSI architectures. This thesis proposes a new fast motion estimation algorithm and its implementation in real time. The results show that the proposed algorithm and its motion estimation hardware architecture out performs the state of the art. The proposed architecture operates at a maximum operating frequency of 241.6 MHz and is able to process 1080p@60Hz with all possible variables block sizes specified in HEVC standard as well as with motion vector search range of up to ±64 pixels.A codificação de vídeo tem sido usada em aplicações tais como, vídeovigilância, vídeo-conferência, video streaming e armazenamento de vídeo. Numa norma de codificação de vídeo, diversos algoritmos são combinados para comprimir o vídeo. Contudo, um desses algoritmos, a estimação de movimento é a tarefa mais complexa. Por isso, é necessário implementar esta tarefa em tempo real usando arquiteturas de hardware apropriadas. Esta tese propõe um algoritmo de estimação de movimento rápido bem como a sua implementação em tempo real. Os resultados mostram que o algoritmo e a arquitetura de hardware propostos têm melhor desempenho que os existentes. A arquitetura proposta opera a uma frequência máxima de 241.6 MHz e é capaz de processar imagens de resolução 1080p@60Hz, com todos os tamanhos de blocos especificados na norma HEVC, bem como um domínio de pesquisa de vetores de movimento até ±64 pixels
    corecore