Search CORE

1,690 research outputs found

Fast Algorithm Designs of Multiple-Mode Discrete Integer Transforms with Cost-Effective and Hardware-Sharing Architectures for Multistandard Video Coding Applications

Author: Fan Chih-Peng
Publication venue: 'IntechOpen'
Publication date: 23/11/2016
Field of study

In this chapter, first we give a brief view of transform-based video coding. Second, the basic matrix decomposition scheme for fast algorithm and hardware-sharing-based integer transform design are described. Finally, two case studies for fast algorithm and hardware-sharing-based architecture designs of discrete integer transforms are presented, where one is for the single-standard multiple-mode video transform-coding application, and the other is for the multiple-standard multiple-mode video transform-coding application

IntechOpen

Crossref

Performance analysis of Discrete Cosine Transform in Multibeamforming

Author: Gias Ziad 1983-
Publication venue: 'University of Saskatchewan Library'
Publication date: 11/02/2020
Field of study

Aperture arrays are widely used in beamforming applications where element signals are steered to a particular direction of interest and a single beam is formed. Multibeamforming is an extension of single beamforming, which is desired in the fields where sources located in multiple directions are of interest. Discrete Fourier Transform (DFT) is usually used in these scenarios to segregate the received signals based on their direction of arrivals. In case of broadband signals, DFT of the data at each sensor of an array decomposes the signal into multiple narrowband signals. However, if hardware cost and implementation complexity are of concern while maintaining the desired performance, Discrete Cosine Transform (DCT) outperforms DFT. In this work, instead of DFT, the Discrete Cosine Transform (DCT) is used to decompose the received signal into multiple beams into multiple directions. DCT offers simple and efficient hardware implementation. Also, while low frequency signals are of interest, DCT can process correlated data and perform close to the ideal Karhunen-Loeve Transform (KLT). To further improve the accuracy and reduce the implementation cost, an efficient technique using Algebraic Integer Quantization (AIQ) of the DCT is presented. Both 8-point and 16-point versions of DCT using AIQ mapping have been presented and their performance is analyzed in terms of accuracy and hardware complexity. It has been shown that the proposed AIQ DCT offers considerable savings in hardware compared to DFT and classical DCT while maintaining the same accuracy of beam steering in multibeamforming application

University of Saskatchewan Research Archive

Multi-standard reconfigurable motion estimation processor for hybrid video codecs

Author: Chen
Cheng
G. Vafiadis
Huang
J.L. Nunez-Yanez
Kao
Li
Nunez-Yanez
Srinivasan
T. Spiteri
Yu
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2011
Field of study

Crossref

Explore Bristol Research

Video coding algorithm and optimization techniques

Author: Σουφλερή Ευστρατία
Publication venue
Publication date: 01/01/2017
Field of study

University of Thessaly Institutional Repository

Variable Bit-Depth Processor for 8×8 Transform and Quantization Coding in H.264/AVC

Author: Gustavo A. Ruiz
Juan A. Michell
Publication venue: 'IntechOpen'
Publication date: 05/07/2011
Field of study

IntechOpen

A unified 4/8/16/32-point integer IDCT architecture for multiple video coding standards

Author: Sha Shen
Weiwei Shen
Xiaoyang Zeng
Yibo Fan
Publication venue
Publication date: 01/01/2012
Field of study

(4096x2048) 30fps video sequence at 191MHz working frequency, with 93K gate count and 18944-bit SRAM. We suggest a normalized criterion called design efficiency to compare with previous works. It shows that this design is 31% more efficient than previous work

CiteSeerX

HIGH-THROUGHPUT AREA-EFFICIENT INTEGER TRANSFORMS FOR VIDEO CODING

Author: DO THI THU TRANG
Publication venue
Publication date: 25/01/2013
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

VLSI architectures design for encoders of High Efficiency Video Coding (HEVC) standard

Author: Xiao Guoping
Publication venue: Politecnico di Torino
Publication date: 01/01/2016
Field of study

The growing popularity of high resolution video and the continuously increasing demands for high quality video on mobile devices are producing stronger needs for more efficient video encoder. Concerning these desires, HEVC, a newest video coding standard, has been developed by a joint team formed by ISO/IEO MPEG and ITU/T VCEG. Its design goal is to achieve a 50% compression gain over its predecessor H.264 with an equal or even higher perceptual video quality. Motion Estimation (ME) being as one of the most critical module in video coding contributes almost 50%-70% of computational complexity in the video encoder. This high consumption of the computational resources puts a limit on the performance of encoders, especially for full HD or ultra HD videos, in terms of coding speed, bit-rate and video quality. Thus the major part of this work concentrates on the computational complexity reduction and improvement of timing performance of motion estimation algorithms for HEVC standard. First, a new strategy to calculate the SAD (Sum of Absolute Difference) for motion estimation is designed based on the statistics on property of pixel data of video sequences. This statistics demonstrates the size relationship between the sum of two sets of pixels has a determined connection with the distribution of the size relationship between individual pixels from the two sets. Taking the advantage of this observation, only a small proportion of pixels is necessary to be involved in the SAD calculation. Simulations show that the amount of computations required in the full search algorithm is reduced by about 58% on average and up to 70% in the best case. Secondly, from the scope of parallelization an enhanced TZ search for HEVC is proposed using novel schemes of multiple MVPs (motion vector predictor) and shared MVP. Specifically, resorting to multiple MVPs the initial search process is performed in parallel at multiple search centers, and the ME processing engine for PUs within one CU are parallelized based on the MVP sharing scheme on CU (coding unit) level. Moreover, the SAD module for ME engine is also parallelly implemented for PU size of 32×32. Experiments indicate it achieves an appreciable improvement on the throughput and coding efficiency of the HEVC video encoder. In addition, the other part of this thesis is contributed to the VLSI architecture design for finding the first W maximum/minimum values targeting towards high speed and low hardware cost. The architecture based on the novel bit-wise AND scheme has only half of the area of the best reference solution and its critical path delay is comparable with other implementations. While the FPCG (full parallel comparison grid) architecture, which utilizes the optimized comparator-based structure, achieves 3.6 times faster on average on the speed and even 5.2 times faster at best comparing with the reference architectures. Finally the architecture using the partial sorting strategy reaches a good balance on the timing performance and area, which has a slightly lower or comparable speed with FPCG architecture and a acceptable hardware cost

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Algoritmo de estimação de movimento e sua arquitetura de hardware para HEVC

Author: Nalluri Purnachand
Publication venue: Universidade de Aveiro
Publication date: 01/01/2016
Field of study

Doutoramento em Engenharia EletrotécnicaVideo coding has been used in applications like video surveillance, video conferencing, video streaming, video broadcasting and video storage. In a typical video coding standard, many algorithms are combined to compress a video. However, one of those algorithms, the motion estimation is the most complex task. Hence, it is necessary to implement this task in real time by using appropriate VLSI architectures. This thesis proposes a new fast motion estimation algorithm and its implementation in real time. The results show that the proposed algorithm and its motion estimation hardware architecture out performs the state of the art. The proposed architecture operates at a maximum operating frequency of 241.6 MHz and is able to process 1080p@60Hz with all possible variables block sizes specified in HEVC standard as well as with motion vector search range of up to ±64 pixels.A codificação de vídeo tem sido usada em aplicações tais como, vídeovigilância, vídeo-conferência, video streaming e armazenamento de vídeo. Numa norma de codificação de vídeo, diversos algoritmos são combinados para comprimir o vídeo. Contudo, um desses algoritmos, a estimação de movimento é a tarefa mais complexa. Por isso, é necessário implementar esta tarefa em tempo real usando arquiteturas de hardware apropriadas. Esta tese propõe um algoritmo de estimação de movimento rápido bem como a sua implementação em tempo real. Os resultados mostram que o algoritmo e a arquitetura de hardware propostos têm melhor desempenho que os existentes. A arquitetura proposta opera a uma frequência máxima de 241.6 MHz e é capaz de processar imagens de resolução 1080p@60Hz, com todos os tamanhos de blocos especificados na norma HEVC, bem como um domínio de pesquisa de vetores de movimento até ±64 pixels

Repositório Institucional da Universidade de Aveiro

Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

Author: Pourabed Mohammad Ali
Publication venue
Publication date: 08/05/2019
Field of study

Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76

\mu

J for the 4 points system (IDCT and IDST) and 3.06

\mu

J for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA

Trepo - Institutional Repository of Tampere University