379 research outputs found

    Fast Algorithm Designs of Multiple-Mode Discrete Integer Transforms with Cost-Effective and Hardware-Sharing Architectures for Multistandard Video Coding Applications

    Get PDF
    In this chapter, first we give a brief view of transform-based video coding. Second, the basic matrix decomposition scheme for fast algorithm and hardware-sharing-based integer transform design are described. Finally, two case studies for fast algorithm and hardware-sharing-based architecture designs of discrete integer transforms are presented, where one is for the single-standard multiple-mode video transform-coding application, and the other is for the multiple-standard multiple-mode video transform-coding application

    Area and power efficient DCT architecture for image compression

    Get PDF

    Performance analysis of Discrete Cosine Transform in Multibeamforming

    Get PDF
    Aperture arrays are widely used in beamforming applications where element signals are steered to a particular direction of interest and a single beam is formed. Multibeamforming is an extension of single beamforming, which is desired in the fields where sources located in multiple directions are of interest. Discrete Fourier Transform (DFT) is usually used in these scenarios to segregate the received signals based on their direction of arrivals. In case of broadband signals, DFT of the data at each sensor of an array decomposes the signal into multiple narrowband signals. However, if hardware cost and implementation complexity are of concern while maintaining the desired performance, Discrete Cosine Transform (DCT) outperforms DFT. In this work, instead of DFT, the Discrete Cosine Transform (DCT) is used to decompose the received signal into multiple beams into multiple directions. DCT offers simple and efficient hardware implementation. Also, while low frequency signals are of interest, DCT can process correlated data and perform close to the ideal Karhunen-Loeve Transform (KLT). To further improve the accuracy and reduce the implementation cost, an efficient technique using Algebraic Integer Quantization (AIQ) of the DCT is presented. Both 8-point and 16-point versions of DCT using AIQ mapping have been presented and their performance is analyzed in terms of accuracy and hardware complexity. It has been shown that the proposed AIQ DCT offers considerable savings in hardware compared to DFT and classical DCT while maintaining the same accuracy of beam steering in multibeamforming application

    A Cost Shared Quantization Algorithm and its Implementation for Multi-Standard Video CODECS

    Get PDF
    The current trend of digital convergence creates the need for the video encoder and decoder system, known as codec in short, that should support multiple video standards on a single platform. In a modern video codec, quantization is a key unit used for video compression. In this thesis, a generalized quantization algorithm and hardware implementation is presented to compute quantized coefficient for six different video codecs including the new developing codec High Efficiency Video Coding (HEVC). HEVC, successor to H.264/MPEG-4 AVC, aims to substantially improve coding efficiency compared to AVC High Profile. The thesis presents a high performance circuit shared architecture that can perform the quantization operation for HEVC, H.264/AVC, AVS, VC-1, MPEG- 2/4 and Motion JPEG (MJPEG). Since HEVC is still in drafting stage, the architecture was designed in such a way that any final changes can be accommodated into the design. The proposed quantizer architecture is completely division free as the division operation is replaced by multiplication, shift and addition operations. The design was implemented on FPGA and later synthesized in CMOS 0.18 μm technology. The results show that the proposed design satisfies the requirement of all codecs with a maximum decoding capability of 60 fps at 187.3 MHz for Xilinx Virtex4 LX60 FPGA of a 1080p HD video. The scheme is also suitable for low-cost implementation in modern multi-codec systems

    HEVC 2D-DCT architectures comparison for FPGA and ASIC implementations

    Get PDF
    This paper compares ASIC and FPGA implementations of two commonly used architectures for 2-dimensional discrete cosine transform (DCT), the parallel and folded architectures. The DCT has been designed for sizes 4x4, 8x8, and 16x16, and implemented on Silterra 180nm ASIC and Xilinx Kintex Ultrascale FPGA. The objective is to determine suitable low energy architectures to be used as their characteristics greatly differ in terms of cells usage, placement and routing methods on these platforms. The parallel and folded DCT architectures for all three sizes have been designed using Verilog HDL, including the basic serializer-deserializer input and output. Results show that for large size transform of 16x16, ASIC parallel architecture results in roughly 30% less energy compared to folded architecture. As for FPGAs, folded architecture results in roughly 34% less energy compared to parallel architecture. In terms of overall energy consumption between 180nm ASIC and Xilinx Ultrascale, ASIC implementation results in about 58% less energy compared to the FPGA

    An Efficient Architecture of Forward Transforms and Quantization for H.264/AVC Codecs

    Get PDF
    Thanks to many novel coding tools, H.264/AVC has become the most efficient video compression standard providing much better performance than previous standards. However, this standard comes with an extraordinary computational complexity and a huge memory access requirement, which make the hardware architecture design much more difficult and costly, especially for realtime applications. In the framework of H.264 codec hardware architecture project, this paper presents an efficient architecture of Forward Transform and Quantization (FTQ) for H.264/AVC codecs in mobile applications. To reduce the hardware implementation overhead, the proposed design uses only one unified architecture of 1-D transform engine to perform all required transform processes, including discrete cosine transform and Walsh Hadamard transform. This design also enables to share the common parts among multipliers that have the same multiplicands. The performance of the design is taken into consideration and improved by using a fast architecture of the multiplier in the quantizer, the most critical component in the design. Experimental results show that our architecture can completely finish transform and quantization processes for a 4:2:0 macroblock in 228 clock cycles and the achieved throughput is 445Msamples/s at 250MHz operating frequency while the area overhead is very small, 147755μm2 (approximate 15KGates), with the 130nm TSMC CMOS technology
    corecore