thesis

Design of 2D discrete cosine transform using CORDIC architectures in VHDL

Abstract

The Discrete Cosine Transform is one of the most widely transform techniques in digital signal processing. In addition, this is also most computationally intensive transforms which require many multiplications and additions. Real time data processing necessitates the use of special purpose hardware which involves hardware efficiency as well as high throughput. Many DCT algorithms were proposed in order to achieve high speed DCT. Those architectures which involves multipliers, for example Chen’s algorithm has less regular architecture due to complex routing and requires large silicon area. On the other hand, the DCT architecture based on distributed arithmetic (DA) which is also a multiplier less architecture has the inherent disadvantage of less throughputs because of the ROM access time and the need of accumulator. Also this DA algorithm requires large silicon area if it requires large ROM size. Systolic array architecture for the real-time DCT computation may have the large number of gates and clock skew problem. The other ways of implementation of DCT which involves in multiplierless, thus power efficient and which results in regular architecture and less complicated routing, consequently less area, simultaneously lead to high throughput. So for that purpose CORDIC seems to be a best solution. CORDIC offers a unified iterative formulation to efficiently evaluate the rotation operation. This thesis presents the implementation of 2D Discrete Cosine Transform (DCT) using the Angle Recoded (AR) Cordic algorithm, the new scaling less CORDIC algorithm and the conventional Chen’s algorithm which is multiplier dependant algorithm. The 2D DCT is implemented by exploiting the Separability property of 2D Discrete Cosine Transform. Here first one dimensional DCT is designed first and later a transpose buffer which consists of 64 memory elements, fully pipelined is designed. Later all these blocks are joined with the help of a controller element which is a mealy type FSM which produces some status signals also. The three resulting architectures are all well synthesized in Xilinx 9.1ise, simulated in Modelsim 5.6f and the power is calculated in Xilinx Xpower. Results prove that AR Cordic algorithm is better than Chen’s algorithm, even the new scaling less CORDIC algorithm

    Similar works