A low power discrete cosine transform (DCT) architecture is presented. It is implemented through Loffler DCT based on the coordinate rotation digital computer (Cordic) algorithm and the canonical sign-digital (CSD) multiplier-less architecture. The synthesis results show that the proposed 8-point one-dimensional DCT architecture consumes lower power and smaller area.
Introduction
The DCT is a loss-less and reversible mathematical transformation that converts a spatial amplitude representation of data into a spatial frequency representation. One of the advantages of the DCT is its energy compaction property, that is, the signal energy is concentrated on a few components while most other components are zero or are negligibly small.
The DCT was first introduced in 1974 and since then it has been used in many applications such as filtering, speech coding, image coding (still frame, video and image storage), pattern recognition, image enhancement. The DCT is widely used in image compression applications, especially in lossy image compression. For example, the 2-D DCT is used for JPEG still image compression, MPEG moving image compression, and the H.261 and H.263 video-telephony coding schemes [1] .
Loeffler based DCT algorithm
DCT is highly computational intensive, which creates prerequisites for performance bottlenecks in systems utilizing it. To overcome this problem, a number of algorithms have been proposed for more efficient computations of these transforms.
In my design I use an 8-point 1-D DCT/IDCT algorithm, proposed by Christoph Loeffler [2] .This algorithm is a huge modification of the original DCT transform algorithm, which provides one of the most computationally efficient 1-D DCT /IDCT calculations. The Loeffler algorithm for calculating 8-point 1-D DCT is illustrated in Fig.1 . The implementation of the rotator depicted in Fig.4 utilizes four multipliers and two adders to shorten critical path and improve numerical accuracy. This direct implementation has been proven to be ideal for fixed point arithmetic calculation. Indeed, some other implementations of the rotator are possible, such as implementing with three multipliers and three adders. These alternative designs, however, have longer critical paths and involve initial additions, which may lead to overflows and may affect the accuracy of the calculations. Firstly, for the angle θ=3π/8, Sun reduce the number of rotation iterations to three and also shift all compensation steps to the final stage. Although the optimized 3π/8 rotation will decrease the quality of the results, the influences are not noticeable in video sequence streams or image compression. He have implemented a three-stage unfolded Cordic for the angle θ=3π/8 as shown in Fig.5 . As illustrated, it needs six add and six shift operations to approximate the 3π/8 rotation. Power reduction can be achieved on the fixed multiplicand by not using 2's complement representation, but using canonical sign-digit (CSD) representation. By definition, the canonical sign-digit representation is a redundant number system that represents numbers with no adjacent non-zero digits. Every number has a unique CSD representation. It represents numbers with fewer or equal non-zero digits as the algebraic sum/subtraction of several power-of-two.
A procedure to transform a conventional binary number to CSD representation is described as below. As an example, the CSD representations of the constant operands used in DCT calculation with 12-bit precision after binary point, is shown in Eq.1
Original Format: 1010 01110 0111 CSD Format: 1010 100-10 100-1
The CSD representation can reduce the number of non-zero compare with traditional representation. Because 'canonical' means no adjacent nonzero digits, the n-bit number can be represented with less non-zero digits, which in turn reduces the carry-save adder stages compared to a general purpose array multiplier. Since fewer non-zero bits imply less computation, less switching activity and less power consumption, the CSD multiplier-less architecture is a good choice for low-power design.
I have found if I replace the 3π/8 by using the CSD multiplier-less architecture and follow the Loeffler DCT transform architecture, it can lead more lower power and smaller area than the C.-C Sun proposed [4] . This proposed Cordic and CSD based Loeffler DCT architecture is shown in Fig.7 
Synthesis Results
After Synthesis using Chartered 0.18µm CMOS Design Library by DesignComplier under 100Mhz, Table 1 shows the comparison result between the Cordic and CSD based Loeffler DCT with the Loeffer based DCT proposed by the C.-C. Sun [4] . The power is evaluated through PrimeTime PX. 
Summary
Discrete Cosine Transform (DCT) are most widely used image compression techniques. It is difficult to make a real-time implementation of it by software method because it takes too many CPU cycles. For the hardware, it still required much more power. In this paper I propose a low-power Cordic and CSD based Loeffler DCT architecture to achieve lower power and smaller area.
