Abstract
Introduction
With the rapid growth of multimedia service running on portable applications, demands low power and high quality implementation of complex signal processing algorithm. The applications of multimedia systems involve image and video processing and it should be implemented with low cost because of limited battery lifetime. Many papers have been published on reducing power dissipation of image and video applications. Especially low power design of discrete cosine transform. DCT is a computation intensive operation in image and video compression. It is used in image and video compression standards such as JPEG [2] , MPEG, H.263 [3] and H.264. For direct implementation of DCT require large number of multiple and adder in order to reduce hardware complexity. Many previous works forced about DA based DCT & multiple constant multiplications [4] . To reduce the power consumption Distributed Arithmetic (DA) is used without multiplier [5] . DCT implementation using Distributed arithmetic [DA] include several advantage such as area saving and high speed performance operations. High speed can be achieved by using conventional DA implementation by pre-computing possible values and stored it in ROM. But ROM based DA has the disadvantage of redundancy which is introduced to accommodate all possible combinations of bit pattern of input signals. A regular and simple DCT architecture can be obtained by using bit serial DA based approach. MCM based DCT can be implemented with a smaller number of shifts and add operations. Since first proposed in 1959 [7] , coordinate rotation digital computer algorithm is well known and widely used iterative technique. It is used for evaluating any basic arithmetic operations, trigonometric, hyperbolic, singular value decomposition [8] and so on. Conventional CORDIC can be operations of addition and shifts; it has been used for multiplier less power DCT architecture. In order to skip the internal
Related Works
Many of the multimedia applications such as video conferencing, internet, video streaming and video over wireless are most bandwidth consuming modes of communications. In order to reduce power and area low power architecture was presented in [9] . In this CSD representation for fixed point DCT coefficient and multiplier less multiplication was implemented. Although to concentrate on image quality a CORDIC based DCT [10] was presented. An algorithmic approach for low power design [11] was presented and based on trade off between image quality and power consumption, In order to concentrate on low power and high speed [12] was presented. To reduce power consumption based on sealable multiplier [13] was presented, this method used sealable multiplier and dynamically reconfiguring the width of the multiplier leads to significant power savings.
Cordic based DCT Architecture

A. Cordic Architecture:
The CORDIC algorithm is well known and widely studied iterative technique [15] for evaluating many arithmetic operations and trigonometric functions CORDIC algorithm can be used to evaluate not only the trigonometric functions but also transcendental functions and elementary functions like square root, division, etc. The basic principal of CORDIC is to iteratively rotate a vector using a rotation matrix [14] which is represented as (1) where x and y are the vector co-ordinate components of x and y axes, i is the ith iteration step, σ is the sign bit that can be +1 or -1 represents the direction of vector rotation, z is the accumulated rotation angle and α is the predefined angle value of each micro rotation step α i = arc tan(21-i)
The CORDIC algorithm can operates on two modes of operation. The vectoring mode of operation is used to find the amplitude and argument of a given vector, while the rotation mode is used to obtain the sine and cosine values of the given angle [16] . The hardware architecture of the CORDIC iteration is shown in Figure 1 . In the CORDIC operation the magnitude of the rotated vector is scaled and accumulated after every iteration according to the following equation (2) The accumulated k i value in [2] is converged to a constant as follows.
-
lim k (n) ≈ 0.60725…. n→α Where n is the number of iterations.
B. Cordic Based DCT Architecture:
DCT express a signal in terms of a sum of cosine function with different frequencies. Based on separable property 2D-DCT process is decomposed into and 1D DCT which is expressed as the following equation. 
Where ck=cos(kπ/16). The cosine elements in (6) can be changed into sine elements through trigonometric symmetric property, and (6) can be rearranged as the following equations:
Where s m = sin(mπ/16) = ck, and m=8-k. The rearranged 1-D DCT equation is now represented as vector rotation matrix together with the consecutive CORDIC iterations as shown in Figure 2 . 
Figure 3. Sensitivity Difference of 8 x 8 2-D DCT co-efficient
Based on signal compaction property of DCT, the lower frequency components has more signal energy of the output data compared to that of high frequency components. After the quantization [17] , the high frequency DCT co-efficient becomes even smaller, this means that the lower frequency components are more sensitive to human eyes than high frequency components. Based on that fact the low frequency DCT co-efficient are more important compared to that of high frequency components. CORDIC based DCT architecture is designed considering the important difference between DCT coefficients. A large number of CORDIC iterations are assigned to generate the low frequency DCT coefficients of iterations are used for the high frequency components
Proposed Low Power Cordic based DCT
In the proposed low power CORDIC based DCT architecture, to generate DCT co-efficient a different number of iterations are assigned and the number of iterations should be carefully selected in order to get the minimum error between the desired input angle and corresponding accumulated angle Table 1 shows the required iterations and vector rotation direction σ (sign bits). For example, to rotate the vector by 7π/16, only the i th iterations (i=0, 1, 3, 10) are executed and for power savings the rest of the iterations can be skipped. The look ahead algorithm for 7 π/16 cordic rotator can be written as follows. (8) where σ0 = -1, σ1 = -1, σ3 = -1, σ10 = -1. In Table 1 i = 900 represent the optional first iteration of the CORDIC [14] . The numbers of iterations for our DCT architecture are carefully selected such that the error between the desired angle and the corresponding accumulated angle does not exceed 0.004 for all the given angles The scale factor is decided according to the number of the executed CORDIC iterations. As the number of iterations is known ahead, the scale factors are pre determined. Which are shown in Table 2 . One interesting observation is that the look ahead CORDIC using less number of iterations has the similar effect when the high shift terms is removing from the look ahead CORDIC. For example, if the CORDIC rotation with 7π/16 is executed using three iterations (i=0, 1, 3) the look ahead CORDIC algorithm and its corresponding scale factor are as follows. In (8) when the higher shift terms (smaller than 2-10 terms) are eliminated, the equation is changed to (9) & (10).
Reconfigurable Cordic based DCT Architecture
As mentioned in the last part, the high shift term of the look ahead CORDIC can be carefully removed which has the same effect with the less number of CORDIC iterations. The less number of CORDIC iterations means the CORDIC with low computational complexity. To further reduce the power consumption, we propose a reconfigurable CORDIC based DCT architecture. In this section several nodes are presented and the proposal reconfigurable architecture can dynamically change the CORDIC iterations. Generally in the look ahead CORDIC, the shift terms for calculating low frequency DCT coefficients i.e., terms for calculating X(0), X(1) in [7] are more important than the shift terms for calculating high frequency co-efficient. In look ahead CORDIC equation low shift terms are more important whereas the high shift terms are less important. To save the computation power, the least important shift term X (7) 
For mode 1, 3π/16 CORDIC rotator is reduced as follows.
As it goes to the higher level the number of shift terms are further reduced.
Experimental Results of the Proposed Low Power Cordic based DCT Architecture
In this section, the experimental results of the proposed CORDIC based DCT architecture are presented. In this, the number of CORDIC iterations is selected according to the target angle. The power consumption for DCT architecture is measured with 50 MHz clock cycles, 2.5 V supply voltage. The power value for three different modes is calculated. Because some of the higher order shift terms in CORDIC iterations can be removed considering the important difference of DCT co efficient, our proposed DCT architecture shows the lowest gate count and power consumption with improved image quality.
The component requirement for our DCT architecture for various modes is given in Table 3 . In this the amount of adder, sub tractor, and multiplier required for three different modes of DCT architecture is given.
Table 3. Component Requirement for Various Mode
The power consumption for our DT architecture at different modes is shown in Table 4 . The power consumption is measured with 50 MHz clock cycles, 2.5V supply voltage. 
Conclusion
CORDIC is a powerful algorithm and a popular algorithm of choice when it comes to various digital signal processing applications. In DCT architecture, all the computation are not equally important in generating the frequency domain outputs. This paper presented a low power CORDIC based DCT architecture, where the important difference in DCT co-efficient are efficiently exploited to allocate the number of CORDIC iterations. The reconfigurable CORDIC based DCT architecture can dynamically change the modes with power savings and improve image quality. The device utilization summary showed that minimum resources were consumed. This idea can assist the low power design of image and video compression applications.
Gate count
Normal 
