In this paper an array architecture for computation of Complex Discrete Wavelet Transform has been proposed. The wavelet filter coefficients are realized using multiplier less pipelined CORDIC algorithm. The choice of pipelined CORDIC algorithm over the conventional one for realizing the filter coefficient of CDWT is hardware effective and also effects in high frequency operation. The controller unit clusters input samples into even and odd samples coming in proper sequence at each clock cycles. This clustering provides a good amount of parallelism for faster operation of the filter compared to direct filter realization. The 8-tap filter bank is implemented using array architecture, effecting in high throughput. The algorithm developed is implemented on FPGA using the Virtex XCV100 series.
INTRODUCTION
Image analysis in transform domain has gained considerable importance in the recent years. Wavelet Transform [1] , [2] have come a long way as an important tool for image analysis. The multiresolution property of this transform has proved to have immense application in reliable loss less image compression and reconstruction. Though the Discrete Wavelet has extensive application in motion vector estimation but the real wavelet transforms, e.g. Daubechies, Haar etc., suffer from the lack of rotation and shift invariance property. The Complex Discrete Wavelet Transform (CDWT) [3] , [4] , [5] , [6] a phase-based method is a solution to these problems and is very effective in motion estimation and stereo image matching. For real-time application, hardware implementation of the transform is of immense importance.
An array architecture [7] , for realizing CDWT is proposed, which uses pipelined Co-Ordinate Rotation DIgital Computer (CORDIC) [8] as the basic processing element (PE). As the CORDIC [9] [10] element computes the trigonometric functions and does vectoring through shift and adds, avoiding any multiplication, the hardware overhead is drastically reduced and the speed is also enhanced. The algorithm developed through this paper separates the even and odd input samples to achieve a symmetric parallel architecture for the realization of the transform. Section 2 of this paper develops the theory of CDWT. This section also gives the arrangement of the input signal, the padding pattern, which effectively produces the symmetry of the architecture. Section 3 proposes the array architecture for realizing the transform. The realization of the filter coefficients and the data sequencing is also detailed in this section. Section 4 shows the performance of the proposed architecture.
COMPLEX DISCRETE WAVELET TRANSFORM
Rational valued complex kernels realize the CDWT. These kernels can be modelled by even length FIR filter with approximate Gabor form, given by:
With n 0 set to -0.5 to position the Gaussian window symmetrically in the interval [-D, D-1] , where D is the window half-length, a 0 and a 1 are the magnitude, ω 0 and ω 1 are the modulation frequencies and σ 0 , σ 1 are the w indow standard deviation. The modulation frequencies ω 0 and ω 1 should be complementary, i.e., Where sequence to the left of a 0 and to the right of a N is the padding signal. The remarkable feature of this padding is that it is similar to mirror padding except the fact that the edge values (a 0 and a N ) are not repeated in the padding sequence. The advantage of this can be seen in the later part. The response can be verified for finite input sequence of length N. Let N is equal to 16. The convolution product gives the transfer properties of a digital FIR filter in the time domain
Where N is the filter length, f (k) is the input signal, H (n) is the impulse response of the filter and y (n) is the output signal. The transform relation of the filters is given by: From these equations it can be seen that the even and odd indexed filter coefficients are multiplied with the even and odd samples of the input sequence respectively. This has been achieved using the modified form of padding. This form of padding also has the same performance as the mirror padding as far as the noise level is concerned.
ALGORITHM OF ARCHITECTURE
This section describes the architectural design of CDWT filter using CORDIC as the basic processing element. The symmetry shown in equation (3) has been exploited in design of the architecture. Based on the symmetry of even and odd samples and filter coefficients the filter structure is divided into two sections. The basic block level scheme for the realization of CDWT is given in Fig. 2. h0, h1, h2, h3, h4, h5, h6 , h7 are the filter coefficients of the 8 -tap CDWT which are realized using CORDIC algorithm. The even filter coefficients take the even samples of the input, while the odd filter coefficients take oddindexed samples as input. The controller section generates address for the RAM, which stores the input data (ref. Fig. 2 ). Address generated by the controller section is of the form shown in This separation of the input signal samples into even and odd sequence is done by the controller section. The output of the RAM is given to the filter stage, which performs the multiplication operation using CORDIC structure.
As the first sample is given to the CORDIC input, (ref. 
Filter Design
The basic equations for CDWT (Equation.
(1)) can be written in the following form:
where i=0 for low pass filter and i=1 for high pass filter, and However, the structure of both the equations are same, so equation (4) can be taken as the generalized form for both high and low-pass filters. The amplitude values a 0 and a 1 are taken to be equal and of magnitude 0.5. This has been done without any loss of generality. Thus multiplication with a 0 and a 1 can be achieved using shift operations only. The structure of the filter is shown in Fig. 4 . The input sequence can be real or complex. In this paper a complex input sequence is chosen for generalization of the derivations.
Fig 4. Realization of a filter coefficient
When a real signal is convolved with the complex filter response of the CDWT filters the output is a complex signal. Let the complex input sequence be described as:
Multiplication of a signal coefficient with a filter coefficient of the form A(cosθ + j sinθ) gives the output as:
cos sin sin cos (7) where F re is the real part of the output signal and F im is the imaginary part of the signal. The above equation essentially represents a plane rotation operation, which can be efficiently computed by applying CORDIC algorithm.
Circular CORDIC
In CORDIC technique, the plane rotation through an angle α is achieved by decomposing the target angle into several elementary angles and carrying out rotations through each of these angles as follows [8] , [10] :
with M being the word length and δ i = ±1 Since θ I /2 < θ I+1 < θ i , any arbitrary angle can be expressed in terms of elementary angles θ i with their signs properly chosen. An elementary plane rotation in two dimensions may be expressed as
Where the value of δ i decides the direction of rotation. This expression is identical to equation (7) . For an 8-tap filter, the window half-length D is equal to 4. Thus the value of n ranges from -4 to 3. For low-pass filters the value of θ, i.e. the orientation of the filter, is given by
Hyperbolic CORDIC
Equation (1) can be broken down into three parts, which are dealt individually. The first part of the equation involves the amplitude coefficients a 0 and a 1 . Amplitude is multiplied with a Gaussian distribution function and the product is again multiplied with e (j(n-n0)ω) . The Gaussian function is expressed as: A) . Solution of the Gaussian expression can also be done using Hyperbolic CORDIC scheme. In this case the recursion equation is given by:
) Equation (10) can be realized by using shifters and adders. Structures for circular and hyperbolic CORDIC (equation (9) and (10) 
PERFORMANCE ANALYSIS
Architecture of CDWT is probably the first of this type so, no comparisons can be provided. Number of clock cycles needed for computation of the CDWT using the proposed architecture for a N-tap filter (N = 2
x , x is an integer) is given by log 2 N. Thus the time is O(log 2 N). The pipelined CORDIC algorithm ensures reduction of hardware overhead and also enhances the speed performance. The latency of the system is 24 (16 for hyperbolic CORDIC + 5 for circular CORDIC + 3), while the throughput is 1. Using pipeline CORDIC, the full parameter space with all possible rotation angles is reduced to five rotation stages for this particular application. For verification of the algorithm the performance of the architecture is tested for a 4-tap CDWT on FPGA using Xilinx XCV100 series. Figure 6 shows FPGA implementation of 4-tap CDWT. Figure 6 The algorithm described in this paper exploits the symmetry of the relation between the input samples. This results in a regular and parallel structure of the filter architecture. Parallelism results in low power and high-speed operation. But the tree structure retains a high throughput rate (1/t 0 =frequency of the circuit) compared to the direct form of filter realization. 
