In this paper, we propose a low-cost sequential and high performance architecture for the implementation of CORDIC algorithm in two computation modes. It suited for serial operation that performs conversion between polar and rectangular coordinate systems, essentially sin/cos, sinh/cosh and arctan computation.
INTRODUCTION
Fingerprint recognition systems are the focus of research and development. They allow new types of services universally available to consumers and for industrial applications. This paper is based on a project which aims to develop a fingerprint recognition system. The most difficult to implement functional blocks is Fast Fourier Transform (FFT) processor.
A Coordinate Rotation Digital Computer offers an elegant way of its implementation. It can be applied to FPGA applications, in which the rotation angles are usually known, the twiddle factor in FFT and kernel components in other sinusoidal transforms [1] . The CORDIC scheme has been applied to the FFT processor design and found to result in significant hardware reduction in the implementation of twiddlefactor multiplications.
In this work, we exploit the FPGA circuit capacity to design a reconfigurable architecture for computation of elementary functions such as sine, cosines, exponential and arctangent using this algorithm. We focus on polynomial approximations with fixed coefficients and powers of x to search errors over a bounded interval. Then, we deal with CORDIC evaluation to calculate outputs in fixed-point-format. The obtained average of error is close to the error of polynomial approximations. This makes our method an attractive solution for signal processing applications.
The remaining paper is organized as follows. Section 2 represents the previous work which proposed different types of CORDIC architectures. The CORDIC algorithm is described in Section 3. Section 4 presents the proposed architecture for rotation and mode derived from the algorithm specification. Finally, in section 5 the results of our implementation are reported and the performance comparison of proposed architecture with the other architectures available in the literature is explained. The conclusions are drawn in section 6.
RELATED WORK
Large numbers of architectures have been proposed in the literature for CORDIC algorithm, which vary from bit-serial implementations to word parallel pipelined architectures. The choice depends on the requirements for computing throughput and constraints that hold for area usage, latency and power dissipation. Traditionally [2] , [3] , implementations of the CORDIC algorithm have been carried out on word serial architectures using conventional non-redundant arithmetic with radix-2 micro-rotations and fixed point internal format. Lang and Ercegovac [4] have proposed redundant arithmetic to the implementation of conventional radix-2 CORDIC [2] , [3] . However this resulted in increasing the iteration delay and additional cost due to variable scale factor. Double rotation and correcting rotation methods [5] were proposed to implement constant scale factor CORDIC which resulted in 50% increase in number of iterations. This increase in latency is reduced by proposing branching algorithm [6] , which requires additional CORDIC module to perform rotations in both directions, if the direction cannot be determined using intermediate results. The main disadvantage of branching method is the necessity of performing two conventional CORDIC iterations in parallel, which consumes more silicon area than the conventional methods. However, this method gives a faster implementation than [5] .
Low latency CORDIC algorithm is proposed in [7] to achieve latency reduction by 25% compared to the method in [5] .
In contrast to these methods, a new algorithm is proposed in [8] , which avoids the determination of direction of rotation using intermediate results of steering variable. However, there is an area cost for registers because of pipelining at the full adder level and n initial register rows for performing skew of input data. This redundant radix-2 CORDIC algorithm has been extended to radix-4 to halve the number of iterations [9] . However, the computation time per iteration increases, since it takes more time to decide amongst the five micro-rotation direction values and to select an appropriate one out of five elementary angles. Both redundant and higher radix based CORDIC algorithms are still iterative in nature and greatly restrict the speed of implementation of the algorithm. The delay of every iteration can be decomposed into two different delays, the delay to predict, the new rotation direction and the delay involved in the application of computed rotation. Improvements have been especially made in the reduction of delay to predict the new micro-rotation direction.
OVERVIEW OF ITERATIVE CORDIC ALGORITHM
The CORDIC computing technique was developed by J. E. Volder in the late 1959's [2] for the computation of trigonometric functions, multiplication and division operations. Walther, in 1971, has generalized this algorithm to implement hyperbolic, logarithm and exponential functions. This algorithm is iterative with an ability to decimate elementary operations with simple shift and addition operations. The number of iterations is determined by the word length of the inputs.
CORDIC modes
The CORDIC method can be employed in two different modes, namely, the rotation mode and the vectoring mode.
In the rotation mode, the coordinate components of a vector and an angle of rotation are given, and the coordinate components of the original vector, after rotation through a given angle, are computed. In the vectoring mode, the coordinate components of a vector are given, and the magnitude and angular argument of the original vector are computed.
The CORDIC algorithm performs the rotation of a vector in both modes as a sequence of micro-rotations by elementary angles [2] recalled from ROM. The number of micro-rotations for a given precision is decided by radix being used for the implementation of CORDIC algorithm. The CORDIC's graphical representation is shown in Figure 2 . 
Generalized CORDIC
The generalized iteration equations of the CORDIC algorithm [3] at the (i + 1) th step are as follows: where k i denotes the vector amplification factor for the i th iteration, and K is the resultant vector amplification factor after n iterations.
Outputs of the CORDIC algorithm
In order to better understand how CORDIC processor works, we explain the simplest form of the CORDIC algorithm with 
CORDIC DESIGN
As the CORDIC is an iterative method, it requires many clock cycles to achieve the required accuracy. For a given precision, the increase of radix reduces the number of micro-rotations compared to radix-2. Our CORDIC module performs 14 iterations for 14 bits precision using radix-2 number representation (Figure 3) , with the constraint that the (i+1) th iteration may begin only after the i th rotation has been completed. 
Sinus/cosines and exponential function implementation

Arctangent function implementation
To obtain this function, we use the vectoring mode and circular coordinates as described in Table 1 .
The implementation ( Figure 5 ).was done using the same architecture as for the first design. But, the adder/subtracter is commanded by signed numbers of register-Y. 
RESULTS OF FPGA IMPLEMENTATION
The concept was implemented in VHDL with ModelSim SE 6.0 simulator from Mentor Graphics, verified and synthesized with Quartus II version 8.0 (32 bits) of ALTERA. We use Stratix III : P3SL150F1152C3 component. The implementation results are given in Table 2 . 
PRECISION WITH CORDIC METHOD AND ERROR ANALYSIS
In this section, we will conduct simulations to show the effectiveness of the proposed architecture. To analyze the error performance, we define the error as the distance between the ideal rotated point and the feasible rotated point divided by the ideal rotated point. The error is thus determined by:
In the design flow, one important step is the fixed-point simulation on which we assist to determine the required word-length. If the wordlength is over-determined, we will suffer from higher cost and slower computational speed. So, we will explore the 14-point format for the data and we will fix the scaling factors. The following relative error curves present the CORDIC precision after the extraction of the values from ModelSim simulation, which are generated from the test bench. The mean error for exponential is about 0.005 %, an acceptable error in the specified convergence range. For the arctangent function, the mean error does not exceed 0.01%.
CONCLUSION
This paper proposes CORDIC architecture as an approach to implement some operators in a fingerprint recognification application. The CORDIC architecture leads to fast and small operators up to 14 bits of precision. The principal drawbacks of this algorithm are the requirement of a scale factor and the slow rate of convergence. The convergence range can be extended over the entire coordinate space by repeating certain iteration steps and by exploiting the symmetry of the coordinate axes. To cover the whole coordinate space, we compute the angle on the interval [0, 90°]. The result of CORDIC rotations for any angle between 90° and 360° can be extrapolated for the result of a rotation corresponding to [0, 90°].Our basic CORDIC processor has been designed in VHDL implementation. The implemented architecture is dedicated to the computation of trigonometric, exponential and arctangent functions with internal wordlength of 14 bits. Nevertheless, it can be adapted to all functions by reprogramming the FPGA. Our module uses radix-2 number representation, this leads to small circuits by replacing the costly multiplications by a small number of additions. The obtained operators provide very small average error with reasonable maximum error what's makes our algorithm suitable for many applications.
8.
