I. Introduction
For systems such as Calculator, keeping the size of calculator very small is of prime importance. For these systems the cost (e.g. chip gate count has to be minimized) is more important than speed. Also it is very important to calculate the values with good accuracy and precision. Though sometimes by increasing the bit length we can obtain better precision, but it is more important to select a method which gives more accurate results.
The coordinate rotation digital computer (CORDIC) has established its popularity in several important areas of application, like generation of sine and cosine functions, calculation of discrete sinusoidal transforms like fast Fourier transform (FFT), discrete sine/cosine transforms (DST/DCT), householder transform (HT),etc. [1] - [3] . Manyvariations have been suggested for efficient implementation of CORDIC with less number of iterations over the conventional CORDIC algorithm.The number of CORDIC iterations are optimized in [4] - [6] by greedy search at the cost of additional areaand time for the implementation of variable scale-factor. In efficient scale-factor compensationtechniques are proposed, which adversely affect the latency/throughput of computation.
Two area-time efficient CORDIC architectures have been suggested , which involve constant scalefactor multiplication for adequate range of convergence (RoC). The virtually scalefree CORDIC also requires multiplication by constant scale-factor and relatively more area to achieve respectable RoC. The enhanced scale-free CORDIC combines few conventional CORDIC iterations with scaling-free CORDIC iterations for an efficient pipelined CORDIC implementation with improved RoC. However, if used for recursive CORDIC architecture, combining two different types of CORDIC iterations, degrades performance. In this paper, we propose a novel scaling-free CORDIC algorithm for area-time efficient implementation of CORDIC with adequate RoC. 
II. Brief Overview Of Cordicalgorithm

A. Taylor Series
The Taylor series expansion for sine is:
This method is one of the oldest and most widely, but the problem associated with this method is, to get values of higher accuracies, higher order factorial and power has to be calculated. Moreover to implement this we would at least require a multiplier, divider, adder and a subtractor. For good accuracy it would be required to take each term incalculation till they become insignificant. Thus this approach has a lot of hardware requirements as well as it is slow. Many variations have been suggested for efficient implementation of CORDIC with less number of iterations over the conventional CORDIC algorithm [4] - [11] . The number of CORDIC iterations are optimized in [4] - [6] by greedy search at the cost of additional area and time for the implementation of variable scalefactor. In [7] and [8] efficient scale-factor compensation techniques are proposed, which adversely affect the latency/throughput of computation. Two area-time efficient CORDIC architectures have been suggested in [9] , which involve constant scale-factor multiplication for adequate range of convergence (RoC). The virtually scale-free CORDIC in [10] also requires multiplication by constant scale-factor and relatively more area to achieve respectable RoC. The enhanced scale-free CORDIC in [11] combines few conventional CORDIC iterations with scaling-free CORDIC iterations for an efficient pipelined CORDIC implementation with improved RoC. However, if used for recursive CORDIC architecture, combining two different types of CORDIC iterations, degrades performance. The low complexity technique for eliminating the scale factor is the use of Taylor series expansion. The Scaling-Free CORDIC and modified scale-free CORDIC are techniques based on Taylor series approach. The former suffers from low range of convergence (RoC) which renders it unsuitable for practical applications, while the latter extends the RoC but introduces predictable but constant scale-factor of 1/ 2. The other hardware efficient architectures require scale-factor compensations to extend the range of convergence to the entire coordinate space.
B. Look up
III. Sequential/Iterative Cordic
It requires Maximum number of Clock Cycles to calculate output,Minimum Clock Period periteration,Variable Shifters do not map well on certain FPGA's due to high Fan-in. It hasCombinational circuit More Delay, but processing time is reduced as compared to iterative circuit.Shifters are of fixed shift, so they can be implemented in the wiring.Constants can be hardwired instead of requiring storage space. The key concept of CORDIC arithmetic is based on the simple and ancient principles of two-dimensional geometry. But the iterative formulation of a computational algorithm for its implementation was first described in 1959 by Jack E. Volder for the computation of trigonometric functions, multiplication and division. This year therefore marks the completion of 50 years of the CORDIC algorithm. Not only a wide variety of applications of CORDIC have emerged in the last 50 years, but also a lot of progress has been made in the area of algorithm design and development of architectures for high performance and low-cost hardware solutions of those applications. CORDIC-based computing received increased attention in 1971, by varying a few simple parameters; it could be used as a single algorithm for unified implementation of a wide range of elementary transcendental functionsinvolving logarithms, exponentials, and square roots along with those suggested by Volder. During the same time, Cochran benchmarked various algorithms, and showed that CORDIC technique is a better choice for scientific calculator applications. The popularity of CORDIC was very much enhanced thereafter primarily due to itspotential for efficient and low-cost implementation of a large class of applications which include: the generationof trigonometric, logarithmicand transcendental elementary functions; complex number multiplication, eigenvalue computation, matrix inversion, solution of linear systems and singular value decomposition (SVD) for signal processing, image processing, and general scientific computation. The name CORDIC stands for Coordinate Rotation Digital Computer. Volder [Vold59] developed the underlying method of computing the rotation of a vector in a Cartesian coordinate system and evaluating the length and angle of a vector. The CORDIC method was later expanded for multiplication, division, logarithm, exponential and hyperbolic functions.
IV. Pipelined Architecture
The principle of pipelining has emerged as a major architectural attribute of most present computer systems.Pipelining is one form of imbedding parallelism or concurrency in a computer system. It refers to a segmentation of a computational process (say, an instruction) into several sub processes which are executed by dedicated autonomous units (facilities, pipelining segments) Fig.3 .Pipe line architecture logical view Parallel CORDIC can be pipelined by inserting registers between the adders stages. In most FPGA architectures there are already registers present in each logic cell, so pipeline registers has no hardware cost. Number of stages after which pipeline register is inserted can be modeled, considering clock frequency of system.When operating at greater clock period power consumption in later stages reduces due to lesser switching activity in each clock period.
V. Proposed Algorithm For Scaling Free Cordic :
The proposed design is based on the following key ideas: 1) we use Taylor series expansion of sine and cosine functions to avoid scaling operation and 2) suggest a generalized sequence of micro-rotation to have adequate range of convergence (RoC) based on the chosen order of approximation of the Taylor series.
A.Taylor Series Approximation of Sine and Cosine Functions
The Taylor expansions of sine and cosine of an angle "-" are given by
We have estimated the maximum error in the evaluation of sine and cosine functions for different order of approximations. Therefore, we choose third order of approximation for Taylor's expansion of sine and cosine functions.
1) Representation of Micro-Rotations Using Taylor Series Approximation:
Here, we study the impact of orders of approximation ofTaylor series of sine and cosine functions on the micro-rotations to beused in CORDIC coordinate calculation. Both theoretical and simulationresults are discussed to confirm the appropriate selection of theorder of approximation. Using different orders of approximation of sineand cosine functions in (2), we can have (1e) We have used (1) for coordinate calculation for evaluating the best possible combination of approximation, which satisfies the accuracy and RoC requirements, with minimum possible hardware. In Fig. 1 ,we have plotted the error in magnitude estimated according to (1) (with respect to the corresponding built-in functions of MATLAB). Since Errors resulting from the five combinations (1a)-(1e) are of very small order, we prefer to use (1a) for coordinate calculation with minimum complexity.
2) Expressions for Micro-Rotations Using Taylor Series Approximation and Factorial Approximation:
Although, we find that we canuse Taylor series expansion with third order of approximation (1a),with desired accuracy and RoC requirement, (1a)cannot be used inthe CORDIC shift-add iterations. To implement (1a) by shift-add operations,we need to approximate the factorial terms by the power of 2values, replacing 3! by 2^3 in the (1a) we find
In Fig. 1 only, we have plotted the error in magnitude using the approximated factorial values and exact factorial values after a CORDIC rotation for initial vector with coordinates X=1 and Y=1. The maximum percentage of error in sine and cosine values for both third order of approximation and factorial approximation is 0.0004% and 0.0168%, respectively, within the permissible CORDIC elementary angles range of 0, 7 88 discussed.
B. Determination of the Basic-Shift for a Given Order of Approximation of Taylor Series Expansion
One can find that: 1) the order of approximation of Taylor series expansion of sine and cosine functions determines the basic-shift to be used for CORDIC iterations, and 2) the basic-shift of CORDIC microoperation determines the range of convergence. The expressions for the basic-shifts, the first elementary angle of rotation ∝ 1 and RoCfor different orders of approximations for different word-length of implementations are as follows:
Where b is the wordlength ROC= 1 
The values in Table I are derived from (3). We find with increase in the order of approximation, the basic-shift decreases, the first elementary angle of rotation increases and RoC is expanded. Very often inclusion of higher order terms does not have any impact on the accuracy for smaller word-lengths. The basic-shift for third order of approximation using (3a), for 16-bit word-length is [2.854] . In this paper, we propose a novel scaling-free CORDIC algorithm for area-time efficient implementation of CORDIC with adequate RoC. The proposed recursive architecture has comparable or less area complexity with other existing scaling-free CORDIC algorithms. Moreover, no scale-factor multiplications are required for extending the RoC to entire coordinate Space. 
VI. Proposed Cordic Architecture
The block diagram for the proposed CORDIC architecture is shown in Fig. below . It makes use of the same stage for all the iterations for the coordinate calculations, as well as for the generation of shift values. The structure of each stage (shown in Fig. 5 ) consists of three computing blocks namely the 1) shift-value estimation; 2) coordinate calculation;and 3) micro-rotation sequence generator. The combinatorial circuit for generating the micro-rotation sequence is shown in Fig. 4 . The number of iterations required in a CORDIC processor decides the rollover count of the counter. The rollover count is seven for basic shift =2 and ten for basic-shift =3. The expiry of the counter signals the completion of a CORDIC operation; depending on this signal, the multiplexer either loads a new data-set (rotation angle,initial value of and "x"and"y") to start a fresh CORDIC operation, or recycles the output of the stage to begin a new iteration for the current CORDIC operation. The input and output register files act as latches for synchronization. 
VI. Fpga Implementation
The proposed architecture is coded in Verilog and synthesized using Xilinx ISE9.2i to be implemented in Xilinx Spartan 2E (XC2S200EPQ208-6) device. Slice-delay-product of the proposed architecture is compared with the existing CORDIC designs in Table IV ; where, all designs are synthesized on Xilinx Spartan 2E XC2S200E device to maintain uniformity. The power dissipation of the proposed architecture for different clock frequencies is estimated by Xilinx XPower tool.
VII. Experimental Result And Discussion
TABLE IV SLICE DELAY PRODUCT
Slice-delay-product of the proposed architecture is compared with the existing CORDIC designs in TableIVis suggested to reduce the number of iterations for low latency implementation. The proposed CORDIC processor has 17% lower slice-delay product for identifying the micro-rotations.
VIII. Conclusion
The proposed algorithm provides a scale-free solution for realizing vector-rotations using CORDIC. The order of Taylor series approximation is decided appropriately by the proposed algorithm, not only to meet the accuracy requirement but also to attain adequate range of convergence. The generalized micro-rotation selection technique is suggested to reduce the number of iterations for low latency implementation. Moreover, a high speed most-significant-1 detection scheme obviates the complex search algorithms for identifying the micro-rotations. The proposed CORDIC processor has 17% lower slice-delay product with a penalty of about 13% increased slice consumption on Xilinx Spartan 2E device. 
