Abstract ASIC and FPGA ASIC and FPGA are 
Introduction
CORDIC algorithm is the best choice to achieve the functions of transcendental functions such as trigonometric, inverse trigonometric, exponential function , logarithmic function since that the CORDIC algorithm is provided with a simple structure, and characteristic of saving resources and high efficiency, as while as CORDIC algorithm is used widely for matrix operations, especially possess significance for development of transplant of highly complex operations in FPGA, such as analysis algorithms for multidimensional data array [1] . CORDIC algorithm is an iteration algorithm and commonly used to calculate basic arithmetic functions. The principle of the CORDIC is to use the deflection of the angles associated with the cardinal number, rather than get the desired angle. Thus, the principle can be considered as a numerical approximation methods of calculation. Because the fixed angle is related with cardinal number, the operations of computation include only shift and addition or subtraction. Therefore, it will cost less FPGA (Field Programmable Gate Array) resources than conventional calculation methods, such as multiplication and division. The conventional calculation methods are difficult to achieve or can not meet the designer's requirements. The appearance of CORDIC algorithm is to solve this problem, the CORDIC can greatly saving FPGA resources to get better implement in hardware, which can achieve the requirements of the designer.
In this paper we discuss how to implement the CORDIC algorithm with FPGA, model it in Verilog hardware description language, simulate it by EDA tools, and analyze the 
Schematic of CORDIC Algorithm Rotation
We will obtain a new vector (x n , y n ) by rotation of the initial vector (x i , y i ), the coordinates of the new vector can be expressed as: , the angle of rotation of each step αi=δi arctan2 -i can be approximated as:
In (3), δi={1, -1}, the sign of δ i determines the direction of rotation, which would be close to the target vector if δ i =1, else would be close to the opposite direction if δ i =-1. Then the remaining angle after each rotation can be expressed as: Z i+1 = Z i -δ i α i , where δ i =sign(Z i ).
Assuming that we could complete the rotation angle θ by N times of iterations, then the rotation process may be represented by (5) . 

The value of K depends on the number of iterations N, and K was usually called focus constant or scaling factor. Generally, we could calculate the scaling factor and the rotation operation separately. The rotation operation was carried out according to (7) .
However, the range of δ i would be extended to {-1, 0, 1}, so as to skip the unnecessary rotation and greatly reduce the number of iterations. The gain K would be no longer a constant, therefore, after each rotation step we would not only update coordinates of xi and yi but also the scaling factor ki for each step [4] .J. S. Walther et al. proposed a CORDIC algorithm in 1971, which unified rotation of CORDIC under the circumference coordinate system, hyperbolic system and linear systems into a single iteration equation, which was shown as: 

Where m={1, 0, -1} for the peripheral system is a coordinate system , a linear system or a hyperbolic systems respectively. The αmj is a constant angle, could be unified expressed as:
Where, S(mj) is the shift sequence that could be represented by (10) 

Uniform scaling factor can be expressed as:
The unity of iteration expression under different systems laid the theoretical foundation for the same hardware to implement multiple functions, the results in different systems and modes when the entry was (x 0 , y 0 , z 0 ) would be shown as in Table 1 . The CORDIC algorithm based on angle encoding was same as the traditional ones, it assembled linearly rotation angle θ by a series combination of a small angle, however, the difference was that the rotational direction of the vector could be zero, i.e., α i = {-1, 0, 1}. The use of greedy mechanism, making each selection from the remainder of the rotation angle is the angle nearest. The pseudo-code of angle encoder CORDIC algorithm is shown as:
To expand the scope of the convergence to the range of (0, π/2), the algorithms took advantage of interval folding technique, which under the following rules: if the range of θ is θ>2π，let θ=θ-2π; if the range is from π to 2π，replacing the 
Implementation of Angle Encoder CORDIC
Since the rotation angles of the CORDIC are known, the number of iterations is able to reduce using angle encoder method at least 50% with N-bit precision.
The implementation of angle encoder CORDIC is shown in Figure 2 . In Figure 2 , the algorithm requires an N-bit adder /subtracter, and a comparison unit for getting the minimum. Since all of these operations are on the critical path of the each iteration, the time of iteration and consumption of area are greatly increased. Angle encoder CORDIC algorithm could greatly reduce the number of iterations at the cost of increasing the latency of a single iteration, and skip some unnecessary rotation angle so that the scaling factor is no longer a constant [6] .
Problems and Optimization of CORDIC Algorithm
In this section, we will analysis limitations of CORDIC algorithm from the research object of rotation angle θ i , and put forward the corresponding optimization measures to solve these problems [6] . 
Relationship between the Number of Iterations and Accuracy of Data
Thinking that the angle of rotation Z could be negative in the iterative process, we represent Z with b-bit two's complement, and the minimum of angle is 2 -b . To make sure that the last rotation is meaningful, the angle of it must be less than the minimum value. Since all the angles of rotation are stored in tables of ROM, the minimum angle stored in the table must be less than 2-b. Assuming that the angle stored in ROM being also represent with a-bit two's complement, the minimum angle, 2 -a , should meet the requirement of a≥b. The sequence of angle stored in ROM table is arctan(2   -i+1 ) ， i=0,1,2….N-1，number N is the word length of the CORDIC algorithm computation. When number i is large enough, arctan(2
≤2
-a , N≥a-1.Thus, at least 15-level pipelines had reached the desired for meet the accuracy of the CORDIC algorithm with 16-bit word length of computation.
Limitations of CORDIC Algorithm
For the N-level pipeline, the variable θ is the rotation angle, and the range of number i is from 0 to N-1, θi= arccan (2 -i ). The values of θi as the entry of every level pipeline, should be calculated in advance, and stored in table of ROM.
The sequences of 16-bit θ i taken part in symbolic computation are represented with two's complement in Table 2 . In Table 2 , we can see that as the number of pipeline level increases, the capacity of the ROM table grows exponentially, and the area of the system will also increase. As can be seen in Fig. 2 , it requires multiple iterations for once computation of CORDIC algorithm and the direction of iteration must be determined by the result of the last iteration. The more the number of iteration is, the more the number of direction determination must be required, which will undermine the speed of operation.
Range of Angle
Assuming the number of iteration is N, the range of rotation is shown as:
222
Copyright ⓒ 2015 SERSC
The range of angle corresponding with N would be calculated according to the formula given. In Table 3 , the maximum range of rotation angle is -99.88° ≤ θ ≤ 99.88°, which could not achieve a complete cycle. To make sure that CORDIC algorithm is convergence, the sum of rotation angles must be bigger than the angle of rotation which is actually required, and the input angle should must be pretreated [7] . 
Optimization of CORDIC Algorithm
Since computation of CORDIC algorithm needs calculating the arc tangent function, the usual process is to calculate the corresponding datum shown in Table. 2 previously then store them in the ROM, which took too much hardware resources and make the computation speed slow. If we reduce the number of iterations without affecting the computation accuracy, we can effectively improve the speed of operation. The proposes a feasible method by studying the rotation angle θ i [8] [9] .
By taking use of the Taylor series of arc tangent function, we could obtain the following equations (14). 
Obviously, the difference between arctan(2-l) and 2-l decreases rapidly as while as the increasing of the number i.
If we replace arctan(2-i) with 2-l from the m-level iteration, there is no effect on the calculation accuracy of the final results. We could use shift operation instead of process of checking the ROM table, which speeding up the computation and reducing the occupancy of resources. The key of the theory is that the tolerance introducing by replacing arctan(2-i) with 2-l. For the computation of CORDIC algorithm, it is unnecessary to check arctan(2-i) from ROM table, if replacing it with 2-i when the number of iteration is greater than or equal to m, which reduces the time accessing to ROM, enhances the speed of computation, and reduces the ROM resource by two-thirds [10] .
According to the following Taylor series (16), we could introduce variable l, when i ≥ l, the formula |cosθ i-1 |≤2-n must be established; when N = 15, the I = 8 can be obtained; when N = 31, the l = 16 can be determined [ 11] . 
So, equation (7) can be simplified as (18), and the correction factor is 1which is used to simplify the operation .
Similarly, if i ≥ 8, (8) 
If we select δi and δ i+1 properly, two-step iteration would be combined into a single step. Thus, we could process two adjacent bits of rotation angle Zi in the one iteration, nature of which was actually replacing its corresponding bit with 0, tending the rotation angle to 0. It reduces the number of stages of the pipeline and improves speed of computation. The value of (δ i ,δ i+1 ) was shown in Table 3 . Based on the above analysis, the improved CORDIC algorithm works only when the value i = max (m, l) and N = 15). Therefore, it is necessary to use the combination of (7), (18) and (19) to calculate a value of function. For example, if N = 16, we firstly took nine times of iteration with formula (7), and corrects mode correction factor as , then took three times of iteration for the following six stages, and corrects mode correction factor as one [8] . So the whole process needs twelve times of iteration, reduces three stages of pipeline and six storage units, cuts down the consumption of hardware unit, reduces the ROM accessing times, and reduces the computation time. Thus, under the premise of guaranteed system performance, the optimization of CORDIC algorithm economizes hardware resources and enhances the speed of computation [12-13].
Adjustment of Range of the Input Angle
As mentioned above, the range of the CORDIC rotation angle is from -99.88° to 99.88°, not cover the whole circumference, which limits the scope of the calculation of the algorithm. The solution usually used is to take multiple iterations with the number i=0 to make sure that the angle of CORDIC algorithm covering all the four quadrants.
However, the correction factor is not easy to determine in advance, and it is difficult to calculate it immediately. Another approach called sub-quadrant method is taken in this paper. This method takes advantage symmetry of trigonometric, and converts all of the perspective of the whole cycle into the first quadrant. Then it preserves the phase information according to the rules of transition, and puts it into the module of computation for CORDIC algorithm. After that, it restores to the original angle of the sine and cosine of the phase information.
224
Copyright ⓒ 2015 SERSC Assuming the input angle was z0, the following four cases indicated the specific rules of conversion for 16-bit CORDIC algorithm, which is shown in Table 5 . 
Implementation of the CORDIC Algorithm based on FPGA
FPGA is a semi-custom integrated circuit coming from ASIC (Application Specific Integrated Circuit, ASIC), which not only solve the defect of custom circuits, but also overcome the limitation of the number of the original gates of programmable devices.
Introduction of Tools for Development
We can complete most developments of digital devices by FPGA, such as CPU (Central Processing Unit), a circuit of 74 series. We can reduce design time and area of PCB (Printed Circuit Board) and improve system reliability by using FPGA to develop digital circuits. The process of FPGA design is shown in Fig. 3 . System engineers could connect the internal logic blocks in FPGA together by editing of connection as needed, just like placing a test circuit board into a chip. The logic blocks and connection of a development of FPGA could be edited by designer, so to complete the required logic functions [10] .
Design of the System
The framework of the optimized CORDIC algorithm based FPGA is shown in Fig. 4 , which consisted of five modules, including an UART (Universal Asynchronous Receiver/Transmitter) controller, a cache allocator for the initial value, a pre-processing unit, an unit of optimized CORDIC and the post-processing unit. 
225
The UART controller is used to optimize the communication between the hardware of CORDIC and the serial device. The cache allocator is used to convert the two 8-bit datum to a 16-bit data for the initial value of the pre-processing unit. The pre-processing unit is used to convert initial angle into the first quadrant, trigger the iteration computation of the unit of optimized CORDIC. In the five modules, the unit of optimized CORDIC is the core, which determined the performance of system.
The flow chart of the optimized CORDIC algorithm is shown in Figure 5 . 
.Simulation of CORDIC Algorithm based on FPGA
In this part, the results of simulation on FPGA both of traditional CORDIC algorithm and the optimized one
Simulation of Traditional CORDIC Algorithm
We take a 16-bit CORDIC algorithm as an example, chose the EPlC4F400C6 chip of the Cyclone series developed by Altera company, and select some angles randomly in the cycle, such as 15°, 45°, 99°, 110°, 200°. The angles between 0° and 360° are indicated by 16-bit binary and the result of the traditional CORDIC is shown in Fig. 6 . Since the results are signed, the MSB (Most Significant Bit) is sign bit, and the remaining fifteen bits are the fractional part. The simulation results of sine and cosine function are shown in Table 6 and Table 7 . Table 6 and Table 7 , when the input angles are less than 99.8°, the inaccuracy of simulation negligible, otherwise the inaccuracy are large, which verify the contents we discussed above, and indicate that the conversion of input angle is necessary.
Simulation of the Optimized CORDIC Algorithm based on FPGA
We also take a 16-bit CORDIC algorithm as an example, chose the same chip and selected the same angle. And the angles are indicated with 16-bit unsigned binary, the results of simulation are shown as 16-bit complement, and the MSB is the sign bit, the remaining fifteen fits are decimal, the waveform diagram is shown in Fig. 7 . The simulation results of sine and cosine function are shown in Table 8 and Table 9 . From the results, the optimized CORDIC have a high accuracy as while as enhance operating frequency.
Conclusions
In this paper, we successfully complete the optimization of conventional CORDIC algorithm, and resolve the problem of restrictive relationship of speed, area, precision in the design, break the limitation of angle coverage, provide a optimization for various functions by CORDIC algorithm, and accomplish the simulation of the optimized CORDIC algorithm with 16-bit on FPGA. Comparing the simulation results of optimized CORDIC algorithm and traditional one, we can get the conclusion that the optimized CORDIC algorithm reduce resource consumption, and increased the maximum operating frequency, the accuracy of the CORDIC algorithm is 10-5 as same as the data of traditional one.
