A numerically controlled oscillator (NCO) is used to generate quadrature controllable sine and cosine waves and is an important part of software radio. The traditional NCO module is implemented based on the lookup table structure, which requires a large amount of hardware storage resources inside the FPGA. Therefore, the CORDIC algorithm is used to implement the NCO module, and the output accuracy is improved by improving the CORDIC algorithm. At the same time, the FPGA technology is characterized by strong reconfigurability, good scalability, low hardware resources. The module is designed with Verilog HDL language. Finally, the NCO model based on FPGA design has the characteristics of low hardware resource consumption and high output precision. The model was simulated by Modelsim and downloaded to the target chip verification of Altera DE2's EP2C35F672C6. The digitally controlled oscillator met the design requirements.
Introduction
The numerically controlled oscillator (NCO) is an important component of the signal processing system [1] , and is mainly used to generate orthogonally controllable sine and cosine waves. With the continuous development of modern communication systems, digital control oscillators have been widely used in digital communication, signal processing and other fields as an important part of software radio [2] .
The traditional NCO implementation method is based on read-only memory table lookup (ROM LUT), which requires complex multiplication operations and therefore consumes a large amount of storage resources. In this paper, the CORDIC algorithm is improved, and the algorithm pipeline structure is adopted. Each level of iteration uses independent arithmetic units and increases the calculation data bits, which improves the output precision of the algorithm while reducing the resource occupation. FPGA technology itself has the characteristics of strong reconfigurability, good scalability, and low hardware resources [3] , Therefore, this paper uses FPGA technology to build NCO model, uses Verilog HDL language to design and implement the module, and finally based on FPGA design. NCO has the characteristics of less hardware resources and high output accuracy.
Thestructure and Principle of Digitally Controlled Oscillator

NCO Structure and Principle based on Table Lookup Method
The structure of the digitally controlled oscillator is shown in Figure 1 . The traditional NCO is implemented by the look-up table method (LUT), that is, the sine and cosine values of the corresponding phase are calculated according to the phase of the sine and cosine waves, and the phase angle is used as the sine and cosine of the phase. The address is used to construct the phase amplitude conversion circuit, and finally the sample value of the sine and cosine signal is obtained by looking up the table [4] . The look-up table method requires a large amount of storage resources. 
Principle of CORDIC Algorithm
Coordinate Rotation Digital Computer (CORDIC) was first proposed by D.J. Volder in 1959. The core idea of the CORDIC algorithm is to approximate the desired angle by continuously rotating a series of fixed, small angles associated with the calculation base [5] . As shown in Fig. 2 , the vector A x , y in the xy coordinate system is rotated counterclockwise by θ angle around the origin to obtain the vector B x , y , and the θ angle between the vector A and the vector B is divided into N θ , can get a single rotation expression. Let be the direction of rotation, 1 when rotating clockwise, 1 when counterclockwise, get cosθ , let arctan2 , select the transformation method according to the coordinates:
Since the sum of N θ is equal to θ after N rotations, θ ∑ d θ . The above formula can be obtained:
Where K is the modulus expansion factor, K ∏ cosθ ∏ 1 1 2 ⁄ .When N→∞, K converges to a constant, that isK 0.607252. It can be seen that when the number of iterations N is determined, the value of K is a constant. You can simplify (1) by removing cosθ during the rotation:
Finally, the residual angle value formula is introduced, z z d θ z is θ, and z is the angle residual value. As i increases, the z angle residual value approaches zero. When the initial value x , y , z K , 0, θ , x , y converges to sinθ, cosθ . The CORDIC algorithm performs the calculation of the sine and cosine in the above manner.
Cordic Algorithm Improvement
Algorithm Pipeline Structure
The traditional CORDIC algorithm structure uses the same set of hardware to iterate repeatedly, and needs to continuously feed back the output data to the input end, which undoubtedly affects the running speed of the whole system, and the throughput is small [6] . This paper uses a pipeline structure to achieve, each level of CORDIC iterative operation uses a separate arithmetic unit to improve data throughput.
In the CORDIC algorithm pipeline structure, there is a certain angular error δ between the rotational accumulation angle and the actual angle [7] , θ ∑ d θ δ . It can be seen from the above equation that as the number of iterations increases, the accumulated angle is infinitely close to θ, the angular error δ becomes infinitely small, and the accuracy of the CORDIC algorithm is thus maximized. Combined with the actual situation and the design requirements of the NCO, this paper adopts an 8-stage pipeline structure, and each stage of the iteration uses an independent arithmetic unit to achieve high-speed and high-precision output. The pipeline hardware structure is shown in Figure 3 . 
Coverage of the Angle in the Circle
According to the principle of CORDIC algorithm, the rotation angleθ ∑ d arctan2 , the rotation range is [-99.88°, 99.88°], and cannot cover the entire circumference period. In this paper, 24-bit data bits are used to represent the rotation angle. When the rotation angle is not in the first quadrant, it is mapped to the first quadrant by cutting off the highest two digits and doing the complement 0 processing in front, and then correspondingly processing the angle values. The iterative operation, the obtained value is then based on the symmetry output corresponding to the quadrant restored amplitude value. The angle mapping relationship is shown in Table 1 . 
Increase the Calculated Data Bit
the CORDIC algorithm uses finite precision algebra to do the calculation, so another error occurs. The error caused by the limited data bit width is called the rounding error ε. According to the formulaε 2
, the operation data bit width b affects the rounding error ε, and the rounding error is reduced to half of the original by the bit width of the operand. In the internal iteration of CORDIC designed in this paper, by adding 6 data bits, the data bit width is extended to 14 bits, which effectively improves the operation precision. The data bit increase diagram is shown in Figure 4 . 
Design and Implementation of Improved Nco based on Fpga
The improved NCO top-level structure diagram based on FPGA is shown in Figure 5 . The structure consists of three parts: pre-processing, CORDIC iteration and truncated output processing. Figure 5 . Is based on the improved CORDIC algorithm NCO top-level structure
In the pre-processing, the initial value of the phase accumulator is 0. Under the control of the clock pulse, the frequency control word and the phase accumulator are accumulated, and the obtained phase value is added to the phase control word to obtain the current phase value. Let the frequency control word be K, then the frequency of the output signal is f K f 2 ⁄ , where f is the input frequency and m is the phase accumulator bit width, then the minimum resolution of the frequency is the output frequency f f 2 ⁄ . Since the number of pipeline stages has been determined, simply push the number of pipeline stages into the formula to find K , and then determine the correction factor.
The CORDIC iteration part is an 8-stage pipeline structure. Each level of the iterative structure contains two shift registers and three adders and subtractors. Each level structure is equivalent to one iteration of the CORDIC algorithm. The hardware structure is shown in Figure 3 .
The truncated output processing part is a rounding truncation of the data, and outputs two sinusoidal and cosine waveform signals.
The design of the NCO module based on FPGA is shown in Figure 6 , where clk is the clock pulse signal, the active level is high, rst_n is the reset signal, phase_in is the input of phase data, and sin_out and cos_out are the sine and cosine signals of the output respectively. 
System Simulation and Results Analysis
This design simulates the NCO model in Quartus II 13.0 test software, simulates the model with Modelsim, and downloads it to the development board with Altera DE2 EP2C35F672C6 as the target chip, which verifies the correctness of the design method. And feasibility, the simulation results are shown in Figure 7 . It can be seen from the above figure that the sinusoidal waveform signal generated by the improved NCO has orthogonal characteristics. By mapping the angle to achieve coverage of the entire circumferential interval, it can replace the traditional NCO design method based on the lookup table structure.After compiling and synthesizing the code through Quartus II 13.0 software, the hardware resource information such as the logic unit consumed by the design is obtained . In Table 3 , in the 8-bit data bit and 8-stage pipeline iteration, the NCO based on the lookup table structure uses 275 logic cells, while the NCO based on the CORDIC algorithm uses only 240 logic cells, saving 13.75%. Hardware resources.
According to the data analysis of Table 4 , in the case of 8-bit data bits, the operation error of the traditional CORDIC algorithm can reach 10 , and the improved CORDIC algorithm can improve the accuracy to 10 10 , so this design improves the computational accuracy while reducing hardware resource consumption. 
Conclusion
In this paper, based on FPGA, the NCO module in digital down conversion is improved and designed. The CORDIC algorithm is optimized by using pipeline structure, circumferential angle coverage and increasing data bits. The design is verified by ModelSim. The experimental results show that the design improves the computational accuracy of the system while reducing the hardware resource consumption.
