Abstract-with the advantages of high integration, high reliability, fast processing speed and online programming, the field-programmable gate array (FPGA) has become a hotspot of the studies on digital communication systems. In a suppressed carrier digital communication system, the quality of carrier extraction has a profound effect on demodulation performance. The traditional analog Costas loop performs poorly due to the imbalance between In-phase branch and Quadrature branch and the limit of analog circuit such as zero drift. Adopting digital circuit designed by FPGA can balance both In-phase and Quadrature branch, avoid these problems effectively. The realization of traditional Costas loop occupies large resources of FPGA. This paper improves the traditional structure of Costas loop according to the characteristics of FPGA devices, introduces an improved structure of the Costas loop which reduces the number of multipliers and adders and improves operation speed and reliability.
INTRODUCTION
FPGA (Field Programmable Gate Array) is the large-scale programmable logical device which developed in the 1990 of the 20th. With the advantages of high integration, high reliability, fast processing speed and online programming, FPGA has become a hotspot of application in the field of electronic products design. Hardware and software can carry on collaborative design of software and hardware on a chip of FPGA, and makes hardware design flexible as software design [1] [2] [3] . With the development of microelectronics technology and processes, the computing intensive FPGA devices are born, so that it makes a reality that the FPGA device deals with complex operation such as DSP (Digital Signal Process) [4] [5] . In a suppressed carrier digital communication system, the quality of carrier extraction has a profound effect on demodulation performance. The Costas loop is a closedloop automatic adjustment system. The traditional analog Costas loop performs poorly due to the imbalance between Inphase branch and Quadrate branch and the limit of analog circuit such as zero drift and difficult debugs. Adopting digital circuit designed by using FPGA device can avoid these problems effectively. In this paper, all basic units of the carrier synchronization system which NCO, multiplier, accumulator, low pass filter and loop filter are integrated on a single FPGA chip, and that improves the system's degree of integration and its reliability.
II. RELATED WORKS
The key to achieve digital Costas loop is to determine the number of bits of each module, digital loop filter parameters, the system clock frequency and other characteristic quantity. The literature [5] [6] presents the working principle of phaselocked loop and the composition of each part. The Literature [7] [8] [9] analyzed the digital Costas loop dynamic equation and parametric design formula of digital loop filter. In this paper, the author designed the circuit modules of the digital Costas loop on a chip of FPGA by using the theory of the literature [9] [10] . During modules compilation, it is discovered that the traditional structure of the Costas loop occupies the larger FPGA resources, and affects the design and realization of the whole transceiver end of the communication system, so the improved structure is put forward. Using DE2-70 development board as hardware test platform in which integrated the Cyclone II series EP2C35F672C6 chip of American ALTERA Corporation, the test results prove the correctness and effectiveness of the improved digital Costas loop circuit, save the hardware computing resources, improve the operation speed of the circuit.
III. THE STRUCTURE OF IMPROVED COSTAS LOOP

PFD
Loop filter NCO Φ 1 (t) Φ 2 (t) Costas loop (also called in-phase quadrature loop) still uses phase locked loop (PLL) to extract carrier, but the local carrier can be obtained without square operation to the input signal. Costas loop can be summarized as basic phase feedback system according to its structure. Figure 1 is the principle diagram of PLL. The phase-frequency detector (PFD) generates a phase difference between the input and the output signals; this phase difference was sent to loop-filter, voltage controlled oscillator (VCO), and then a new frequency word which generated by VCO feeding back to the PFD.
Figure2 is a popular structure of Costas loop often used in engineering [6] [7] [8] . Up and down two brunches constitute the basic PLL structure respectively. The phase detector is achieved by a combination of a multiplier and a low pass filter. This Paper supported by the National Natural Science Foundation of China, the Youth Science Foundation of China (61501277).
That the two signals are multiplied with initial phase generates a double-frequency signal and a phase difference signal and then the low pass filter remove the double-frequency signal to get the phase difference signal. The phase difference signals from up and down two brunches are multiplied again, and the result is sent to the loop-filter, the output of loop-filter control signal of VCO. VCO generates a local carrier signal to tracking the frequency and phase of carrier from the transmitter. In a digital communication system, the modulation and demodulation signals need to be converted by DA/DA convertor. If data length of the modulated signal is 14bits and the output data length from NCO is also 14bits (refer to the Figure 2 ), the signal data length of multiplier output in phase detector is 28bits, therefore the data length of signal which is sent to filter is also 28 bits. Digital filters are made up of multipliers and accumulators; so the output data length will be close to 50bits, which can occupy large amount of FPGA logic resources and slow down the operation speed.
After repeatedly programming experiment, traditional Costas loop is changed into the new structure that is shown in Figure 3 .This structure saves two branches of the digital low pass filter. The modulated signals go through three multipliers, a digital low pass filter and a digital loop filter, and the results flow to the NCO. The output of the digital loop filter control the NCO generate local carrier signal. This improved Costas loop structure uses only three multipliers and one low pass filter. A low pass filter is saved in the upper and lower branches as shown in figure 2 . If the FIR filter orders are 50, it means that 50 multipliers and 50 adders are saved.
IV. THE CIRCUIT UNITS AND PARAMETERS OF IMPROVED COSTAS LOOP
The hardware test platform uses two DE2 development boards which integrated Cyclone II series EP2C35F672C6 FPGA of Altera Company. These two boards are used as transmitter modulator and receiver demodulator respectively. Each development board is connected to an AD/DA conversion module and uses a cable as the communication channel. The DA convertor is an AD9767 chip of ANALOG DEVICES Company. The precision of DA convertor is 14bits.The AD convertor is an AD9248-65 chip of ANALOG DEVICES Company. The precision of AD convertor is 14bits.
A. The design of numerically controlled oscillator
After the text edit has been completed, the paper is ready for the template. Duplicate the template file by using the Save As command, and use the naming convention prescribed by your conference for the name of your paper. In this newly created file, highlight all of the contents and import your prepared text file. You are now ready to style your paper; use the scroll down window on the left of the MS Word Formatting toolbar. The working principle of NCO is reading the Rom with sine wave data under the drive of the clock signal, and adjusting output frequency and phase according to the frequency word and the phase word. Figure4 is the structure of lookup-table NCO. The frequency of output signal from NCO can be determined by formula:
The phase of output signal from NCO can be determined by formula:
in which , P is the phase word P width is the phase word length. The determination of phase accumulator accuracy, angle resolution, clock frequency, output frequency and other parameters need to be discussed according to actual design requirements. In this design, the NCO's sampling frequency is 50 MHz; the design output frequency is 1 MHZ.
B. The design of filter
According to the response theory of linear system, digital filter can be divided into infinite impulse response (IIR) filter and finite impulse response (FIR) filter. The system function of FIR filters and IIR filters are different, which decided the difference in structures and characteristics between FIR and IIR filters.FIR filters don't have the feedback structure of the output to the input, and IIR filters have the feedback structure. FIR has a strict linear phase characteristic, and IIR cannot achieve linear phase characteristic. This design adopts the FIR filter, and implemented byte FIR compiler IP core provided by ALTER Company. Filter coefficients are generated by using Digital filter design tool from MATLAB software; using dechex function to convert those decimal coefficients to hexadecimal number, and then import those coefficients to the FIR compiler IP core. The bandwidth of the filter is determined by the baseband signal frequency and sampling frequency. In this design, a low pass filter is needed which pass-band is 0 ~ 0.6 MHz, operating frequency is 50 MHz According to rule of FIR filter output data length determination ruler [9] [10] , the output data length is 40 bits under the condition of input data length is 27 bits.
C. The design of loop filter
This design uses a second order loop filter, as shown in figure 5 . The system function of loop filter can be expressed as:
In order to achieve time delay function, a counting signal can be designed and the second brunch data is loaded into a staging area. It is outputted that the addition results of two brunch's data and the initial frequency word in hardware implementation. Additionally, the period of counting signal should be the same as frequency update time of NCO. Loop filter's parameters C 1 and C 2 can be determined by functions [9] [10] :
. In an ideal second order system, the natural angular frequency ω n can be calculated by formula
In this formula, ξ is the damping coefficient, T is the sampling period, K is the loop gain. Usually ξ get 0.707 in engineering designing.
According to phase locked loop theory, normally a phase locked circuit requires
In this design, B L =0.05R b =21Khz, so we can calculate that ω n =31Khz.In order to make sure the loop gain K≈1, we need to adjust the frequency word bits of NCO and the input data bits of the loop filter. Those parameters can be determined by formula
In this design, B loop =40，T dds =40T s , so we can get that K=0.9817≈1 when N=46,then we put parameters ω n , T , K into the formula
to get the loop filter's coefficients C 1 =0.00089≈2^(-10)、 C 2 =0.00000039 ≈ 2^(-21).According to transfer-function of digital second order PPL, the poles of this system are 0.9996 ±0.0004382i, which are limited in the unit circle, the system is stable.
V. THE CIRCUIT IMPLEMENTATION OF IMPROVED COSTAS
LOOP ON FPGA This paper using an Altera Cyclone II EP2C35F672C6 FPGA, the implementation of improved Costas loop circuit is shown in Figure 6 . This circuit is composes of three multipliers and a low pass filter, a 40 bits output signal of the low pass filter was sent to LoopFilter module. LoopFilter module calculates the frequency word data which controls the NCO module to generate local carrier. The implementation of multipliers in Costas loop using hardware multiplier IP core. The bandwidth of FIR low pass filter is designed to ensure that more useful signal pass and filter out noise signal as much as possible.NCO module has the features of high frequency resolution, quick frequency shift, continuous phase change, and output both sine and cosine signal.
VI. CONCLUSION
This system uses an improved structure of Costas loop which reduces the resources of FPGA. Figure 9 ,10 is a resource usage comparative analysis of the improved Costas loop and the traditional Costas loop. The resource occupancy of improved Costas loops present at figure 9; traditional Costas loop is present at figure 10. It indicates that the traditional Costas loop occupies 37% of the total logical elements, including 26% of total combinational functions and 35% of dedicated logic registers. And improved Costas loopoccupies24% of the total logical elements, including20% of total combinational functions and 23% of dedicated logic registers. In this carrier synchronization system all basic units such as NCO, multiplier, accumulator, low pass filter, loop filter are integrated on a single FPGA chip, which improves the system's degree of integration and its reliability. Meanwhile, all parameters and output data width of the system are programmable, which can be used in private communications and other special fields in digital communications.
