This paper describes a new design approach and an architecture for a Direct Digital Frequency Synthesizer (DDFS) based on Least Square (LS) approximation. It is shown that the architecture can be implemented as a lowcost, low-power, feedforward, and easily pipelineable datapath. A prototype IC has been designed and fabricated in TSMC 0.25 um CMOS technology. The IC produces 14-bit sine and cosine outputs with a spurious free dynamic range of 100 dBc. A 32-bit frequency word gives a tuning resolution of 0.0466 Hz at 200 MHz sampling rate.
INTRODUCTION
Direct Digital Frequency Synthesizers (DDFS) [1] is an important component in modern of high-performance communication systems due to their advantageous high frequency resolution, low phase noise and fast frequency switching speed. The DDFS essentially consists of the phase accumulator and phase-to-amplitude converter. Then a Digitalto-Analog Converter (DAC) can be used to output an analog sinusoidal waveform. A low pass filter can then be included to smooth the continuous analog sinusoid waveforms as shown in Fig. 1 . Given a Frequency Control Word (FCW), the DDFS can output sinusoidal waveforms of frequency:
The FCW is an integer ranging from 0 to 2 (L-1) , and the minimum frequency resolution is .
/ min
L CLK f f (2) Traditionally, the phase-to-amplitude converter is implemented with ROM. However the ROM-based design will lead to large area requirement and power consumption for good synthesized signals and frequency resolution. Many techniques [2] have been proposed to reduce the problems of the power and hardware cost. In this paper, we propose a novel ROM-free DDFS design based on LS algorithm [3] . Based on LS algorithm, 4 th order polynomials can be generated to achieve 100dBc Spurious Free Dynamic Range (SFDR) performance. Besides, the proposed DDFS design can be implemented with only 2 squarer circuits and 1 multiplier.
_________________________________________________________________
This work was supported in part by the MediaTek Inc., under NTU-MTK wireless research project.
Hence the DDFS can synthesize superior sinusoidal signal with fewer arithmetic operations. The FCW is 32-bit wide and the sine/cosine output is 14-bit wide to achieve 100dBc SFDR. These properties will make low cost, low power, high frequency resolution and high spectral purity for the DDFS operation. The DDFS has been realized with TSMC 0.25 um 1P5M CMOS technology. The core area is 0.58x0.58 mm 2 
LEAST-SQUARE (LS) APPROXIMATION
LS algorithm [3] can calculate the best-fit curve that has the minimal sum of the deviations squared (LS errors) from a given set of data. Suppose that the data points are (x 1 , y 1 ), (x 2 , y 2 ), …., (x n , y n ), where x is the independent variable and y is the dependent variable. The fitting curve f(x) has error d i from each data point y i . That is,
According to Eq. (3), the best fitting curve can be achieved by minimizing
A general polynomial is one of the most commonly used types of curves in regression. The applications of the method of least squares curve fitting using polynomials are briefly discussed as follows. We take the general case of m th order polynomial as an example. LS algorithm can use m th order polynomial to approximate the given set of data. The following is the complete derivation.
When using an m th order polynomial (4) to approximate the given set of data, (x 1 , y 1 ), (x 2 , y 2 ), ….. , (x n , y n ), where 1 m n , the best fitting curve f(x) has the least square error as,
a 0 , a 1 , …, and a m are unknown coefficients whereas all x i and y i are given. To obtain the least square error, the unknown coefficient a 0 , a 1 , …, and a m can be obtained by setting first derivatives to zero.
Expanding the above equations, we have 
Then, the unknown coefficients a 0 , a 1 , …, and a m can be obtained from the m+1 linear equations. LS algorithm can generate m th order approximation polynomials with the above derivations.
PHASE-TO-AMPLITUDE CONVERTER
Synthesizing sinusoid waves with high spectral purity is one of the major goals for the frequency synthesizer designs. For high spectral purity, a good phase-to-amplitude converter design is the most important issue in DDFS design. In this paper, we adopt a LS algorithm to propose a novel DDFS design. We also use the symmetric properties about sinusoidal signals to reduce the hardware requirement. Besides, the symmetric properties can improve the approximated waveforms. The comparison with Taylor-series [4] can demonstrate its advantage in hardware cost and SFDR performance. 
Symmetry Property of Sine and Cosine
As shown in Fig. 2 , for the synthesized Sine waveform, the smaller phase range can result in better approximation result in LS algorithm. This phase limitation [0, /4] will improve the approximation result over the limitation [0, 2 ] . The approximated sine and cosine waveforms in [0, /4] can be much better than the approximated waveforms in one full period. Therefore, the required of amplitude values to be represented can be reduced to one eighth of the original when the sine and cosine symmetry is utilized. In this small interval, [0, /4], the complexity of the phase-to-amplitude converter can be reduced substantially.
SFDR Performance
In order to express the advantage of the proposed LS approximation we use Fig. 3 to illustrate LS algorithm can achieve the same SFDR by using lower order number than Taylor series. For our expected SFDR=100dBc, LS approximation only requires 4 th order polynomial while the Taylor series requires a 6 th order polynomial to achieve it. We can use fewer multipliers to implement DDFS with higher spectral purity using the LS algorithm. 
SYSTEM ARCHITECTURE

Finite Wordlength Effect
The optimization of DDFS performance involves trading off the finite wordlength and sinusoid computation method against the sine and cosine wave spectral purity and maximum clock rate. Fig. 4 shows a basic block diagram of a DDFS that identifies the three basic sources of noise inherent to all DDFS designs. These noise sources are phase quantization, amplitude quantization and sine/cosine function compression distortion. For this figure, pipelining isn't considered. In this paper, the wordlength of the accumulator, L, is designed as 32-bit for tuning frequency resolution < 1 Hz and the phase-to-amplitude converter is based on a 4 th order LS approximation. We will use the SFDR criterion to decide the value of W and P. From Matlab and FFT simulation, we set the truncated accumulated phase wordlength to W=17 bits and amplitude wordlength to P=14 bits. These hardware parameters can achieve SFDR=100dBc spectral purity performance. Fig. 5 is the hardware architecture of the proposed LS-based DDFS design with 100 dBc SFDR. In order to improve the system clock, we arrange 6 pipeline stages in the DDFS design. The output spectrum after fixed-point and FFT simulation is shown in Fig. 6 . It shows this hardware architecture can achieve 100 dBc SFDR successfully. 
System Design
IMPLEMENTATION RESULTS AND COMPARISION
The proposed DDFS design described was implemented in TSMC 0.25 um 1P5M CMOS technology. The microphotograph of the LS approximation-based DDFS is shown in Fig. 7 . In order to eliminate the factor of different fabrication technology, we adopt the Normalize Index [5] . The Normalized Area is the silicon area normalized to a 1 um technology, as shown below:
The Power Efficiency (Pe), which compares the number of DDFS calculation per MHz is shown in below: 
From the performance table in Table 2 , we can see that the proposed LS-DDFS scheme has pretty good performance according to the normalize index. 
CONCLUSIONS
Based on LS algorithm, we define and realize the LS-based ROM-free DDFS. The LS-based DDFS can avoid the speeddown by memory device and achieve 100dBc spectral purity. In the comparison, the proposed DDFS requires less area and power consumption. For today and future communication application, the proposed LS-based DDFS can meet the needs of portability, cost, power, speed, and spectral purity for modern SOC designs.
