Abstract-A low-power quadrature direct digital frequency synthesizer (DDFS) is presented. Piecewise linear approximation is used to avoid using a ROM look-up table to store the sine values in a conventional DDFS. Significant saving in power consumption, due to the elimination of the ROM, renders the design more suitable for portable wireless communication applications. To demonstrate the proposed technique, a quadrature DDFS has been implemented using 0.5-m CMOS process and occupies an active area of 1.4 mm 2 . It consumes 8 mW at 100 MHz and operates from a single 2.7-V supply. The spurious-free dynamic range is better than 59 dBc at low synthesized frequencies and the frequency resolution is 1.5 kHz.
I. INTRODUCTION

I
N MODERN wireless communication systems, fast frequency switching with fine frequency steps is crucial. An example of such systems is Bluetooth, where the signal modulation is Gaussian frequency shift keying (GFSK) with about 160-kHz frequency deviation. The traditional phase-locked loop (PLL)-based synthesizer is not suitable in these applications due to the inherent loop delay. Another limitation of a PLL is the small range of frequency locking and the limited frequency resolution. The open-loop voltage-controlled oscillator (VCO) is also not suitable due to the limited control on the output frequency. Conventional ROM-based direct digital frequency synthesizers (DDFSs), as shown in Fig. 1 , are able to meet the above requirements by storing the values of the sine function in a ROM and scanning these values at a rate proportional to the desired frequency. The digital ROM output is converted to analog using a digital-to-analog converter (DAC). The main factors that determine the signal purity in this architecture are: 1) the phase quantization due to finite resolution of the phase accumulator; 2) amplitude quantization noise due to finite resolution of the DAC; and 3) static and dynamic nonidealities of the DAC. The ROM size is exponentially proportional to the desired phase resolution, resulting in a huge area consumed by the ROM for reasonable phase resolutions. Moreover, the ROM should be addressed at a much higher rate than the desired output frequency for moderate spectral purities. Scanning the ROM at high speed makes it power hungry and, thus, unsuitable for portable wireless applications.
Several attempts have been made to reduce the ROM size by using various techniques. The first category of solutions is based on trigonometric identities, the simplest of which is the quarter-wave symmetry in the sine function (see Fig. 1 ). Other trigonometric formulas have been used to split a large ROM into two smaller coarse and fine ROMs [1] . The second category of solutions approximates the sine function over the first quarter period by another function that can be easily implemented, as illustrated in Fig. 2 . In this implementation, a ROM look-up table is used to store the error, , which results in less memory wordlength requirements. The simplest form is the sine-phase difference method [2] that uses a straight-line approximation for the sine function, i.e.,
. In this scheme, two bits of memory wordlength are saved. Parabolic approximation has been also introduced [3] , which results in saving four bits of memory wordlength. However, the above techniques still consume considerable area and power, and hence, are not suitable for low-cost portable applications. The third category of solutions is to use a combination of a small ROM to store a few sample points and a linear interpolation between these points for full computation of the generated sine function. It has been shown that this technique is efficient and the hardware cost required for the additional calculations is shown to be lower than the first two categories [4] .
Another approach was adopted in [5] to avoid using ROM by using a nonlinear DAC, as shown in Fig. 3 . In this architecture, the nonlinear DAC is used to achieve the function of the phase-to-amplitude (P/A) conversion and the digital-to-analog conversion at the same time. This approach was shown to provide considerable area and power savings compared to the conventional approach because the ROM is removed. There were two options for the DAC implementation. The first implementation is based on resistive string, which consumes less power but is inherently slow ( MHz) and occupies significant area (1.7 mm 1.7 mm in 0.5-m technology) due to the large number of resistors and transistors used. The second implementation of the nonlinear DAC is using current-mode techniques to enhance the speed at the expense of power consumption (92 mW at MHz). To further reduce the power consumption and die area, a technique was proposed in [6] to split the nonlinear DAC into a coarse DAC and a fine DAC.
In this paper, a different approach is used to implement ROM-less DDFS based on piecewise linear approximation of the sine function, as will be presented in Section II. The proposed architecture is shown to have significant area and 0018-9200/02$17.00 © 2002 IEEE power savings at high clock rates. Design considerations of the building blocks will be discussed in Sections II-IV. Testing results of the DDFS, as well as a comparison with recently published work, will be presented in Section V. Finally, concluding remarks are drawn in Section VI.
II. PROPOSED DDFS ARCHITECTURE
The proposed architecture is based on the idea of breaking the sine function into linear segments as shown in Fig. 4 , where four segments are shown for the purpose of illustration. For a given number of segments, the segments' slopes ( ) are chosen to minimize the integrated mean square error between the ideal and the approximate piecewise linear curves. In order to simplify the implementation of such approximation, the number of segments is chosen to be in powers of 2. The points are selected to be Fig. 4 . Sine-wave approximation. equally spaced to further simplify the design. A MATLAB code is developed to determine the optimum set of slope values, i.e., given the number of piecewise segments, the slope values yielding minimum mean square error (MMSE) are determined, where MMSE is expressed as
The piecewise linear function is implemented using the block diagram shown in Fig. 5 . The most significant bits (MSBs) of the phase-accumulator output are used in the P/A converter. The P/A converter consists of the complementor, the linear DAC, and the switched weighted-sum (SWS) blocks. The first two MSBs of the accumulator are used to select the quarter in the sine-wave cycle. The next bits are fed into the complementor, whose output is then split into two parts, and , where is the MSB part ( bits long), and corresponds to the segment number, and is the least-significant-bit (LSB) part ( bits long) which is applied to the input to the DAC. For a given phase resolution , the proposed architecture uses an -bits linear DAC. Whereas, in the architecture presented in [5] , an -bits nonlinear DAC is needed. Reducing the DAC bits saves significant area and promotes faster operation. The output of the SWS block is given by for for (2) where is the linear DAC output in the range of to , and is an enable digital signal that corresponds to the positive/negative half cycle of the sine function. The SWS block in each branch implements half of the sine-wave cycle, and, hence, the differential output voltage is a full sine wave.
The frequency of the output sine wave is given by (3) where is the digital accumulator input, is the number of bits in the accumulator, is the input clock frequency, and is the minimum synthesized frequency (frequency resolution) that can be obtained. In the proposed design, and MHz yielding a frequency resolution of 1.5 kHz. To allow a high clock rate, a four-stage carry-look-ahead adder (4 bits/stage) is used in the accumulator. Sources of distortion in the above architecture are the limited number of segments ( ), phase resolution ( ), and slope resolution. The effect of each of these parameters on the performance is simulated using MATLAB. Fig. 6 shows the effect of the number of segments on the spurious-free dynamic range (SFDR). Note that the SFDR improves by 12 dB by doubling the number of segments. The SFDR is 59 and 71 dBc for eight ( ) and 16 ( ) segments, respectively. In Bluetooth, the transmitter spurious emissions should be less than 20 dBc to 40 dBc at 2 MHz and 40 dBc to 60 dBc at 3 MHz, depending on the transmitter power class. Since the 16-segments DDFS consumes about 50% more power and area than the eight-segments design, as will be shown at the end of Section IV, the eight-segments design is adopted. A passive RC pole can be used at the DDFS output to attenuate the spurs at frequencies far from the fundamental output. For eight segments, the effect of the finite phase resolution is shown in Fig. 7 . It shows that no significant improvement in the SFDR is achieved for phase resolutions of more than 10 bits. For this choice of 
and
, only a 5-bit DAC ( ) is needed.
The weighted-sum function is implemented using resistive dividers (as will be discussed later). Each resistance is an integer multiple of a unit resistance , which determines the slope resolution. The effect of finite unit resistance is shown in Fig. 8 for eight segments and 10 bits of phase resolution. Note that for a normalized unit resistance ( ) less than 0.4%, there is not much improvement in SFDR. To allow for some margin, is a good choice for an eight-segment sine shape. Extra care must be taken in the layout of these resistors to achieve the required resolution. Table I summarizes the required values of phase resolution , normalized unit resistance, and the corresponding SFDR for different number of segments. Note that doubling the number of segments, i.e., increasing by 1, requires a 2-bit increase in the phase resolution. This implies doubling the size of the linear DAC, i.e., increasing by 1.
Quadrature outputs are generated by replicating the P/A converter. The 90 phase shift is implemented by adding 01 to the two MSBs of the accumulator output. 
III. LINEAR DAC
The 5-bit resistive string DAC is implemented as shown in Fig. 9 . In order to have smoother transitions at the corner points between segments in the output sine wave, a 1/2 LSB offset is introduced to the DAC output by using value for the lower and uppermost resistors. A 5-to-32 decoder is used to turn ON the switch corresponding to the digital input . At any given time, only one switch must be turned ON. Due to nonequal delays at the decoder outputs, more than one switch can be turned ON at the same time or all the switches can be turned OFF. This will result in undesirable glitches at the output of the DAC. To solve this problem, the outputs of the decoder are sampled at . The settling time for each digital input word is determined by three factors.
1) The total capacitance at the output node , which is dominated by the drain capacitances of all the switches and the input capacitance of the next stage.
2) The input resistance of the resistive string seen from the corresponding node (m). The closer the node to the middle of the resistor string, the higher the resistance seen. The value of is chosen as a tradeoff between power consumption and settling time. For this particular design, is found to be a good compromise. 3) The ON resistance of the corresponding switch . In order to minimize the load capacitance , different transistor widths are used to have the same settling time for all digital input combinations. To allow for large overdrive voltage in the nMOS switches, the terminal voltages and of the resistor string are chosen to be 0 and 0.5 V, respectively.
IV. SWITCHED WEIGHTED-SUM BLOCK
Two SWS blocks are used in the system, one for each half of the sine-wave cycle. The details of the SWS block are shown in Fig. 10(a) . The output of the SWS block can be written as (4) The role of the analog demultiplexer is to route , , or to based on the value of , corresponding to the segment number, and enable as follows:
for and for and for and for
The analog demultiplexer consists of cells, one for each output. The basic analog demultiplexer cell for each output for , is illustrated in Fig. 10(b) . The weighted-sum function described in (4) is implemented using resistors and buffers as shown in Fig. 11 . Assuming ideal buffers (unity gain and zero output resistance), the weighted-sum output is given by (6) where is the parallel combination of all resistors in the weighted-sum network. Comparing (4) and (6) (7) which is used to obtain the values of from the segments' slopes. Note from (6) that depends on the ratio of resistors, hence, the matching of the resistors in the weighted-sum block in each branch is a critical issue. On the other hand, matching between the resistors of the weighted-sum blocks in both positive and negative branches is not critical.
The buffers are the most power-consuming parts in the proposed system. Care must be taken in the design of these buffers to minimize the overall power consumption while keeping good linearity. Fig. 11 also shows the transistor-level design of each buffer. PMOS input transistors are used to eliminate the body effect by shorting the source and body terminals (n-well technology is used). Another advantage of using pMOS input transistors is that the input voltage can be as low as 0 V. Since this voltage is routed from the DAC resistor string to the buffer input using MOS switches (in the DAC and the analog demultiplexer), then fast nMOS switches can be used with high overdrive voltage. This helps to minimize the delay associated with these switches without having to use wide transistors. Note that the output voltage of each buffer is dc shifted by relative to its input. Since all the buffers are biased using the same biasing voltages and , they all have the same nominal dc shift. However, due to mismatches between transistors in different buffers, this dc shift may differ from one buffer to the other. These dc-shift mismatches will result in a slight offset in the output sine wave, but will not affect its spectral purity.
Since buffers have different load resistances, the actual relative slope values of the segmented sine wave will be slightly different from (7) if the buffers are identical due to the finite buffer-output resistance. One way to dilute this effect is to reduce the buffers' output resistance by increasing the bias current or increasing of (Fig. 11 ), which will increase the power consumption or increase the load capacitance to the analog demultiplexer, respectively. Instead, to have the same loading effect on all buffers, each buffer is designed such that the ratio of its output resistance to the load resistance is the same for all buffers. To allow better matching between buffers, each buffer is designed as a number of parallel buffer subcells. These subcells are identical in all buffers, but the number of parallel subcells in each buffer is inversely proportional to the desired output resistance. As a result, the buffers' output resistances will account for a slight attenuation, but the relative slope values of the segments will remain unchanged. Consequently, the total current drain in all buffers will be inversely proportional to the parallel combination of all resistors in the weighted-sum network. Hence, scaling of all resistors is very important in determining the overall power consumption. However, the higher the value of , the slower the response of the circuit due to the large time constant , where is the input capacitance of the following stage. The settling error within the clock period is given by settling error
The largest error corresponds to the largest output voltage step V. This largest error should be set to be smaller than the smallest output step, which corresponds to the smallest slope. For the particular case when , the smallest slope value (obtained from the MMSE algorithm implemented in MATLAB) is 0.096 for a sine wave of unit amplitude. Therefore, the smallest output step is smallest output step DAC step smallest slope mV
By fixing the settling error in (8) smaller than the above value, we get (10)
The value of determines the integrated output noise due to resistors, known as noise. This noise should be sufficiently smaller than the largest output spur, which in this case is 59 dB below the fundamental tone. If we set the noise level to be lower than this spur, then (11) where is the Boltzmann constant, is the temperature (assumed 300 K), and is the output amplitude (0.5 V).
In the case of eight-segments approximation, the above equation yields pF. This is rather a loose condition for this capacitance, which is typically larger than 0.1 pF. In this design, 0.3 pF is assumed for the load capacitance, for which the maximum parallel resistance is 5.7 k , as given by (10) for MHz. To allow for some margin for process variations, all resistances are scaled to have a nominal parallel combination of 3 k . The lowest and highest resistors in the weighted sum network are 15 and 150 k , respectively.
To get some idea of the increase in power consumption and area for 16 segments ( ), let us apply the same analysis. For 16 segments, the smallest slope is 0.05 and the targeted SFDR is 71 dB. According to Table I, the minimum phase resolution for 71-dB SFDR is 12 bits. Hence, the number of DAC bits is . From (11), we get pF. If we take pF (this may require adding a physical capacitance), then the parallel combined resistance will be k . If we choose k , this design will consume about 50% more power compared to the eight-segments design. The buffer area will roughly increase by the same percent.
V. TESTING RESULTS
The proposed quadrature ( and ) output DDFS has been implemented through MOSIS in 0.5-m AMI CMOS process. The die photo is shown in Fig. 12 . The chip active area is 1.4 mm , of which 25% is occupied by the phase accumulator. On-chip buffers are included to drive the pin capacitance. The DDFS operates from a single 2.7-V supply with 3-mA current drain and a clock frequency of 100 MHz. The testing setup incorporates an off-chip instrumentation amplifier for differential-to-single-ended conversion. It uses low-distortion (total harmonic distortion (THD) dB at 1 MHz) buffer amplifiers with 1-pF input capacitance. Fig. 13 shows the single-ended outputs ( and ) of the branch as well as the differential output when kHz. Fig. 14 shows the two quadrature outputs of the and branches at the same output frequency. The peak-to-peak magnitude is about 910 mV, which is slightly less than the ideal 1-V magnitude due to the expected attenuation of the weighted-sum and the output buffers. The modulation capabilities of the proposed DDFS have been also tested. Fig. 15 shows an example of frequency modulation where the modulating signal is a square wave of frequency 1 kHz. An example of amplitude modulation is shown in Fig. 16 . The modulating signal is applied at (see Fig. 9 ). It is a sine wave of frequency 1 kHz and peak-to-peak magnitude of 400 mV with a dc offset of 250 mV.
The output spectrum for kHz and MHz, at which the SFDR is The SFDR versus the output synthesized frequency is plotted in Fig. 19 . The SFDR is better than 59 dBc for low synthesized frequencies. For high synthesized frequencies, the SFDR is degraded due to large output steps of the DAC and the switched weighted-sum blocks. Fig. 20 shows the SFDR versus the clock frequency for . The DDFS is shown to operate from a clock frequency up to 130 MHz. Higher clock frequen- cies could not be achieved due to the frequency limitations of the phase accumulator.
The proposed DDFS is compared with the recently reported synthesizers [4] - [6] as listed in Table II . Note that the DDFS presented in [4] does not have an on-chip DAC, and the DDFS presented in [6] does not have quadrature outputs. The energy of the proposed DDFS (in milliwatts per megahertz) is significantly lower than state-of-the-art implementations due to the removal of the ROM and the small DAC size. Therefore, the proposed design is more suitable for low-power portable applications. A better SFDR can be achieved using the proposed architecture by doubling the number of segments ( ). This can increase the SFDR by 12 dB at the expense of increasing the power consumption and active area by roughly 50% and 40%, respectively.
VI. CONCLUSION
A low-power ROM-less quadrature DDFS architecture has been presented. It uses a piecewise linear approximation of the sine function. The proposed DDFS has been implemented in 0.5-m CMOS technology and occupies an area of 1.4 mm . A 16-bit frequency control word results in a tuning resolution of 1.5 kHz at a 100-MHz clock frequency. The proposed design operates from a single 2.7-V supply while consuming 8 mW. The design features an SFDR that is better than 50 dBc for synthesized frequencies up to . The proposed architecture also incorporates different modulation capabilities. The modulation formats include frequency modulation and amplitude modulation. Since the proposed design consumes significantly less power than other recently reported designs, it is a good candidate for wireless portable communication applications that use frequency modulation such as Bluetooth and GSM. For applications that require higher SFDR, the number of linear segments can be increased at the expense of area and power consumption.
