A simple architecture for direct digital frequency synthesis (DDS) is presented. The proposed architecture uses a sampling-onlyalgorithm (SOA) to achieve a high compression ratio of 558 in realizing the sine function look up table, which is higher than the prior arts.
INTRODUCTION
Direct digital frequency synthesizer (DDS) has the unique feature of fast and fine frequency switching that is required by modern communication systems. The main design tradeoff in realizing DDS is between the ROM size and output SFDR. ROM size grows exponentially as the phase truncation is reduced. Large phase truncation, however, causes large output spurs. In this paper, we introduce an improved Taylor series algorithm to reduce the ROM size efficiently. The error introduced by one's complement operation is compensated by the carry of output adder. Simulation result shows that the output SFDR is mainly determined by the phase truncation, but not data compression.
BASIC IDEA 2.1 Algorithm
Taylor series of sine function is:
where ) 2 sin( ) cos(
So,
Based on the stored sample 'sin(x)' and the slope 'sin(π/2-x)', the sine value of object phase 'x+delta' can be calculated according to (1) . We also observe that the slope does not need to be stored, because the slope of one sample 'sin(x)' is just the value of another sample 'sin(π/2-x)' if samples are evenly distributed in phase range of 0~π/2. The phase 'π/2-x' can be easily found by 2's complement of 'x', provided that the number of divisions is a power of 2. Reference [1] mentioned a similar idea but from a different viewpoint. We also observe that if the stored samples (or slopes) are selected at the center of each phase division ( Fig.  1(a) ), then the approximation (2) is better than the case when samples are selected at the beginning of each phase division, because the approximated 'delta' phase is reduced ( Fig. 1(b) ). 
One's Complement
To further simplify the hardware, we replace all the 2's complements with 1's complements which are used to map the phase of 'sample' to the phase of 'slope'; map the 'object phase' above π/2 into the phase range of 0~π/2 and map the 'residue phase' in phase division to 'delta phase'.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. When mapping sample phase 'X' to slope phase 'π/2-X', there should be no offset error because we have already selected the samples at the center of each phase division.
When mapping object phase 'X' above π/2 into the phase range of 0~π/2, if we assume that 'X' is the mid point of each phase subdivision, and the data from accumulator is the index of 'X' (but not object phase itself), then there is no offset error as well.
When mapping the residue phase 'R' to the delta phase 'D' in each phase division 'P' (Fig. 1(c) and (d)), the phase offset caused by 1's complement has to be considered. In each phase division, we use the lower PL bits of truncated data from the accumulator to represent the index of residue phase 'R'. We also assume that 'R' is the mid point of each phase sub-division. The delta phase 'D' represents the absolute phase difference between the mid point of phase division and the residue phase. The sine value is approximated as the product of D and the slope at the mid point of phase division. If R is above the mid point of phase division, then D is equal to R minus 0.5P, which can be found by simply setting MSB of R to zero and cause -0.5LSB phase offset. If R is below the mid point of phase division, then D is equal to 0.5P minus R, which can be found by 2's complement of R and cause +0.5LSB phase offset. If 2's complement is replaced with 1's complement in order to reduce hardware, the resulting D will have -0.5LSB phase offset.
To accommodate these offsets, the size of the multiplier has to be increased. Fortunately, we can avoid this by adding or subtracting 1LSB to the amplitude output. Using the existing carry of the output adder, this compensation actually does not require any extra hardware.
ARCHITECURE 3.1 General Description
Fig . 2 shows the architecture. The truncated W bits from the accumulator are separated into 3MSB+PH+PL bits. Among these, the three most significant bits are used to generate the full sine wave, the following PH bits are the index to phase divisions in the phase range of 0~π/4 or π/4~π/2, and the least significant PL bits are the index to phase sub-divisions in each phase division mentioned above. The 1 st MSB is used as a sign bit of half sine wave. The 2 nd MSB is XOR'ed (one's complement) with the quarter sine wave from the adder output to generate the half sine wave. The 3 rd MSB is XOR'ed with PH bits to generate the index of quarter sine wave. When the 3 rd MSB is 0, PH bits represent the phases in 0~0.25π range, which are the index to the sampled sine wave data in ROM1, and also the index to the slope data in ROM2. When the 3 rd MSB is 1, the reversed PH bits represent the phases in 0.5π~0.25π range, which are the index to the sampled sine wave data in ROM2, and also the index to the slope data in ROM1. The stored data in ROMs are directly obtained from quantized sine values without any optimization. Therefore, the total ROM size is
PL bits are the index of 'residue' phase X in phase division ( Fig.   1(c~d) ). The width of the phase division is P ( Fig. 1(c) ). D is equal to R-0.5P, which can be represented by PL-1 bits directly. If the PL's MSB is 0, then R is below the mid point of the phase division ( Fig. 1(d) ). D is equal to 0.5P-R, which can be found by reversing the PL-1 bits. Both of the above operation result in -0.5 LSB phase offset error that should be compensated latter. D is then multiplied by constant π to generate the real 'delta' phase in radians. The resulting product is multiplied by the slope and added to the sampled data to approximate the sine amplitude.
Carry Compensation
It is quite simple to compensate for the 0.5LSB phase offset error. When residue phase is above the mid point, the error is -0.5LSB phase times the slope (due to add operation), thus adding 1LSB amplitude to the final output. When it is bellow the mid point, the error is 0.5LSB phase times the slope (due to subtract operation), thus subtracting 1LSB amplitude from final output. The adding and subtracting of 1LSB is through the carry of the adder at the output end of the DDS, which is referred as carry compensation. This carry compensation operation does not consume extra hardware because carry operation is required by subtraction operation originally. Although this compensation is not complete, simulation shows that the resulting error is reduced efficiently.
Hardware Summary
Hardware used in proposed DDS is summarized in table 1. It shows that both the ROM size and the extra blocks are minimized, which are composed of one small hardwired multiplier, one small multiplier and one adder. Table 2 shows the width of data paths for sample designs of 11bit and 12bit DDS. The names of data paths are also shown in architecture (Fig. (2) ). The ROM size is 352 bits (32 X 11). The stored data are just quantized sine values of corresponding phases without any optimization. The design is first modeled with the Cprogram, and then implemented with Verilog at gate level, using TSMC 0.18um CMOS technology and IBM 7hp SiGe technology separately.
DESIGN EXAMPLES 4.1 Implementation

Test Plan
When the least significant bit of the phase accumulator is forced to one, all outputs of the phase accumulator belong to the number theoretic class (
) [4] , regardless of the values of the input control word. Thus only one simulation is needed to determine the value of the worst-case spurious response. The selected control word is 1199104, which means that the output signal frequency is about 1/14 of the clock frequency. (1199104≈2 24 /14).
Simulation Result
The simulated FFT results of the case without and with the carry compensation are shown in Fig. 3(a) and Fig. 3(b) , respectively. The carry compensation improved the SFDR to 81dB. Fig. 4(a) and Fig. 4(b) show the error signal of the case without and with carry compensation (the error signal here is equal to the DDS output value minus ideal sine value). It is shown that the carry compensation reduces the error signal by a factor of 2. The simulation also shows that the proposed DDS can work with clock frequency above 1GHz and 2GHz for TSMC and IBM process separately without any circuit optimization (using standard cell in library and with slow-slow corner condition). The design is now under optimization. 
Comparison
Comparison between the proposed DDS and the state-of-the-art designs is given in Table 3 . It is shown that the ROM size in the proposed DDS is less than half that of the prior designs, and the effect on output SFDR is minimal.
CONCLUSION
The proposed SOA DDS architecture uses a very small lookup table (352 bits) for sine function implementation. The design technique is also simple because the stored data does not require any optimization. Combined with small additional circuits, this architecture is suitable for high speed and low power DDS designs. 
ACKNOWLEDGMENTS
