Abstract. Based on the parabolic approximation, which was recently introduced by the authors, a new architecture for sine-output direct digital frequency synthesizers has been developed. Due to using this approximation, and also considering several memory-reduction techniques, the proposed architecture is so designed that needs only 728 bits read-only memory for mapping a 12-bit phase address to 10-bit sine amplitude. The synthesizer has also been implemented and the experimental results show its desired operation and performance.
Introduction
Direct digital frequency synthesizers (DDSs or DDFSs) are increasingly welcomed in modern communication systems and precise electronic systems, due to their significant advantages over phase-locked loop (PLL)-based synthesizers. A simplified block diagram of a sine-output DDS is shown in Fig. 1 .
However there are several methods for phase to sine-amplitude conversion [1] , but due to the difficulties in calculating the sine function, sine computation in DDSs is typically performed by ROM-based lookup tables. In this technique, increasing the number of memory locations and also the length of memory words can improve the frequency and amplitude resolution, but larger ROM size means higher power consumption and lower speed. For this reason, different memory-compression techniques have been reported in the literature in order to decrease ROM size while keeping the frequency resolution and spectral purity undegraded.
As a successful approach, in each instant of time an initial guess for the sine amplitude can be produced by using digital hardware. Then, the difference between the initial guess and the accurate value of the sine amplitude, which has been already stored in the associated memory location, can be used to correct it. If the initial approximation is properly performed, in each memory location just small correcting data will be stored instead of the whole amplitude. It is obvious that the closer approximation to the ideal sine function, the more memory-wordlength shortening.
• The simplest implementation of this idea is to use the value of phase as the initial guess. This method, which is called "sine-phase difference" technique, saves 2 bits of memory word length [2] [3] [4] [5] .
• Another similar work implementing the above idea is a double trigonometric approximation, devised by Yamagishi et al. [6] , which has led to a memory word length reduction of 3 bits.
• The closest initial guess to the target sinusoid is obtained by using parabolic approximation, which has been recently introduced by the authors [7, 8] and saves 4 bits of memory word length. Figure 2 shows the general concept of these three initial guesses.
This paper deals with the development and implementation of a novel architecture for sine-output direct digital frequency synthesizers based on the parabolic approximation.
Sine-Output DDS by Using Parabolic Approximation
As the first step for converting output of the phase accumulator to sine amplitude, the initial guess for the sine function should be generated by digital hardware due to parabolic approximation. In the block diagram of Fig. 1 , output of the M-bit phase accumulator, m, is a binary number between 0 and 2 M − 1, and can be corresponded to the relative phase, φ/2π, as:
If the MSB of m is left out, the remained N = M − 1 bits will also represent a digital sweep, n, with half amplitude and twice frequency compared with m. Now, n can be written in terms of φ as:
As was mentioned in [7, 8] an N -bit parabola generator, with the block diagram shown in Fig. 3 , can produce the required parabolic initial guess represented as:
Then, the error between the generated parabola and the target sinusoid should be corrected by using an error-correcting ROM look-up table. Simplified block diagram along with the more important waveforms of the resulted sine-output direct digital frequency synthesizer, called: Error-Corrected Parabolic DDS (EPDDS), is shown in Fig. 4 .
An M-bit frequency control word determines the frequency of the synthesized waveform. The phase accumulator generates a digital sweep, m. By splitting the MSB of m, which will be considered as the sign bit for the synthesizer output, the remaining bits (n) are considered as the input to the parabola generator and the address to the error-correcting ROM look-up table. Provided that the difference between the initial guess and the desired sinusoid amplitude samples have been already stored in the memory, subtracting the fetched memory words from the generated parabolas corrects the parabolic initial guess to precise sine amplitude (Since the generated parabola is greater than the target sinusoid, the stored data has to be subtracted form the generated parabola). As is shown in Fig. 4 , output of the subtractor looks like a full-wave rectified sinusoid. Now, considering the MSB of m, which is 0 for one sine half period and 1 for the next one, as the sign bit for the output of the subtractor gives a digital sinusoid in signed-magnitude format. This digital sinusoid can be either used in the same form or converted to unsigned binary format by a simple digital format converter block. To have a fine frequency resolution, the DDS designed for this research work has a 32-bit frequency control word. As is common, the 32-bit output of the phase accumulator is truncated to 12 bits in order to avoid large, slow, and power-consuming memory and digital circuits. The parabola generator converts the 11-bit digital sweep to a 22-bit parabola, which is then truncated to 9 bits. At the end of frequency synthesis process, adding the sign bit to the 9-bit sinusoid half periods will result in a 10-bit synthesized digital sinusoid. In order to have a faster architecture, two improvements have been considered in the EPDDS architecture:
1. The subtractor has been substituted with an adder, and hence, the ROM look-up table has to contain the negated approximation error. 2. To increase maximum operating frequency of the synthesizer, the long signal paths have been cut by using two latch stages. This way, it can be said that the system has been pipelined in block level.
In addition, a format-converter block is inserted prior to DAC in order to convert the synthesized output from the signed-magnitude to unsigned format. Also, a spur-reduction technique [9] is considered in designing the phase accumulator.
Memory-Reduction Techniques
In the case of 12-bit phase to 10-bit amplitude mapping, an unreduced implementation of the sine memory requires a 2 12 × 10 bits ROM (40 Kbits). As was described in [7] , the first reduction in the size of ROM look-up table is due to using the parabolic approximation and the need to store the approximation error rather than a sinusoid. This leads to a word-length reduction of 4 bits. As is common, storing only a quarter period of the approximation error instead of a complete period will decrease the number of memory words to one fourth. As an effective method to further reduce the required memory size, the lookup table can be subdivided into coarse and fine memory partitions. Simple coarse/fine partitioning without losing any information will reduce the required total memory bits. In the best case only 2368 bits of memory will be needed, which is less than one half of the quarter-period memory.
In order to have a higher degree of memory reduction, there exists a more-efficient coarse/fine partitioning method, which has been used in Nicholas and Sunderland architectures [9, 10] . In this method, the N -bit memory address, n, is split into 3 partial addresses: a, b, and c. The address, n, is defined as
where the word length of the variable a is A, the word length of b is B, and that of c is C. Then, the coarse ROM address is formed by a and b, and that of the fine ROM is determined by the a and c partial addresses. General concept of this method is depicted in Fig. 5 .
Dr. Vankka et al. [3] has presented a valuable comparison between different techniques, which are used to compress the required memory. They determined by computer simulations that in both modified-Sunderland and modified-Nicholas methods, the optimum partitioning of the ROM address lengths was A = 4, B = 3, and C = 3 in the case of 12-bit phase to 10-bit amplitude mapping, which is our case. These values for A, B, and C result in the required memory size of 1280 bits for both modified-Sunderland and modifiedNicholas architectures. It should be noted that the contents of fine and coarse memories in Nicholas and Sunderland architectures are essentially determined in different ways. Sunderland method is only applicable to memory contents that can be represented by some kinds of trigonometric functions, while Nicholas method can be used for any trigonometric and non-trigonometric function. Hence, to further reduce the error-correcting memory size for EPDDS, it is partitioned by Nicholas method, which leads to the Reduced-memory Errorcorrected Parabolic DDS (REPDDS).
A computer program has been developed to study the required memory size for REPDDS in the case of different values for A, B, and C. In order to find the optimum value for A, B, and C, a trade off between total memory size and maximum spurious levels should be performed. Starting from the smallest-memory case, A, B, C = 2, 3, 5 (410 bits), the first acceptable spurious levels are observed in the case of A, B, C = 4, 3, 3 (768 bits). It is interesting that the obtained fine-and coarse-memory word lengths are still 2 bits smaller than that of commonly-used modified-Nicholas and modified-Sunderland architectures (for 12-bit phase to 10-bit sine amplitude conversion), which is a significant memory compression. Table 1 compares the proposed DDS architecture, REPDDS (with parabolic approximation for the initial guess), with modified-Sunderland architecture, modified-Nicholas architecture (with sine-phase difference method for the initial guess), Yamagishi's architecture (with double-trigonometric approximation for the initial guess). In this comparison, the required ROM in Yamagishi's architecture is more than that of Nicholas architecture, while it utilizes better initial guess. This is because the memory in Yamagishi's architecture was not reduced by memory partitioning methods.
Gate-and Transistor-Level Simulation
The complete REPDDS (excluding its D/A converter) has been designed to be implemented on FPGA. Output waveform, which is obtained from time-domain gatelevel simulation, is shown in Fig. 6 . Also, in order to observe the effect of the error-correcting look-up table, the output of the parabola generator is shown there. While DDS speed is usually limited by the access time of the ROM look-up table used, in the simplest REPDDS design, optimized for the least power consumption, the parabola generator is a speedlimiting block. However, the parabola generator can be pipelined in order to be faster. Pipelined with 4-bit stages, the parabola generator would be no longer the speed-limiting block, and the maximum frequency of operation for the REPDDS will be determined by ROM access time. While the two ROMs dissipate a total power of 370 µW/MHz @ 3.3 V, the parabola generator with parallel and pipelined (with 4-bit stages) structures dissipate only 38.8 and 126.7 µW/MHz @ 3.3 V, respectively. It should be noted that the transistor count for the parallel and pipelined parabola generators are 3782 and 7115, respectively. Thus, using the parabola generator (whether parallel or pipelined), the required ROM size and consequently its power dissipation has been reduced. The only cost for this benefit can be the transistor count and consumed chip area, when the parabola generator is designed for very fast operation.
Compared to RLPDDS architecture, introduced in [11], REPDDS has lower frequency switching latency, is implemented by fewer transistors, and occupies smaller chip area.
Experimental Results
In [8] , the authors used the parabolic approximation to implement a Parabolic DDS (PDDS) capable of synthesizing quasi-sinusoid waveforms. In this paper, in order to evaluate the operation of REPDDS, it has been implemented on FPGA. Then, the FPGA and a D/A conversion module were used to build a test board. The block diagram of Fig. 7 illustrates the DDS test system.
The output of REPDDS in time domain and its frequency-switching behavior are shown in Figs. 8 and 9, respectively. In addition, Fig. 10 . shows a comparison between the generated parabola and REPDDS output in order to observe the role of error-correcting lookup table. Power spectrum of REPDDS output at low frequencies can be seen in Fig. 11 . As was Fig. 11 . Specifications of the implemented REPDDS are given in Table 2 . As an integrated circuit designer Dr. Sodagar worked from 1994 to 1995 in IUST Integrated Circuits Laboratory, and during 1997-1998 he was working with VLSI Circuits & Systems Laboratory at the University of Tehran, and also from 1998 to 2000 he was with EMAD Semicon Company as a senior design engineer. His professional experience and research interests include phase-locked loops (PLLs), PLL-based and direct digital frequency synthesizers, voltage references and regulators, analog front-ends for telemetrypowering applications, and implantable electrical nerve stimulators.
Also, he has been with S. Rajaee University, Tehran, Iran from 1992 to 2000 as a lecturer, and since 2000 as an assistant professor. He was known there as the distinguished faculty member for 1998-1999 and 1999-2000 academic years.
So far, Dr. Sodagar has published 11 conference papers, 4 journal papers, and authored one book. 
G. Roientan Lahiji

