Turkish Journal of Electrical Engineering and Computer Sciences
Volume 25

Number 5

Article 43

1-1-2017

A low power memoryless ROM design architecture for a direct
digital frequency synthesizer
SALAH ALKURWY
SAWAL ALI
MD.SHABIUL ISLAM
MD. FAIZUL IDROS

Follow this and additional works at: https://journals.tubitak.gov.tr/elektrik
Part of the Computer Engineering Commons, Computer Sciences Commons, and the Electrical and
Computer Engineering Commons

Recommended Citation
ALKURWY, SALAH; ALI, SAWAL; ISLAM, MD.SHABIUL; and IDROS, MD. FAIZUL (2017) "A low power
memoryless ROM design architecture for a direct digital frequency synthesizer," Turkish Journal of
Electrical Engineering and Computer Sciences: Vol. 25: No. 5, Article 43. https://doi.org/10.3906/
elk-1609-61
Available at: https://journals.tubitak.gov.tr/elektrik/vol25/iss5/43

This Article is brought to you for free and open access by TÜBİTAK Academic Journals. It has been accepted for
inclusion in Turkish Journal of Electrical Engineering and Computer Sciences by an authorized editor of TÜBİTAK
Academic Journals. For more information, please contact academic.publications@tubitak.gov.tr.

Turkish Journal of Electrical Engineering & Computer Sciences
http://journals.tubitak.gov.tr/elektrik/

Turk J Elec Eng & Comp Sci
(2017) 25: 4023 – 4032
c TÜBİTAK
⃝
doi:10.3906/elk-1609-61

Research Article

A low power memoryless ROM design architecture for a direct digital frequency
synthesizer
Salah ALKURWY1,∗, Sawal ALI2 , Shabiul ISLAM3 , Faizul IDROS4
Department of Electronics Engineering, College of Engineering, Diyala University, Diyala, Iraq
2
Department of Electrical, Electronics and System Engineering, Faculty of Engineering,
National University of Malaysia, Bangi, Malaysia
3
Institute of Microengineering and Nanoelectronics (IMEN), National University of Malaysia, Bangi, Malaysia
4
Faculty of Engineering, MARA University of Technology, Shah Alam, Malaysia
1

Received: 06.09.2016

•

Accepted/Published Online: 10.02.2017

•

Final Version: 05.10.2017

Abstract: This paper presents a novel, memoryless, read-only memory (ROM) design architecture for a direct digital
frequency synthesizer (DDFS). A pipelining technique is proposed to increase the phase accumulator (PA) throughput.
However, this technique increases the number of registers as the pipeline stages increase. The shifted clocking technique
is used to reduce the pipelined PA registers. The wave symmetry technique is applied to store (0: π /2) of the sine wave.
The ROM is partitioned into three four-bit sub-ROMs based on the angular decomposition technique and trigonometric
identity. A novel approach of memoryless ROM design technique is proposed and implemented in the design of a 24bit DDFS system that replaces the conventional ROM. Replacing the memoryless sub-ROM circuits, instead of the
conventional 12-bit ROM, reduces power consumption and area dimension. As a result, compared to the conventional
ROM circuit, the values of area dimension and dynamic power are reduced by 15% and 14.8%, respectively.
Key words: Phase accumulator, carry look-ahead adder, direct digital frequency synthesizer, read-only memory

1. Introduction
The sinusoidal waveform generated by a direct digital frequency synthesizer (DDFS) has many advantages
compared to the analogue phase-locked loop. These advantages include high-frequency resolution, high-speed
frequency channels, and high spectral purity. The first direct digital system, designed by Tierney [1], consisted of
a phase accumulator (PA) that generated (0 − 2 π) digital phase values. A read-only memory (ROM) or phaseto-amplitude converter (PAC) is used to provide the sinusoidal amplitude waveform. Then a digital-to-analogue
converter (DAC) with a low-pass filter is used to generate the analogue sinusoidal signal.
Several techniques were used to improve PA performance. Achieving high resolution and high speed
throughput for the DDFS system requires a high-speed PA. Hence, the PA, which would consist of D flip flop
(DFF) registers and adders, taking in consideration the use of fast adders as well as applying the pipelining
techniques as in [2,3], will provide the desired requirements. Several solutions have proposed the pipelining
design technique to improve speed performance and reduce complexity for the PA [4]. However, this technique
has the disadvantage of increasing the number of registers as the pipeline stages increase. Therefore, the shifted
clock technique is proposed in [5] to reduce the number of repetitive registers while preserving high-speed
operation.
∗ Correspondence:

salahalkurwy@engineering.uodiyala.edu.iq

4023

ALKURWY et al./Turk J Elec Eng & Comp Sci

ROM implementation uses an angular decomposition technique, based on the segmentation of the quarter
phase angle (0: π /2) into small blocks, namely coarse and fine ROMs. Trigonometric identities and approximations are used to reduce the required stored data of the suggested blocks. A (0: π /2) phase angle is obtained by
assuming the coarse and fine phase angles. Two segmented angles, coarse (A) and fine (B), based on trigonometric identities with simple multiplication, were used by Sunderland [6] and Hutchinson [7] to provide the
quarter sine amplitude waveform.
In this research, we propose a DDFS system with PA, based on a carry look-ahead adder (CLA). The
shifted clocking technique is used in the PA design to reduce the number of registers. The memoryless logic
gates blocks were designed based on the seven-segment display technique instead of conventional sub-ROM
blocks.
2. Phase accumulator architecture
Pipeline technique is used to increase the throughput of the output frequency. The preskewing D flip-flop
(DFF) registers are used to synchronize the frequency control word (FCW) input with the carry-input in each
pipelining stage. However, this technique increases the number of registers as the pipeline stages increase. A
PA with multiple pipeline stages increases the number of registers, as shown in Figure 1. This increase will
lead to higher power consumption. Therefore, the shifted clocking technique is used to reduce the number of
registers while preserving the high-speed operation.

Figure 1. Conventional 24-bit pipelined phase accumulator design.

The shifted clocking technique uses DFFs to connect each row of pipeline stages. These DFFs are clocked
by the pipelined pulses with one clock cycle (Figure 2) and control the FCW input registers in the stages.
Figure 3 shows the new architecture of the 24-bit PA with the CLA and shifted clock technique.
Considering that N is the PA input bit and that the PA is partitioned into L stages with R flip flops registers
in each stage (Figure 1), the preskewing DFF registers R can be expressed as follows:
R = [N (L + 1)]/2

(1)

Applying the shifted clocking method to the proposed design, the preskewing registers R can be expressed as
4024

ALKURWY et al./Turk J Elec Eng & Comp Sci

Figure 2. Shifted clocking technique.

Figure 3. Block diagram of 24-bit pipeline phase accumulator design, based on eight-bit carry look-ahead adder with
shifted clocking technique.

follows:
R=N +L

(2)

As a result, the number of preskewing registers R decreases by 43.7% (Figure 3).
4025

ALKURWY et al./Turk J Elec Eng & Comp Sci

2.1. The proposed carry-look ahead adder
The adder is the key element of the PA. Therefore, the fast adder improves the performance of the accumulator.
In this PA design, the basic CLA concept is explained as follows: the carry generate can produce the carry-out
function of two input bits when both inputs are equal to 1, regardless of the input carry. The carry propagate
is associated with the propagation of the carry input to generate the carry-out [8]. Therefore, the carry-out
function of the CLA, used in each pipeline stage, can be quickly determined by a value of 0 or 1 at each stage.
Consequently, the result can be achieved more rapidly. The carry-out functions of stage N are obtained from
the following equation:
CN +1 = gN + pN CN ,
(3)
where gN = xN yN and ⊕PN = xN yN are the carry-generate and carry-propagate functions, respectively, and
N are the input bits.
The conventional eight-bit CLA logic circuit required an 80-logic gates circuit, whereas the proposed
adder needed only 47 logic gates. The critical path of the proposed adder was achieved with 7 gate delays, as
shown in Figure 4.
Cout = g7 + p7 [g6 + p6 (g5 + p5 g4 + p5 p4 (g3 + P3 (g3 + P2 (g1 + P1 g0 + P1 p0 Cin ))))]

(4)

Figure 4. Block circuit diagram of eight-bit carry look-ahead adder.

3. ROM look-up table design
The ROM LUT is a storage of memory addresses that is used to assign the phase output to an amplitude sine
wave signal. To achieve a high-accuracy sine waveform for DDFS, a large ROM size is required. Reducing the
ROM size area while maintaining high performance is always the goal of the designers. A simple technique used
for ROM resizing is the quarter-wave symmetry technique, at which one-quarter (0: π /2) of the sine waveform is
4026

ALKURWY et al./Turk J Elec Eng & Comp Sci

stored in the ROM, and the highest two most significant bits (MSB) from the PA output are used to reconstruct
the full sine wave. The phase output is used in the first and third quadrants, whereas the inverted values of the
phase output are needed for the second and fourth quadrants. These inverted values are achieved by using the
two’s complement method when the phase is between (π : 2 π). To meet the design goal of saving power and
keeping less area, counterbalance can be achieved by adding one-half of the least significant bits (LSB) oﬀset
to the stored memory address of sub-ROMs. With this oﬀset, the two’s complement full adder hardware is
removed from the proposed design. The angular decomposition technique, based on the trigonometric identity
technique, is used in the proposed design to reduce ROM size. The ROM is partitioned into three four-bit
sub-ROMs, namely A, B, and C, in a way that A < (π /2), B < (π /2) (1/2 A ), and C < ( π /2) (1/2 A+B ).
The sine wave function, based on the trigonometric relation and the relative sizes of A, B , and C , may
be approximated as follows:
sin (A + B + C) = sin(A + B) + cos A sin C = sin A + CosA sin B + sosA sin C[6]

(5)

cos A values can be obtained by connecting the complement of sin A values and logic high (Vcc) to the XOR
logic gates input [9]. In this way, only one addressing sub-ROM is needed for sin A and cos A values.
The proposed ROM LUT design with an angular decomposition technique and three four-bit ROMs
requires only 368 D flip-flops storing registers (176 + 128 + 80) DFF-bits, for ROM- (A, B, and C), respectively,
with 534.2:1 compressed ratio [9]. Two adders, two multiplier adders, and XOR gates are adopted as additional
hardware equipment to accomplish the ROM circuit design.
The proposed compressed ROM LUT consists of three segments of four-bit sub-ROM blocks (A, B, and
C). The required stored values in these sub-ROMs are calculated as follows:
(
sin A = (2A+B+C − 1) sin

π [0 : (2A − 1)]
2
2A
((

sin B = (2

A+B+C

− 1) sin

)

(
⇒ (212 − 1) sin
B

π [0 : (2 − 1)]
2
2B

)(

1
2A

π [0 : 15]
2 16

)
(6)

))

((

) ( ))
π [0 : 15]
1
⇒ (2 − 1) sin
2 16
16
((
)(
))
C
π [0 : (2 − 1)]
1
A+B+C
sin C = (2
− 1) sin
2
2C
2A+B
12

((
⇒ (212 − 1) sin

π [0 : 15]
2 16

)(

1
256

(7)

))
(8)

The calculated values of sin (A),sin (B), and sin (C) show that each sin C value is limited between two sin B
series values. Similarly, sin B values are limited between two sinA series.
3.1. Proposed memoryless circuit design
The basic seven-segment display operates as follows: BCD counter is used to provide (0–9) numbers and is
required to identify the seven-column line output. The segment display contains seven columns (a, b, c, d, e, f,
4027

ALKURWY et al./Turk J Elec Eng & Comp Sci

and g) that are used to represent all the sections. For instance, a decimal number one can be represented by
two sections (b and c). Therefore, the two sections b and c will be logic 1, and the rest will be logic 0 (0110000).
Based on the explanation of the seven-segment display, the suggested memoryless ROM circuit is designed.
The created values of the (0 : π/2) sine wave (based on Eqs. (6)–(8)) are used as a counter of the combinational
logic circuit, and are listed as binary digits in rows (0:2 A − 1 . The required ROM output bits and the listed
columns in lines ( Xn , Xn−1 , : X0 ) are used to create the logic circuits, according to the binary digits rows, as
shown in Figure 5.

Figure 5. Quarter sine wave bits and ROM output digit bits.

A Karnaugh map is used to simplify the created logic circuits. The created logic circuits must be equal to
the number of listed columns. These circuits were used to provide the desired digital value of the suggested (0:
π /2) sine wave with each clock pulse. Therefore, no memory storages (registers and multiplexers) are needed
to store the ROM values.
The created values of the four-bit sin A logic circuit block are calculated based on (A10 : A0 ) equations,
and expressed as follows:
A10 = X0 + X1 X2
(9)
A9 = X1 X 2 + X0 X3 + X2 (X0 + X 1 X3 ),

(10)

where X3 : X0 are the required sin A output bits along with the listed columns in lines.
The same principle is used for the rest of the other sin A equations. The ( B7 : B0 ) sin B and the
(C3 : C0 ) sin C equations are similarly generated.
The created memoryless ROM of sin (A, B, andC) blocks replace the conventional three sub-ROMs blocks,
and are implemented with the proposed compressed ROM design blocks, as shown in Figure 6.
4. Hardware implementation of high-speed DDFS
The proposed DDFS design system (consisting of a 24-bit PA and the memoryless circuit) was coded in Verilog
hardware description language (HDL) and successfully simulated in ALTERA Quartus II software, a system
4028

ALKURWY et al./Turk J Elec Eng & Comp Sci

Figure 6. Application of novel memoryless ROM blocks of sin A, sin B, and sin C in the proposed 12-bit compressed
ROM.

with an operation speed of 144.22 MHz. Then it was programmed on a Cyclone III FPGA kit board. The
programmed FPGA kit board was connected to the high-speed throughput DAC circuit, using a high-speed
mezzanine card (HSMC) to involve Dual 14-bit with high-speed D/A conversion of 250 mega-samples per second
(MSPS), and verified by waveform and spectrum analyzers. The measured results of the frequency waveform
and the spectrum are consistent with the expected results.
The output frequency (fout ) is mathematically calculated with the equation: fout = ((F CW ×fclk )/2N ),
where N = 24 (for 24-bit PA), FCW = 0EFFFF (in hexa format), and 125 MHz (clock frequency of the Cyclone
III FPGA kit board). The measured output frequency is calculated by fout =

F CW ×fclk
2N

=

0EF F F F ×125×106
224

=

7.324M Hz . The measured output frequency of the sine wave signal, shown in Figure 7, is 7.35 MHz, which
closely matches the calculated value.

Figure 7. Measured image of the sine wave signal for high-speed DFSS.

The measured signal-to-noise ratio (SNR) for DDFS waveform output is 68 dB, as shown in Figure 8,
which is 4 dB less than the average values of 12-bit ROM DDFS output (72 dB). The disadvantage in the result
stems from the angular decomposition technique and the wire connection between the FPGA board and the
spectrum analyzer.
The novel approach of designing the sub-ROM blocks, based on memoryless ROM, is realized by ASIC
using Synopsys software. The comparison results give a benefit percentage of area dimension for sin A ,
4029

ALKURWY et al./Turk J Elec Eng & Comp Sci

Figure 8. Measured SNR for DDFS waveform output.

sin B , and sin C blocks of 69%, 66.5%, and 51.5%, respectively. Replacing the memoryless ROM blocks
of sin(A, BandC) , instead of the conventional sub-ROM blocks, reduces the number of slices and the area
dimension up to 22% and 15%, respectively. In terms of power consumption, the proposed architecture consumes
less power for cell leakage power (18%) and dynamic power (14.8%).
The comparison results of conventional memoryless ROM blocks in a number of cells, nets, and area
dimension of sin (A, BandC) blocks are shown in Table 1.
Table 1. Conventional memoryless ROM comparison.

Sine A
Benefit
Sine B
Benefit
Sine C
Benefit

Conventional ROM
Memoryless ROM
reduction
Conventional ROM
Memoryless ROM
reduction
Conventional ROM
Memoryless ROM
reduction

Cells
77
40
48%
49
31
36.7%
27
21
22%

Nets
83
44
47%
55
35
36.3%
33
25
24%

Area (µm2 )
1716.42
532.22
69%
1280.66
429.10
66.5%
592.09
289.39
51.1%

Table 1 shows that novel memoryless ROM has less area than conventional sub-ROM in 69%, 66.5%, and
51.5%, and a lower number of cells in 48%, 36.7%, and 22% for sin (A, B, and C) blocks, respectively. The
power comparison results show that the dynamic and cell leakage power of memoryless ROM have less area
than conventional sub-ROM in 14.8% and 18%, respectively, as shown in Table 2.
The comparison between ROM size of the present and previous approaches of DDFS works is shown
in Table 3. The comparison was based on the amplitude phase bit, ROM technique, ROM size, spurious-free
dynamic range (SFDR), and additional hardware components. Quarter-wave symmetry and compressed ROM
techniques were applied to all the listed ROM designs in the table.
The results show that the proposed memoryless ROM only has 161 logic gates (76, 57, and 28 logic gates
4030

ALKURWY et al./Turk J Elec Eng & Comp Sci

Table 2. ROM blocks memoryless ROM comparison.

12-bit compressed
ROM
Conventional ROM
Memoryless ROM
Benefit reduction

# Cells

Area (µm2 )

610
520
22%

19,589.16
16,658.61
15%

Cell leakage
power (nW)
917.9225
753.3794
18%

Dynamic
Power (µW)
176.0596
150.0412
14.8%

Table 3. Comparison between present ROM size and previous approaches of DDFS works.

PA

[6]

20

Amp.
phase
(bit)
12

Angular decomposition

ROM
size
(bit)
3328

[10]

28

12

Angular decomposition

832

32

12

2176

NA

[12]

24

12

480

83.6

[13]

24

12

672

80

[14]

32

12

NA

42

[15]

32

18

Quad line approximation
(QLA)
Piecewise-polynomial
approximation
Piecewise-polynomial
approximation
ROM-less
Parabolic polynomial
interpolation

NA

68.6

[16]

32

12

Angular decomposition

368

68

[17]

24

10

NA

45.3

32

9

NA

46

24

12

0

68

Ref.

[11]

[18]
This
work

(bit)

ROM technique

Complementary
dual-phase latch
Piecewise linear
interpolation
Angular decomposition

SFDR

Additional hardware

72
84

2 adders + 2 multipliers
2 adders/subtractors +
2 multipliers
5 adders + 1 MUX +
Complementer
Adders tree + multipliers
DAC (LTC 2624)
2 multipliers + 2 adders
+4 Mux + shift selector
2 multipliers + 2 adders
+ 3 DFF registers
2 ROM + 3 level AND
gate in each stage
8 × 6 and 8 × 3 ROMs +
thermo decoder
2 multipliers + 2 adders + 3 DFF
Reg. + 161 logic gates

*In [16] and the present work, the measured DDFS output waveform is in signal-to-noise ratio (SNR).

for sin (A, BandC), respectively). The proposed work uses a 24-bit pipelined PA to achieve high-frequency
resolution. Quarter-wave symmetry and angular decomposition techniques are used to reduce ROM size.
Novel memoryless ROM is used to replace the conventional sub-ROM blocks for 12-bit PAC. This novel
technique removes all the required registers and multiplexers for the sub-ROM design circuits and replaces them
with few AND, OR, and XOR logic gates.
The average sub-ROM size is reduced by 65.15%, and the complete PAC area is reduced to 15% for the
compressed 12-PAC. Moreover, the dynamic and cell leakage power were reduced to 14.8% and 18%, respectively.
5. Conclusion
This paper presented a memoryless ROM design architecture for DDFS. Replacing the sin (A, B, and C)
memoryless ROM, instead of the conventional sub-ROM blocks of the developed 12-bit compressed ROM, will
save 22% of cells, 15% area dimension, 14.8% dynamic power, and 18% cell leakage power. The presented 24-bit
4031

ALKURWY et al./Turk J Elec Eng & Comp Sci

DDFS design system was coded in Verilog HDL, programmed on Cyclone III FPGA kit board and connected
with DAC. The complete DDFS system was designed and verified with waveform and spectrum analyzers.
The improved performance of the PA and the reduced ROM size provide the present DDFS design with more
flexibility for application in wireless communication systems.
References
[1] Tierney C, Rader M, Gold B. A digital frequency synthesizer. IEEE T Acoust Speech 1971; 19: 48-57.
[2] Jung H, Yoo T, Cho J, Baek H. Pipelined phase accumulator using sequential FCW loading scheme for DDFS.
Electr Lett 2012; 48: 1044-1046.
[3] Kim S, Lee J, Hong Y, Kim E, Baek H. Low-power pipelined phase accumulator with sequential clock gating for
DDFSs. Electr Lett 2013; 49: 1445-1446.
[4] B. Jensen S, Khafaji M, Johansen K, Krozer V, Scheytt C. Twelve-bit 20-GHz reduced size pipeline accumulator
in 0.25 µ m SiGe:C technology for direct digital synthesiser applications. IET Circ Device Syst 2012; 6: 19-27.
[5] Ching Y, Jun W, Hsuan C. A 5.3-GHz 32-bit accumulator designed for direct digital frequency synthesizer. Chinese
Sci Bull 2012; 57: 2480-2487.
[6] Sunderland A, Strauch A, Wharfield S, Peterson T, Cole R. CMOS/SOS frequency synthesizer LSI circuit for spread
spectrum communications. IEEE J Solid-St Circ 1984; 19: 497-506.
[7] Hutchinson J. Contemporary Frequency Synthesis Techniques. New York, NY, USA: IEEE Press, 1975.
[8] Brown S, Vranesic Z. Fundamentals of Digital Logic with VHDL. 2nd ed. New York, NY, USA: McGraw-Hill, 2005.
[9] Alkurwy S, Sawal A, Islam S. Implementation of low power compressed ROM for direct digital frequency synthesizer. In: IEEE 2014 International Conference on Semiconductor Electronics; 27–29 August 2014; Kuala Lumpur,
Malaysia. New York, NY, USA: IEEE. pp. 309-312.
[10] Curticapean F, Niittylahti J. A hardware eﬃcient direct digital frequency synthesizer. In: IEEE 2001 International
Conference on Electronics Circuits and Systems; 2–5 September 2001; Valetta, Malta. New York, NY, USA: IEEE.
pp. 51-54.
[11] Yang BD, Choi JH, Han SH, Kim LS, Yu HK. An 800-MHz low-power direct digital frequency synthesizer with an
on-chip D/A converter. IEEE J Solid-St Circ 2004; 39: 761-774.
[12] De Caro D, Strollo M. High-performance direct digital frequency synthesizers using piecewise-polynomial approximation. IEEE T Circ Syst-I 2005; 52: 324-337.
[13] De Caro D, Petra N, Strollo M. Reducing lookup-table size in direct digital frequency synthesizers using optimized
multipartite table method. IEEE T Circ Syst-I 2008; 55: 2116-2127.
[14] NourEldin M, Yahia M. A novel low-power high-resolution ROM-less DDFS architecture. Int J Adv Res Electron
2013; 2: 990-994.
[15] Hsu H, Wang C. ROM-less DDFS using non-equal division parabolic polynomial interpolation method. In: IEEE
2011 International Symposium on Integrated Circuits; 12–14 December 2011; Singapore, Singapore. New York, NY,
USA: IEEE. pp. 59-62.
[16] Ibrahim S, Sawal A, Islam S. Hardware implementation of 32-bit high-speed direct digital frequency synthesizer.
Sci World J 2014; 131568.
[17] Alonso A, Miyahara M, Matsuzawa A. A novel direct digital frequency synthesizer employing complementary dualphase latch-based architecture. In: IEEE 2015 International Conference on ASIC; 3–6 November 2015; Chengdu,
China. New York, NY, USA: IEEE. pp. 1-4.
[18] Guo X, Wu D, Zhou L, Liu H, Wu J, Liu X. A 4-GHz 32-bit direct digital frequency synthesizer in 0.25 µ m SiGe
HBT with SFDR > 46 dBc up to Nyquist bandwidth. In: IEEE 2016 Bipolar/BiCMOS Circuits and Technology
Meeting; 25–27 September 2016; New Brunswick, NJ, USA. New York, NY, USA: IEEE. pp. 86-89.

4032

