Southern Illinois University Carbondale

OpenSIUC
Conference Proceedings

Department of Electrical and Computer
Engineering

3-2008

A Low-Power Double-Edge-Triggered Address
Pointer Circuit for FIFO Memory Design
Saravanan Ramamoorthy
Southern Illinois University Carbondale

Haibo Wang
Southern Illinois University Carbondale, haibo@engr.siu.edu

Sarma Vrudhula
Arizona State University at the Tempe Campus

Follow this and additional works at: http://opensiuc.lib.siu.edu/ece_confs
Published in Ramamoorthy, S., Wang, H., & Vrudhula, S. (2008). A low-power double-edgetriggered address pointer circuit for FIFO memory design. 9th International Symposium on Quality
Electronic Design, 2008 (ISQED 2008), 123-126. doi: 10.1109/ISQED.2008.4479711 ©2008
IEEE. Personal use of this material is permitted. However, permission to reprint/republish this
material for advertising or promotional purposes or for creating new collective works for resale or
redistribution to servers or lists, or to reuse any copyrighted component of this work in other works
must be obtained from the IEEE. This material is presented to ensure timely dissemination of
scholarly and technical work. Copyright and all rights therein are retained by authors or by other
copyright holders. All persons copying this information are expected to adhere to the terms and
constraints invoked by each author's copyright. In most cases, these works may not be reposted
without the explicit permission of the copyright holder.
Recommended Citation
Ramamoorthy, Saravanan; Wang, Haibo; and Vrudhula, Sarma, "A Low-Power Double-Edge-Triggered Address Pointer Circuit for
FIFO Memory Design" (2008). Conference Proceedings. Paper 51.
http://opensiuc.lib.siu.edu/ece_confs/51

This Article is brought to you for free and open access by the Department of Electrical and Computer Engineering at OpenSIUC. It has been accepted
for inclusion in Conference Proceedings by an authorized administrator of OpenSIUC. For more information, please contact opensiuc@lib.siu.edu.

9th International Symposium on Quality Electronic Design

A Low-Power Double-Edge-Triggered Address Pointer Circuit for FIFO
Memory Design
Saravanan Ramamoorthy and Haibo Wang

Sarma Vrudhula

Dept. of Electrical and Computer Engineering
Southern Illinois University
Carbondale, IL 62901

Dept. of Computer Science and Engineering
Arizona State University
Tempe, AZ 85281

Abstract

ization. To reduce the pointer clock load, the pointer circuit presented in [1] uses pass-transistors, instead of complementary transmission gates, in the DFF circuit. While
resulting in significant power reduction, this approach potentially suffers from notable speed degradation when low
voltage supply is used. D latches are used in address pointer
circuit presented in [5]. The latches are classified as odd
latches and even latches according to their positions in the
pointer circuit. Odd and even latches fetch data at different clock phases. Thus, a double-edge-triggering (DET)
clock scheme is achieved. Since pass-transistors are also
used in the latch circuits of the above design, it is not suitable for low-voltage application neither.
In this work, we present a novel address pointer design. At each stage of the proposed pointer circuit, only
one transistor is connected to the pointer clock. Thus, the
clock load is dramatically reduced in the proposed circuit.
Unlike most of the previous pointer circuits that use both
clock (clk) and its complementary signal (clk), the proposed circuit only needs a true single-phase clock signal
and hence is immune to circuit racing conditions caused
by clock skew between clk and clk [6]. In addition, the
proposed design uses a DET clock scheme to accommodate the double date rate technique, which is now widely
used in high-throughput system design. Finally, clock gating techniques are presented in the paper to further reduce
power consumption of the address pointer circuit. Experimental results are presented to compare the performance
of the proposed circuit with other designs.
The rest of the paper is organized as follows. Section 2
describes the proposed circuit. Clock gating techniques
for pointer circuits are discussed in Section 3. Experimental results are presented in Section 4 and the paper is
concluded in Section 5.

This paper presents a novel design of address pointer
for FIFO memory circuits. Advantages of the proposed
design include: reduced capacitive load on the pointer
clock path, the use of a true single-phase clock, and doubleedge-triggering clock scheme. The circuit has low power
consumption, is immune to circuit racing conditions and
suitable for high-speed operations. Techniques to implement clock gating in pointer circuit design for further reducing power consumption are also discussed. The proposed circuit is implemented with a 65nm CMOS technology and its performance is compared with previous
pointer circuits.

1 Introduction
First-in first-out (FIFO) memories have been widely
used in modern electronic systems. A high-speed FIFO
is usually implemented using a two-port RAM array (one
port for read operation and the other for write operation)
and two address pointers for tracing the read and write
memory accesses [1, 2, 3, 4]. An address pointer functions
as a token-passing circuit which passes a logic 1 (the token) along its outputs, which control the word-line drivers
or column selection circuits of the RAM array. A straightforward implementation of the address pointer is a cyclic
shift register chain. Since the number of flip-flops in the
shift register increases with the size of the RAM array,
address pointers designed for large RAM arrays normally
occupy large silicon areas and have heavy cumulative capacitive load on the clock signal paths, resulting in large
power consumption and degraded circuit speed.
Previously, several circuit techniques have been proposed to reduce pointer circuit area and the capacitive load
on the clock paths of pointer circuits. D flip-flops (DFFs)
without clear inputs are used in address pointer design [4],
which results in a 17% layout area saving. However, due
to the lack of a global clear input, the address pointer must
go through a length multi-cycle reset operation for initial-

0-7695-3117-2/08 $25.00 © 2008 IEEE
DOI 10.1109/ISQED.2008.129

2 Proposed Design
The proposed address pointer circuit consists of two
types of basic cells, referred to as n-cell and p-cell, respectively. The circuit structures of n-cell and p-cell are shown
in Figure 1. Each type of cell circuit contains three inputs:

123

Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 12:13 from IEEE Xplore. Restrictions apply.

the n-cell is set to 1 on the rising edge of the clock
when the previous cell output is 1.

clock (CK), data (D), and clear (CLR) ports. In addition,
each cell has an output port Q and its complementary output port QB. For an n-cell, its output Q is set to 1 only
when the clock and data input D both are high. Q is reset to 0 when CLR=0. To ensure proper circuit operation
and avoid a large DC current, the pull-up and pull-down
network of the n-cell can never be activated simultaneously. The operation of a p-cell is the exactly reverse to
the operation of the n-cell. The output port Q of the pcell is set to 1 when both clock and data input D are low,
and Q is reset to 0 when CLR=1. Similarly, the pull-up
and pull-down network of the p-cell are never turned on
at the same time. The inverters in the feedback paths of
n-cells and p-cells are weak inverters (to be implemented
by devices with small W/L ratios), which prevent circuit
nodes from floating when both the pull-up and pull-down
networks of n- or p-cells are off. Because of the sporadic
nature of FIFO write and read operations, it is important
to avoid floating circuit nodes (dynamic circuit behaviors)
in address pointers. Otherwise, leakage current may corrupt the logic value on the floating node and cause circuit
malfunction.
QB

CLR
P: 200/50
N: 100/50

200/50

D

Q

5. Whenever a cell output is set to 1, its complementary output will turn off the previous pointer output.
6. The CLR (clear) input of the n-cell in position j is
connected to the complementary output of the ncell in position j + 2. If the n-cell in position j
received a 1 on the jth clock transition, then on the
next rising clock transition, the 1 will appear in cell
j + 2. Hence the complementary output (which is
0) of cell j + 2 resets cell j to 0. Similarly, the CLR
input of the p-cell in position i is connected to the
non-inverting output of the p-cell in position: i + 2.
P4
AND4

AND3

P: 200/50
N: 200/50

To AND2N

From QB5

N-cell

P-cell

N-cell

P-cell

D1 Q1

D2 Q2

D3 Q3

D4 Q 4

QB1

QB 2

QB3

CLR1

CLR2

QB 4

CLR3

CLR4

To CLR2N-1
To CLR2N

From QB5
From Q6

Clock

P: 200/50
N: 100/50

400/50

P3

AND 2

Q

400/50

(a) Connections between n- and p-cells.

CK
CK

QB

.

INI

200/50

100/50
P: 100/250
N:100/250

D
200/50

P2

P1
AND1

N-Cell

CLR

P: 100/250
N: 100/250

Q

P1

P2

P2N

S
R
Q 2N

D1

P-Cell

Q B1
Q2

...... ... .

CLR 2N-1
CLR 2N

Figure 1. Schematic of the proposed n- and
p-cells.

Input of
AND 2N-2

(b) The overall circuit connection.
The connections of n- and p-cells as well as the overall
circuit structure, including the starting circuit are shown
in Figure 2. The key points of this structure are:

Figure 2. The proposed pointer circuit.
The proposed address pointer circuit has a number of
advantages. First, each cell in the proposed design contributes only one gate capacitance to the clock net. Also,
the proposed design uses less number of transistors than
most of the previous designs. It results in a smaller layout
area and, consequently, a shorter clock routing path with
smaller interconnect parasitic capacitance. These factors
will dramatically decrease the capacitive load on the pointer
clock path. Second, the pointer circuit only needs a true
single-phase clock. Thus, it is immune to racing conditions, which makes it particularly suitable for high-speed
design. The circuit avoids the use of single-channel passtransistors and, hence, no threshold voltage loss occurs
during signal propagation, making it attractive in designs
with reduced power supply voltage.

1. All the cells are initialized to 0 before starting operation. (This can be done by a multi-cycle reset operation similar to the one in [4] or adding a global
reset input to all the cells.)
2. A starting circuit provides the 1 to be injected into
the pointer circuit in the first shifting operation. When
the 1 reaches the second cell, the SR latch is reset
and from that point onwards, the D input of the first
cell is logically connected to the output of the last
cell.
3. The data input of a p-cell is connected to the complementary output of the previous n-cell so that the
p-cell is set to 1 on the falling edge of the clock
when the previous cell output is 1.

3 Clock Gating Technique in Pointer Design

4. The data input of a n-cell is connected to the noninverting output of the previous p-cell in order that

Further power reduction for the pointer circuit can be
achieved by partitioning the whole circuit into several blocks

124

Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 12:13 from IEEE Xplore. Restrictions apply.

resistance of a minimum sized NMOS device of the given
technology. Circuit simulations are performed to verify
the function of the proposed circuit and compare its performance with reference designs. A 1V power supply and
500MHz clock signal are used in the simulation. Figure 4
shows the waveforms of the clock signal and the first three
outputs of the proposed pointer circuit. It clearly shows
the DET pointer function is realized by the proposed circuit structure.

and the clock signal is connected only to the block in
which the logic 1 is shifted. This clock gating technique
can be easily implemented by using RS-latches and AND
gates as shown in Figure 3. The clock is fed into a block
only when the output of the corresponding latch is logic 1.
The operation of the clock gating circuit is explained by
the following example. If the output of the Mth RS-latch
is high, the clock signal is connected to the Mth block and
logic 1 is being shifted within this block. When the last
pointer output in this block is set to 1, the Mth RS-latch is
reset and the clock signal is disconnected from this block.
Meanwhile, the (M + 1)th RS-latch is set and clock signal is connected to the (M + 1)th block. Therefore, after
next clock transition, logic 1 is transferred from the Mth
block to the (M + 1)th block. As the pointer circuit is a
cyclic structure, the set port of the first RS-latch is connected to the last pointer output. Before starting operation, all the RS-latches are reset to 0, except for the first
latch, which is set to logic 1. Since both a positive and
a negative edge can trigger shifting operation, the circuit
should be designed carefully to prevent additional transitions from being generated at the output of AND gates.
Thus, switching the clock signal from one block to another
block must be always scheduled during the low period of
the clock. This implies that the first and last cell of each
block should be an n-cell and a p-cell, respectively, resulting in an even number of cells in each block.
The last output of
sub-block M

Set by the
previous
sub-block

The power consumption, clock to output delay, and
power-delay product of the four implementations are also
compared through circuit simulations. For more accurate
comparison, parasitic capacitance on clock routing paths
are estimated and included in circuit simulations. The procedure to estimate the wire load capacitance is briefly discussed as follows. First, according to circuit stick diagrams and design rules of the selected technology we estimate the width of each stage of the four pointer circuits.
Second, we use the clock routing scheme as shown in Figure 5. We partition the 256 cells into 16 groups and each
group contains 16 cells. Cells within a group share a single group clock buffer and a global clock buffer drives the
16 group clock buffers. Third, we assume clock interconnects are twice of the minimum wire width and located in
low-k trenches. To consider the congested routing channels, we assume there are metal layers over and beneath
the clock routing layer. With the above assumptions, the
wire load capacitance are estimated according to capacitor
parameters of the given technology. The estimated capacitor values are listed in Columns 2 and 3 in Table 1. Ref.
design 1 uses the least number of transistors and has the
smallest area. Thus, it has the smallest wire load on its
clock path. On the contrary, Ref. design 3 has the largest
wire load capacitance due to the use of complicated DET
DFFs.
The power, delay and power-delay product are obtained
from simulation and listed in the third, fourth and fifth
columns of Table 1. It shows the proposed circuit has
the smallest power consumption and clock to output delay,
thanks to the reduced overall clock load and avoiding the
use of single-channel pass-transistors. Compared to Ref.
designs 1, 2, and 3, the proposed circuit reduces power
consumption by 15.6%, 65.6%, and 86.3%, respectively.
The clock to output delay is also improved in the pro-

The last output of
sub-block M +1

Block M
Sub -clock

Figure 4. Simulated pointer outputs.

Block M+1
Sub -clock

Q
SR Latch
S
R

Q
SR Latch
S
R

Set next
sub-bloc k

Global Clock Signal

Figure 3. Clock gating technique in pointer
circuit.

4 Experimental Results
To compare the proposed design with other pointer implementations, four 256-bit DET pointer circuits, referred
to as Proposed design, Ref. design 1, Ref. design 2, and
Ref. design 3, are implemented using a 65nm CMOS technology. The proposed design uses the technique discussed
in this paper. The Ref. design 1 is based on the technique
presented in [5]. The Ref. design 2 and 3 are shift register
based implementations with using DET DFFs presented
in [7, 8], respectively. Transistor sizes used in all the designs are selected according to the following principle: the
equivalent resistance of every pull-up or pull-down path
in the circuits is approximately the same as the equivalent

125

Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 12:13 from IEEE Xplore. Restrictions apply.

16 BIT

16 BIT

16 BIT

POINTER

POINTER

POINTER

Buffer

C2

Buffer

C2

Buffer

strate the proposed clock gating technique. Figure 6 shows
a snapshot of the clock waveforms obtained from simulation. The top waveform is the main clock before the
clock gating logic. The second and third waveforms are
the clock signals going to the first and second partitioned
pointer blocks. The fourth waveform is the last output of
the first pointer block and the fifth waveform is the first
output of the second pointer block. Clearly, the pointer
function is not affected by the clock gating scheme. Simulation results show that 51% power reduction can be achieve
by the clock gating technique.

C2

CLK
Buffer

C1

Figure 5. Clock routing path in the pointer
circuit.

posed design by the percentages of 6%, 31%, and 44.7%.
The power delay product is improved by the percentage
of 19.6%, 76.4%, and 92.4% compared to the reference
designs.

Table 1. Circuit performance comparison
with estimated wire load.
Circuits
Prop. Design
Ref. Design 1
Ref. Design 2
Ref. Design 3

C1
(fF)
96
89.6
166.4
244.8

C2
(fF)
6
5.6
10.4
15.3

Power
(µW )
151
179
439
1102

Delay
(ps)
240
256
350
434

PDP
(µW · ps)
36,240
45,056
153,650
478,268

Figure 6. Simulated clock signals and
pointer outputs with clock gating circuits.

5 Conclusions
Simulations are also performed to study circuit performance when a reduced power supply voltage is used. The
obtained clock to output delays at 0.9V and 0.8V power
supply voltage are listed in Table 2. The second and fourth
columns of the table shows the delays of the four designs.
The third and fifth columns list the delay improvement
by using the proposed circuit. For example, the number
9.4%, at the third column and the row corresponding to
Ref. design 1, means that 9.4% delay improvement is obtained by using the proposed design when compared to
using Ref. design 1. At 0.8V power supply, Ref. design
4 fails to operate with a 500MHz clock. This is primarily due to its large parasitic capacitance on the clock path
and the clock outputs from the clock buffers are severely
degraded. The simulation results demonstrate that the proposed circuit is more suitable for low-voltage applications
than the three reference designs.

A novel double-edge-triggered address pointer is developed for FIFO memory design. The proposed design
results in significant reduction on the cumulative capacitive load on the pointer clock path and hence consume
less power consumption compared to previous design, It
uses a true single-phase clock and is immune to circuit racing conditions. The proposed design is suitable for lowvoltage and high-speed applications.

References
[1] M. Hashimoto and M. Nomura, “A 20-ns 256K*4 FIFO memory,”
IEEE Journal Solid-State Circuits, vol. 23, pp. 490–499, 1988.
[2] C. C. Wang, Y. H. Hsueh and Y. W. Chen, “A 6.4 GBPS FIFO DESIGN FOR 8-32 TWO-WAY DATA EXCHANGE BUS,” in IEEE
ISCAS 2002, pp. 772–775, 2002.
[3] L.R. Fenstermaker and K. J. O’Conner, “A Low power generatorbased FIFO using ring pointers and current-mode sensing,” in IEEE
ISSCC Dig. Tech. Papers, pp. 242–243, 1993.
[4] N. Shibata, M. Watanabe and Y.Tanabe, “A Current-Sensed HighSpeed and Low-power FIFO Memory Using a Wordline/BitlineSwapped Dual-Port SRAM Cell,” IEEE Journal Solid-State Circuits, vol. 37, pp. 735–750, 2002.
[5] H. Wang and P. C. Liu, “A doube-edge-triggered address pointer,”
IEE Electronic Letters, vol. 20, pp. 479–488, 1997.
[6] J. M. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated
Circuits A Design Perspective. Prentice Hall, 2003.
[7] R. Hossain, L. D. Wronski and A. Albicki, “Low power design using double edge triggered flip-flops,” IEEE Trans VLSI Systems,
vol. 2, pp. 261–265, 1994.
[8] C. Kim and S. Kang, “A Low-Swing Clock Double-Edge Triggered
Flip-Flop,” IEEE Journal Solid-State Circuits, vol. 37, pp. 648–
652, 2002.

Table 2. Circuit Delay with reduced supply
voltage.
Circuits

Prop. Design
Ref. Design 1
Ref. Design 2
Ref. Design 3

VDD = 0.9V
Delay
Improv.
(ps)
290
320
9.4%
425
31.8%
546
46.9%

VDD = 0.8V
Delay
Improv.
(ps)
377
438
13.9%
555
32.1%
N/A
N/A

Finally, circuit simulations are performed to demon126

Authorized licensed use limited to: Southern Illinois University Carbondale. Downloaded on May 29, 2009 at 12:13 from IEEE Xplore. Restrictions apply.

