International Journal of Electronics and Electical Engineering
Volume 2

Issue 2

Article 2

October 2013

A HIGH-PERFORMANCE AND LOW-POWER DELAY BUFFER
GOPALA KRISHNA.M
Dept. of ECE, MIC College of Technology, Vijayawada, AP, India, gopal.ece16@gmail.com

UMA SANKAR.CH
Dept. of ECE, Universal College of Engineering, Guntur, AP, India, sankarch.4u@gmail.com

NEELIMA. S
Dept. of ECE, Gandhiji Institute of Science & Technology, Jagaaiahpet, AP, India,
neelima.sayana@gmail.com

KOTESWARA RAO.P
Dept. of ECE, Madhira Institute of Technology & Sciences, Kodad, AP, India, hodecemitsg7@gmail.com

Follow this and additional works at: https://www.interscience.in/ijeee
Part of the Power and Energy Commons

Recommended Citation
KRISHNA.M, GOPALA; SANKAR.CH, UMA; S, NEELIMA.; and RAO.P, KOTESWARA (2013) "A HIGHPERFORMANCE AND LOW-POWER DELAY BUFFER," International Journal of Electronics and Electical
Engineering: Vol. 2 : Iss. 2 , Article 2.
DOI: 10.47893/IJEEE.2013.1072
Available at: https://www.interscience.in/ijeee/vol2/iss2/2

This Article is brought to you for free and open access by the Interscience Journals at Interscience Research
Network. It has been accepted for inclusion in International Journal of Electronics and Electical Engineering by an
authorized editor of Interscience Research Network. For more information, please contact
sritampatnaik@gmail.com.

A High-Performance And Low-Power Delay Buffer

A HIGH-PERFORMANCE AND LOW-POWER DELAY BUFFER
1

GOPALA KRISHNA.M, 2UMA SANKAR.CH, 3NEELIMA.S & 4KOTESWARA RAO.P
1

Assistant Professor, Dept. of ECE, MIC College of Technology, Vijayawada, AP, India
Assistant Professor, Dept. of ECE, Universal College of Engineering, Guntur, AP, India
3
Assistant Professor, Dept. of ECE, Gandhiji Institute of Science & Technology, Jagaaiahpet, AP, India
4
Associate Professor, Dept. of ECE, Madhira Institute of Technology & Sciences, Kodad, AP, India
Email:gopal.ece16@gmail.com, sankarch.4u@gmail.com, neelima.sayana@gmail.com, hodecemitsg7@gmail.com
2

Abstract: In this paper, presents circuit design of a low-power delay buffer. The proposed delay buffer uses several new
techniques to reduce its power consumption. Since delay buffers are accessed sequentially, it adopts a ring-counter
addressing scheme. In the ring counter, double-edge-triggered (DET) flip-flops are utilized to reduce the operating frequency
by half and the C-element gated-clock strategy is proposed. Both total transistor count and the number of clocked transistors
are significantly reduced to improve power consumption and speed in the flip-flop. The number of transistors is reduced by
56%-60% and the Area-Speed-Power product is reduced by 56%-63% compared to other double edge triggered flip-flops.
This design is suitable for high-speed, low-power CMOS VLSI design applications.
Keywords: C-element, delay buffer, first-in–first-out (FIFO), gated-clock, ring-counter.

output driver of the memory part in the delay buffer.
In a delay buffer based on the SRAM cell array such
as the one in [6], the read/write circuitry is through
the bit lines that work as data buses. In the proposed
new delay buffer, we use a tree hierarchy for the
read/write circuitry of the memory module. For the
write circuitry, in each level of the driver tree, only
one driver along the path leading to the addressed
memory word is activated. Similarly, a tree of
multiplexers and gated drivers comprise the read
circuitry for the proposed delay buffer. Simulation
results show the effectiveness of the above techniques
in power reduction. As an example, a 256 x 8 delay
buffer chip is designed and fabricated. Measured
results indicate its much better power performance
than the same-size delay buffer based on existing
commercial SRAM.
The rest of this paper is organized as follows. Section
II first introduces the conventional architecture for
implementing delay buffers. Next, the proposed delay
buffer using the new ring counter and gated driver
trees for the read and write circuits of the memory
module is described in Section III. Section IV then
presents experimental results of the new delay buffer.
Also, comparison in power and area of the new delay
buffer with conventional SRAM-based delay buffers
are given. Section V then concludes this paper.

1. INTRODUCTION
The latest advances in mobile battery-powered
devices such as the Personal Digital Assistants and
mobile phones have set new goals in digital VLSI
design. These goals include the need for high-speed
digital circuits at low power consumption. Flip-flops
and latches are used as the storage elements in a
clocking system. A careful design of storage elements
will contribute in the increased performance and
reduced power consumption of a VLSI system. The
former approach is convenient since SRAM
compilers are readily available and they are
optimized to generate memory modules with low
power consumption and high operation speed with a
compact cell size. The latter approach is also
convenient since shift register can be easily
synthesized, though it may consume much power due
to unnecessary data movement.
Since the ring counter is made up of an array
of D-type flip-flops (DFFs) triggered by a global
clock signal and all except one DFFs have a value of
“0,” it is possible to disable the clock signal to most
DFFs. Such a gated-clock ring counter is
implemented in [6] to compose a low-power first-in–
first-out (FIFO) memory. In this paper, we propose to
use double-edge-triggered (DET) flip-flops instead of
traditional DFFs in the ring counter to halve the
operating clock frequency. A novel approach using
the C-elements instead of the R–S flip-flops in the
control logic for generating the clock-gating signals is
adopted to avoid increasing the loading of the global
clock signal. In addition to gating the clock signal
going to the DET flip-flops in the ring counter, we
also proposed to gate the drivers in the clock tree.
The technique will greatly decrease the loading on
distribution network of the clock signal for the ring
counter and thus the overall power consumption. The
same technique is applied to the input driver and

2. CONVENTIONAL DELAY BUFFERS
The simplest way to implement a delay buffer is to
use shift registers as shown in Fig. 1. If the buffer
length is and the word-length is N, then a total of Nb,
DFFs are required, and it can be quite large if a
standard cell for DFF is used. In addition, this
approach can consume huge amount of power since
on the average Nb/2 binary signals make transitions in
every clock cycle. As a result, this implementation is

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT):2231 –5284 Vol-2 Issue-2
78

A High-Performance And Low-Power Delay Buffer

usually used in short delay buffers, where area and
power are of less concern.

out. This method, known as the pointer-based scheme
[5], is illustrated in Fig. 2. The bottom row of D-type
flip-flops is initialized with only one “1” (the active
cell) and all the other DFFs are kept at “0.” When a
clock edge triggers the DFFs, this “1” signal is
propagated forward. Consequently, the traditional
binary address decoder can be replaced by this
“unary-coded” ring counter. Compared to the shift
register delay buffers, this approach propagates only
one “1” in the ring counter instead of propagating N
b-bit words. Obviously, with much less data
transitions, the pointer-based delay buffers can save a
lot of power. As shown in Fig. 3, when the input of
the first DFF in a block is asserted, it sets the output
of the R–S flip-flop to “1” at the next clock edge.
Thus, the incoming “1” can be trapped in that block
and continue to propagate inside the block. On the
other hand, the successful propagation of “1” to the
first DFF in the next block can henceforth shut down
the unnecessary clock signal in the current block.

Figure 1: Delay buffer implemented by shift registers

A. Propagation Delay Expression and Minimum
Delay Buffer Insertion for RC Interconnects:
The propagation delay of an interconnect of
length with buffers can be modeled as the sum of
delays of the individual inverter-interconnect
segments, as shown in Fig. 2. Further, the delay of
each inverter-interconnect segment is typically
modeled as the sum of delays of the inverter tinv and
the interconnect twire [4].

Figure 2: Buffer inserted RC distributed interconnect of length
l with m buffers of size W
Figure 3:Ring counter with clock gated by R–S flip-flop

The buffer delay expressions presented are for a
rising input signal transition. Delay for a falling input
can be obtained using similar analysis. Devices are
modeled using the th-power law model [14]. The
propagation delay of the inverter can be obtained
from inverter output voltage VDS (t), as follows:

3. PROPOSED DESIGN
In the proposed delay buffer, several power reduction
techniques are adopted. Mainly, these circuit
techniques are designed with a view to decreasing the
loading on high fan-out nets, e.g., clock and
read/write ports. We propose to replace the R–S flipflop by a C-element and to use tree-structured clock
drivers with gating so as to greatly reduce the loading
on active clock drivers. Additionally, DET flip-flops
are used to reduce the clock rate to half and thus also
reduce the power consumption on the clock signal.
The proposed ring counter with hierarchical clock
gating and the control logic is shown in Fig. 4. Each
block contains one C-element to control the delivery
of the local clock signal “CLKi,j ” to the DET flipflops, and only the “CKE signals along the path
passing the global clock source to the local clock
signal are active. The “gate” signal (CKEi,j ) can also
be derived from the output of the DET flip-flops in
the ring counter. The
C-element is an essential
element in asynchronous circuits for handshaking.
One of its implementation is shown in Fig. 5(a) [7].
The logic of the C-element is given by

SRAM-based delay buffers are more popular in long
delay buffers because of the compact SRAM cell size
and small total area. Also, the power consumption is
much less than shift registers because only two words
are accessed in each clock cycle: one for write-in and
the other for read-out. A binary counter can be used
for address generation since the memory words are
accessed sequentially.
Though the SRAM-based delay buffers do away
with many data transitions, there still can be
considerable power consumption in the SRAM
address decoder and the read/write circuits. In fact,
since the memory words are accessed sequentially,
we can use a ring counter with only one rotating
active cell to point to the words for write-in and read-

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT):2231 –5284 Vol-2 Issue-2
79

A High-Performance And Low-Power Delay Buffer

Where A as well as B are its two inputs and C+ as
well as C are the next and current outputs. If A=B,
then the next output C+ will be the same as A.
Otherwise A≠B, and C+ remain unchanged. Since the
output of C-element can only be changed when A=B,
it can avoid the possibility of glitches, a crucial
property for a clock gating signal.

DET flip-flop in the current block is asserted, then
both inputs of the C-element in the previous block go
to “0” and the clock for the previous block is
disabled.

Figure 5: Circuit diagrams of (a) the C-element [7] and (b) the
double-edge-triggered Flip Flop

To save area, the memory module of a delay buffer is
often in the form of an SRAM array with input/output
data bus as in [6]. Special read/write circuitry, such as
a sense amplifier, is needed for fast and low-power
operations. However, of all the memory cells, only
two words will be activated: one is written by the
input data and the other is read to the output. Driving
the input signal all the way to all memory cells seems
to be a waste of power. The same can be said for the
read circuitry of the output port. In light of the
previous gated-clock tree technique, we shall apply
the same idea to the input driving/output sensing
circuitry in the memory module of the delay buffer.
The memory words are also grouped into blocks.
Each memory block associates with one DET flipflop block in the proposed ring counter and one DET

Figure 4: (a) Ring counter with clock gated by C-elements, (b)
tree-structured clock drivers with gating, and (c) control logic
for clock enable signals

In order to reduce more power, we replace DFFs by
double-edge-triggered flip-flops [8] [see Fig. 5(b)]
and operate the ring counter at half speed. With such
changes, the clock gating control mechanism in Fig.
4(a) is different from the one in Fig. 3. When the
input of the last DET flip-flop in the previous block
changes to “1” making both two inputs of the Celement the same, the clock signal in the current
block will be turned on. When the output of the first

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT):2231 –5284 Vol-2 Issue-2
80

A High-Performance And Low-Power Delay Buffer

flip-flop output addresses a corresponding memory
word for read-out and at the same time addresses the
word that was read one-clock earlier for write-in. Fig.
6(a) depicts the tree-structured hierarchy of tri-state
inverters used for delivering the input word to the
addressed memory word.

the SRAM-based delay buffers when the buffer
length is shorter than 256. Fig. 8 shows the total
power consumption in normal operation mode and
the leakage power consumption in idle (disabled
clock) mode for 90-nm and 65-nm technology,
respectively. Note that the total power consumption
in normal operation mode is not logarithmically
proportional to the length of the delay buffer.

Figure 6: (a) Gated-driver tree of input driving circuitry and
(b) its timing diagram

4. SIMULATION RESULTS
A delay buffer based on the proposed techniques is
designed and implemented in 0.18- m CMOS
technology. The standard 6-T SRAM cell is used in
the delay buffer. Eight DET flip-flops, eight memory
words, and associated control logic are designed in a
full-custom fashion and grouped as one block. We
have simulated the proposed delay buffer with
various lengths in 0.18 m CMOS technology. The
word-length is set to 8 bits. The area and power
consumption are estimated from post layout
simulation. In addition, we compared the simulated
results with the values provided by a commercial
SRAM compiler in the same technology. Since in
each clock cycle, one read and one write operations
are necessary for the delay buffer of length N, either
one two-port SRAM with N words or two one-port
SRAMs each with N/2 words is required. Fig. 7(a)
shows the simulated power consumption at 135-MHz
operating frequency and 1.8-V supply voltage. Fig.
7(b) depicts their occupied area. From Fig. 7, we can
see that the proposed delay buffer outperforms both
the two-port and single-port SRAM-based delay
buffers in terms of power consumption. In addition,
the area of the proposed delay buffer is smaller than

Figure 7: Simulated results of (a) power and (b) area of various
delay buffers versus different lengths

Instead, due to the quad tree structure for all the
driving circuitry, delay buffers of length and have
approximate dynamic power because basically these
two cases activate the same number of drivers. We
can see that the superiority of the proposed circuit is
still obvious in 90-nm technology in that the leakage
power is almost negligible. Even in the more
advanced 65-nm technology, the leakage power can
be controlled to within an acceptable level for
medium-length delay buffers with the dual-Vt
approach. For longer-length delay buffers and for
more advanced technology, other leakage reduction

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT):2231 –5284 Vol-2 Issue-2
81

A High-Performance And Low-Power Delay Buffer

techniques such as the “sleep” transistors in SRAM
(Latch) cells can help to reduce leakage power [9].

2.

3.

4.

5.

6.

7.
Figure 8: Simulated power (with leakage power) of the
proposed delay buffer architecture in (a) 90-nm CMOS
technology and (b) 65-nm CMOS technology

AUTHORS PROFILE:

5. CONCLUSION

Gopala Krishna.M, working as
Assistant professor in MIC
College of Technology, has 3
years of Teaching Experience.
He received his B.Tech degree
in ECE from Anurag College
of Engg, Kodad in 2009.

We presented a low-power delay buffer architecture
which adopts several novel techniques to reduce
power consumption. The ring counter with clock
gated by the C-elements can effectively eliminate the
excessive data transition without increasing loading
on the global clock signal. The gated-driver tree
technique used for the clock distribution networks
can eliminate the power wasted on drivers that need
not be activated. Another gated-demultiplexer tree
and a gated-multiplexer tree are used for the input
and output driving circuitry to decrease the loading of
the input and output data bus. All gating signals are
easily generated by a C-element taking inputs from
some DET flip-flop outputs of the ring counter.
Measurement results indicate that the proposed
architecture consumes only about 13% to 17% of the
conventional SRAM-based delay buffers in 0.18- m
CMOS technology. Further simulations also
demonstrate its advantages in nanometer CMOS
technology.
If optimization is applied it is possible to achieve
much higher speeds with the proposed Gated Driver
while keeping the power consumption low. It is then
concluded that the proposed Gated Driver appears to
be the most suitable delay element in high-speed,
low-power VLSI applications.

UmaSankar.CH, working as
Assistant
professor
in
Universal college of Engg,
received his B.Tech degree in
ECE from Anurag College of
Engg, Kodad in 2009 and
M.Tech degree from JNTUH
in 2011.

Neelima.S,
working
as
Assistant professor in Gandhiji
Institute of Science
&
Technology, has 6 years of
Industry
and
Teaching
Experience. She received her
M.Tech degree in VLSI from
JNTUK in 2011.
Koteswara Rao.P, working as
Associate professor in Madhira
Institute of Science
&
Technology, has 12 years of
Teaching Experience. He
received his B.Tech degree in
ECE from Bapatla Engg
College in 1993 and M.Tech
degree from JNTUK in 2009.

ACKNOWLEDGEMENTS
The authors would like to thank the
anonymous reviewers for their comments which were
very helpful in improving the quality and presentation
of this paper.

REFERENCES:
1.

sleep transistor for leakage reduction,” IEEE J. Solid-State
Circuits, vol. 40, no. 4, pp. 895–901, Apr. 2005.
E. K. Tsern and T. H. Meng, “A low-power video-rate
pyramid VQ decoder,” IEEE J. Solid-State Circuits, vol. 31,
no. 11, pp. 1789–1794, Nov. 1996.
M. L. Liou, P. H. Lin, C. J. Jan, S. C. Lin, and T. D. Chiueh,
“Design of an OFDM baseband receiver with space
diversity,” IEE Proc. Commun., vol. 153, no. 6, pp. 894–900,
Dec. 2006.
N. Shibata, M.Watanabe, and Y. Tanabe, “A current-sensed
high-speed and low-power first-in-first-out memory using a
wordline/bitline- swapped dual-port SRAM cell,” IEEE J.
Solid-State circuits, vol. 37, no. 6, pp. 735–750, Jun. 2002.
J. Yuan and C. Svensson, “High-speed CMOS circuit
technique,” IEEE J. Solid-state Circuits, vol. 24, pp. 62-71,
1989.
V. Adler and E. G. Friedman, “Uniform repeater insertion in
RC trees,” IEEE Trans. Circuits Syst. I, Fundam. Theory
App., vol. 47, no. 10, pp. 1515–1523, Oct. 2000.
R. Hosain, L. D. Wronshi, and A. albicki, “Low power design
using double edge triggered flip-flop,” IEEE Trans. Very
Large Scale Integr. (VLSI ) Syst., vol. 2, no. 2, pp. 261–265,
Jun. 1994.

K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D.
Murray, N. Vallepalli, Y.Wang, B. Zheng, and M. Bohr,
“SRAM design on 65-nm CMOS technology with dynamic

International Journal of Electrical and Electronics Engineering (IJEEE), ISSN (PRINT):2231 –5284 Vol-2 Issue-2
82

