Design and Optimization of Sense Amplifier-Based Flip-Flops by Borivoje Nikolić & Vojin G. Oklobdžija
Design and Optimization of Sense Amplifier-Based Flip-Flops
Borivoje Nikolić
Dept. of Electrical and Computer Engineering,
University of California,
Davis, CA 95616, USA
bora@ece.ucdavis.edu
Vojin G. Oklobdžija
Integration Corp.
1285 Grizzly Peak,
Berkeley, CA 94708, USA
vojin@nuc.berkeley.edu
Abstract
An improved design of a sense amplifier-based flip-flop
is presented. The new design overcomes the problems of
floating nodes, which is a weakness of previously
reported solutions. This is achieved by systematic
derivation of flip-flop equations and rearranging the
resulting network.  The resulting flip-flop outperforms
earlier published structures, exhibiting TCQ of 190ps
when driving 100fF load in a 0.18µ m CMOS technology.
1. Introduction
Sense Amplifier based Flip-Flops (SAFF) have been
used in high-performance and low-power digital
systems. Recently reported modifications of SAFFs
exhibit very small delay, calculated as a sum of setup
time and clock to output delay in high-speed datapath
applications.  They have differential outputs, can be used
with single or differential inputs and present a small
clock load [1,2,3].
The initial design of a SAFF was based on the sense
amplifier in the first and the NAND-based cross-coupled
RS latch in the second stage, Figure 1 [1].
Modified SAFF improves the output stage to reduce the
overall delay [3].
2. Analysis of Operation
The sense amplifier stage [1] is precharged during the
interval when clock signal is low, generating falling
transition after clock rises on only one of its outputs,  S
or  R .  As reported in [1], SAFF did not include the
transistor  MN5.  Outputs of the first stage are
incompletely defined logic functions, with
NMOS pull-down trees implementing  D R Clk S ⋅ ′ ⋅ = ,
D S Clk R ⋅ ′ ⋅ =  functions and PMOS pull-up trees
implementing  R Clk S ′ ⋅ = ,  S Clk R ′ ⋅ = , represented by
Karnaugh maps shown in Figure 2.a and Figure 2.b.  S′
and  R′  represent previous values of signals  S  and  R .
Values marked as “don’t cares” (x) in Figure 2, actually
represent floating outputs of the logic function, that are
neither forced to 0 nor to VDD.  Essentially, the truth
table includes four more don’t cares, since both  S and
R  can not be zero at the same time.
D D
Clk
V DD
V DD
M N6
M N1 M N2
M N3 M N4
M N5
M P1 M P2 M P3 M P4
R S
Q Q
Figure 1: Sense amplifier-based flip-flop.
Ideally, when Clk is changing from 0 to 1, and D = 0, the
value of the S  node should remain unchanged, as well as
the value of the node  R  when Clk is rising and D = 1.
The consequences of having don’t cares in the Karnaugh
map are explained in the case of  R :
(a)  Clk = 1, D = 1,  R′  = 1: This Karnaugh map entry is
exercised when Clk changes from low to high logic
level, causing the triggering of the flip-flop.  With D
being high, branch evaluating  S  is being pulled to low
logic level,  R  floats, causing a glitch, (Figure 5) and
slowing down the transition. Elimination of this situation
would involve addition of another parallel pull-up
PMOS branch, leading to diminishing returns in terms of
speed.  Because of this ‘don’t care’ the new value of  R
is not forced directly by the branch evaluating it, but
indirectly through the change of  S .(b) Clk = 1, D = 0,  R′  = 0: This value corresponds to
the case of the sense amplifier output  R  being forced
low at the leading edge of the clock. If the data changes
after the leading edge of the clock, leakage currents
could charge this node, and eventually change the state
of the flip-flop.
This problem was noticed in [2], and proposed
modification is shown in Fig. 1. Additional transistor,
MN5, in Figure 1, allows static operation, providing a
path to ground even after the data changes. This prevents
eventual charging of the low sense-amplifier output, due
to leakage currents, to the value when it could trigger the
latch. This solution has two major disadvantages: a) the
glitch on  S  is more emphasized due to the conducting
path through MN5, Figure 5; b) direct gate connection to
long metal lines (VDD or ground) is prohibited. This is
due to very small gate thickness in deep-submicron
technologies.
x1
x1
Clk, D 00 01
00
11 10
11
11
x (0) 0
x (0) x (0)
01
x (1) 1
01
11
10
D
R D Clk ′ + +
S Clk ′ +
Clk
S′ R′ R′
S′
a)
x1
x1
Clk, D 00 01
00
11 10
11
11
x (0) 1
x (0) 1
x (1) x (0)
00
01
11
10
S′ R′
D R Clk ′ +
S D Clk ′ + +
Clk
R′
S′
b)
Figure 2: Karnaugh maps for a)  S , b)  R .
3. Flip-Flop Optimization
This problem can be solved in a more consistent way, by
covering the Karnaugh maps as shown in Figure 2:
S D R Clk S ′ ⋅ ⋅ ′ ⋅ = ,  R D S Clk R ′ ⋅ ⋅ ′ ⋅ = .
This modification involves addition of one NMOS
transistor per each pull-down tree.  Since the
implementation of the SAFF as shown in [3], already
includes the signals S and R, the gates of added
transistors are connected to existing inverter outputs, so
their addition does not change the input loading, and
reduces the glitch magnitude.  Additional inverters are
used to decouple the loading of critical nodes  S  and  R
by transistor MN3 and MN4 gates [4], speeding up the flip-
flop operation.
M N7
M P9 M P10
M N5 M N6
I3 I4
Q
D
V DD
R
Clk
D
V DD
M P4
M N8 M N9
M N10 M N11
M N12 M N13
M P7 M P8
M P5 M P6
M N4 M N3
M P2 M P3
M N1 M N2
M P1
R
S
S
Q
I1 I2
Figure 3: Modified sense amplifier-based flip-flop.M N1
M P7
M P12 M P13
M P5
M P6
Q
D
V DD
R
Clk
D
V DD
M P4
M N5 M N6
M N 7 M N 8
M N9 M N 10
M P10 M P11
M P8 M P9
M N4 M N3
M P2 M P3
M N2
M P1
R S
S
Q
I2 I1
I3 I4
Fig. 4: Falling edge-triggered flip-flop.
4. Results
In addition to reliability improvement, proposed
modification of the SAFF shows 5-10% faster operation
over the design with cross transistor [2], as shown in
Figure 6.  This flip-flop was designed in 0.18µ m  Leff
CMOS, with improved output stage [3], optimized to
drive the loads between 100 and 150fF. Clock to output
delay is 190ps when driving 100fF load at both outputs.
It is possible to design the falling edge-triggered SAFF
with the same driving capability, Fig. 4. This flip-flop
has the same setup time of –20ps and its delay
characteristic is equidistant to the characteristic of the
rising edge SAFF, with clock to output delay of 240ps
when driving 100fF load. This is not the case with
originally proposed design, based on cross-coupled NOR
gates [1,2].  The results from Fig. 6 are obtained with the
same size of the transistors in the output stage [3].  Two
lines for SAFFs from [1] in Fig. 6, represent the cases
when  i) only one or ii) both outputs are loaded.
Improved output stage design [3] has equal delays for
both outputs, that is independent of the load on the other
output.
5.  Conclusion
A new design of sense-amplifier-based flip-flop is
presented that eliminates floating nodes in the sense
amplifier stage.  By using the decoupling inverters the
modification also results in improved switching speed.
In addition the falling edge flip-flop was designed with
equidistant characteristic of delay vs. load.
References
[1] M. Matsui et al. “A 200MHz 13mm2 2-D DCT
Macrocell Using Sense-Amplifying Pipeline Flip-Flop
Scheme,” IEEE Journal of Solid-State Circuit, vol. 29,
no. 12, pp. 1482-1490, Dec. 1994.
[2] D. Dobberpuhl, “The Design of a High Performance
Low Power Microprocessor,” Proceedings International
Symposium on Low-Power Electronics and Design,
August 1996.
[3] B. Nikolic et al “Sense Amplifier-Based Flip-Flop,”
1999. International Solid-State Circuits Conference,
Digest of Technical Papers, February 1999.
[4] F. Klass, “Semi-Dynamic and Dynamic Flip-Flops
with Embedded Logic,” Symposium on VLSI Circuits,
Digest of Technical Papers, June 1998.
[5] V. Stojanovic, V.G. Oklobdzija, “Comparative
Analysis of Master-Slave Latches and Flip-Flops for
High-Performance and Low-Power Systems,” IEEE
Journal of Solid-State Circuits, vol. 34, no. 4, pp. 536-
548, April 1999.Q
Q
S
R
R in [2]
Clk
Fig. 5: Typical SAFF waveforms.
0
100
200
300
400
500
600
700
800
0 50 100 150 200 250
Falling Egde SAFF
w/NOR [1]
Rising Egde SAFF
w/NAND [1]
Falling Egde SAFF
this work
Rising Egde SAFF
this work
Rising Egde 
SAFF from [3]
Fig. 6: Delay vs. load for different SAFFs.