An Overlap-Contention Free True-Single-Phase Clock Dual-Edge-Triggered Flip-Flop by Bonetti, Andrea et al.
An Overlap-Contention Free True-Single-Phase
Clock Dual-Edge-Triggered Flip-Flop
Andrea Bonetti, Adam Teman and Andreas Burg
Telecommunications Circuits Lab (TCL), E´cole Polytechnique Fe´de´rale de Lausanne (EPFL), Switzerland
Email: {andrea.bonetti, adam.teman, andreas.burg}@epfl.ch
Abstract—Dual-edge-triggered (DET) synchronous operation is
a very attractive option for low-power, high-performance designs.
Compared to conventional single-edge synchronous systems, DET
operation is capable of providing the same throughput at half
the clock frequency. This can lead to significant power savings
on the clock network that is often one of the major contributors
to total system power. However, in order to implement DET
operation, special registers need to be introduced that sample
data on both clock-edges. These registers are more complex
than their single-edge counterparts, and often suffer from a
certain amount of clock-overlap between the main clock and
the internally generated inverted clock. This overlap can cause
contention inside the cell and lead to logic failures, especially
when operating at scaled power supplies and under process
variations that characterize nanometer technologies. This paper
presents a novel, static DET flip-flop (DET-FF) with a true-single-
phase clock that completely avoids clock overlap hazards by
eliminating the need for an inverted clock edge for functionality.
The proposed DET FF was implemented in a standard 40 nm
CMOS technology, showing full functionality at low-voltage
operating points, where conventional DET-FFs fail. Under a near-
threshold, 500 mV supply voltage, the proposed cell also provides
a 35% lower CK-to-Q delay and the lowest power-delay-product
compared to all considered DET-FF implementations.
I. INTRODUCTION
The design of energy-efficient circuits remains one of the
main challenges in the field of digital integrated systems [1]. A
large portion of the power dissipated in VLSI architectures is
attributed to clock distribution, consuming as much as 45% of
the total system power [2]. Clock networks are characterized
by a 100% activity factor, charging and discharging their
parasitic capacitors during each cycle, and thereby leading
to power dissipation that is directly proportional to clock
frequency. For this reason, among the different applications,
high-speed, high-throughput designs are especially affected by
this issue.
One well-known approach for reducing the clock power is
dual-edge-triggered (DET) synchronous operation. By sam-
pling data on both the rising and falling edges of the clock, the
clock frequency can be reduced by 50% without changing the
system throughput. This directly cuts the power dissipation of
the clock network in half, leading to significant overall system
power savings. However, implementation of DET operation
requires the introduction of registers that sample, store, and
propagate their input at both clock edges. While these dual-
edge-triggered flip-flops (DET-FFs) are more complex and
generally larger than their single-edge-triggered (SET) coun-
terparts, they can be designed to be more energy-efficient [3],
thereby providing additional power savings.
The implementation of storage cells that are triggered on
both clock edges is a well researched subject. Many solutions
for the design of DET-FFs have been proposed [4]–[8]. The
most popular of these cells is the transmission-gate latch-MUX
(DET-TGLM) [4] due to its simple implementation that is
based on two latches and an output multiplexer (MUX). An
alternative configuration can be assembled by replacing the
transmission gates with C2MOS gates [11], resulting in the
C2MOS latch-MUX (DET-C2MOSLM) [5]. A different ap-
proach is to generate a short pulse on each clock edge, thereby
realizing a pulse-triggered DET-FF, as shown in [6]. More
advanced DET FFs that limit the switching activity through
pulse generation and precharge conditions are the conditional
discharge flip-flop (DET-CDFF) [7] and the symmetric pulse
generator flip-flop (DET-SPGFF) [8].
While these topologies have been demonstrated on various
applications, few of them have been examined in deeply
scaled process technologies under voltage scaling, commonly
used for the implementation of energy-efficient systems. In
particular, in the presence of considerable process variation,
the use of both clock phases usually introduces some extent
of clock-overlap, which can lead to race conditions and other
detrimental circuit behavior. For example, when considering
the traditional DET-TGLM, process, voltage and temperature
(PVT) variations can cause this overlap to increase to a point,
where the currently held data is over-written, resulting in a
fatal logic error.
Contribution: in this paper, we solve this clock overlap
problem, by presenting the first static true-single-phase-clock
(TSPC) DET-FF. By implementing the cell with TSPC circuits
and an internal dual-feedback mechanism, completely static
operation is achieved, enabling robust operation under voltage
scaling and process variations. To demonstrate its function-
ality in nanoscaled technologies, the cell was implemented
in a 40 nm CMOS process, showing full functionality at a
near-threshold, 500 mV supply voltage (VDD) under extensive
Monte Carlo (MC) statistical simulations. In addition to being
the only topology to continue to operate robustly under these
conditions, the proposed cell also provides the lowest CK-to-
Q delay (tcq) and the best power-delay product (PDP) when
compared to other leading DET-FF solutions.
Outline: the rest of this paper is organized as follows:
Section II presents the clock-overlap hazard in the traditional
DET-TGLM circuit. The proposed static dual-edge-triggered
flip-flop with true-single-phase clock (SDET-TSPCFF) is pre-
sented in Section III to address this hazard and enable low-
voltage operation. Section IV provides simulation results for
the proposed cell and a comparison with other popular DET-
FF implementations. Finally, the conclusions are reported in
Section V.
CK
CKB
D
CKB
CKI
CKI
CKI
CKB
CKI
CKB
QB
SNP
SNN
CKI
CKB
CKI
CKB
M1
M2
M5
M6
M7
M8
M9
M10
M11
M12
M3
M4
M13
M14
M15
M16
M17
M18
M19
M20
M21
M22
M23
M24
M25
M26
Latch
Latch
MUX
CKI
M27
M28
Fig. 1. Schematic of the DET-TGLM.
II. CLOCK-OVERLAP FAILURE RISK IN
DET-TGLM CELLS
This section explains the risk of failure in DET-FFs due to
clock-overlap. The DET-TGLM gate was chosen to demon-
strate this hazard, as it is the most popular DET-FF implemen-
tation. Accordingly, a brief overview of the DET-TGLM cell
is provided in Section II-A, followed by a detailed analysis
of the risk of failure due to clock-overlap in Section II-B.
Note that while this discussion is specific for the DET-
TGLM cell, a similar analysis applies to many other DET-FF
implementations.
A. Overview of the DET-TGLM
Among the various DET-FFs, the DET-TGLM is one of
the most commonly implemented topologies, primarily due
to its simple structure and straightforward behavior. Two
values are stored internally in two separate latches that are
connected through an output MUX, as illustrated in Fig. 1. The
latches are implemented with input transmission gates (M3–
M4 and M13–M14), inverters (M5–M6 and M15–M16) and
clocked inverters (M7–M10 and M17–M20), while the MUX
is exclusively composed of transmission gates (M11–M12 and
M21–M22). Each latch is transparent during a different phase
of the clock, and the value stored in the opaque latch is passed
through the MUX to the output.
In addition to its simple structure, this cell has been shown
to be one of the most energy-efficient DET-FFs for high-speed
operation [3]. During its respective transparent window, each
latch passes the input data to its cascaded transmission gate
(SNP and SNN in Fig. 1), such that the data only needs to
propagate through the output MUX on the next clock edge.
This provides a short tcq, which makes the DET-TGLM suit-
able for high-frequency applications. Finally, this circuit does
not rely on pulse-triggered circuits or precharge (dynamic)
27 28 29 30 31 32 33
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
D
 [V
]
27 28 29 30 31 32 33
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
CK
 [V
]
27 28 29 30 31 32 33
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
SN
N 
[V
]
27 28 29 30 31 32 33
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
QB
 [V
]
Fig. 2. Voltage waveforms of the signals affected by the clock-overlap in
the DET-TGLM. The failure is represented with continuous lines while the
segmented lines show the case when the circuit overcomes the hazard.
conditions, such as those required by [6]–[8] making it less
sensitive to variations from technology and voltage scaling.
B. Clock-Overlap Failure Risk
The static operation of the DET-TGLM provides inherent
robustness; however, one problematic feature remains – its
dependence on both clock phases for functionality. To ac-
commodate this need, the inverted clock signal is internally
generated with an inverter (M25–M26). A second inverter
(M27–M28) is implemented to internally buffer the input clock
and ensure a controlled and fast slew rate of CKI. Due to the
intrinsic delay of the second inverter, CKB and CKI share the
same value during an interval of time that is ideally equal to
this delay. The time during which both clock signals are high is
defined as positive clock-overlap (PCO), while negative clock-
overlap (NCO) occurs when both clock signals are low. These
overlap phases occur immediately after each transition of the
clock signal that is used to generate the inverted one. Since
both clock signals are equal during such an overlap, there is
always one type of transistor (either NMOS or PMOS) turned
on in each transmission gate of the MUX (M11–M12 and
M21–M22). A conducting path is therefore generated between
the inputs of the MUX in the DET-TGLM, causing an internal
race between the values that are stored in the two latches. This
clock overlap time is heavily dependent on PVT variations and
wire parasitics. If the overlap is too large, the voltage value
stored in one latch will overwrite the value stored in the other
latch, resulting in a storage failure.
Fig. 2 demonstrates the behavior of a DET-TGLM gate
under a typical hazardous disrupt. In this example, the clock is
initially low and a logic-0 is stored at SNP and passed through
the top transmission gate to the output. During this phase, the
bottom latch is transparent, passing a logic-1 from the input
(D) to SNN. After the rising edge of the clock, both CKI
and CKB are low during the NCO, and the PMOS transistor
in each transmission gate is conducting. If the NMOS that
is pulling down SNP (M6) drives more current than the
PMOS that is driving SNN (M15) and if the overlap time is
0 0.1 0.2 0.3 0.4 0.5
0
100
200
300
400
500
600
Logic−1 minimum voltage level [V]
N
um
be
r o
f e
ve
nt
s
Fa
ilu
re
 th
re
sh
ol
d 
= 
0.
19
9V
15 failures
Fig. 3. Distribution of the minimum voltage value reached by the storage
node SNN during the clock-overlap.
sufficiently long, the voltage value on SNN will drop until it
is overwritten by a logic-0 through the cross-coupled feedback
of the bottom latch. Following the overlap period, this logic-0
value is latched and driven through the MUX to provide the
wrong value at the output. The transient waveforms of SNN
and QB during a failing event are shown as a solid line in
Fig. 2, while a case where the circuit overcomes the hazard is
shown with a dotted line. The same failure risk can be studied
for the case where CKB and CKI are high, and a logic-0 value
stored at SNP is the critically affected value.
As previously described for the case of NCO, a failure
occurs when the voltage at SNN drops below a critical
threshold that results in a latched logic-0 level. To evaluate
the probability of such an occurrence, we employ statistical
MC simulations, applying global and local process variations
to a DET-TGLM gate during a NCO phase. Fig. 3 displays the
obtained distribution of the minimum voltage level of SNN for
10,000 MC samples applied to a DET-TGLM implemented in
a standard 40 nm CMOS process. The simulations were run
with a near-threshold VDD of 500 mV at 125◦C. Out of the
10 k samples, 15 resulted in a storage failure, as can be seen by
the non-zero probability of voltage levels centered around 0 V.
In addition, the failure threshold can be estimated at 0.199 V,
which is the minimum voltage level for a stored logic-1 that is
still overcome by the gate without causing a failure. However,
it is clear from the presented distribution and the large number
of failures that the DET-TGLM is not a viable candidate for
near-threshold operation.
III. THE PROPOSED SDET-TSPCFF
In the previous section, the traditional DET-TGLM gate was
shown to be unsuitable for near- or sub-threshold operation in
scaled technologies, due to the risk of clock overlap failures. In
order to overcome these risks, we propose a fully-static, TSPC
alternative to the DET-TGLM and other dual-phase solutions.
Other TSPC DET-FFs have been shown in the past [6]–
[10]; however, these gates all rely on temporary dynamic
storage [9], [10] and/or generated pulses [6]–[8], which make
them sensitive to both process variations and voltage scaling.
The schematic of the proposed SDET-TSPCFF is shown
in Fig. 4. Similar to other latch-MUX DET-FFs, new data is
written to an internal storage node during one clock phase
and subsequently latched and driven to the output following
the clock transition. This is achieved without the need for an
CK
Q
CK
CK
CK
CK
CK
CK
D
CK
CK CK
CK
CK CK
SNN
SNP
DBP
DBN
M1
M2
M3
M4
M5
M6
M7
M8
M9
M10
M11
M13
M14
M15
M16
M17
M18M12
M19
M20
M21
M22
M23
M24
M25
M26
M27
M28
M29
M30 M31
M32
M33
M34
M35
M36
Fig. 4. Schematic of the proposed SDET-TSPCFF.
inverted clock signal by implementing the storage elements
with a pair of TSPC latch-MUX branches (M1–M18 and
M19–M36). These branches are loosely based on the classic
TSPC latch [12] with the addition of two internal feedback
mechanisms that ensure strong data levels and fully-static
operation to enable robust, low-voltage functionality.
To further explain the circuit operation and its feedback
mechanisms, we will focus on the top branch in Fig. 4 (M1–
M18), with the opposite branch operating in a completely
symmetric fashion. When CK is high, devices M1–M8 act
as a buffer, passing the value at D to SNP. This buffer does
not encounter any contention with other parts of the circuit,
as M5, M10 and M11 are all cut off. In addition, in this state,
the output of the top branch presents a high-impedance to
Q, as M17 cuts off the pull-up to this node and M12 pulls
down the gate of M18, cutting off the pull-down to the output.
When CK goes low, the current state of SNP is latched, since
M7 cuts off the pull-down and M2 cuts off the pull-down
path to DBP, disabling a pull-up through M6. It is essential
to ensure that DBP does not drift and possibly turn on M6,
and therefore, a feedback path from SNP to M4 maintains a
logic-1 at DBP if SNP was latched at 0. Moreover, in this
state, devices M9–M15 comprise a cross-coupled inverter that
holds the level at SNP through a strong positive-feedback
loop. Finally, devices M16–M18 function as a tri-state inverter,
selectively and robustly passing the storage value to the output.
While the 36 transistors required to implement this gate is
larger than the 28 required by the DET-TGLM or many of
the other DET-FFs, the additional area enables static overlap-
contention free operation, thereby providing variation-tolerant
functionality at scaled supply voltages and for advanced pro-
cess technologies.
IV. SIMULATIONS AND RESULTS
The performance of the proposed cell is evaluated consider-
ing two groups of simulations. First, the resilience of the stor-
age cell against failures is tested with MC simulations to show
its robustness. Second, the SDET-TSPCFF is characterized in
terms of speed and power consumption in order to compare
it with the other popular static DET-FF implementations. All
circuits were implemented with standard-VT transistors in a
40 nm CMOS technology for comparison.
In the first considered testbench, all possible combinations
of data are written inside the storage cell, and subsequently
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
CK
 [V
]
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
D
 [V
]
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
0
0.1
0.2
0.3
0.4
0.5
Time [ns]
Q 
[V
]
Fig. 5. Monte Carlo family plots of the SDET-TSPCFF.
checked at the output during the next clock phase. The output
value is continuously sampled and failures are reported if
it differs from the expected value. Both process and mis-
match variations are taken into account while running MC
simulations. Furthermore, near-threshold operation is targeted
by setting VDD to 500 mV. This set of runs is executed for
each of the following temperatures: 0◦C, 25◦C and 125◦C.
An example of the family plots obtained through a set of
simulations is shown in Fig. 5. The proposed cell provided
the correct output for all 10 k samples at each temperature
point, indicating robust functionality under these operating
conditions.
In order to evaluate the performance of the proposed cell,
it was compared with other latch-MUX based storage cells
that do not rely on pulse generation. The characterization of
the storage cells is performed using the testbench proposed
in [13], where several state-of-the-art DET-FFs are simulated
and compared. All the simulations were applied to 40 nm
implementations of the considered circuits with VDD=0.5 V,
at 25◦C and at a typical process corner. The frequency of the
input clock is 500 MHz, corresponding to a cell throughput of
1 GHz with data activity of 25%.
The results are summarized in Table I, showing that in
addition to solving the clock-overlap failures, the proposed
SDET-TSPCFF also provides the lowest CK-to-Q delay (tcq).
The DET-C2MOSLM shows the worst performance in terms
of speed and dynamic power consumption, as the presence of
four stacked transistors severely compromises its performance
at near-threshold operation. Therefore, much wider transistors
are required in order to operate correctly under conditions,
resulting in a severe area and power consumption penalty.
The advantage in tcq of the proposed circuit as compared
to the DET-TGLM cell is due to the reduced conductivity
of its transmission gates at scaled voltage supplies. The
SDET-TSPCFF also provides a lower clock load compared
to the DET-TGLM, defined as the number of minimum-size
transistors controlled by a clock signal. The leakage and total
power consumption of the presented cell is slightly higher
than those of the DET-TGLM; however its PDP is lower,
confirming that the SDET-TSPCFF is the best option in terms
TABLE I
SUMMARY OF THE STORAGE CELLS CHARACTERISTICS 1
DET-TGLM DET-C2MOSLM This work
Transistor Count 30 22 38
Clock Load 16 8 14
CK-to-Q Delay (tcq) [ps] 230.9 395.3 149.9
Internal Power 2 (Pint) [nW] 344.4 386.2 287.5
Total Power 2 (Ptot) [nW] 431.6 495.3 449.9
Leakage Current (Ileak) [nA] 12.7 15.4 17.1
Power-Delay Product 3 (PDP) [aJ] 99.7 196.8 67.5
1 Based on the testbench proposed in [13].
2 At 500MHz input clock frequency and 25% data activity.
3 PDP = tcq · Ptot
of energy-efficiency. Note that in any case, the only fully
functional cell at this operating point is the proposed SDET-
TSPCFF, and therefore, it is the unequivocal choice for DET
operation in low-power, nanoscaled systems, targeted at near-
threshold operation.
V. CONCLUSIONS
This paper presented a novel dual-edge-triggered flip-flop
topology to solve the inherent clock-overlap risk in the major-
ity of the previously presented DET-FFs. The failure risk due
to clock-overlap was demonstrated on a popular DET-TGLM
gate, showing an unacceptable error-rate at near-threshold volt-
ages in a 40 nm CMOS process. The proposed fully-static true-
single-clock-phase DET-FF was shown to be fully functional
at a similar operating point, under local and global process
variations and at a wide range of temperatures. In addition,
the proposed cell was found to provide the best performance
and energy-efficiency among static DET-FF options.
ACKNOWLEDGMENTS
This project is partially funded by Nano-Tera.ch with Swiss
Confederation financing. The authors would like to thank
Nicholas Preyss for his invaluable help in carrying out this
work.
REFERENCES
[1] R.G. Dreslinski et al., “Near-threshold computing: reclaiming Moore’s
law through energy efficient integrated circuits.” Proceedings of the IEEE,
vol. 98, pp. 253-266, Feb. 2010.
[2] H. Kawaguchi and T. Sakurai, “A reduced clock-swing flip-flop (RCSFF)
for 63% power reduction.” IEEE JSSC, vol. 33, pp. 807-811, May 1998.
[3] M. Alioto et al., “DET FF topologies: A detailed investigation in the
energy-delay-area domain”. IEEE ISCAS, 2011, pp. 563-566.
[4] R. Llopis and M. Sachdev, “Low power, testable dual edge triggered flip-
flops.”, ISLPED, 1996, pp. 341-345.
[5] A. Gago et al., “Reduced implementation of D-type DET flip-flops.”,
IEEE JSSC, vol. 28, pp. 400-402, Mar. 1993.
[6] J. Tschanz et al., “Comparative delay and energy of single edge-
triggered & dual edge-triggered pulsed flip-flops for high-performance
microprocessors.”, ISLPED 2001, pp. 147-152.
[7] P. Zhao et al., “High-performance and low-power conditional discharge
flip-flop.”, IEEE TVLSI, vol. 12, pp. 477-484, May 2004.
[8] N. Nedovic et al., “A low power symmetrically pulsed dual edge-triggered
flip-flop.”, IEEE ESSCIRC, 2002, pp. 399-402.
[9] M. Afghahi, “A robust single phase clocking for low power, high-speed
VLSI applications.”, IEEE JSSC, vol. 31, pp. 247-254, Feb. 1996.
[10] J.S. Wang, “A new true-single-phase-clocked double-edge-triggered flip-
flop for low-power VLSI designs.”, IEEE ISCAS 1997, pp. 1896-1899.
[11] Y. H. Suzuki et al., “Clocked CMOS calculator circuitry.”, IEEE JSSC,
vol.8, pp. 462,469, Dec. 1973.
[12] J. Rabaey et al., Digital integrated circuits-2nd ed, Prentice hall, 2003.
[13] N. Nedovic et al., “A test circuit for measurement of clocked storage
element characteristics.”, IEEE JSSC, vol. 39, pp. 1294-1304, Aug. 2004.
