REVIEW OF HIGH-SPEED DIGITAL CMOS CIRCUITS by Elrabaa1, Muhammad E. S.
The 6th Saudi Engineering Conference, KFUPM, Dhahran, December 2002  Vol. 4.  167 
REVIEW OF HIGH-SPEED DIGITAL CMOS CIRCUITS 
 
 
Muhammad E. S. Elrabaa1 
 
1:Assistant professor, COE department, KFUPM 
E-mail: elrabaa@kfupm.edu.sa 
 
 
ABSTRACT 
This paper presents a comprehensive review of the major state-of-the-art high-speed CMOS digital 
circuits. Focusing in particular on dynamic circuits such as conventional DOMINO, conditional-
evaluation DOMINO and contention-free DOMINO. Also some high-performance non-dynamic 
(static) circuit techniques will be reviewed such as dual-threshold (DVT) circuits. The relative 
performance of these circuits in terms of speed, power, and noise margins is presented. Also the effects 
of technology scaling into the deep sub-micron regime on these techniques are evaluated and 
presented. 
 
Keywords: Digital Circuits, CMOS, Scaling, Leakage, Noise Margins 
ﺹﺨﻠﻤﻟﺍ 
لﺍ ﺭﺌﺍﻭﺩ ﻡﻴﻤﺼﺘﻟ ﺔﺜﻴﺩﺤﻟﺍ ﻕﺭﻁﻟﺍ ﻡﻫﻷ ﺔﻠﻤﺎﺸ ﺔﻌﺠﺍﺭﻤ ﺙﺤﺒﻟﺍ ﺍﺫﻫ ﻡﺩﻘﻴCMOSﺔﻋﺭﺴﻟﺍ ﺔﻘﺌﺎﻔﻟﺍ ﺔﻴﻤﻗﺭﻟﺍ  .ﺍﺩﻴﺩﺤﺘ , ﺭﺌﺍﻭﺩﻟﺍ
لﺍ ﻉﺍﻭﻨﺍ لﺜﻤ ﺔﻴﻜﻴﻤﺎﻨﻴﺩﻟﺍDOMINOﺔﻔﻠﺘﺨﻤﻟﺍ  . ﺔﻴﺌﺎﻨﺜ ﺭﺌﺍﻭﺩﻟﺍ لﺜﻤ ﺀﺍﺩﻷﺍ ﺔﻘﺌﺎﻓ ﻯﺭﺨﺃ ﺭﺌﺍﻭﺩ ﻡﻴﻴﻘﺘ ﻭ ﻡﻴﺩﻘﺘ ﻡﺘﻴﺴ ﺎﻤﻜ
لﺍThreshold .ﺍﻭﺩﻟﺍ ﻩﺫﻫ ﻥﻴﺒ ﺔﻨﺭﺎﻘﻤﻟﺍﺵﻴﻭﺸﺘﻟﺍ ﻥﻤ ﺔﻋﺎﻨﻤﻟﺍ ﻭ ﺔﻗﺎﻁﻟﺍ ﻙﻼﻬﺘﺴﺍ ﻭ ﺔﻋﺭﺴﻟﺍ لﺎﺠﻤ ﻲﻓ ﻥﻭﻜﺘﺴ ﺭﺌ . ﺎﻤﻜ
ﺭﺌﺍﻭﺩﻟﺍ ﻩﺫﻫ ﻰﻠﻋ ﺕﺍﺭﻭﺘﺴﻴﺯﻨﺍﺭﺘﻠﻟ ﻕﺌﺎﻔﻟﺍ ﺭﻴﻐﺼﺘﻟﺍ ﺭﻴﺜﺄﺘ ﻡﻴﺩﻘﺘ ﻭ ﻡﻴﻴﻘﺘ ﻡﺘﻴﺴ. 
   
 
 
1. INTRODUCTION 
Over the recent years two principal techniques for achieving high-speed logic using standard 
Silicon-based CMOS technology have emerged; dynamic CMOS circuits (DOMINO) and low 
threshold (LVT) CMOS circuits. Since the emergence of dynamic CMOS gates (DOMINO) 
[Krambeck et. al, 1982], shown in Figure 1, they became the main option for implementing 
high-speed logic paths. Their speed and area advantage over static CMOS circuits [Fletcher, 
1994] made them indispensable for high performance applications such as microprocessors. 
However, due to its lower noise margins, complexity of design and dynamic behavior, LVT 
CMOS circuits started gaining ground as a static high-speed alternative. 
 
 
Vol. 4.  168 Muhammad E.S. Elrabaa 
In this paper both techniques with their associated advantages and disadvantages are 
reviewed. Also recently proposed enhancements are presented. In section 2 the different 
DOMINO implementations are evaluated with emphasis on the newly proposed contention-
free DOMINO [Elrabaa et. al,] that circumvents the major problems associated with 
conventional DOMINO. In section 3 LVT circuit techniques are discussed along with an 
elaborate discussion of recently proposed modifications to reduce the leakage. Finally, 
conclusions are presented in section 4. 
 
All simulation results presented in this paper were generated using HSPICE and a 0.25µm, 
2.5V CMOS technology with nominal threshold voltages of 540 mV and -580 mV for the 
NMOS and PMOS devices, respectively. 
2. Dynamic Circuits (DOMINO) 
DOMINO continued to dominate the high-speed applications since its introduction due to its 
single-transition (monotonic) evaluation. This meant smaller area, less gate capacitance and 
natural pipelining. However, as the chips’ power grew, lowering the supply voltage has 
emerged as the most favorable way of reducing the power [Brodersen et.al  ,1993 ]. To 
prevent speed degradation, threshold voltages had to be lowered as well, which in turn 
increased the leakage currents. While the increased leakage does not affect the functionality of 
static CMOS, it reduces the noise margins (NM) of dynamic circuits and ultimately leads to 
their failure. Dual Vt (DVT) technologies were proposed as a solution to this problem 
[Thompson et.al  ,1997 ], where LVT devices are used in static CMOS gates and high Vt 
(HVT) devices are used in DOMINO gates. This option not only is more expensive than the 
single Vt option, but it also deprives the DOMINO gates from benefiting from the higher 
speed of the LVT devices, thus reducing the DOMINO’s speed advantage. Also, as it was 
clearly demonstrated in [Thompson et.al  ,1997 ], for the same channel length, it is not 
possible to optimize two devices with two different Vts. This means that the HVT device 
would have a more compromised performance than a similar device in a single Vt technology, 
further degrading the DOMINO’s performance. All this makes it very difficult for products to 
meet their speed targets as the technology and supplies scale down. Another recent dual Vt 
implementation of DOMINO focused on reducing the leakage while reaping the speed 
advantages of LVT devices [Kao  ,1999 ]. However, no attention was paid to noise margins. 
This is very impractical since DOMINO is especially vulnerable to noise.  
 
A modified DOMINO gate that is contention-free (CF-DOMINO) was proposed [Elrabaa 
et.al  , ] to resolve the trade-off between speed and NM and extend the DOMINO’s operation 
into the deep LVT regime. The trade-off between NM and speed for the conventional 
DOMINO is first demonstrated as a function of Vt and then the performance of the 
CF-DOMINO circuit is presented. 
Review of High-Speed Digital CMOS Circuits Vol. 4.  169 
2.1 Conventional DOMINO 
Referring to figure 1, the conventional DOMINO operates as follows. During the pre-charge 
phase (when the clock is low), the DOMINO output is pre-charged to VDD and the keeper is 
turned ON. When the clock goes high (the evaluation phase), depending on the inputs, the 
DOMINO output is either discharged to GND or remains high at VDD. If all inputs are low, 
the keeper will keep the output high despite any existing noise. So the larger the keeper width 
(Wkeeper), the better the NM. When at least one of the inputs is high, the output is pulled low 
and the keeper is turned OFF. The larger the keeper, the larger the contention that occurs 
between the pulling down NMOS device and the keeper and the slower the gate. This is the 
basic speed-NM trade-off of the conventional DOMINO. This trade-off becomes severer at 
lower Vts because of the increased NMOS leakage. Figure 2 shows the voltage waveforms of 
a DOMINO gate. It shows how the DOMINO output starts switching while the keeper is still 
ON (the contention). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 3 shows the DC characteristics of an 8-input conventional DOMINO NOR gate for 
several Vts keeping the keeper size constant. An 8-input NOR was chosen because DOMINO 
is mostly favoured for wide FanIn NOR gates and these represent the worst-case scenario for 
leakage-induced noise margins degradation. This figure shows that the low NM decreases as 
Vt decreases. 
 
Figure 4 shows the normalized delay of a 3-stages chain of 8-input DOMINO NOR gates with 
a FanOut of 3 versus Vt for two cases; constant NM (by varying the keeper’s size) and 
constant keeper size (i.e. variable NM). The normalized keeper size for the constant NM case 
is also shown on the same graph. The NM was defined according to the criterion in 
[Brodersen et.al  ,1993 ], the input voltage above ground that causes a 10% drop from VDD at 
the DOMINO output. The NM was set to 10% of VDD. This figure shows the trade-off 
between NM and speed as Vt is reduced. In order to keep the NM of the DOMINO gate 
constant, the keeper size had to be increased. This, in turn reduced the speed gain at lower Vts 
and eventually increased the delay. The constant keeper curve shows the maximum possible 
VDD 
A B 
   
Clock 
Keeper 
DOMINO 
Output 
• 
 
Figure 1.  A 2-input conventional DOMINO NOR gate. 
Vol. 4.  170 Muhammad E.S. Elrabaa 
gain in speed with lowering the Vt. However, in the case of conventional DOMINO, this can 
only be achieved if the reduction in the NM with Vt was wrongfully ignored. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2.2 Contention-Free DOMINO 
As it was shown, the contention between the keeper’s current and the NMOS devices’ in the 
evaluation phase represents a trade-off between NM and speed as Vt decreases. To resolve 
this trade-off and remove the contention, the new modified DOMINO gate, CF-DOMINO, 
shown in figure 5, was devised. The inverter in the conventional DOMINO, that drove the 
keeper, was replaced by a 2-input static CMOS NAND gate. One input of the NAND gate is 
connected to the DOMINO output and the other is connected to the clock. The devices in the 
NAND gate that are connected to the clock input are kept at a minimum size to avoid 
0
0.5
1
1.5
2
2.5
0.2 0.3 0.4 0.5 0.6 0.7 0.8
am
pl
itu
de
 (V
)
time (nS)
Domino O/P
Figure 2. The waveforms of a conventional DOMINO gate. 
Figure 4. The normalized delay and keeper size vs. Vt. 
Constant keeper 
size
 Constant NM 
1
2
3
4
5
6
7
8
9
0.8
0.85
0.9
0.95
1
1.05
1.1
200 250 300 350 400 450
N
or
m
al
iz
ed
 K
ee
pe
r S
iz
e
N
orm
alized D
elay
Vt (mV)
0
0.5
1
1.5
2
2.5
0 100 200 300 400 500
D
om
in
o 
O
ut
pu
t (
V)
Input  (mV)
Vt decreases
Figure 3. The DC characteristics of DOMINO as a function of Vt. 
 
Review of High-Speed Digital CMOS Circuits Vol. 4.  171 
increasing the clock load. This technique is only valid for footless DOMINO, where all inputs 
are synchronized with the clock.  
 
A conditional keeper technique similar to the CF-DOMINO was proposed in [Alvandpour 
et.al, 2001]and [Alvandpour et.al, 1999]. In that technique, the NAND clock input is delayed 
(by at least 2 inverters). This means that the DOMINO output would actually be without a 
keeper for this delay, a potential serious noise problem for the already noise sensitive 
DOMINO. Also, for this technique to be efficient in reducing the gate’s delay, the fan in has 
to be very high (>16) [Alvandpour et. al  ,2001 ]. The added inverters also increase the power 
consumption. In the presented CF-DOMINO technique, if there is a need to delay the clock 
(to mach the data delays), the gate won’t suffer the above shortcomings.  
 
 
 
 
 
 
 
 
 
The circuit operates as follows; during pre-charging when the clock is low, the NAND’s 
output is high and hence the keeper is OFF. Now when the clock goes high if the DOMINO 
output remains high, the NAND’s output will go low after one gate delay and turns the keeper 
ON. However, if the DOMINO output starts going low, the NAND’s output will remain high, 
and the keeper will remain OFF. This totally eliminates the contention between the keeper 
and the NMOS devices in the evaluation phase since they will not be ON simultaneously. 
Hence, the noise margin is kept constant (and equal to its value at high Vt) as Vt is lowered by 
increasing the keeper’s size. At the same time (as the keeper’s size increases) the contention 
does not increase. In fact, the noise margin of the CF-DOMINO circuit can be made identical 
to that static CMOS. 
 
This contention-free operation is clearly demonstrated in figure 6, which shows the voltage 
waveforms of the CF-DOMINO gate in the evaluation phase. It shows how the NAND’s 
output (the keeper input) remains high when the output goes low in the evaluation phase. The 
small ‘dip’ in the NAND’s output is far from sufficient to start turning the keeper ON. Also, it 
is worth noting that the CF-DOMINO gate does not require any special clock timing different 
from that of practical conventional DOMINO as evident from Figures 2 and 6. Just as in 
regular DOMINO, the clock is usually delayed such that precharging is delayed along the 
logic path to increase the operating frequency [Harris and Horowitz, 1997 ]. The clock, 
   A  B 
Clock 
Keeper 
DOMINO 
Output 
•
•
Figure 5. A 2-input CF-DOMINO NOR gate. 
GND 
VDD
Vol. 4.  172 Muhammad E.S. Elrabaa 
however, is never delayed to the point that it gets into the path delay (i.e. clock is never 
designed to arrive after the data). This is usually accomplished by using two inverters to delay 
the clock between consecutive DOMINO stages (which have a static CMOS stage in 
between).  
 
If the delay between the clock and data becomes too large, the CF-DOMINO operation will 
resemble that of the conventional DOMINO (i.e. it will suffer from contention). Finally, if the 
DOMINO path contains feed forwarded data (i.e. some inputs to a DOMINO gate arriving 
earlier than the others), fast inputs should be delayed to avoid the above pitfall. 
 
 
 
 
 
 
 
 
 
 
 
2.3 Delay Comparison 
The delay of a 3-stages 8-input CF-DOMINO NOR chain with a fanOut of 3 is shown in 
figure 7 as a function of Vt for constant NM.  Also, the delay curves of conventional 
DOMINO (same setup, Fan in and Fan out) with constant NM and constant keeper size (from 
figure 4) are re-plotted on the same graph. All delays are normalized to the delay of the 
conventional DOMINO at the high Vt of 450 mV. This figure shows how the speed of the 
CF-DOMINO continues to improve as Vt is reduced in a similar manner to that of DOMINO 
with constant keeper size. Hence, the speed-NM trade-off has been completely resolved with 
the CF-DOMINO. A small speed difference starts to develop between the CF-DOMINO and 
conventional DOMINO with constant keeper size as Vt is reduced further. This is due to the 
increased output loading as the keeper and NAND gate are sized up to keep NM constant. 
 
 
 
 
 
 
Keeper Input 
0
0.4
0.8
1.2
1.6
2
2.4
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
A
m
pl
itu
ed
 (V
)
time (nS)
Figure 6: Voltage waveforms of the CF-DOMINO gate. 
DOMINO 
Output 
 
Figure 7. The normalized delay of the CF-DOMINO vs. Vt.  Also shown are the two   cases of 
conventional DOMINO; constant NM & constant keeper size. 
 Conv. DOMINO 
@Constant NM 
 0.85
0.9
0.95
1
1.05
1.1
200 250 300 350 400 450
N
or
m
al
iz
ed
 D
el
ay
Vt (mV)
Conv. DOMINO 
@Constant keeper 
Review of High-Speed Digital CMOS Circuits Vol. 4.  173 
2.4 Power Comparison  
A potential concern about the CF-DOMINO circuit is its dynamic power consumption relative 
to the conventional DOMINO. This is because of its slightly higher clock loading. Figure 8 
shows a dynamic power comparison between the two circuits at 500 MHz. The NOR chains 
that were used to measure the delays were also used to measure the dynamic power. The 
dynamic powers, which include the power in the clock circuitry, were normalized to the 
power of the CF-DOMINO at high Vt. This Figure shows that the CF-DOMINO actually has 
significantly smaller power consumption than the conventional DOMINO at lower Vts. This 
is due to the lack of contention in the CF-DOMINO gate, which means that there are no rush-
through currents between VDD and GND during switching.  
2.5 Leakage Comparison  
Figure 9 shows the leakage of the conventional DOMINO versus Vt normalized to its value at 
high Vt. It also shows the leakage ratio between the CF-DOMINO and the conventional 
DOMINO.   
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
Figure 8. The normalized dynamic powers of CF-DOMINO and conventional 
DOMINO vs.  Vt at 500 MHz. Both circuits had equal  and constant NM. 
1
1.1
1.2
1.3
1.4
1.5
200 250 300 350 400 450
N
or
m
al
iz
ed
 P
ow
er
Vt (mV)
1
10
100
0.925
0.93
0.935
0.94
0.945
200 250 300 350 400 450
N
or
m
. c
on
v.
 D
om
in
o 
Le
ak
ag
e
C
F-
D
om
in
o/
co
nv
. D
om
in
o 
ra
tio
Vt (mV)
Figure 9. The normalized leakage of the conventional DOMINO and leakage 
ratio between the CF-DOMINO and conventional DOMINO. 
Vol. 4.  174 Muhammad E.S. Elrabaa 
The leakage of the conventional DOMINO follows a typical exponential curve as Vt is 
decreased and the ratio remains in the range between 93% to 95%. Hence, the CF-DOMINO 
actually has a slightly smaller leakage than the conventional DOMINO. The smaller leakage 
is due to the use of the NAND gate instead of an inverter. The series connected NMOS 
devices will have less leakage due to the de-biasing of the internal node (which makes VGS 
negative for the top NMOS).  
2.6 Area Impact 
The area impact of using the CF-DOMINO instead of the conventional DOMINO for the 
NOR chains used above was less than 3%. This is because the NAND gate that replaced the 
inverter is of minimum size, hence the insignificant increase in the area. 
3. Low-Vt Circuits 
As was mentioned earlier, to prevent speed degradation of digital CMOS circuits, the 
threshold voltages had to be scaled down aggressively. This, however, would increase the 
leakage currents and the static power consumption beyond the VLSI integration limits. 
Several techniques for reducing the standby leakage currents in low-Vt (LVT) CMOS circuits 
were proposed. A popular technique utilizes high-vt (HVT) MOS devices to gate the supply’s 
current [Mutoh et.al., 1995]. An addition to this technique reduces the leakage even further by 
over turning-off the gating devices by applying a negative VGS [Stan, 1998] to it. The second 
technique requires the complexity of having multi-supply voltages. These gating techniques 
preserved the logic state in standby mode by either adding large resistors [Horiguchi et.al, 
1993] or diodes [Makino  et.al , 1998] in parallel with the gating devices. All these techniques 
require the non-trivial design task of sizing the gating devices [Kao  et.al , 1997] and [Kao  
et.al , 1998], which will significantly impact both the speed and area. Also, they require very 
high ratio of standby time to active time to be effective (i.e. they do not improve active 
leakage). Other methods utilize the fact that the leakage of series-stacked low-Vt devices is 
much smaller than non-stacked devices. Hence an input vector that gives a "minimum" 
leakage is selected either via a statistically based search algorithm [Halter and Najm , 1997] 
or a genetic algorithm [Chen, et.al. 1998]. The resulting leakage reduction is due to the 
reversed biased VGS resulting from internal source nodes charging up above ground (for 
NMOS) and below VDD for (PMOS). This reverse bias, however, would take a relatively 
long time to develop and cut the leakage completely. Hence this technique still requires a high 
standby time to active time ratio to be effective. Also, every time the design changes, the 
"minimum" leakage input vectors will have to be re-calculated using the above mentioned 
time-consuming algorithms. Another power reduction technique utilizes automatic feedback 
control to set the supply voltage at a minimum value that achieves the operating frequency 
[Kuroda et.al., 1998]. This technique involves the design of a complex feedback control loop, 
speed detectors, and highly efficient on-chip DC-to-DC converters.  This would have to be 
repeated for different logic blocks operating at different frequencies in the same chip. Also, 
Review of High-Speed Digital CMOS Circuits Vol. 4.  175 
the ambiguity of the circuit’s speed makes the design process more difficult since the designer 
does not have a specific supply voltage to target. 
 
Recently, a new dual-vt (DVT) static CMOS circuit technology, the split-gate DVT 
(SG-DVT), was proposed [Elrabaa and Elmasry, 2001]. The SG-DVT circuits and their 
performance compared to that of all HVT (AllHVT), all LVT (AllLVT), and other DVT 
circuits is presented below.  The thresholds of the low-Vt devices were set to a value that is 
300 mV lower than their HVT counter part. This corresponds to an increase of about 25% in 
the saturation currents of the LVT devices over those of the HVT devices for the CMOS 
technology at hand. 
3.1 SG-DVT Circuits 
Two versions of the SG-DVT technique are shown in Figure 10; the SG1-DVT, demonstrated 
on a 2-input NAND gate in Figure 10(a), and the SG2-DVT, Figure 10(b). MOS devices with 
thicker gate-lines are LVT while the others (with thin gate lines) are HVT. The gate is split 
into two types of DVT gates. For the first type, the SG1-DVT, the gate is split into an 
AllHVT gate and an AllLVT gates connected in parallel. In the second type (SG2-DVT) the 
gate is also split into two gates; one with an LVT N-block and HVT P-block (LVTN-HVTP) 
and another with an LVT P-block and HVT N-block (LVTP-HVTN) as shown in figure 
10(b). The output of the LVTN-HVTP gate (type 1 output), which has a faster High-to-Low 
edge, is used to drive all the PMOS devices in the subsequent gates. The output of the LVTP-
HVTN gate (type 2), which has a faster Low-to-High edge, is used to drive all the NMOS 
devices in the subsequent gates. Thus both types of devices are driven by signals with the 
appropriate fast edge. This also means that the SG2-DVT gate has two outputs, rendering it 
only appropriate for dense logic where the wiring capacitance is a very small and insignificant 
portion of the fan out capacitance and the outputs do not need to be routed for a long distance. 
 
Another DVT circuit that will be compared to the SG-DVT circuits is the Alt-Gates DVT 
circuit option, representing the equal mixing of AllLVT and AllHVT gates in the logic path. 
This results from using LVT gates to reduce the delays of critical paths in a logic block. 
Hence both the delay of the Alt-Gates logic path and its static (leakage) power are expected 
to be half way between the AllLVT option and the AllHVT option. All the above gates are 
fully compatible with other CMOS circuits, both dynamic and static (i.e. can drive or be 
driven by conventional CMOS circuits). 
3.2 Delay Comparison 
The delay of the above-mentioned DVT circuits along with those of the AllHVT and the 
AllLVT were evaluated using 31-stages ring oscillators of 2-input NAND gates with a fanout 
of 1. Ring oscillators were used to evaluate the delay since this method gives a fairly accurate 
estimation of the average gate delay including the effects of output loading and input-
waveform slope. In all the comparisons, all circuits had equal input capacitances (i.e. equal 
Vol. 4.  176 Muhammad E.S. Elrabaa 
total gate areas). NAND gates were selected as test vehicles to account for series gating 
effects such as body effect and reverse biasing (negative VGS) of the series connected devices 
in the off state due to leakage. Also, NAND gates are very favored in static CMOS designs. 
The delay was evaluated as a function of the P/N ratio (defined as the ratio of the WPMOS to 
the WNMOS of the NAND gate) and is shown in Figure 11 normalized to the delay of the 
AllHVT circuit at P/N ratio of 0.5. As expected, the AllHVT and AllLVT represent the upper 
and lower delay limits. The optimum P/N ratio of all DVT falls between 1.35 to 1.45, similar 
to standard CMOS circuits. The AllLVT had an optimum delay that is ~ 22% lower than that 
of the AllHVT. This is consistent with published delay analysis data for series-connected 
MOSFET circuits [Sakurai and Newton,1991]. The AllLVT to AllHVT delay ratio can be 
approximated as:  
   Delay ratio = (VDD – VthHVT)n / (VDD – VthLVT)n .            
 
Where n is the saturation velocity index and ranges from 1 for very-short channel MOSFET 
devices with total velocity saturation to 2 for long-channel devices with no velocity saturation 
[Sakurai and Newton,1990]. For a 0.25µm technology the value of n is in the range 1.3~1.5 
and increases for series-connected devices such as in the NAND gates used in this work 
[Sakurai and Newton, 1991]. A delay ratio of 0.78 for the values of the LVT and HVT 
threshold voltages corresponds to an n of about 1.55. The SG1-DVT and the Alt-Gates 
circuits are very close and achieved an improvement in the minimum delay equivalent to half 
the AllLVT`s (~11%). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 10.  Schematics of the Split-Gate DVT 
O_1 
      (b) The SG2-DVT circuit 
in2_1 in1_1 
 in1_2 
 in2_2 
O_2 
 
(a) The SG1-DVT circuit  
in1 in2 
HVT 
devices 
LVT
Review of High-Speed Digital CMOS Circuits Vol. 4.  177 
 
The SG2-DVT, however, achieved an astonishing 20% improvement in the optimum delay 
over the AllHVT. Hence 90% of the speed improvement of the AllLVT was achieved by the 
SG2-DVT with only half the gate transistors being LVT, a very significant result. 
3.3 Energy-Delay Product comparisons 
Figure 12 shows the energy-delay (ED) products of the different ring oscillators. Again, the 
DVT circuits` performance lies between the AllHVT`s and the AllLVT`s. The SG2-DVT had 
a relatively higher ED product than the other DVT implementations because of two reasons: 
1) the difference between the edge rates of the inputs to the NMOS and PMOS blocks.  This 
causes a slight increase in the rush-through currents (currents from VDD to GND) during 
switching, 2) the low gate`s Fanout of 1. As the Fanout increases, the rush-through power 
becomes negligible part of the total switching power. The SG2-DVT still has a lower ED 
product than the AllHVT circuit. 
Figure 11. The normalized delay versus the P/N ratio at equal input capacitance. 
Figure 12. The Energy-Delay product versus the P/N ratio at equal input capacitance. 
 
3.4 DC (static) leakage: 
The average leakage of the different circuits is shown in Table 1 normalized to that of the 
AllHVT. This leakage was measured on the open-ring-oscillators (i.e. chains of 31 NAND 
gates) at the optimum-delay P/N ratio of each circuit and using DC simulations. Hence it 
0.6
0.7
0.8
0.9
1
0.5 0.75 1 1.25 1.5 1.75 2
AllHVT
AllLVT
Alt-Gates
SG1-DVT
SG2-DVT
N
or
m
al
iiz
ed
 D
el
ay
P/N Ratio
105
110
115
120
125
130
135
140
145
0.75 1 1.25 1.5 1.75 2
AllHVT
AllLVT
Alt-Gates
SG1-DVT
SG2-DVT
E
ne
rg
y-
D
el
ay
 (P
J.
nS
)
P/N Ratio
Vol. 4.  178 Muhammad E.S. Elrabaa 
accounts for the reverse biasing of the stacked NMOS devices due to the charging up of 
internal source nodes. All DVT circuits have close average DC leakage that is about half the 
AllLVT`s. 
 
Table 1. The DC leakage of the different circuits at their optimum delay 
P/N ratio and normalized to that of the AllHVT circuit. 
 
 
 
 
The above results represent the average leakage obtained by averaging the leakage resulting 
from the two possible extreme input combinations to the NAND chains (both inputs tied 
together). For the AllLVT and AllHVT circuits, the chain leakage is the same for both inputs. 
This is because in half the stages stacked N-devices are leaking, while for the next stage, the 
P-block is leaking. As for the SG1-DVT and SG2-DVT circuits the input does not make a 
significant difference since the leakage of each gate in the chain is always the same. The 
inputs status does make a huge difference for the Alt-Gates circuit. This is because the 
leakage will vary significantly depending on whether the LVT NMOS stack is leaking or the 
HVT NMOS stack is. The normalized leakage of the Alt-Gates circuit was found to change 
from 275 to 1171 (4.3x), depending on the input state. The SG-DVT`s average leakage is 
slightly higher than the Alt-Gates` due to the splitting of the SG-DVT`s gate. This reduces 
the reverse biasing at the internal nodes compared to the Alt-Gates and hence the higher 
average leakage. If the logic path were composed of different gates, then the leakage of both 
the AllLVT and the SG-DVT circuits would have greater dependency on the input pattern. 
The leakage of the SG-DVT circuits, however, would still be about half that of the AllLVT. 
Thus the designer would not have to worry about finding out and setting up a specific stand-
by input pattern to get this 50% saving. 
 
3.5 Active leakage 
The above results assume that the stand-by time is very large. In normal operation of most ICs 
this is not the case. Figure 13 below shows the normalized leakage of the four circuits in 
Table 1 normalized to the DC leakage of the AllHVT circuit as a function of stand-by time 
elapsed after switching. This figure shows that the DC value of leakage is only attained after 
about 0.2 µS of idle time. Still, the relative difference between the DVT circuits and the 
AllLVT remains the same (∼half). Also, for all circuits, the starting leakage is 67% higher 
than the DC value. The effect of this active leakage on the energy consumption per operation 
(EPO) of these circuits is shown in Figure 14. This figure shows the EPO as a function of the 
activity factor (AF) at a nominal frequency of 1 GHz and normalized to that of the AllHVT at 
100% AF. An AF of 100% means the circuit switches 2×109 times per second. As AF 
AllLVT Alt-Gates SG1-DVT SG2-DVT 
1444 724 760 760 
Review of High-Speed Digital CMOS Circuits Vol. 4.  179 
decreases, the EPO increases significantly for all circuits except the AllHVT due to the effect 
of leakage. However, with the exception of the SG2-DVT, the EPO of the DVT circuits 
remains significantly below that of the AllLVT. The EPO of the SG2-DVT starts at an equal 
value to that of the AllLVT but then becomes significantly lower below an AF of 2%. This is 
again due to the unity fanout which over emphasis the rush-through currents at high AF. 
Hence, at low AF the SG2-DVT circuit would achieve 90% of the AllLVT speed at a 
significantly lower EPO. The EPO difference grows as the AF decreases further. The AF is 
usually very small for general purpose logic blocks (e.g. ALUs). 
 
Figure 13. The normalized leakage versus elapsed stand-by  
         time measured from the end of switching. 
 
4. CONCLUSIONS 
 
DOMINO circuits continue to be the major choice for high-speed logic circuits. However, the 
trade-off between noise margins and speed in conventional DOMINO circuits prevented them 
from benefiting from the scaling down of technologies and supply voltages. This is because 
they could not tolerate the lower threshold voltages necessary at the lower supply voltages. 
This in turn meant that expensive dual Vt technologies had to be used if DOMINO is to 
continue to be used in future scaled technologies. The newly developed contention-free 
DOMINO resolves this trade-off. The speed of the CF-DOMINO continues to improve as the 
threshold voltages are scaled down while its noise margins are kept constant. This enables the 
usage of single low Vt devices in scaled down technologies while retaining the speed 
advantage of DOMINO.  It was also shown that the CF-DOMINO consumes less dynamic 
power than the conventional DOMINO due to its contention-free operation, has less leakage 
and does not impact the area significantly. A new trend in high-speed logic has emerged in the 
last few years that utilize low-Vt devices to gain speed. This however, increased the leakage 
beyond practical limits and necessitated the development of dual-Vt circuits that attempts to 
reduce the leakage. These techniques impacted both the speed and design time of these 
circuits. A new type of dual-Vt logic, the split-Gate logic, was introduced to avoid that. Two 
flavors of the SG-DVT were developed; the SG1-DVT and the SG2-DVT. The SG1-DVT 
500
1000
1500
2000
2500
0.001 0.01 0.1 1 10
AllLVT
Alt-Gates
SG-DVT
N
or
m
al
iz
ed
 L
ea
ka
ge
Time (uS)
Vol. 4.  180 Muhammad E.S. Elrabaa 
achieved identical performance, in terms of speed and power, to regular DVT circuits 
consisting of equally mixed LVT and HVT gates. However, it significantly reduces the stand-
by leakage dependency on the logic block input pattern.  Hence the designer can get the speed 
and leakage advantage of regular DVT circuits without out performing the excruciating task 
of determining the optimum input pattern. This task would otherwise be re-done every time 
there is a design change in the logic block. The SG2-DVT, in addition to reducing the input 
patter leakage dependency, achieved 90% of the speed gains of the AllLVT circuits at half 
the leakage. The SG2-DVT is specially suited for dense logic blocks since it has two outputs 
per gate. These outputs, however, can be tightly routed together since they have the same 
polarity, just different edge-speeds. Also, the SG2-DVT circuit suffers from a relatively 
higher rush through currents due to the slow edge signals that feed the LVT devices in these 
gates. This problem, however, becomes less significant at higher Fanout. The energy per 
operation (EPO), was used to evaluate the effect of active leakage on energy consumption. 
The dual-Vt circuits achieved much lower energy per operation than the all-low-Vt circuits, 
especially at low activity factors. This is a very significant result since most digital blocks 
operate at a very low activity factor. 
Figure 14. Normalized Energy per Operation vs. Activity 
        Factor at a nominal frequency of 1GHz. 
 
5. ACKNOWLEDGEMENT 
The author is grateful for the facilities support provided by KFUPM.  
 
REFERENCES 
1. Alvandpour ,A., 1999,  P. Larsson-Edefors , and C. Svensson, “A Leakage-Tolerant Multi-Phase 
Keeper for Wide DOMINO Circuits,” IEEEInternational Conference on Electronics, Circuit, and 
Systems Tech. Dig., pp. 209-212. 
2. Alvandpour, A., Krishnamurthy, R., Soumyanath, K., and Borkar, S.,2001,”A Conditional Keeper 
Technique for Sub-0.13µ Wide Dynamic Gates,” Sym. On VLSI Circuits Tech. Dig., pp. 29-30. 
1
1.02
1.04
1.06
1.08
1.1
1.12
1.14
1.16
0.1 1 10 100
AllHVT
AllLVT
Alt-Gates
SG1-DVT
SG2-DVT
N
or
m
al
iz
ed
 E
ne
rg
y/
O
pe
ra
tio
n
Activity Factor (%)
Review of High-Speed Digital CMOS Circuits Vol. 4.  181 
3. Brodersen ,R.W., Chandrakasan, A., and Sheng, S.,1993, “Design techniques for portable 
systems,” International Solid-State Circuit Conference Tech. Dig., pp. 168-169. 
4. Chen, Z., Wei, L., Johnoson, M., and Roy, K.,1998,:`Estimation of Standby Leakage Power in 
CMOS Circuits Considering Accurate Modeling of Transistor Stacks`, Int. Sym. On Low Power 
Electronics and Design, ISLPED`1998, August 1998, Monterey, CA, USA, pp. 1-6. 
5. Elrabaa ,M., Anis, M., and Elmasry, M.,  “A Contention-Free DOMINO Logic For Scaled-Down 
CMOS,” To appear in the Institute of Electronics, Information, and Communication Engineers 
Transactions on Electronics (Japan). 
6. Fletcher T.,1994, “Microprocessor Technology Trends,” International Electron Device Meeting 
Tech. Dig., pp. 269-271. 
7. Halter, J., and Najm, F.,1997: `A Gate-Level Leakage Power Reduction Method for Ultra-Low-
Power CMOS Circuits`, Custom Integrated Circuits Conf., CICC`, May , San Francisco, 
California, USA, pp. 475-478 
8. Harris ,D. and Horowitz, M.,1997,”Skew-Tolerant DOMINO Circuits,” IEEE Journal of Solid-
State Circuits, vol. 32, No. 11, pp. 1702-1711, November. 
9. Horiguchi, M., Sakata, T., and Itoh, K,1993,.: `Switched-Source-Impedance CMOS Circuit for 
Low Standby Subthreshold Current Giga-Scale LSIs`, IEEE J. of Solid-State Circuits, 1993, 28 (5) 
pp. 1131-1135 
10. J. Kao, Chandrakasan, A., and Antoniadis, D,1997,.: `Transistor Sizing Issues and Tool for Multi-
Threshold CMOS Technology`, 34th Design Automation Conf., DAC`97, June, Anaheim, 
California, USA, pp. 409-414 
11. J. Kao,1999 “Dual Threshold Voltage DOMINO Logic,”  Proceedings of the IEEE 25th  European 
Solid-State Circuits Conference,  pp. 118-121, September. 
12. Kao, J., Narendra, S., and Chandrakasan, A.,1998,: `MTCMOS Hierarchical Sizing Based on 
Mutual Exclusive Discharge Patterns`, 35th Design Automation Conf., DAC`98, June, San 
Francisco, California, USA, pp. 495-500 
13. Krambeck, R. H., Lee, C. M. and Law ,H-F S.,1982, “High-Speed Compact Circuits  with 
CMOS,” IEEE Journal of Solid-State Circuits, vol. 17, No. 3, pp. 614-619,  June. 
14. Kuroda, T., Suzuki, K., Mita, S., Fujita, T., Yamane, F., Sano, F., Chiba, A., Watanabe, Y., 
Matsuda, K., Maeda, T., Sakurai, T., and Furuyama, T.,1998,: `Variable Supply-Voltage Scheme 
for Low-Power High-Speed CMOS Digital Design`, IEEE J. of Solid-State Circuits, 33 (3) 
pp. 454-462. 
15. M. Elrabaa and M. Elmasry,2001, “Split-Gate Logic Circuits for Multi-Threshold Technologies,” 
ISCAS’01, vol. IV, pp. 798-801, Sydney, Australia. 
16. Makino, H., Tsujihashi, Y., Nii, K., Morishima, C., Hayakawa, Y., Shimizu, T., and Arakawa, 
T.,1998,: `An Auto-Backgate-Controlled MT-CMOS Circuit`, Symp. On VLSI Circuits, June1998, 
Honolulu, Hawaii, USA, pp. 42-43 
17. Mutoh, S., Douseki, T., Matsuya, Y., Aoki, T., Shigematsu, S., and Yamada, J.,1995,: `A 1-V 
Power Supply High-Speed Digital Circuit Technology with Multi-threshold Voltage CMOS`, 
IEEE J. of Solid-State Circuits, 30 (8) pp. 847-854 
18. Sakurai, T. and Newton, A. R.,1990,: `Alpha-power law MOSFET model and its application to 
CMOS inverter delay and other formulas`, IEEE J. of Solid-State Circuits, 25 (4) pp. 584-593. 
Vol. 4.  182 Muhammad E.S. Elrabaa 
19. Sakurai, T. and Newton, A. R.,1991,: `Delay Analysis of Series-Connected MOSFET Circuits`, 
IEEE J. of Solid-State Circuits,  26 (2) pp. 122-131 
20. Stan, M.,1998,: `Low-Threshold CMOS Circuits with Low Standby Currents`, Int. Symp. On Low 
Power Electronics and Design, ISLPED`1998, August , Monterey, California, USA, pp. 97-98 
21. Thompson ,S. Young,,I., Greason, J., and Bohr, M.,1997 “Dual Threshold Voltages and Substrate 
Bias: Keys to High Performance, Low Power, 0.1 µm Logic Designs,”  Sym. On VLSI Technology 
Tech. Dig., pp. 69-70. 
