NP domino logic gates for Ultra Low Voltage and High Speed applications by Mahmood, Sohail Musa
UNIVERSITY OF OSLO
Department of Informatics
NP domino logic
gates for Ultra Low
Voltage and High
Speed applications
Master thesis
Sohail Musa
Mahmood
Spring 2013

NP domino logic gates for Ultra Low Voltage
and High Speed applications
Sohail Musa Mahmood
Spring 2013
ii
Abstract
In this thesis we present different configurations of digital circuits exploit-
ing Ultra Low Voltage (ULV) NP domino logic style. The proposed logic
style is utilized with the help of Floating gate transistors.
The proposed NP domino logic gates are aimed to perform high speed
operations in Ultra Low Voltage applications. The presented circuits
may operate near the sub-threshold regime where the supply voltage is
near the threshold voltage of the transistors. In terms of frequency,
speed, robustness, Power Delay Product (PDP) and Energy Delay Product
(EDP), the proposed ULV NO domino logic gates may offer significant
improvement compared to the conventional CMOS logic gates.
Different implementations of NOT, NAND and NOR gates are presented
using both conventional and Pass Transistor Logic styles. Further, NAND
and NOR gates are used to employ different configurations of Carry gates
which is a speed limited factor in many arithmetic operations. These ULV
NP domino Carry gates are simulated at different supply voltages in the
range of 100mV to 400mV, and the performance results are presented with
respect to delay, power, PDP and EDP.
The proposed ULV NP domino Carry gates are cascaded together to
perform addition in a 32-bit chain. The circuits are operated with respect to
worst case scenario where the carry signal propagates through the whole
chain. Multi-threshold (MTCMOS) and Variable-threshold (VTCMOS)
techniques are employed in the ULV domino 32-bit carry chain in order
to reduce the power consumption, meanwhile offering superb speed
performance. Although the 32-bit carry chain offers a great advantage of
speed improvement in the worst case scenario, the chain also introduces
the drawback of enormous power consumption in the idle mode.
The work in this thesis has resulted in three papers. Two of these papers
represent various configurations of 1-bit ULV NP domino Carry gates, while
the third paper examines the performance of one of the proposed ULV NP
domino carry gates in a 32-bit chain.
The simulation results presented in this thesis are obtained using a
90nm TSMC CMOS process.
iii
iv
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 5
2.1 Conventional CMOS logic . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Dynamic logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Challenges in Dynamic logic . . . . . . . . . . . . . . . . 6
2.3 Domino logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 NP Domino logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Keepers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Floating gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Pass Transistor Logic . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8.1 Half Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8.2 Full Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.9 Multi-threshold CMOS Technology . . . . . . . . . . . . . . . . 10
2.10 Variable-threshold CMOS Technology . . . . . . . . . . . . . . . 11
2.11 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.12 Figure of Merit in logic gates . . . . . . . . . . . . . . . . . . . . 11
2.12.1 Power Delay Product . . . . . . . . . . . . . . . . . . . . . 11
2.12.2 Energy Delay Product . . . . . . . . . . . . . . . . . . . . 12
3 Performance of CMOS at ultra low supply voltages 13
3.1 Challenges at low supply voltages . . . . . . . . . . . . . . . . . 13
3.2 F : The strength tunable factor of the transistor . . . . . . . . . 14
3.2.1 Implementation of Deep n-well . . . . . . . . . . . . . . 16
3.2.2 Imbalance factor between nMOS and pMOS . . . . . . 17
3.3 Power Dissipation in CMOS . . . . . . . . . . . . . . . . . . . . . 17
3.3.1 Dynamic power dissipation . . . . . . . . . . . . . . . . . 18
3.3.2 Static power dissipation . . . . . . . . . . . . . . . . . . . 19
4 ULV NP domino Inverters 21
4.1 N type ULV domino inverter . . . . . . . . . . . . . . . . . . . . 22
4.2 P type ULV domino inverter . . . . . . . . . . . . . . . . . . . . . 28
4.3 A chain of ULV NP domino inverters . . . . . . . . . . . . . . . 30
v
5 ULV NP domino Logic gates 33
5.1 ULV NP domino NAND Gates . . . . . . . . . . . . . . . . . . . . 33
5.2 ULV NP domino NOR gates . . . . . . . . . . . . . . . . . . . . . 35
5.3 ULV NP domino NAND/NOR gate using Pass Transistor Logic 36
6 ULV NP domino Carry gates for high speed Full Adders 39
6.1 Ultra-Low-Voltage and High Speed NP domino Carry circuit . 40
6.2 ULV NP domino Carry gates utilizing Pass Transistor Logic . 43
6.3 NP domino Carry gates Performance . . . . . . . . . . . . . . . 47
6.3.1 MonteCarlo Simulations . . . . . . . . . . . . . . . . . . 52
6.3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7 Different configurations of 32-bit Carry chain by exploiting
ULV NP domino logic style 55
7.1 32-bit carry chain using NP domino Carry 1 gates . . . . . . . . 55
7.1.1 A solution without Forward Body Biasing on nMOS
transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.1.2 32-bit carry chain utilizing Multi-threshold CMOS
Technique (MTCMOS) . . . . . . . . . . . . . . . . . . . 60
7.1.3 32-bit carry chain utilizing Variable-threshold CMOS
Technique (VTCMOS) . . . . . . . . . . . . . . . . . . . . 61
7.1.4 VTCMOS and MTCMOS Technique . . . . . . . . . . . . 63
7.2 32-bit carry chain using NP domino Carry 2 gates . . . . . . . . 64
7.3 32-bit carry chain using NP domino Carry 3 gates . . . . . . . . 66
7.4 New implementations of 32-bit carry chain exploiting PTL . . 68
7.5 Performance of ULV 32-bit carry chains at different supply
voltages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8 Results - Overview of the papers 79
8.1 Paper I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.2 Paper II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.3 Paper III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9 Discussion 83
9.1 Power consumption in the idle mode . . . . . . . . . . . . . . . 83
9.2 Performance of ULV NP domino carry chains with Pass
Transistor Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Leakage at the output nodes . . . . . . . . . . . . . . . . . . . . . 84
10 Conclusion 85
10.1 Summary of the contributions . . . . . . . . . . . . . . . . . . . 85
10.2 Innovation throughout the project . . . . . . . . . . . . . . . . . 86
10.3 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A Truth Tables 89
B Publications 91
vi
List of Figures
2.1 NAND gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Dynamic cascade inverters. . . . . . . . . . . . . . . . . . . . . . 7
2.3 Domino cascaded inverters. . . . . . . . . . . . . . . . . . . . . . 7
2.4 NP Domino Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Floating gate transistor. . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 The ON-current ION through an nMOS transistor with
different dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Deep n-well process architecture. . . . . . . . . . . . . . . . . . 16
3.3 Dynamic power dissipation in a conventional CMOS inverter[24]. 18
3.4 Leakage currents in a MOS transistor[26]. . . . . . . . . . . . . 19
4.1 Different configurations of ULV domino inverters [12]. . . . . 21
4.2 N type ULV domino inverter. . . . . . . . . . . . . . . . . . . . . 23
4.3 Simulation results of N type ULV domino inverter. . . . . . . . 24
4.4 Different configurations of N type ULV domino inverter. . . . 26
4.5 Simulation results of different configurations of N type ULV
domino inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 P type ULV domino inverter. . . . . . . . . . . . . . . . . . . . . 28
4.7 Speed performance of P type ULV domino inverter compared
with conventional CMOS inverter. . . . . . . . . . . . . . . . . . 29
4.8 Different configurations of P type ULV domino inverter. . . . 29
4.9 Robustness performance of different configurations of P type
ULV domino inverter. . . . . . . . . . . . . . . . . . . . . . . . . 30
4.10 ULV NP domino chain with 8 inverters. . . . . . . . . . . . . . . 31
4.11 Simulation results of 8 ULV NP inverters in a domino chain. . 31
5.1 NP ULV domino NAND gate. . . . . . . . . . . . . . . . . . . . . 33
5.2 Simulation results of ULV NP domino NAND gates. . . . . . . 34
5.3 NP ULV domino NOR gate. . . . . . . . . . . . . . . . . . . . . . 35
5.4 Simulation results of ULV NP domino NOR gates. . . . . . . . 36
5.5 ULV NP domino logic Gates using PTL. . . . . . . . . . . . . . . 37
5.6 Simulation results of ULV NP domino logic gates using PTL. . 38
6.1 Four Bits Full Adder . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 N type ULV domino Carry Gate (Carry 1a). . . . . . . . . . . . . 40
6.3 Simulation results for N type ULV domino Carry gate. . . . . . 41
6.4 P type ULV domino Carry Gate (Carry 1b). . . . . . . . . . . . . 42
6.5 Simulation results for P type ULV domino Carry gate. . . . . . 43
vii
6.6 ULV domino Carry Gates using PTL (Carry 2). . . . . . . . . . . 44
6.7 Simulation results for the worst case scenario of ULV NP
domino Carry gates implemented in Figure 6.6. . . . . . . . . . 45
6.8 ULV domino Carry Gates using PTL (Carry 3). . . . . . . . . . . 46
6.9 Simulation results for the worst case scenario of ULV NP
domino Carry gates implemented in Figure 6.8. . . . . . . . . . 47
6.10 Average Delay for the proposed ULV NP domino Carry gates
for different supply voltages. . . . . . . . . . . . . . . . . . . . . 49
6.11 Delay of proposed ULV domino carry gates relative to
conventional CMOS carry gate for different supply voltages. . 49
6.12 Average power consumption per ULV domino Carry gate
compared to conventional CMOS carry gate. . . . . . . . . . . . 50
6.13 Average energy of ULV domino carry gates relative to the
Conventional Carry gate at different supply voltages. . . . . . . 51
6.14 Average Energy Delay Product of ULV domino Carry gates
compared to conventional Carry gate at different supply
voltages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.15 Average Delay per ULV domino Carry gate with 100 monte-
carlo simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.16 Average Power consumption per ULV domino Carry gate
with 100 montecarlo simulations. . . . . . . . . . . . . . . . . . . 53
6.17 Average PDP per ULV domino Carry gate with 100 monte-
carlo simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.18 Average EDP per ULV domino Carry gate with 100 monte-
carlo simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1 NP domino n-bit carry chain 1. . . . . . . . . . . . . . . . . . . . 55
7.2 Simulation result of 32-bit carry chain 1. . . . . . . . . . . . . . 56
7.3 Simulation result of 32-bit carry chain 1 with FBB on N and
P transistors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.4 Simulation result of 32-bit carry chain 1 without FBB on
nMOS transistor N . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.5 ULV domino Carry 1 Gates utilizing MTCMOS technology. . . 60
7.6 Simulation result of 32-bit carry chain 1 utilizing MTCMOS
technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.7 Simulation result of 32-bit carry chain implemented in
Circuit 7.1 utilizing VTCMOS technique. . . . . . . . . . . . . . 62
7.8 Simulation result of 32-bit carry chain implemented in
Circuit 7.1 utilizing both MTCMOS and VTCMOS techniques. 63
7.9 NP domino n-bit carry chain 2. . . . . . . . . . . . . . . . . . . . 64
7.10 Simulation result of 32-bit carry chain implemented in
Circuit 7.9 when only input bits B get transitions. . . . . . . . . 65
7.11 Simulation result of 32-bit carry chain implemented in
Circuit 7.9 when only input bits A get transitions. . . . . . . . . 66
7.12 NP domino n-bit carry chain 3. . . . . . . . . . . . . . . . . . . . 66
7.13 Simulation result of 32-bit carry chain implemented in
Circuit 7.12 when only input bits B get transitions. . . . . . . . 67
viii
7.14 Simulation result of 32-bit carry chain implemented in
Circuit 7.12 when only input bits A get transitions. . . . . . . . 68
7.15 ULV domino Carry Gates using PTL (Carry 4). . . . . . . . . . . 68
7.16 Simulation result of 32-bit carry chain 4 implemented in
Circuit 7.15 when only input bits A get transitions. . . . . . . . 69
7.17 Simulation result of 32-bit carry chain 4 implemented in
Circuit 7.15 when only input bits B get transitions. . . . . . . . 70
7.18 ULV domino Carry Gates using PTL (Carry 5). . . . . . . . . . . 71
7.19 Simulation result of 32-bit carry chain 5 implemented in
Circuit 7.18 when only input bits A get transitions. . . . . . . . 71
7.20 Simulation result of 32-bit carry chain 5 implemented in
Circuit 7.18 when only input bits B get transitions. . . . . . . . 72
7.21 Delay for two ULV NP domino 32-bit carry chains compared
with conventional 32-bit carry chain for different supply
voltages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.22 Power consumption of ULV domino 32-bit carry chain
compared to conventional CMOS carry chain. . . . . . . . . . . 74
7.23 EDP for two ULV NP domino 32-bit carry chains compared
with conventional 32-bit carry chain for different supply
voltages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
ix
x
List of Tables
3.1 Relative threshold voltage (thr), ION, PON, Ioff and Poff for
various configurations of nMOS transistor. . . . . . . . . . . . . 15
4.1 Simulation Results of different Delays of N type domino
inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Performance of different configurations of N type ULV
inverter relative to conventional CMOS inverter at a supply
voltage of 300mV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Speed performance of ULV NP domino 8 inverters chain. . . . 32
6.1 Performance of ULV domino Carry gates compared to con-
ventional CMOS Carry gate at different supply voltages. . . . . 48
6.2 The delay, PDP and EDP of ULV domino carry gates at
Minimum Energy Point (250mV) relative to conventional
CMOS carry gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1 The working principle for the NP Carry gates in a Domino
chain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.2 Strength parameters for different transistors in various
configurations of 32-bit carry chains. . . . . . . . . . . . . . . . 75
7.3 Performance of various configurations of 32-bit carry chains
in the worst case scenario. . . . . . . . . . . . . . . . . . . . . . . 76
7.4 Power consumption and deviation of 32-bit carry chains in
the Wai t Mode I. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.5 Power consumption and deviation of 32-bit carry chains in
the Wai t Mode II. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.1 Truth table of main logical functions . . . . . . . . . . . . . . . . 89
A.2 Truth table: Half Adder . . . . . . . . . . . . . . . . . . . . . . . 89
A.3 Truth table: Full Adder . . . . . . . . . . . . . . . . . . . . . . . . 89
xi
xii
Acronyms
ALU Arithmetic Logic Unit
ASIC Application-Specific Integrated Circuit
Avg Average
CMOS Complementary Metal Oxide Semiconductor
Dev Deviation from the rails
EN nMOS evaluation transistor
EP pMOS evaluation transistor
EDP Energy Delay Product
FBB Forward Body Biasing
FG Floating Gate
FPU Floating Point Unit
GND Ground
H − thr High threshold transistor
KN nMOS keeper transistor
KP pMOS keeper transistor
KHz Kilo Hertz
L− thr Low threshold transistor
MEP Minimum Energy Point
MHz Mega Hertz
MOSFET Metal Oxide Semiconductor Field-Effect Transistor
MTCMOS Multi-Threshold CMOS
N t ype Output node precharges to 1
NM Noise Margin
xiii
nMOS N-channel MOSFET
P t ype Output node precharges to 0
PDN Pull Down Network
PDP Power Delay Product
pMOS P-channel MOSFET
PTL Pass Transistor Logic
PUN Pull Up Network
RN nMOS recharge transistor
RP pMOS recharge transistor
RBB Reverse Body Biasing
S− thr Standard threshold transistor
TD Propagation Delay
TF Fall Time
TR Rise Time
TSMC Taiwan Semiconductor Manufacturing Company
ULV Ultra Low Voltage
VDD The supply voltage
VTH Threshold Voltage of the transistor
V LSI Very Large-Scale Integration
V TCMOS Variable-Threshold CMOS
xiv
Preface
This master thesis was carried out at the Department of Informatics,
Faculty of Mathematics and Natural Sciences, University of Oslo (UiO) in
the period January 2012 - May 2013. The thesis is for the grade as Master
of Science in Nano and Micro-electronics and contributes 60 credits.
Executing the master thesis has been both ambitious and interesting.
This project contributes a great experience in life. Among those, the most
important is the publication of three papers. The work has also provided
me the deeper knowledge and understanding for the Nano-electronic field
and the challenges as the technology scales down.
First and foremost, I would like to thank my supervisor, Professor
Yngvar Berg, for providing all the valuable guidance and inspiration. Thank
you for believing in my work and for giving me the freedom to do what I
wanted to do. Special thanks go to Amir Hasanbegovic for the technical
support and for being an important source of inspiration and knowledge.
Thanks to my fellow student Øystein Bjørndal for helping me with LATEX
which made this thesis book even more beautiful.
A great thanks to the master lab buddies, Erlend, Erik, Dag, Patrick
and Alex for fruitful discussions, quiz, jokes, video games at the lab, which
makes long days to short. In addition, I would like to thanks all the
employers at the micro electronics group for being so helpful, and for
providing such a great working environment.
Last but not least, a great thanks to my parents and my family for their
support and motivation throughout the whole project. And a very special
thanks to my best friend and co-student, Abdul Wahab Majeed for being
with me for 8 years in my study life, to support me, to motivate me, to
stand-up with me and to tolerate me.
SOHAIL MUSA MAHMOOD
2nd May 2013
xv
xvi
Chapter 1
Introduction
Since the first CMOS invention in 1960s, the CMOS technology has grown
at an unprecedented rate than any other human invention in the modern
era. As the portable devices (ipads, laptops, mobile phones) and wireless
systems are becoming more and more common in everyday life, the
demand for extended battery life, low weight of electronic devices and
superior speed is becoming more and more challenging. CMOS is well
known for the ultra-low power systems such as implantable medical
devices that require longer lives with tiny batteries. The rapidly growing
applications on these portable devices run out their batteries very quickly.
Thus power consumption is becoming the major design concern.
1.1 Motivation
Several approaches have been suggested in [1], [2] in order to reduce power
consumption of Very Large-Scale Integration (VLSI) circuits. Among those,
scaling the supply voltage is one of the most efficient ways to reduce
power and energy consumption as the power consumption in digital CMOS
circuits is proportional to the square of the power supply voltage. The
circuits operate at low supply voltages near or below the threshold voltages
of the CMOS transistors. The reduction in the supply voltage degrades
the CMOS transistor performance with respect to speed as the nodes are
charged and discharged by weak/moderate inversion currents. By using the
conventional CMOS technology at ultra low supply voltages, the operating
frequencies of the digital circuits have been reduced to the range of KHz
and low MHz. Several approaches are proposed in [2],[3], [4] in order
to achieve high speed performance in the digital CMOS circuits when the
supply voltage is scaled down.
A full adder plays an important role in many arithmetic units such as
addition, subtraction, multiplication and division. Addition is the most
fundamental arithmetical operation in any kind of processor, and building
block for all other units. It has a significant use in Arithmetic Logic
Unit (ALU), Floating Point Unit (FPU) and Application-Specific Integrated
Circuit (ASIC) where high processing speed is critical. The main aim of
this thesis is to implement the digital CMOS logic gates by exploiting the
Floating-gate technique in order to enhance the speed performance of the
1
full adder at ultra low supply voltages.
1.2 Previous work
In the late 1980s, the floating gate transistors were used in non-volatile
memory elements. During 1990s, new methods and techniques are
suggested in [5], [6], [7], [8] in order to use floating-gate devices in different
applications, for example, in audio recording products and flash memories.
In the recent years, floating gate (FG) technique is proposed for
Ultra-Low Voltage, Low power applications in both analog and digital
circuits. Floating gate technique can be fabricated using a standard CMOS
process. It can be either poly-poly, MOS or metal-metal[9] where an extra
capacitance is connected serially to the gate terminal of the MOS transistor.
This makes the gate terminal charged and discharged and thus floating as
the gate terminal is not connected to a fix potential. By tuning the charge
at the floating node, a different DC level can be achieved than provided
by the supply voltage headroom. This shifts the threshold voltage of the
MOS transistor, which affects the active current of the transistor. The gates
proposed in this thesis are influenced by ULV non-volatile FG circuits and
recharge logic presented in [10] and [11] respectively. The ULV NP domino
logic was first presented in [12].
1.3 Thesis Outline
• Chapter 1 gives a brief introduction to the today’s technology,
and some challenges are discussed as the technology scales down.
Further, the motivation for the thesis is given. The previous works
are also stated exploiting the floating gate technology.
• Chapter 2 follows an introduction to the conventional CMOS logic,
dynamic logic, domino logic and NP domino logic. This chapter also
provides some common definitions for various CMOS techniques and
figure of merits in the CMOS digital circuits.
• Chapter 3 describes the behavior of the CMOS transistors at ultra
low supply voltages. Furthermore, the main challenges at ultra
low supply voltages are briefly discussed with respect to speed
performance, robustness and power consumption.
• Chapter 4 represents a detailed description of the ULV NP domino
inverters which are utilized in conjunction with floating gate transis-
tors. Different configurations are shown in order to reduce the static
current consumption and increase the robustness of the exploiting
logic style.
• Chapter 5 represents NAND and NOR logic gates using the NP
domino floating gate logic style. Different implementations of these
gates are shown using conventional and pass transistor logic style.
2
• Chapter 6 represents the novel configurations of ULV NP domino
Carry gates which are implemented with the help of proposed ULV
NP domino logic gates. A complete performance of these Carry gates
have been simulated and compared with conventional Carry gates at
different ultra low supply voltages.
• Chapter 7 shows different implementations of 32-bit carry propaga-
tion chains utilizing ULV NP domino Carry gates. Some challenges
are discussed which can occur in long domino chains and suggestions
are given to compensate with those challenges.
• Chapter 8 represents the review and summary for the three papers
written throughout the thesis.
• Chapter 9 discusses some of the main aspects of the thesis.
• Chapter 10 summarizes the main contributions of the thesis. Some
ideas and suggestions are also mentioned in this chapter for the
further contribution in the thesis work.
• Appendix A shows the truth tables for the main digital logic gates
utilized in this thesis.
• Appendix B includes the papers written throughout the thesis.
3
4
Chapter 2
Background
2.1 Conventional CMOS logic
Conventional CMOS logic use complementary pull-down network (PDN)
and pull-up network (PUN) to drive output node to 0 and 1 respectively.
Both PDN and PUN are used when a transition arrives at the input nodes.
Conventional logic is robust, easy to design and have good noise margins
as far as circuits operate in strong inversion (super threshold region).
Considering the example of a conventional NAND gate. The PUN consists
of two pMOS transistors in parallel, and PDN consists of two serially
connected nMOS transistors, which are connected to the power supply
voltage (VDD) and ground (GND) respectively.
The major drawback for the conventional logic style is that the
transistors in both PDN and PUN switch on when the transitions arrive
at the input nodes. This increases the total input capacitance and hence
the delay. This logic uses more transistors to perform a logical operation as
compared to dynamic logic. Thus it is not suitable for high density circuits.
2.2 Dynamic logic
To enhance the speed performance for the logic gates, the designers
have implemented dynamic CMOS logic gates. Dynamic circuits typically
use fewer transistors to implement a given logic function, which directly
reduces the amount of capacitance being switched and improves the speed
performance for the circuits. We use a clock signal φ to control the circuits
as shown in Figure 2.1b.
Dynamic circuit operates in two phases. During the precharge phase,
the clock φ is 0 which turns on the pMOS transistor P1, and the output node
precharges to VDD. During the evaluation phase, the clock φ is 1 which
turns off P1. The output may remains high or become low depending upon
the transitions at the input nodes in the evaluation phase. In the case of a
NAND gate, both nMOS transistors N1 and N2 in the PDN must turn on to
discharge the output node to GND.
The main advantages of dynamic logic over conventional CMOS logic
are reduced switching activity due to hazards, elimination of short-circuit
dissipation, and reduced parasitic node capacitances[1].
5
AAB
N1
P1 P2
N1
A B
B
_
(a) Conventional CMOS
A
AB
N1
N2B
_
ф P1
(b) Dynamic CMOS
Figure 2.1: NAND gate.
2.2.1 Challenges in Dynamic logic
Timing and clock synchronization is the most critical task in the dynamic
logic as the correct operation of the dynamic gates strongly depends upon
the timing of the clock signal and the transitions at the input nodes[13].
If transitions arrive at any of the input nodes during the precharge phase
in the dynamic gates, footed nMOS transistors must be implemented at
the bottom of the PDN. The gate of the footed nMOS transistor will be
controlled by φ. This prevents the output node to be discharged during
the precharge phase.
One other major disadvantage of the dynamic circuits is the charge
leakage at the floating nodes. For example, if PDN is off during the
evaluation phase, the output node should ideally hold the precharged value.
But the charge falls down slowly due to some leakage currents in the
transistors. The evaluation phase should be short in order to prevent the
leakage at the floating output node. Thus dynamic logic style is not suitable
for the low frequency systems.
Another major problem occurs when the dynamic circuits cascaded in
a chain as shown in Figure 2.2. As both cascaded inverters are precharged
by the same clock signal φ, the output nodes of both inverters precharges to
VDD. This gives a logically incorrect value at the output node X of the second
cascaded inverter when a positive transition arrives at the input node A.
This concludes that the dynamic circuits which are sharing the same clock
signal cannot cascade directly.
6
AA
N1
N3
   _ф P1
ф
X False Transition
N2
N4
ф P2
ф
Figure 2.2: Dynamic cascade inverters.
2.3 Domino logic
Domino logic are utilized in the digital circuits such as microprocessors
where high speed and area characteristics are critical. It has many
advantages such as high speed operation, minimum used area and power
consumption savings. Domino logic overcomes the cascaded problem faced
by the dynamic logic gates.
A
A
N1
N3
   _ф P1
ф
A
N2
P2
A
N4
N5
   _ф P3
ф
A
N6
P4
Figure 2.3: Domino cascaded inverters.
Figure 2.3 shows a chain of cascaded inverters connected in a Domino
logic. The conventional CMOS inverters are connected at the output
nodes of dynamic inverters, which are further connected to the dynamic
inverters in the chain. During the precharge phase, the output nodes of the
dynamic and the conventional inverters are precharged to VDD and GND
respectively. In the evaluation phase, the output node of the first dynamic
inverter remains high or discharges toGND, depending upon the transition
7
at the input node. If the input transition is from 0 to 1, the effect may ripple
through the whole chain, from the first to the last inverter, in the same way
as the dominos trigger from the first to the last element in the chain.
Domino logic uses a single clock to precharge and evaluate all the logic
gates within the chain. By using the same clock signal φ, precharging
occurs parallel for each element in the chain, but the evaluation occurs
serially, from the first to the last element in the domino chain. Domino logic
is somehow better than dynamic logic, but the inclusion of conventional
CMOS inverters at the output nodes of dynamic high speed inverters limit
the speed performance for the proposed logic style.
2.4 NP Domino logic
NP Domino logic is used to substitute the conventional CMOS inverters
at the output nodes of dynamic inverters in the domino logic. The
conventional CMOS inverters are substituted with the precharged dynamic
gates using PUN and an inverted clock signal φ as shown in Figure 2.4.
During the precharge phase, φ is low and φ is high which precharges PDN
and PUN to VDD and GND respectively. During the evaluation phase, PDN
discharges toGND and PUN charges toVDD depending upon the transitions
at the input nodes. The input transitions at the first NP domino gate ripple
through the whole chain in a single evaluation phase.
PDN
N1
ф P1
ф
PUN
N2
P2
_ф
_ф
PDN
N3
ф P3
ф
PUN
N4
P
_ф
_ф
4
To further
  N block
Vin
Figure 2.4: NP Domino Logic.
2.5 Keepers
As mentioned earlier, the dynamic circuits suffer from the charge leakage
at the dynamic nodes. If a dynamic node is precharged high and then
left floating, the voltage on the output node will reduce over time due to
subthreshold, gate and junction leakage. Moreover, dynamic nodes have
8
poor noise margin. These two problems can be overcomed by using keeper
transistors[14]. The keeper is a weak transistor that holds the output node
at the correct level when it would otherwise float. The keeper reduces
the static current consumption by draining one of the transistors at the
output node. The reduction in the static current reduces the overall power
consumption. However the load capacitance at the output node increases,
which degrades the speed performance slightly.
2.6 Floating gate
A floating gate transistor is a transistor whose gate terminal is not
connected to a fix potential. The voltage at the floating node can be
determined by capacitive division as shown in Figure 2.5. Vin is the input
voltage, Cin is the capacitance at the floating terminal, Cpar is the parasitic
capacitance of the nMOS transistor EN1 and V is the voltage at the floating
gate terminal. Voltage V at the floating node is determined in the following
equation:
V =Vinit+Vin∗
Cin
Cin+Cparasitic
(2.1)
Vinit is the initial voltage at the floating gate terminal. The voltage at
the floating gate terminal is programmed/recharged to an initial voltage
by various means presented in [10]. Most often, Vinit recharges to VDD and
GND for the pMOS and nMOS transistors respectively during the precharge
phase.
Vin EN1V
Cin
Cpar
Figure 2.5: Floating gate transistor.
2.7 Pass Transistor Logic
Pass transistor logic (PTL) is a logic style that has been widely used
in digital systems[1]. PTL is attractive as fewer transistors are used to
implement the important digital gates, offering a huge advantage in terms
of area consumption. The input capacitance is reduced, which reduces the
overall delay and makes the circuit faster. In PTL, inputs are not only
applied to the gate terminals, but also to the drain and source terminals.
PTL suffers from threshold voltage drop for the transmitted signal which
9
results in the swing restoration at the output node and degrades the
robustness performance.
2.8 Adder
Addition is the most fundamental arithmetical operation in any kind of
processor, and is the building block for many processing operations like
ALU, FPU and ASIC. Besides the addition task, it is also nucleus to
many other arithmetic operations such as subtraction, multiplication and
division etc. This makes the adder of great interest for many digital system
designers. There are two types of adder circuits explained below.
2.8.1 Half Adder
If the addition operation of only two input bits is desired, a half adder
is suggested. We have two bits input A and B and two bits output Sum
and Cout. The logic function for the half adder is derived in the following
equation.
Sum = A⊕B
Cout = A ·B
(2.2)
The Sum logic function corresponds to the XOR operation for the input bits
A and B , while theCout logic function corresponds to the AND operation for
the input bits.
2.8.2 Full Adder
If the addition of more than two bits is desired, the half adder should be
cascaded in a chain. To achieve the correct arithmetic operation, we should
take into consideration the Cin bit from the previous adder in the chain.
Thus a full adder has 3 inputs and 2 outputs. The logic function for the full
adder is given in the following equation.
Sum = (A⊕B)⊕Cin
Cout = A ·B +Cin · (A+B)
(2.3)
The Sum and Cout for the full adder derives the same logic function as the
half adder as far as Cin is 0.
2.9 Multi-threshold CMOS Technology
Multi-threshold CMOS (MTCMOS) is an efficient method with an alteration
of CMOS chip technology having transistors with multiple/dual threshold
voltages in order to optimize power or delay[15]. MTCMOS technique can
be employed in the high speed circuits where low threshold transistors
L− thr are used in the speed critical paths to minimize the delay. However,
high threshold transistors H − thr are used in the non-critical paths to
reduce the leakage power consumption.
10
2.10 Variable-threshold CMOS Technology
Variable Threshold CMOS (VTCMOS) is another efficient method to
reduce the leakage power for the high speed circuits[16]. The speed
critical transistors can be biased by adopting VTCMOS technology as these
transistors should only operate in the active mode of operation. The
substrate bias voltage of these speed critical transistors can be varied in
order to achieve low threshold voltage in the active mode of operation
and high threshold voltage otherwise. However, the main drawback is
the fabrication of these VTCMOS devices as it requires twin or triple well
technology to achieve different bias voltage levels.
2.11 Delay
One common way to determine the speed performance of the digital circuits
is by measuring the propagation time TD between the input and the output
signals. TD is the time measured from the input signal reaches 50% of
its logic swing to the output signal reaches 50% of its logic swing[17]. TD
depends upon various parameters given in the following equation:
TD =
VDD ·CL
Ion
(2.4)
VDD is the supply voltage, CL is the load capacitance and Ion is the active
current running through the on transistors.
Another common way to measure the delay is by determining the
difference between the rise/fall time for the input and output signals. Rise
time TR is the time for a signal to rise from 20% to 80% of its steady state
value. Fall time TF is the time for a signal to fall from 80% to 20% of its
steady state value. These two delays are also used to utilize the transition
times for different signals.
2.12 Figure of Merit in logic gates
The performance for the digital circuits can be presented according to
Power-Delay-Product (PDP) and Energy-Delay-Product (EDP). PDP and
EDP are two common figure of merits which are correlated with power and
energy efficiency for the digital gates respectively.
2.12.1 Power Delay Product
PDP is the product of power consumed in a switching event times the
propagation delay TD. Power is determined by multiplying the average
consuming current Ion per transition times the supply voltage VDD. The
formula for PDP is driven in the following equation.
PDP (J )= Power ·TD
= (Ion ·VDD) · (
VDD ·CL
Ion
)
=V 2DD ·CL
(2.5)
11
The unit of PDP is Joule (j). PDP is only dependent upon the supply voltage
VDD and the load capacitance CL and not on the ON-current Ion running
through the logic gate.
2.12.2 Energy Delay Product
EDP can be implemented by multiplying the PDP with the input-output
delay TD for the logic gate. The unit for EDP is Joule second (js). Formula
for driving the EDP is shown in the following equation:
EDP (J s)= PDP ·TD
= (V 2DD ·CL) · (
VDD ·CL
Ion
)
= V
3
DD ·C2L
Ion
(2.6)
EDP is useful figure of merit in high speed digital circuits as it weights the
switching time more than the power consumption. It is dependent upon the
supply voltage VDD, load capacitance CL and the ON-current Ion. As EDP is
inversely proportional to the Ion, thus increasing the current results in low
EDP.
12
Chapter 3
Performance of CMOS at
ultra low supply voltages
Scaling down the supply voltage VDD is one of the most efficient way to
reduce the power consumption in many new applications, such as ambient
intelligence, wireless sensor networks, mobiles, laptops and other energy-
scavenging systems. It reduces the cost for the system maintenance and
extends the battery’s life time.
3.1 Challenges at low supply voltages
Although there are many advantages as the supply voltage scales down
to the near-threshold region where the transistors may operate in the
weak inversion or moderate inversion region. There are also arising some
major challenges in the performance of the digital CMOS circuits. The
major impact is on the speed performance as the ON-current Ion degrades
exponentially when the transistor is on. A current model for the transistor
operating at ultra low supply voltages is given in the following equation[18]:
I = I0 ·W
L
·e(VGS−VTH)/nvt · (1−e−VDS/vt) (3.1)
where I0 is the technology-dependent subthreshold current, vt is the
thermal voltage, n is the subthreshold factor, WL is the sizing ratio of the
transistor, VGS represents the gate source voltage, VDS represents the drain
source voltage and VTH represents the threshold voltage of the transistor.
When the transistor is switched on, the ON-current Ion degrades
exponentially with the scaling of the supply voltage VDD. As mentioned
earlier, this directly impacts on the speed performance of the CMOS circuits
as the switching delay TD is inversely proportional to Ion. To compensate
with the speed performance, the transistor’s threshold voltage VTH should
be reduced by increasing the strength factor F of the transistor.
However lowering the threshold voltage of the transistor causes en
exponential increase in the transistor’s OFF-current Ioff at ultra low supply
voltages. This is due to the exponential dependency of current I on
VGS−VTH. In the super threshold region, VTH is high enough at VGS = 0 that
I is very small when the transistor is off. However, when the supply voltage
13
scales down and VTH is reduced to compensate with the speed performance,
Ioff increases at VGS = 0 due to exponential inverse proportionality. Ioff
is also known as the weak inversion current and the subthreshold leakage
current Ilkg[19].
Scaling down the supply voltage also degrades the robustness perfor-
mance of the circuit to a certain extent. Robustness can be determined by
obtaining the Noise Margin (NM). NM allows to determine the allowable
noise voltage on the input of a gate so that the output will not be corrupted.
One way to derive NM is shown in the following equation:
NM = Ion
Ioff
(3.2)
NM is the ratio between the ON-current Ion and OFF-current Ioff. This ratio
reduces as both Ion decreases and Ioff increases at low supply voltages.
3.2 F : The strength tunable factor of the tran-
sistor
The strength factor of the transistor is dependent upon the threshold
voltage VTH of the transistor. Ion increases by lowering VTH. VTH is tuned by
tuning the strength factor of the transistor F . The strength tunable factor F
is driven in the equation below[18]:.
F = I0W
L
e−(VTH0−λBSVSB)/nvt (3.3)
where λBS is the body effect coefficient, VSB is the substrate bias voltage
through the body effect and VTH0 is the zero-bias threshold voltage. The
strength Factor can be tuned by:
• Adjusting the W /L ratio.
• Selecting the zero-bias threshold VTH0 among the low/standard or
high values available in the adopted technology.
• Adjusting the substrate bias voltage VSB.
The plots shown in Figure 3.1 determines ION through an nMOS transistor
at a supply voltage of 300mV with different sizing parameters of the
transistor. The graph demonstrates that increasing the length L of the
transistor is more preferable than width W in order to achieve higher
ION at ultra low supply voltages. This happens due to the reverse short
channel effect (RSCE) which increases ION that further lowers the threshold
voltage VTH of the transistor[20]. ION increases linearly as L increases up
to 2.5×Lmin, however increasing L over 2.5×Lmin is not helpful due to the
inverse proportionality between F and L as shown in the equation above.
14
1 2 3 4 5 6 7 8 9
0.8
1
1.2
1.4
1.6
1.8
2
2.2
x 10
−7
normalized length (L) and width (W)
C
ur
re
nt
(A
)
Width Length
Figure 3.1: The ON-current ION through an nMOS transistor with different
dimensions.
According to the graph in Figure 3.1, increasing the transistor’s width
W is not an effective node up to 3×Wmin as ION decreases in this range. W
should be almost 7×Wmin to increase ION with the same factor as achieved
by increasing L 2.5×Lmin.
The second method to increase the transistor strength is to select the
zero-bias threshold VTH0 among the low threshold (L − thr ), standard
threshold (S− thr ) and high threshold (H − thr ) transistors available in the
adopted technology, as VTH0 affects exponentially the transistor strength
F . As shown in Table 3.1, implementing a L − thr transistor instead
of a H − thr transistor lowers the relative threshold voltage almost 25%.
However, L− thr transistor increases the OFF-current Ioff. Table 3.1 also
represents different performance parameters for the various configurations
of nMOS transistor. ION and PON represents the ON-current and the
power consumption by nMOS transistor when transistor is on. Ioff and
Poff represents the off-current and power consumption by nMOS transistor
when transistor is off.
F parameters Relative
thr (%)
ION
(nA)
PON
(nW)
Ioff
(pA)
Poff
(pW)
Transistor Sizing Body
biasing
S− thr WminLmin RBB 100 87.8 26.34 22.97 6.89
L− thr WminLmin RBB 97 97.79 29.34 26.5 7.9
H − thr WminLmin RBB 122 6.37 1.91 1.8 0.567
L− thr Wmin3×Lmin RBB 71 226.4 67.92 68.77 20.63
L− thr 3×WminLmin RBB 118 92.65 27.97 27 8.105
L− thr WminLmin Fl t 95 150.8 45.23 54.23 16.27
L− thr WminLmin FBB 90 235 70.54 116.4 34.9
Table 3.1: Relative threshold voltage (thr), ION, PON, Ioff and Poff for various
configurations of nMOS transistor.
15
The third method to increase the transistor strength is by tuning the
substrate bias voltage VSB as it has also en exponential dependency on the
transistor strength F . This tuning node is not so effective in the above-
threshold circuits, as F has a much weaker dependency on VSB.
Three common techniques for body biasing are Forward Body Biasing
(FBB), Reverse Body Biasing (RBB) and keep the body terminal floating
(Fl t). The conventional CMOS circuits are connected traditionally by using
RBB technique, which increases the threshold voltage VTH of the transistor,
reducing the power consumption at the cost of reduced speed performance.
Fl t and FBB are often used in the speed critical paths. The substrate
of an nMOS transistor can be either remain floating or connect by VDD by
utilizing Fl t or FBB schemes respectively. This decreases the threshold
voltage VTH of the transistor which further increases the speed performance
of the gates, however the drawback is the increment in the power
consumption. Table 3.1 concludes that the threshold voltage is only 95%
and 90% relative to RBB by utilizing Fl t and FBB body biasing technology
respectively. As concluded from the simulation results, Forward Body
Biasing is the most effective body biasing scheme to reduce the threshold
voltage of the transistor in order to achieve high speed performance.
3.2.1 Implementation of Deep n-well
Although Forward Body Biasing FBB and floating bulk terminals are
the most effective biasing schemes to achieve higher ON-current ION by
lowering the transistor threshold voltage. However applying these schemes
can be a challenging task and increase the complexity of the circuit during
the layout stage in the TSMC 90nm process as the demand of implementing
deep n-well is necessary in order to isolate the body of nMOS transistors.
P-well
Deep N-well
Gate
P-well
DrainSource
N+N+
P-substrate
Figure 3.2: Deep n-well process architecture.
Generally a deep n-well is used to isolate the substrates of one or more
nMOS transistors from the substrates of other nMOS transistors[21]. For
16
this purpose, deep n-well process is applied. The main disadvantage of
implementing deep n-well is the increment in the area. An nMOS transistor
with deep n-well can enlarge the area on the chip from 10 to 80 times
depending upon different technologies. An nMOS transistor with deep n-
well is shown in Figure 3.2.This approach is very common to use in order
to suppress the substrate noise coupling injected by the digital logic in the
mixed/RF environment[21].
A solution to avoid the use of deep n-well process is to implement
the Dynamic Threshold Voltage MOSFET (DTMOS)[22] process instead of
standard CMOS process. In DTMOS process, Silicon On Insulator (SOI)
transistor is used which employs insulating substrate instead of silicon as
the substrate. No wells or substrate contacts are needed in the design of
the SOI process. However, some new challenges occur in the layout stage
as mentioned in [23].
3.2.2 Imbalance factor between nMOS and pMOS
For a conventional CMOS inverter in the super threshold region, the
mobility difference between nMOS and pMOS transistors are µn ≈ 2µp, thus
the width W of the pMOS transistor is 2×Wmin to obtain the same strength
as the nMOS transistor. According to [20, 18], the imbalance factor IF
between nMOS and pMOS transistors is given in the following equation:
IF =
(
βn
βp
,
βp
βn
)
≥ 1 (3.4)
IF is defined as the strength ratio between the stronger and the weaker
transistor. IF between the nMOS and pMOS is not a big issue in the
superthreshold region as the nMOS transistor is twice as stronger as pMOS
transistor. However as the supply voltage scales down, the transistor
strength depends exponentially upon the threshold voltage VTH. Thus a
small difference in VTH results in a higher imbalance factor.
When a logic gate suffers from a higher IF factor, its stronger transistor
increases the leakage current of the corresponding logic gate due to its
higher strength. On the other hand, the weaker transistor increases the gate
delay. This concludes that a large imbalance tends to increase the leakage
power and degrades the performance of the logic gate[20].
The DC analysis of the conventional CMOS inverter at a supply voltage
of 300mV concludes that the IF between nMOS and pMOS is quite larger.
The bulk terminal of the pMOS transistor remains Floating and the
width W of the pMOS transistor is increased 2×Wmin to achieve the same
strength as nMOS transistor, while nMOS transistor is minimum sized with
conventional RBB scheme at the body terminal.
3.3 Power Dissipation in CMOS
The total power dissipation in a digital CMOS circuit consists of two main
sources shown in the equation below:
PTotal = Pdynamic+Pstatic (3.5)
17
where Pdynamic is the dynamic power consumption and Pstatic is the static
power consumption.
3.3.1 Dynamic power dissipation
Dynamic power mostly consists of the switching power Pswitching and the
short-circuit power Psc in the digital CMOS circuits. When the transistors
switch, Pswitching is dissipated during the charging/discharging of the load
capacitance CL at the output node. The general formula for driving the
switching power consumption is given in the equation below:
Pswitching = pt · fclk ·CL ·V 2DD (3.6)
where fclk is the switching frequency and pt is the probability that a power
consuming transition occurs which is also defined as the activity factor[1].
A
   _
A
N
P
CL
IVDD
Isc
Figure 3.3: Dynamic power dissipation in a conventional CMOS
inverter[24].
Short-circuit power Psc is another main source of dynamic power
dissipation. It occurs due to the direct flow of current Isc from VDD to GND
during a transition at the input node, when both PUN and PDN are partially
on for a short period of time. A conventional CMOS inverter implemented
in Figure 3.3 shows the path of Isc. The grey shaded circle at the negative
input transition indicates the interval when the Isc conducts a direct path
from VDD to GND. Isc flows as long as the input voltage A is higher than
nMOS threshold voltage (VTHn) and lower than pMOS threshold voltage
(VTHp). According to [25], the short circuit power Psc dissipation in
conventional CMOS inverter is given in the following equation:
Psc =K · fclk ·TR,F · (VDD−2VTH)3 (3.7)
where K is the constant that depends upon transistors dimensions and
other process parameters, TR,F is the rise/fall time of the input signal, fclk is
18
the switching frequency, VDD is the supply voltage and VTH is the threshold
voltage of the transistors. The short-circuit power dissipation Psc is linearly
proportional to the TR,F. Thus reducing TR,F would lead to a reduction in
Psc.
Dynamic power dissipation is the dominant power source in the digital
CMOS circuits in the superthreshold regime. Pdynamic contributes about
90% of the total power dissipation in the superthreshold regime[24].
However, Pdynamic reduces significantly as the supply voltage VDD scales
down. This is due to the quadratically dependence of the switching power
dissipation Pswitching upon VDD. On the other hand, reducing VDD also offers
a significant reduction in the short-circuit power dissipation Psc due to
(VDD−2VTH)3 factor.
3.3.2 Static power dissipation
Gate
P-Substrate
DrainSource
N+N+
          (dominent)
Subthreshold Leakage 
             Current
Gate Oxide Tunneling
             Leakage
Reverse Bias  
      Current
Isub
Figure 3.4: Leakage currents in a MOS transistor[26].
Dynamic power dissipation is often related to the transitions at the gate
terminals of the transistors. However, static power consumption is caused
by the leakage currents Ilkg without any transitions at the gate terminals.
Ideally, CMOS digital circuits should not consume any power consumption
in this mode. However, there are some leakage currents in the transistors
which consume a certain amount of power.
The main leakage current sources Ilkg in a transistor are subthreshold
leakage current Isub, gate oxide tunneling current, gate-induced drain
leakage and reverse bias current as shown in Figure 3.4. The leakage power
can be determined by using the formula given in the equation below:
Plkg = Ilkg ·Vdd (3.8)
The static power consumption is not a dominant issue when the CMOS
circuit operates in the superthreshold regime. However, it is the most
19
dominant power contributor as VDD scales down. This happens due to the
reduction in transistor’s threshold voltage in order to enhance the speed
performance. However lowering the threshold voltage gives an adverse
affect on the static power consumption.
The subthreshold leakage current Isub is the most dominant among
all the leakage currents. Isub is also known as the off-current Ioff of
the transistor. Isub is the current flowing between the drain and source
terminals in a CMOS transistor when the transistor operates in the cut-off
region. Subthreshold leakage power can consume up to 60% of the total
power consumption in 65nm technology[26].
The second most dominant leakage current is the Gate Oxide Tunneling
Current. As the technology scales down, the gate oxide is becoming thinner.
Thus aggressive scaling of the oxide thickness gives rise to high electric
field, which results in high tunneling current through transistor’s gate
insulator. The gate leakage current increases exponentially with decreased
oxide thickness. For the gate oxide thickness less than 15-20 Å, the gate
tunneling current contributes the same amount of leakage current as the
subthreshold leakage current[27].
20
Chapter 4
ULV NP domino Inverters
This chapter describes how the ULV NP domino logic style can be utilized
in conjunction with floating gate transistors to realize high speed CMOS
inverters. The original ULV domino inverters are first presented in [12]
and shown in Figure 4.1.
ф
_ф
Vin Vout
EN1
RN1
RP1
_ф
EP1
ф
(a) Original ULV
ф
_ф
Vin Vout
EN1
RN1
RP1
_ф
EP1
(b) N type ULV
ф
_ф
Vin Vout
EN1
RN1
RP1
EP1
ф
(c) P type ULV
Figure 4.1: Different configurations of ULV domino inverters [12].
The configuration of various standard transistors exploiting ULV
domino logic style is described below:
• Evaluation transistors labeled EP or EN. The evaluation transis-
tors are the most important transistors in the proposed logic style
which drive the output nodes.
• Recharge transistors labeled RP or RN. The recharge transistors
are used to recharge the semi floating gate terminals of the evaluation
transistors in the precharge phase.
The original ULV domino inverter shown in Figure 4.1a can be
configured by applying the clock signals to power the inverter, i.e. the
source terminals of EP and EN are connected to the clock drivers φ and φ
respectively. During the precharge phase, φ and φ switches from 1 to 0 and
0 to 1 respectively, and the output node is precharged to VDD/2. The output
21
node will be forced to 0 or 1 depending upon the positive or negative input
transition respectively in the evaluation phase.
The ULV domino inverters shown in Figure 4.1b and 4.1c can be
configured by applying the clock signals to power the inverters, i.e. either
by connecting the source terminals of EP and EN to φ and VDD respectively
(N type) or by connecting the source terminals of EP and EN to φ and
GND respectively (P type). During the precharge phase, the output node
is precharged to 1 and 0 for N and P type gate respectively, resembling the
NP domino logic style. In the evaluation phase, the output node will be
forced to 0 or 1 depending upon the positive or negative input transition for
the N and P type ULV domino inverter respectively.
The main differences between the Original and NP ULV domino gates
are:
1. Precharging. The output nodes are precharged to VDD/2 for the
original ULV domino gates, while the output nodes are precharged to
0 and 1 for the P and N type ULV domino gates respectively.
2. Input transitions. Original ULV domino gates can response to both
rising and falling input transitions in the evaluation phase, whereas
the NP ULV domino gates can either response to rising or falling input
transitions for N and P type ULV domino gates respectively.
3. Current level. The input current level for the NP ULV domino logic
style is quite higher than original ULV domino logic style due to large
input transition, as the input transition for the original ULV domino
gate is |VDD/2| and the input transition for the NP ULV domino gate
is |VDD|.
4.1 N type ULV domino inverter
The ULV NP domino inverters presented in [12] can be modified by
removing one of the input capacitors from the gate terminals of the
evaluation transistors, i.e. EP for the N type and EN for the P type ULV
domino inverters.
The N type ULV domino inverter is shown in Figure 4.2. The clock
drivers φ and φ are used as control signals for the recharge transistors
RP1 and RN1, and φ is used as power signal for EN1. The precharge and
evaluation phase of the N type ULV domino inverter is characterized by:
• Precharge phase. The precharge phase starts when φ switches
from 1 to 0. This turns on RP1 and recharges the gate of EN1 to VDD.
Meanwhile φ switches from 0 to 1 which turns on RN1 and recharges
the gate of pMOS transistor P1 to 0. Thus both EN1 and P1 turn on in
the precharge phase and drive the output node Vout to VDD. Figure
4.2a describes the precharge phase of the N type inverter. The gray
shaded lines indicate the components which are not active during the
precharge phase.
• Evaluation phase. The evaluation phase starts when the clock
signals φ and φ switch from 0 to 1 and 1 to 0 respectively. Both
22
ф_ф
Vin
Vout
EN1
P1
RN1
RP1
VP
VN
_ф
Cin
(a) Precharge phase.
ф
_ф
Vin
Vout
EN1
P1
RN1
RP1
VP
VN
_ф
Cin
1
0
0
1 1
1.5
(b) Evaluation phase.
Figure 4.2: N type ULV domino inverter.
recharge transistors switch off which leaves the charge on nodes VP
and VN temporarily floating allowing an input transition to affect the
current running through the evaluation transistor EN1. The output
node Vout floats as well until an input transition occurs. The gray
shaded lines in Figure 4.2b indicate the components which are not
active during the evaluation phase.
The input signal Vin must be monotonically rising to ensure the correct
operation for the N type ULV domino inverter[28]. This can be only
satisfied if
• Input signal Vin is low at the beginning of the evaluation phase, and
• Vin is only able to make a positive transition from 0 to 1 in the
evaluation phase.
As a positive transition is applied at the input node Vin, the input
capacitanceCin at the gate terminal of EN1 charges and discharges. Thus the
voltage at the floating node VN alters. The voltage at VN can be estimated
by using the following equation:
VN =Vinit+Vin∗ (
Cin
Cin+Cparasitic
) (4.1)
We may assume that the initial voltage Vinit at the floating node VN is VDD
as the recharge transistor RP1 has recharged the floating node to VDD in
23
the previous precharge phase. Vin is charged up to VDD as well due to the
positive input transition. Cin is the input capacitance at the gate terminal
of EN1 and Cpar is the parasitic capacitance of EN1. Assuming that Cin
and Cpar are equally sized, the potential at the floating node VN becomes
1.5×VDD. This concludes that by using the floating capacitance to the
transistors gate terminals, the floating nodes can achieve a different voltage
than the voltage provided by the supply voltage VDD[9]. This makes the
evaluation transistor EN1 strongly biased which increases the current level
of the transistor. Thus PDN becomes much stronger than PUN and the
output node Vout discharges to 0.
5 10 15 20 25 30 35
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Time(ns)
V
ol
ta
ge
(V
)
Voltages at all nodes of NP Domino Inverter
clk− clk+
(a) Waveforms of Clock signals for an ULV domino logic style.
5 10 15 20 25 30 35
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Time(ns)
V
ol
ta
ge
(V
)
Voltage plot presenting at all nodes of N type ULV Domino Inverter
Vin VN Vout VP
(b) Voltage plot representing different nodes of N type domino inverter.
14 15 16 17 18 19 20
−0.1
0
0.1
0.2
0.3
0.4
Time(ns)
V
ol
ta
ge
(V
)
Voutc Vout Vin
33ps 1.678ns
(c) N type domino compared to Conventional CMOS Inverter.
Figure 4.3: Simulation results of N type ULV domino inverter.
Simulation results for the N type ULV domino inverter implemented
in Figure 4.2 are shown in Figure 4.3. The clock signals are operating at
a frequency of 83.3MHz. To avoid underestimation of the implemented
circuit and to obtain more realistic waveforms, clock signals have been
made by inserting two symmetric conventional inverters between the ideal
voltage sources and the clock signals. In the same way, input signal has
been made by inserting ULV domino inverter between the voltage source
24
and the input nodes.
Plots in Figure 4.3b represents the simulated voltage at every node of N
type domino inverter. Curve VN indicates the floating node at the gate of
EN1. The voltage at this node varies in the evaluation phase when a positive
input transition is applied. This makes the voltage at VN much higher than
VDD which is proved by using the equation (4.1).
Figure 4.3c shows a comparison between the proposed N type ULV
domino inverter and a conventional CMOS inverter. Vout indicates the
output signal from N type ULV domino inverter while Voutc indicates the
output signal from the conventional inverter. As shown in Table 4.1, the
falling time TF for N type domino inverter is almost 34 times faster than
the conventional CMOS inverter in the evaluation phase when the output
discharges from VDD to GND. Curves in Figure 4.3c also demonstrates
that the N type ULV domino inverter is almost 50 times faster than the
conventional inverter, which is determined by finding the propagation
delay between the input-output signals.
Delay (ps) φ φ Vin Vout Voutc
TR 846 844 81 X X
TF 805 813 X 63 2136
TD X X 0 33 1678
Table 4.1: Simulation Results of different Delays of N type domino inverter.
Table 4.1 shows a summary of the delays between the most important
curves shown in Figure 4.3. TR of the input signal Vin is almost 10 times
faster than the clock signals. The relative delay for the proposed N type
UVL domino inverter is only 1.96% compared to the conventional CMOS
inverter. X are the cases where we don’t care about the falling or rising edge
time. As we are dealing with N type domino inverter, we are only interested
in the rising edge of the input signal and falling edge of the output signal.
However, we do consider both rising and falling edges while operating with
sequential circuits, for example, latches and flip-flops.
The performance of the proposed N type ULV domino inverter degrades
due to a negative transition at the floating node VP of the gate terminal of
transistor P1 in the evaluation phase. This is due to the parasitic capacitance
associated with RN1 as φ switches from 1 to 0. Furthermore, when a positive
input transition occurs, the output node starts to pull down towards 0 which
may add some additional negative transition at VP. This makes P1 slightly
stronger in the evaluation phase. Thus the contention current is increased
as PUN attempts to hold the precharged value, while the PDN attempts
to discharge the output node Vout to GND. Both speed and robustness
performance for the N type ULV domino inverter degrades due the floating
node at the gate terminal of P1.
The leakage problem mentioned in the previous paragraph can be
minimized or eliminated by modifying the NP domino inverter as shown
in Figure 4.4.
25
фVin
Vout
EN1
P1
RP1
VN
_ф
VP
(a) Pseudo N type.
ф
Vin
Vout
EN1
RP1
VN
_ф
_ф
P1
RN1
KP
VP
(b) N type with keeper (static).
Figure 4.4: Different configurations of N type ULV domino inverter.
In Figure 4.4a, the gate terminal of P1 is connected to a fix potential
(GND) which is not affected by the parasitic capacitance associated with
any recharge transistor. The circuit resembles pseudo nMOS logic.
However, as P1 is still switched on, the current IOFF running through P1 will
increase the contention current as both PDN and PUN is on and impact on
the total power consumption.
One other modified solution of N type ULV domino inverter is shown in
Figure 4.4b. The pMOS keeper transistor KP is connected in a feedback
configuration to the gate terminal of P1 in order to increase the ratio
between ON-current ION and OFF-current IOFF by decreasing IOFF running
through P1. KP is not active during the precharge phase as the output node
precharges to 1. In the evaluation phase, KP does not turn on until the
output node changes from 1 to 0 with a positive transition at the input node
Vin. When KP turns on, the voltage at the floating node VP rises from 0 to 1
and P1 turns partially off. This reduces IOFF running through P1. The PUN
becomes weaker, and the output node Vout fully discharges to GND. This
proves that using the keeper transistor in the proposed ULV domino logic
style eliminates the problem of poor noise margin by increasing ION/IOFF.
The power consumption is minimized as well due to the reduction in IOFF.
Different configurations of N type ULV domino inverter are simulated
and the results are demonstrated in the graphs in Figure 4.5. Vin represents
the monotonically rising input signal, Npseudo, NKeeper and NULV represents
the outputs from N type ULV inverter, N type ULV pseudo inverter and N
type ULV inverter with keeper transistor which are implemented in Figure
4.4a, 4.4b and 4.2 respectively. With respect to speed performance, N type
ULV pseudo inverter offers the minimum delay of only 29.7ps, whereas the
delay of N type ULV inverter with keeper transistor is somehow increased
26
63 63.02 63.04 63.06 63.08 63.1 63.12 63.14 63.16
0
0.05
0.1
0.15
0.2
0.25
Time(ns)
V
ol
ta
ge
(V
)
29.7ps
NPseudoVin NKeeper NULV
43.5ps
(a) Delay of different configurations.
43 44 45 46 47 48 49 50
−0.01
0
0.01
0.02
0.03
0.04
Time(ns)
V
ol
ta
ge
(V
)
Npseudo NKeeper NULV
2.8mV5.28mV
12.99mV
(b) Robustness of different configurations.
Figure 4.5: Simulation results of different configurations of N type ULV
domino inverter.
due to extra load at the output node. With respect to robustness, N
type ULV inverter with keeper transistor provides the best performance as
expected, offering a deviation of only 2.8mV from the rail (GND) after the
transition at the output node. N type ULV pseudo inverter and N type ULV
inverter offer the deviation of 5.28mV and 12.99mV respectively.
Parameters NULV Npseudo NKeeper
Delay (ps) 43.5 29.7 43.5
Power (nW) 38.9 20.81 10.5
Energy (aj) 1.69 .618 0.456
EDP (10−29 js) 7.36 1.83 1.98
Relative Delay(%) 1.96 1.76 2.6
Relative PDP(%) 37.5 13.7 10.1
Relative EDP(%) 0.97 0.24 0.26
Table 4.2: Performance of different configurations of N type ULV inverter
relative to conventional CMOS inverter at a supply voltage of 300mV.
Table 4.2 represents the performance of different configurations of N
type ULV domino inverter with respect to speed, power consumption, PDP
and EDP. Further, the performance parameters are compared with the
conventional CMOS inverter at a supply voltage of 300mV. N type ULV
inverter with keeper transistor offers the best relative PDP and EDP of
10.1% and 0.26% respectively, with the best robustness performance.
27
4.2 P type ULV domino inverter
ф
_ф
Vout
N1
EP1
RN1
RP1
VP
VN
Vin
ф
(a) Precharge phase.
ф
_ф
Vout
N1
EP1
RN1
RP1
VP
VN
Vin
ф
(b) Evaluation phase.
Figure 4.6: P type ULV domino inverter.
The circuit implemented in Figure 4.6 represents a P type ULV domino
inverter, where the input capacitance is only applied to the gate terminal
of evaluation transistor EP1. The recharge/precharge and the evaluation
phase for the P type ULV domino inverter is characterized below:
• Precharge/Recharge phase. Figure 4.6a shows the precharge
phase, where the grey shaded lines indicate the components which
are not active during the precharge phase. When φ switches from 1
to 0, the circuit operates in the precharge/recharge phase. During
this phase, RP1 turns on and recharges the gate of N1 transistor to 0.
Meanwhile φ switches from 0 to 1 which turns on RN1 and recharges
the gate of EP1 to 0. Thus both evaluation transistors N1 and EP1 turn
on and precharge the output node Vout to GND.
• Evaluation phase. Figure 4.6b shows the evaluation phase of P
type ULV domino inverter, where the grey shaded lines indicate the
components which are inactive in this phase. Clock drivers φ and φ
switch from 0 to 1 and 1 to 0 respectively. Both recharge transistors
RP1 and RN1 switch off which make the voltage VN and VP on the gate
terminals of N1 and EP1 floating. The output node Vout floats as well
until an input transition occurs.
The input signal Vin must be monotonically falling to ensure the correct
operation for the P type ULV domino inverter. This can only be satisfied if
• input signal Vin is high at the beginning of the evaluation phase, and
28
• Vin only makes a single transition from 1 to 0 in the evaluation phase.
Negative transition at the input node Vin decreases the voltage at the
floating node VP on the gate terminal of EP1. Thus EP1 becomes strongly
biased compared to N1 in the evaluation phase, and the output node Vout
charges to VDD.
145 150 155 160 165 170 175 180 185 190
0
0.1
0.2
0.3
0.4
Time(ns)
V
ol
ta
ge
(V
)
Vout
Vin
Voutc
1.647ns79ps
33mV
Figure 4.7: Speed performance of P type ULV domino inverter compared
with conventional CMOS inverter.
Figure 4.7 shows the simulation results where the speed performance
of the P type ULV domino inverter is compared with the conventional
CMOS inverter. Vin represents the input signal, Vout and Voutc represents
the output signals from proposed P type ULV inverter and the conventional
inverter respectively. P type ULV domino inverter is 30 times faster than
the conventional inverter at a supply voltage of 300mV, providing a delay
of only 4.7% relative to standard CMOS inverter. However the robustness
performance degrades as the deviation from the rail (VDD) is almost 33mV
when the output node Vout is pulled up to VDD in the evaluation phase.
_ф
Vout
N1
EP1
RN1
VP
VN
Vin
ф
(a) Pseudo P type .
_ф
Vout
EP1
RN1
Vin
ф
VP
ф
N1
RP1
KN
VN
(b) P type with keeper
(static).
Figure 4.8: Different configurations of P type ULV domino inverter.
29
The robustness problem can be eliminated by modifying the P type ULV
domino inverter as shown in Figure 4.8. Figure 4.8a resembles the pseudo
logic style where the floating gate of N1 is connected to a fixed potential
(VDD). This partially turns off N1 which decreases the contention current
and offers better robustness performance.
A new configuration of P type inverter is shown in Figure 4.8b, where
a keeper transistor KN is connected in a feedback configuration. KN is
connected at the floating gate terminal of N1 transistor. KN is inactive
during the precharge phase as the output node Vout precharges to 0. In
the evaluation phase, KN does not turn on until Vout switches from 0 to
1 with a correct transition at the input node Vin. When KN turns on, the
voltage at the floating node VN falls from 1 to 0. This partially turns off
the evaluation transistor N and let the output node swings fully to VDD.
This helps to reduce the contention current which directly impacts on the
robustness and power consumption of the proposed P type ULV domino
inverter.
15.5 16 16.5 17 17.5
0.23
0.24
0.25
0.26
0.27
0.28
0.29
Time(ns)
V
ol
ta
ge
(V
)
Ppseudo PKeeper PULV
34mV
17mV 12mV
Figure 4.9: Robustness performance of different configurations of P type
ULV domino inverter.
The graph in Figure 4.9 demonstrates the robustness performance of
different configurations of P type ULV domino inverter in the evaluation
phase. PULV, Ppsuedo and PKeeper represents the output signals from P type
ULV domino inverter, pseudo P type ULV domino inverter and P type ULV
domino inverter with keeper respectively. As expected, PKeeper offers the
best robustness performance, providing the deviation of only 12mV from
the rail (VDD). PULV and Ppsuedo offers a deviation of 17mV and 34mV
respectively.
4.3 A chain of ULV NP domino inverters
So far, the proposed ULV NP domino inverters are only simulated with
small capacitive loads at the input and output nodes. To obtain more
correct and realistic performance, the proposed logic style should be
simulated in a domino chain. The output node of an N type domino inverter
is connected at the input node of a P type domino inverter, as shown in
Figure 4.10. A chain of 8 NP domino inverters is implemented to observe
speed and robustness behavior of ULV NP domino inverters with a certain
load both at the input and the output of each inverter.
30
ф_ф
Vin
Vout1
EN1
P1
RN1
RP1
_ф
ф
_ф
Vout2
N1
EP1
RN2
RP2
Vout1
ф
ф
_ф
Vout6
Vout7
EN4
P4
RN7
RP7
_ф
ф
_ф
Vout8
N4
EP4
RN8
RP8
Vout7
ф
To further
  N block
Figure 4.10: ULV NP domino chain with 8 inverters.
2 4 6 8 10 12 14 16
0
0.2
0.4
Time(ns)
V
ol
ta
ge
(V
)
Clock signals in a chain of inverters
clk+ clk−
(a) Clock drivers presenting precharge and evaluation phase.
Time(ns)
2 4 6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
Time(ns)
V
ol
ta
ge
(V
)
NP Domino inverters(Precharge to 0) Vs Conventional inverters 
NP Domino inverters(Precharge to 1) Vs Conventional inverters
vin vout2 vout4 vout8 vout6 voutc2 voutc4
(b) Output nodes of P type ULV domino Inverters.
2 4 6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
Time(ns)
V
ol
ta
ge
(V
)
NP Domino inverters(Precharge to 1) Vs Conventional inverters
vout1 vout3 vout5 vout7 voutc1 voutc3
(c) Output nodes of N type ULV domino Inverters.
Figure 4.11: Simulation results of 8 ULV NP inverters in a domino chain.
The simulation results for a domino chain of 8 ULV NP inverters are
shown in Figure 4.10. The curves is Figure 4.11a represents the clock signals
φ and φ which are used both as control and reference signals for ULV NP
domino inverters. The graph in Figure 4.11b represents the input signal Vin
which has a positive transition. The input arrives at the floating node of an
N type ULV domino inverter that has an output which precharges to 1 in the
precharge phase. The output signal Vout1 discharges to 0 in the evaluation
phase, which is connected further to the input node of P type ULV domino
31
inverter whose output precharges to 0 in the precharge phase. The curves in
Figure 4.11b represent the outputs from 2nd, 4th, 6th and 8th P type domino
inverters in the chain. In the same graph, the outputs from a chain of 8
conventional CMOS inverters Voutc have been compared with the domino
inverters.
The subgraph in Figure 4.11c represents the outputs from 1st, 3rd, 5th
and 7th N type domino inverters in the chain. The outputs from N domino
inverters have been compared with the conventional inverters. The plots
in Figure 4.11 demonstrates the working principle of a domino logic. It
states that all the output nodes precharge in parallel, while in the evaluation
phase, depending upon the input transition at the 1st N type domino
inverter, the output from the 1st N type inverter is triggered to the next
P type inverter, and further in the domino chain. Within a single clock
period, the input signal propagates through 8 NP domino inverters in a
chain, utilizing the propagation delay of only 711ps.
The propagation delay from the input signal to every output signal
in the ULV NP domino 8 inverters chain is shown in Table 4.3. The
propagation delay between the input signal and the output from the 8th NP
domino inverter is only 711ps, providing an average propagation delay of
88.8ps between each of the ULV NP domino inverters in the chain. The
propagation delay between the input signal Vin and the output from the
1st conventional inverter Voutc1 is about 3,033ns. This concludes that NP
domino inverter is almost 50 times faster than the conventional CMOS
inverter, while operating in a chain of 8 inverters. As seen in Figure 4.11,
later output signals from the conventional inverter chain are obstructed due
to low supply voltage and a high clock frequency of 83.3MHz.
Waveform Waveform Delay(ps)
Vin Vout1 64
Vin Vout2 178
Vin Vout3 239
Vin Vout4 361
Vin Vout5 419
Vin Vout6 548
Vin Vout7 605
Vin Vout8 711
Vin Voutc1 3033
Table 4.3: Speed performance of ULV NP domino 8 inverters chain.
32
Chapter 5
ULV NP domino Logic gates
In the previous chapter, ULV NP domino inverters are taken into consider-
ations which are dependent upon a single input transition in the evaluation
phase, that results in an inverted transition at the output nodes. To imple-
ment larger and complex digital systems such as microprocessors and ALU,
we need to deal with complex systems having multiple inputs. Logical gates
like NAND and NOR are the main building blocks for these complex digital
systems.
5.1 ULV NP domino NAND Gates
ф
A EN1
RP1
_ф
ф
B EN2
RP2
_
AB
_ф
P1
RN1
KP
(a) N type .
_ф
EP2
RN1
A
ф
_ф
ф
BEP1 _
AB
RN2
ф
N1
RP1
KN
(b) P type .
Figure 5.1: NP ULV domino NAND gate.
Figure 5.1 represents two different implementations of ULV domino NAND
gates. Both implementations are utilized by NP domino logic in conjunction
with floating gates on the evaluation transistors. ULV domino NAND gates
in Figure 5.1a and 5.1b precharge the output node to 1 and 0 respectively
33
during the precharge phase. The prechrage phase operates in the same way
as described in the previous chapter. In the evaluation phase, the main
difference is the two input signals A and B at the floating nodes of the gate
terminals of the evaluation transistors.
In Figure 5.1a, both evaluation transistors EN1 and EN2 are serially
connected, resembling the PDN for the conventional NAND gate. The
output node is precharged to 1 in the precharge phase. In the evaluation
phase, depending upon the transitions at the input nodes A and B , the
output node is discharged to 0. Both evaluation transistors should turn
on in the evaluation phase in order to discharge the output node to GND,
otherwise there will be no direct path from the output node to GND and
the output node will remain high and floating. The only case for the output
node of a NAND gate to become 0 is when both of the inputs become high.
Figure 5.1b represents a P type ULV domino NAND gate which is
implemented using two parallel pMOS evaluation transistors EP1 and EP2
in the PUN which resembles the PUN of the conventional NAND gate.
Transistors EP1, EP2 and N1 precharge the output node to 0 during the
precharge phase. In the evaluation phase, the output node is pulled up
to VDD when one of the two evaluation transistors EP1 or EP2 turns on. The
only case for the output node to hold the precharged value is when both EP1
and EP2 remain switched off in the evaluation phase.
14 15 16 17 18 19 20 21 22
−0.1
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
N type ULV domino NAND gate Vs Conventional Nand gate
A
B3.5ns
80.65ps
NULV
NC
(a) N type
13 14 15 16 17 18 19 20 21 22 23
0
0.1
0.2
0.3
0.4
Time(ns)
V
ol
ta
ge
(V
)
P type ULV domino NAND gate Vs Conventional Nand gate
A
B
PULV
PC
1.78ns
134ps
(b) P type
Figure 5.2: Simulation results of ULV NP domino NAND gates.
The subplot in Figure 5.2a demonstrates the simulation results of N
type ULV domino NAND gate and a comparison is made against the
conventional NAND gate with respect to speed performance. A and B are
34
the input signals with a positive transition from 0 to 1 in the evaluation
phase. NULV is the output from the proposed N type domino NAND gate
while NC is the output from the conventional NAND gate. The output is
high when both inputs are low, the output remains high as A has a positive
transition because there is still no direct path from output nodes to GND.
The output node NULV gets an negative transition when B changes from 0
to 1 while A has already got a positive transition. The propagation delay
between the input signal and output NULV is only 81ps, which is almost 43
times faster than the conventional NAND gate. The conventional NAND
gate has a propagation delay of 3.5ns. The conventional NAND gate is
almost unable to operate on such a high frequency of 83MHz as the output
node NC needs more time to change its state from 1 to 0.
Simulation results in Figure 5.2b demonstrates the speed performance
between the P type ULV domino NAND gate and the conventional NAND
gate. A and B are the input signals. PULV and PC are the output nodes
from the proposed P type domino NAND gate and conventional NAND gate
respectively. When one of the two inputs gets a negative transition in the
evaluation phase, one of the two parallel connected evaluation transistor
EP1 or EP2 turns on and the output node PULV is pulled up to VDD. The
propagation delay for the proposed P type ULV domino NAND gate is only
7.5% relative to the conventional NAND gate.
5.2 ULV NP domino NOR gates
In this Section, NOR gate is proposed by exploiting ULV NP domino logic
style in conjunction with floating gate transistors. In general, NOR gate
gives a high output when both of the input signals are low.
ф
A EN1
RP1 ф
B
RP2
EN2
_ф
A+B
_
_ф
P1
RN1
KP
(a) N type .
_ф
EP1
RN1
_ф
EP2
RN2 ф
A
B
A+B
_
ф
N1
RP1
KN
(b) P type .
Figure 5.3: NP ULV domino NOR gate.
Two implementations of ULV NP domino NOR gates are proposed in
35
Figure 5.3. The output node of the N type ULV domino NOR gate in Figure
5.3a precharges to 1 in the precharge phase and discharges to 0 if one of the
two input signals A or B switches from 0 to 1 in the evaluation phase.
The output node of the P type ULV domino NOR gate implemented in
Figure 5.3b precharges to 0 in the precharge phase. The reference signal
φ propagates through both serially connected pMOS evaluation transistors
EP1 and EP2 when both transistors switch on in the evaluation phase and
pulls up the output node to VDD. This is only possible when negative
transitions arrive at both input nodes A and B in the evaluation phase.
14 15 16 17 18 19
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
A
B
NULV94ps
(a) N type.
14 15 16 17 18 19 20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
A
B
PULV
114ps
(b) P type.
Figure 5.4: Simulation results of ULV NP domino NOR gates.
The graphs in Figure 5.4 shows the simulation results of NP ULV
domino NOR gates with respect to speed performance. The simulation
results in Figure 5.4a corresponds to the N type ULV domino NOR gate. The
output node NULV is precharged to 1 in the precharge phase, and discharges
to 0 when one of the two inputs A or B becomes high in the evaluation phase.
The propagation delay between the input and the output node is about 94ps.
The graphs in Figure 5.4b corresponds to the simulation results for the
P type ULV domino NOR gate. The output node PULV precharges to 0 in
the precharge phase, and pulls to VDD when both of the input signals A and
B become 0. The propagation delay from the inputs switch to the output
response is only 114ps. The output node of conventional NOR gate was
unable to operate at such a high frequency with a supply voltage of 300mV.
5.3 ULV NP domino NAND/NOR gate using Pass
Transistor Logic
Various configurations of digital logic circuits are implemented in [29]
exploiting Pass Transistor logic (PTL) in conjunction with floating gate
transistors. PTL is determined to obtain the same logic function by using
36
fewer number of transistors compared to the conventional logic style, which
helps to reduce the overall delay and save the area on the chip. In this
section, ULV NP domino NAND and NOR gates are proposed exploiting
this logic style.
Circuit in Figure 5.5a operates as an N type ULV domino NAND gate.
The evaluation transistor EN1 works as a pass transistor for the input signal
B with the increased current level at the gate terminal of EN1 as EN1 is
strongly biased in the evaluation phase when a positive transition arrives
at the input node A. In the evaluation phase, if both input signals A and B
are 0, the output will remain to the precharged value. If A is high and B is
low, EN1 turns on and transmits high B from source to drain, which does
not change the output node. If A is low and B is high, the output node holds
the precharged value as EN1 is switched off. The only case for the output
node to pull down to GND is when A switches from 0 to 1, while B switches
from 1 to 0 in the evaluation phase. This concludes that the implemented N
type ULV domino gate operates as a NAND gate, by utilizing only a single
evaluation transistor EN1.
ф
A EN1
RP1
_B
_
AB
_ф
P1
RN1
KP
(a) N type NAND gate.
_ф
EP1
RN1
A
_B
A+B
_
ф
N1
RP1
KN
(b) P type NOR gate.
Figure 5.5: ULV NP domino logic Gates using PTL.
An ULV P type domino NOR gate is implemented in Figure 5.5b by
exploiting PTL. The output node is precharged to 0 in the precharge phase.
Evaluation transistor EP1 works as a pass transistor where the input signal A
is applied at the floating gate of EP1 in the evaluation phase. The other input
signal B is applied to the source terminal of EP1. The only possibility for the
output node to become high is when A switches from 1 to 0 and B switches
from 0 to 1 in the evaluation phase. This concludes that the proposed P type
ULV domino gate behaves as the NOR gate for input signals A and B .
The graphs shown in Figure 5.6 demonstrates the speed performance
of proposed ULV NP domino NAND and NOR gates exploiting PTL. The
subplot in Figure 5.6a corresponds to the proposed ULV N type domino
37
14 16 18 20 22 24 26 28
0
0.1
0.2
0.3
0.4
Time(ns)
V
ol
ta
ge
(V
)
NULV B A
59ps
(a) N type NAND gate.
14 16 18 20 22 24 26 28
0
0.1
0.2
0.3
0.4
P type NOR gate
Time(ns)
V
ol
ta
ge
(V
)
A B PULV
241ps
(b) P type NOR gate.
Figure 5.6: Simulation results of ULV NP domino logic gates using PTL.
NAND gate with PTL. The only possibility for the output node NULV to
become 0 is when both of the input signals A and B receive positive
transitions. The propagation delay for the proposed NAND gate is only
59ps, which shows a significant improvement in the speed performance as
the overall capacitance of the logic gate is reduced by reducing the total
number of transistors.
Simulation results in plot 5.6b corresponds to the P type ULV domino
NOR gate using PTL. The output node PULV precharges to GND in the
precharge phase and pulls only toVDD when both of the input nodes A and B
receive negative transitions in the evaluation phase. The propagation delay
for the proposed NOR gate is 241ps, offering a high speed performance.
38
Chapter 6
ULV NP domino Carry gates
for high speed Full Adders
A full adder plays an important role in many arithmetic units such as
addition, subtraction, multiplication and division. Addition is the most
fundamental arithmetical operation in any kind of processor, and building
block for all other units. It has a significant use in ALU, FPU and ASIC
where high processing speed is critical. Due to the high speed requirement,
the full adder should be able to provide a superb speed feature as the supply
voltage scales down.
FA0 FA1 FA3FA2Cin C1 C2 CoutC3
S[0] S[1] S[2] S[3]
A[0] B[0] A[1] B[1] A[2] B[2] A[3] B[3]
Critical path
Figure 6.1: Four Bits Full Adder
A 4-bits full adder is implemented in Figure 6.1[28]. Four full adders
are cascaded in a chain. The full adder is in the propagation mode when
the input signals A 6=B which makes Cout= Cin. The overall worst case delay
is obtained when all the full adders are in the propagation mode and the
carry signal propagates from the first to the last full adder in the chain.
Thus carry propagation path is the most critical path when an addition of
more than two bits is desired, which makes it a speed limiting factor for
many high speed applications.
Different ULV NP domino logical gates implemented in the previous
chapters operate at ultra low supply voltages while offering excellent speed
performance. In this chapter, ULV NP domino carry gates are implemented
exploiting the same logic style which enhance the speed performance when
the carry signal propagates through all the full adders in the worst case
scenario[30].
39
The logic function for the carry gate Cout is shown in the following
equation:
Cout = A ·B +Cin · (A+B) (6.1)
Equation 6.1 states that Cout provides an AND function for the adding bits
A and B as far as Cin is low. When Cout becomes high, Cout provides an OR
function for the input bits A and B .
6.1 Ultra-Low-Voltage and High Speed NP domino
Carry circuit
One common way to implement the Carry gate is by combining NAND
and NOR gates. A control signal Cin should be included in addition which
ascertains whether the output node Cout provides either NAND or NOR
function for the input bits A and B .
_ф
ф
EN2
RP2 ф
_ф
EN4
P1
RN1
RP4 фRP5
EN5
_ф
ф
EN3
RP3ф
EN1
RP1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
A0
B0
0Cin
A0 B0
1Cout
_
N type
Carry 1
Cin Cout
A _B
KP
Figure 6.2: N type ULV domino Carry Gate (Carry 1a).
An N type ULV domino Carry gate is implemented in Figure 6.2. The
PUN resembles the PUN of all other N type ULV domino logic gates which
are implemented so far. In the PDN, the PDNs of ULV N type NAND and
NOR gates proposed in the previous chapter are connected in parallel. An
extra input Cin is serially connected with the PDN of ULV NOR gate. The
PDN of the proposed N type ULV domino Carry gate resembles PDN of
conventional CMOS Carry gate implemented in [17] to some extent.
The output node Cout precharges to VDD in the precharge phase. In
the evaluation phase, Cout discharges to GND depending upon the input
transitions at the floating gates of the evaluation transistors. If A and
40
B switches from 0 to 1 prior to the Cin signal, both serially connected
evaluation transistors EN1 and EN2 turn on and makes a direct path from
Cout to GND. This resembles the working behavior of a NAND gate.
When Cin switches from 0 to 1 in the evaluation phase, the evaluation
transistor EN3 turns on. Only one of the two inputs A or B needs a positive
transition to conduct a path from Cout to GND. This resembles the working
behavior of a NOR gate for the input bits A and B . The proposed ULV P
type domino inverter should be connected at the output node Cout to obtain
Cout signal.
The graphs obtained in Figure 6.3 demonstrates the simulation results
for the proposed N type ULV domino Carry gate. In Figure 6.3a, Cin bit
is low which let the circuit operates as a half adder. The output node Cout
operates as an NAND gate for the input bits A and B . The propagation
delay is 118ps between the input and the output signal. The simulated case
in not a worst case as Cin is low. Thus the full adder is not operating in the
propagation mode.
14 16 18 20 22 24 26 28 30 32 34
−0.1
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
B Cin A Cout
118ps
_
(a) Cin = 0
14 16 18 20 22 24 26 28 30 32 34
−0.1
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
Cin B A Cout
54ps
_
(b) Cin = 1
Figure 6.3: Simulation results for N type ULV domino Carry gate.
Figure 6.3b represents the simulation results for the proposed N type
ULV domino Carry gate when the full adder operates in the propagation
mode. Input signals A 6= B and Cin is high which leads to a worst case
scenario for the full adder. As far as the Cin is high, only one of the two
inputs A or B needs to be high to conduct a path from the output node Cout
to GND. Thus the Carry gate offers an NOR function for the input bits A
and B . Simulation results demonstrates that the propagation delay is only
54ps between input and output signals.
Circuit implemented in Figure 6.4 represents an ULV domino P type
Carry gate where the output node Cout precharges to 0 in the precharge
phase. The proposed P type Carry gate is designed by parallel connecting
41
the PUNs of P type ULV domino NAND and NOR gates implemented in
the previous chapter. The evaluation transistor EP3 determines whether
the Carry gate should perform as NAND or NOR gate for the input signals
A and B . PUN of proposed P type ULV domino Carry gate resembles the
PUN of the conventional Carry gate implemented in [17]. The working
principle for the proposed P type domino Carry gate is identical to the N
type ULV domino Carry gate which is described in the above paragraphs.
However the precharged level is opposite, requiring the monotonically
falling transitions at the floating gate terminals of the pMOS evaluation
transistors.
EP2
ф ф
EP1
EP5
EP4
ф
ф
N1
EP3
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
1 1
1
0
0
0
1
0
1
1
0
0
1
1
0
0
1
0
1 0
0
Cin
_
Cout
RN1
RN2RN3
RN4 RN5
0
_
Cout
1Cin A1
B1A1 B1
RP1
P type
Carry 1
Cin Cout
A _B KN
Figure 6.4: P type ULV domino Carry Gate (Carry 1b).
The simulation results plotted in Figure 6.5 represents the speed
performance for the proposed P type ULV domino Carry gate. The output
node Cout precharges to 0 in both subgraphs in the precharge phase, and
pulls towards 1 in the evaluation phase depending upon the transitions at
the input nodes. Subgraph in Figure 6.5a represents the best case scenario
for the Carry gate as Cout pulls to 1 because of the input signals A and B ,
independent for the carry input signalCin. The simulated propagation delay
is 225ps between the input and the output node.
Subgraph in Figure 6.5b represents the simulation results for the
proposed P type ULV domino Carry gate for the worst case scenario. A
negative transition arrives input node A prior to the Cin bit while a negative
transition arrive at Cin node prior to othe ther input node B . The simulated
worst case propagation delay is only 153ps.
42
14 16 18 20 22 24 26 28 30 32 34
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
Cout A Cin B
225ps
_
(a) Cin = 0
14 16 18 20 22 24 26 28 30 32 34
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
Cin A B Cout
153ps
_
(b) Cin = 1
Figure 6.5: Simulation results for P type ULV domino Carry gate.
6.2 ULV NP domino Carry gates utilizing Pass
Transistor Logic
ULV NP domino Carry gates proposed in the previous section resembles
to the conventional CMOS Carry gate[17] to a certain extent with the
implementation of serially connected evaluation transistors in the PDN and
PUN for N and P type ULV domino Carry gates respectively. This may
degrade the speed performance of the circuit and occupies more area on
the chip. These drawbacks can overcome by exploiting pass transistor logic
(PTL). ULV domino NAND and NOR gates are implemented in the previous
chapter using PTL.
ULV NP domino Carry gates exploiting PTL are proposed in Figure 6.6,
where all the evaluation transistors labeled E operate as pass transistors
with an increased current level by using floating capacitance at the gate
terminals.
Figure 6.6a shows N type ULV domino Carry gate with PTL where the
output node Cout precharges to 1 in the precharge phase. As far as Cin is
low, Cout only switches from 1 to 0 when input signal A switches from 0 to 1
and the other input signal B switches from 1 to 0 in the evaluation phase.
When Cin bit becomes high, only one of the two parallel connected
evaluation pass transistors EN2 or EN3 needs to switch on to discharge
the output node Cout to GND. This implies that EN2 or EN3 acts as pass
transistor for input Cin when Cin switches from 1 to 0 and one of the two
inputs A or B switches from 0 to 1. Cin only requires to pass through a
single evaluation transistor EN2 or EN3 to reach to the output node in the
worst case scenario.
43
фEN2
RP2 фRP3
EN3
ф
EN1
RP1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
1
_
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
B1
_
_ф
P1
RN1
Cin
A0
A0 B
0
1Cout
_
B1
_
1
_
Cin
KP
N type
Carry 2
Cin
Cout[0]A[0]
_
B[0]
_
B[0]
_
(a) N type.
EP2EP1
EP3
_
A1 B1
0 Cin
1
0
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
1
0
1
1
0
0
1
1
0
Cin Cout
_
0
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
B0Cin
0 Cout
RN1 RN2
RN3
ф
N1
RP1
KN
Cin[0]
A[1]
B[1]
B[1]
_
P type
Carry 2
Cout[1]
A
B
(b) P type.
Figure 6.6: ULV domino Carry Gates using PTL (Carry 2).
Graphs plotted in Figure 6.7 demonstrates the speed performance for
the proposed ULV NP domino Carry gates implemented in Figure 6.6.
The worst case scenario is assumed for both circuits where input signal A
arrives prior to carry input signal Cin. The supgraph in Figure 6.7a and
6.7b corresponds to the N and P type ULV domino Carry gates respectively.
The propagation delay for the P type Carry gate is 261ps which is higher
than propagation delay of N type Carry gate (29ps) due to the mobility
difference between pMOS and nMOS transistor. The strength of pMOS
44
evaluation transistors can be increased by lowering the threshold voltage
of the transistor with various means mentioned in Chapter 3.
14 16 18 20 22 24 26 28 30 32 34
−0.1
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
A B C_in out
29ps
(a) N type Carry gate.
14 16 18 20 22 24 26 28 30 32 34
−0.1
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
A B C_in out
261ps
(b) P type Carry gate.
Figure 6.7: Simulation results for the worst case scenario of ULV NP
domino Carry gates implemented in Figure 6.6.
Another alternative solution of ULV domino NP Carry gate using PTL
is implemented in Figure 6.8. The proposed solution of ULV NP domino
Carry gates resembles the previous solution of ULV NP domino Carry gates
with PTL implemented in Figure 6.6 to some extent as both proposed ULV
NP domino Carry gates operate entirely the same way as far as the carry
input signal Cin is low.
Taking into consideration the N type ULV domino Carry gate with PTL
implemented in Figure 6.8a. When the carry input bit Cin switches from 0
to 1 in the evaluation phase, both parallel connected evaluation transistors
EN2 and EN3 turn on and act as pass transistors for the input bits A and B .
Under this instance, only one of the two inputs A or B requires to switch
from 1 to 0 to pull the output node Cout to 0.
The working principle for the proposed P type ULV domino Carry gate
with PTL implemented in Figure 6.8b is identical to the N type ULV domino
Carry gate described in the above two paragraphs. However the precharged
level is opposite, requiring opposite transitions at the floating gate and
source terminals for the pMOS evaluation transistors labeled EP.
45
фEN2
RP2 фRP3
EN3
ф
EN1
RP1
_ф
P1
RN1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
A1
_
1
0
0
1
1
0
0
1
1
1
0
0
1
1
0
0
B1
_
0
0
1Cout
_
A0
B1
_
B1
_
A1
_
KP
N type
Carry 3
Cin
Cout[0]A[0]
A[0]
B[0]
_
_
Cin Cin
_
(a) N type.
EP2EP1
EP3
1
1
0A
1
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
1
0
1
1
0
0
1
1
0
Cin Cout
_
1
0
0
1
1
0
0
1
1
1
0
0
1
1
0
0
B0
Cin
RN1 RN2 RN3 0
0B
A0
N1
RP1
KN
Cin[0]
A[1]
A[1]
B[1]
_
P type
Carry 3
Cout[1]
_
Cin
Cout
A
B
(b) P type.
Figure 6.8: ULV domino Carry Gates using PTL (Carry 3).
Graphs plotted in Figure 6.9 demonstrates the speed performance for
the proposed ULV NP domino Carry gates implemented in Figure 6.8. The
worst case scenario is considered for both N and P type ULV domino carry
gates, where the input bits A and A arrive prior to carry input bit Cin, while
no transition arrives at the other input bit B and B . The supgraph in Figure
6.9a and 6.9b corresponds to the N and P type ULV domino Carry gates
respectively. The propagation delay of P type Carry gate is 411ps, while the
propagation delay for the proposed N type ULV domino Carry gate is only
99.4ps.
46
14 16 18 20 22 24 26 28 30 32 34
−0.1
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
A B C_in out
99.4ps
(a) N type Carry gate.
14 16 18 20 22 24 26 28 30 32 34
−0.1
0
0.1
0.2
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_in A B out
411.4ps
(b) P type Carry gate.
Figure 6.9: Simulation results for the worst case scenario of ULV NP
domino Carry gates implemented in Figure 6.8.
6.3 NP domino Carry gates Performance
In this section, different performance parameters for the proposed ULV NP
domino Carry gates implemented in this chapter are compared with the
conventional CMOS carry gate[17] at different ultra low supply voltages.
Table 6.1 demonstrates the speed performance, together with the power
consumption and other figure of merits (PDP and EDP) in order to optimize
theMinimumEnergy Point MEP for the proposed ULV domino Carry gates
comparable with conventional CMOS carry gate. The table also represents
the speed, PDP and EDP performance relative to the conventional Carry
gate in %.
The power consumed by the clock drivers are not included and must
be taken into consideration for each specific application. Besides this,
the table also presents the operating limits for the clock frequency which
changes rapidly as the supply voltage varies. In Table 6.1, the style labeled
N Carry and P Carry represents the proposed N and P type domino Carry
gates respectively. Avg represents the average delay or power consumption
between the specific N and P type ULV domino Carry gates.
47
Style Comment 100mV 150mV 200mV 250mV 300mV 350mV 400mV
CLK fclk (MHz) 0.83 2.5 8.3 16.67 66.67 83.3 125
Conventional Delay (ns) 328 101 25.4 10.4 2.56 1.55 0.782
Carry Power (nW) 0.008145 0.055 0.34 1.12 6.83 10.35 23
PDP (10−18j) 2.672 5.55 8.64 11.65 17.48 16.04 17.97
EDP (10−27js) 876.4 560.5 219.5 121 44.75 24.9 14.05
N Carry 1 Delay (ns) 53 18.74 2.49 0.318 0.162 0.05 0.022
P Carry 1 Delay (ns) 194 36.83 2.75 0.38 0.19 0.109 0.1086
Carry 1 Avg.Delay (ns) 123.5 27.785 2.62 0.349 0.176 0.0795 0.0653
Relative delay (%) 37.65 27.52 10.31 3.36 6.88 5.11 8.36
Avg.Power (nW) 0.0358 0.3078 1.907 7.55 41.1 100.8 265
Avg.PDP (10−18j) 4.423 8.552 4.996 2.635 7.234 8.014 17.305
Relative PDP (%) 165 154 57.8 22.6 41.4 50 96.3
Avg.EDP (10−27js) 546.4 237.6 13.1 0.919 1.273 0.637 1.13
Relative EDP (%) 62.35 42.4 5.97 0.76 2.84 2.56 8.04
N Carry 2 Delay (ns) 141.5 25.38 3.07 0.4127 0.183 0.0725 0.04516
P Carry 2 Delay (ns) 92.06 11.72 1.26 0.2976 0.166 0.1375 0.333
Carry 2 Avg.Delay (ns) 116.8 18.5 2.165 0.35 0.1745 0.105 0.189
Relative delay (%) 35.61 18.32 8.52 3.36 6.82 6.77 24.18
Avg.Power (nW) 0.01867 0.209 1.461 5.21 28.45 56.9 137
Avg.PDP (10−18j) 2.181 3.86 3.16 1.823 4.96 5.97 25.9
Relative PDP (%) 81.6 69.55 36.57 15.65 28.37 37.22 144.13
Avg.EDP (10−27js) 254.7 71.41 6.84 0.638 0.865 0.627 4.897
Relative EDP (%) 29.06 12.74 3.116 0.527 1.93 2.518 34.85
N Carry 3 Delay (ns) 141.5 22.5 3.35 0.475 0.22 0.092 0.0715
P Carry 3 Delay (ns) 265.7 31.1 2.75 0.504 0.248 0.182 0.883
Carry 3 Avg.Delay (ns) 203.6 26.8 3.05 0.489 0.234 0.137 0.477
Relative delay (%) 62.07 26.53 12 4.7 9.14 8.84 61
Avg.Power (nW) 0.01948 0.245 1.572 5.47 30.75 71.15 176.5
Avg.PDP (10−18j) 3.96 6.58 4.79 2.68 7.19 9.747 84.235
Relative PDP (%) 148.4 117.35 55.1 23.01 41.15 60.57 468
Avg.EDP (10−27js) 807.5 176.4 14.63 1.312 1.684 1.335 40.2
Relative EDP (%) 92.1 31.14 6.62 1.08 3.76 5.34 286
Table 6.1: Performance of ULV domino Carry gates compared to conven-
tional CMOS Carry gate at different supply voltages.
The average propagation delay between N and P type Carry gates for
the proposed ULV domino logic style is shown in Figure 6.10. As concluded
from the subgraph in Figure 6.10a, the propagation delay is in the range of
ns for the supply voltages under 225mV and decreases rapidly as the supply
voltage increases. Below 300mV, the propagation delay is only in the range
of tens of ps. The logarithmic scale for the propagation delay in Figure 6.10b
concludes that Carry 1 and Carry 2 ULV domino gates contribute almost
equal propagation delay when the supply voltage varies between 220mV and
320mV. Below 220mV, Carry 2 provides minimum propagation delay. On
the other hand, Carry 1 gives minimum propagation delay when the supply
voltage exceeds over 320mV.
Carry 3 is the slowest and less preferable in high speed applications due
to low noise margin, as both parallel connected evaluation transistors EN2
or EN3 turn on in the worst case scenario, while only one of the two inputs
A or B switches from 1 to 0. Thus both A and B simultaneously contends
48
at the output node, which makes the output transition slower and degrades
the robustness performance. However, it offers a more efficient solution in
terms of area and power consumption as compared to Carry 1 ULV domino
gate.
200 220 240 260 280 300 320 340 360 380 400
0.5
1
1.5
2
VDD[mV]
A
ve
ra
ge
 D
el
ay
[n
s]
Carry1
Carry2
Carry3
(a) Average Delay in ns.
200 220 240 260 280 300 320 340 360 380 400
10−10
10−9
VDD[mV]
Lo
g[
A
ve
ra
ge
 D
el
ay
[s
]]
Carry1 Carry2 Carry3
(b) Average Delay with logarithmic scale.
Figure 6.10: Average Delay for the proposed ULV NP domino Carry gates
for different supply voltages.
100 150 200 250 300 350 400
0
10
20
30
40
50
60
70
VDD[mV]
R
el
at
iv
e 
D
el
ay
[%
]
Carry1 Carry2 Carry3
Figure 6.11: Delay of proposed ULV domino carry gates relative to
conventional CMOS carry gate for different supply voltages.
The average delay between N and P type Carry gates for the proposed
ULV domino logic style is compared with the conventional CMOS Carry
gate in Figure 6.11. The delay is less than 20% for all the proposed ULV
domino Carry gates relative to conventional Carry gate when the supply
voltage varies between 175mV and 375mV. The overall best relative delay
is achieved by Carry 2 ULV domino Carry gate with PTL. The reason
49
is because the input carry bit Cin only needs to propagate through a
single evaluation transistor to reach the output node. Compared to the
conventional Carry gate, the least average relative delay is achieved at the
supply voltage of 275mV where Carry 2 ULV domino gate only utilizes a
delay of 2.48%.
100 150 200 250 300 350 400
10−2
100
102
VDD[mV]
Lo
g[A
ve
ra
ge
 po
we
r c
on
su
mp
tio
n[n
W
]] 
Carry1 Carry2 Carry3 Conventional
Figure 6.12: Average power consumption per ULV domino Carry gate
compared to conventional CMOS carry gate.
The average power consumption per ULV domino Carry gate is com-
pared with the conventional Carry gate in Figure 6.12. The total power
consumption per gate increases with the supply voltage. As expected, the
power consumption for the ULV domino Carry gates exceeds the power
consumption for the conventional Carry gate, giving the advantage of su-
perb speed performance. As shown in Figure 6.12, ULV domino Carry gates
with PTL contributes minimum power consumption than other proposed
ULV NP domino Carry gate as the total number of evaluation transistors
reduces from 5 to 3 by using PTL.
The average energy consumed by the proposed ULV domino Carry gates
compared with conventional CMOS carry gate at different supply voltages
is shown in Figure 6.13. The subgraph presented in Figure 6.13a shows
the energy consumption in terms of j oule while the subgraph in Figure
6.14b shows the average energy of the ULV domino Carry gates relative
to the conventional CMOS Carry gate. The PDP for the proposed ULV
Carry gates is lower than the conventional Carry gate for the supply voltage
between 175mV and 350mV. This is mainly caused due to the superb speed
performance for the proposed ULV domino Carry gates relative to the
conventional Carry gate.
Comparing the graphs in Figure 6.11 and 6.13 conclude that minimum
relative PDP corresponds to the maximum relative speed for the proposed
ULV NP domino Carry gates. All the three domino Carry gates have the
minimum relative PDP of lower than 25% at the supply voltage of 250mV,
which makes it the Minimum Energy Point (MEP). Carry 2 is the most
efficient solution as it only contributes 15.65% PDP relative to conventional
Carry gate. For the supply voltage below 175mV, the relative PDP for Carry
1 and Carry 3 exceeds 100% while the relative PDP of Carry 2 is still below
the PDP of conventional Carry gate. However, the relative PDP of Carry 2
50
becomes worse than Carry 1 as the supply voltage exceeds 375mV.
100 150 200 250 300 350
5
10
15
20
25
VDD[mV]
A
ve
ra
ge
 P
D
P
[a
j]
Average PDP of ULV domino carry gates and conventional carry gate
Carry1
Carry3
Conventional
Carry2
(a) Average Energy (10−18j).
150 200 250 300 350 400
20
40
60
80
100
VDD[mV]
R
el
at
iv
e 
P
D
P
 [%
]
PDP of ULV domino carry gates relative to conventional carry gate
Carry1 Carry2 Carry3
(b) Average relative energy (%).
Figure 6.13: Average energy of ULV domino carry gates relative to the
Conventional Carry gate at different supply voltages.
The Energy Delay Product EDP for the proposed ULV domino Carry
gates for different supply voltages is shown in Figure 6.14. The subgraph
in Figure 6.14a shows the average EDP in terms of j s while the subgraph in
Figure 6.14b shows the EDP relative to conventional Carry gate. The relative
EDP for all the proposed ULV domino Carry gates is less than 30% for the
supply voltage between 175mV and 375mV which directly corresponds to
the same supply voltage range where the PDP is minimum as shown in
Figure 6.13.
At Minimum Energy point (250mV), the EDP for all the proposed ULV
domino Carry gates is lower than 1.5% relative to the conventional Carry
gate. However, Carry 2 is characterized by least relative EDP with a value
closer to 0.527% at 275mV. The relative EDP of Carry 2 is far better than
Carry 1 and Carry 3 when the supply voltage is below 175mV, but becomes
worse than Carry 1 when the supply voltage exceeds 375mV.
51
220 240 260 280 300 320 340 360 380 400
2
4
6
8
10
VDD[mV]
[A
ve
ra
ge
 E
D
P
(js
)]*
e−
27
Carry1 Carry2 Carry3
EDP of ULV domino carry gates relative to conventional carry gate
(a) Average EDP (10−27js).
150 200 250 300 350 400
0
5
10
15
20
VDD[mV]
R
el
at
iv
e 
E
D
P
 [%
]
Carry1
Carry2
Carry3
(b) Average relative EDP of ULV domino Carry gates.
Figure 6.14: Average Energy Delay Product of ULV domino Carry gates
compared to conventional Carry gate at different supply voltages.
6.3.1 MonteCarlo Simulations
In this section, the performance of proposed ULV NP domino Carry gates
are presented at different supply voltages with 100 montecarlo simulations
including process and mismatch variations at each supply voltage. The
performance for ULV NP domino Carry gates is not compared with the
conventional CMOS Carry gate with the montecarlo simulations as the
conventional CMOS carry gate is unable to operate at low supply voltages
and high operating frequencies.
150 200 250 300 350
100
102
VDD[mV]
Lo
g[A
ve
rag
e D
ela
y (
ns
)]
 
 
Carry1 Carry2 Carry3
Figure 6.15: Average Delay per ULV domino Carry gate with 100 montecarlo
simulations.
The graph in Figure 6.15 represents the average delay of 100 montecarlo
simulations at each of the supply voltages in the range of 100mV to 400mV.
The average delay is high for low supply voltages and decreases as the
52
supply voltage increases.
150 200 250 300 350 400
100
102
VDD[mV]
Lo
g[A
ve
rag
e P
ow
er 
(nW
)]
 
 
Carry1 Carry2 Carry3
Figure 6.16: Average Power consumption per ULV domino Carry gate with
100 montecarlo simulations.
The graph in Figure 6.16 represents the average power consumption
with 100 montecarlo simulations for the proposed ULV NP domino Carry
gates. As expected, Carry 1 consumes slightly more power than other two
proposed ULV Carry gates. The power consumption for each of the ULV
domino Carry gates increases as the supply voltage increases.
150 200 250 300 350 400
101
VDD[mV]
Lo
g[A
ve
rag
e P
DP
(aj
)]
 
 
Carry1 Carry2 Carry3
Figure 6.17: Average PDP per ULV domino Carry gate with 100 montecarlo
simulations.
Average PDP of 100 montecarlo simulations at each of the supply voltage
for the proposed ULV NP domino Carry gates is shown in Figure 6.17. Carry
1 offers worst average PDP performance than the other proposed Carry
gates at the supply voltages below 300mV, however as the supply voltage
increases, the significant improvement is achieved in PDP performance for
Carry 1 domino gate.
53
100 150 200 250 300 350
100
101
102
VDD[mV]
Lo
g[A
ve
rag
e E
DP
(js
)x1
0−
27
]
 
 
Carry1 Carry2 Carry3
Figure 6.18: Average EDP per ULV domino Carry gate with 100 montecarlo
simulations.
Average EDP of 100 montecarlo simulations for the proposed ULV NP
domino Carry gates is presented in Figure 6.18. Better EDP performance is
achieved for all the proposed ULV NP domino Carry gates at higher supply
voltages due to significantly reduced propagation delay.
6.3.2 Summary
Table 6.2 summarizes the speed, PDP and EDP performance for the
proposed ULV NP domino Carry gates relative to conventional Carry gate
at Minimum Energy Point. Carry 1 and Carry 2 have the same relative delay
of 3.36% thus both solutions are efficient for ultra low voltage and high
speed applications. However for low power applications, Carry 2 is the most
efficient solution as it consumes less power than the other two ULV domino
Carry gates and results in better PDP and EDP. Carry 3 is the slowest and
may not be the best solution for high speed applications, however it offers
a more efficient solution than Carry 1 in terms of area and power.
CARRY 1 CARRY 2 CARRY 3
Relative Delay(%) 3.36 3.36 4.7
Relative PDP(%) 22.6 15.65 23.01
Relative EDP(%) 0.76 0.527 1.08
Table 6.2: The delay, PDP and EDP of ULV domino carry gates at Minimum
Energy Point (250mV) relative to conventional CMOS carry gate.
54
Chapter 7
Different configurations of
32-bit Carry chain by
exploiting ULV NP domino
logic style
All the ULV domino Carry gates implemented in the previous Chapter are
one-bit Carry gates[31][30], as they add only 2 single bits. If the arithmetic
operation of more than 2 bit is desired, every single one-bit Carry gate
should be cascaded in a serial chain. In this chapter, we exploit various
ULV NP domino 32-bit Carry chains using NP domino logic style.
7.1 32-bit carry chain using NP domino Carry 1
gates
N and P type ULV domino Carry 1 gates implemented in previous chapter
in Figure 6.2 (on page 40) and 6.4 (on page 42) respectively are serially
cascaded in a chain of n-bits in Figure 7.1. Every other Carry gate is P
type where all the input bits have monotonically falling transitions in the
evaluation phase.
N type
Carry 1
Cin Cout[0]
A[0] _B[0]
P type
Carry 1
Cout[1]
A[1] B[1]
N type
Carry 1
C[n-2]
A[n-1] B[n-1]
Cout[n]Cout[n-1]
_
P type
Carry 1
A[n] B[n]
Figure 7.1: NP domino n-bit carry chain 1.
Table 7.1 states the working principle for NP domino Carry gates in a
chain. ⇑ and ⇓ symbols indicate the positive and negative transition in the
evaluation phase. Output Cn+1 from an N type is applied further to a P type
Carry gate. In the same way,Cn+2 is applied further to the next N type Carry
gate. In this way, the carry signal propagates through the NP domino chain.
55
Inputs N type Output N type Inputs P type Output P type
Cn An Bn Cn+1 Cn+1 Cn+1 An+1 Bn+1 Cn+2 Cn+2
0 0 0 1 0 1 0 ⇓ 0 ⇓ 1 ⇑ 0 ⇓
0 0 1 ⇑ 1 0 1 0 ⇓ 1 0 1
0 1 ⇑ 0 1 0 1 1 0 ⇓ 0 1
0 1 ⇑ 1 ⇑ 0 ⇓ 1 ⇑ 0 ⇓ 1 1 0 1
1 ⇑ 0 0 1 0 1 0 ⇓ 0 ⇓ 1 ⇑ 0 ⇓
1 ⇑ 0 1 ⇑ 0 ⇓ 1 ⇑ 0 ⇓ 0 ⇓ 1 1 ⇑ 0 ⇓
1 ⇑ 1 ⇑ 0 0 ⇓ 1 ⇑ 0 ⇓ 1 0 ⇓ 1 ⇑ 0 ⇓
1 ⇑ 1 ⇑ 1 ⇑ 0 ⇓ 1 ⇑ 0 ⇓ 1 1 0 1
Table 7.1: The working principle for the NP Carry gates in a Domino chain.
Simulation results shown in Figure 7.2 demonstrates a 32-bit NP
domino carry chain implemented in Figure 7.1. Simulation results are
considered for the worst case scenario, where all the full adders operate
in the propagation mode. One of the two input signals A or B arrive prior to
the Cin signal, however other input signal assume to retain the precharged
value. All the input signals arrive in parallel for every NP domino Carry
gate, assuming that all the input signals originate from a register or flip
flop. All the transistors except the evaluation transistors labeled N and P
assumed to be minimum sized during the simulation response. The bulks
of N and P transistors are floating in order to lower the threshold voltage of
the transistors by increasing the strength.
50 55 60 65 70 75 80 85 90
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
C_in B A C_out[2,4,6,...32]
5.2ns
139mV
(a) Output nodes from P type Carry gates and the input signals.
50 55 60 65 70 75 80 85 90 95
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31]
133mV
(b) Output nodes from N type Carry gates.
Figure 7.2: Simulation result of 32-bit carry chain 1.
The subgraph in Figure 7.2a shows the input signal to the first N type
56
Carry gate and the output signals from the P type Carry gates, however plots
in subgraph 7.2b demonstrates output signals from N type carry gates. The
total propagation delay is 5.2ns from the Cin bit switches to the last Cout bit
response in the chain. However last carry bits are obstructed.
The reason for the obstruction is clarified from the plots. During the
evaluation phase, the output nodes from the last Carry gates in the chain
drift to 139mV and 167mV for the P and N type Carry gates respectively
prior to the correct input bits arrive. This turns on the evaluation
transistors labeled EN3 and EP3 in the N and P type UlV domino Carry 1
gates respectively, which results in false rising and falling transitions at the
output nodes of the later Carry 1 gates in the chain. The power signals φ
and φ slowly leaks toward the output nodes through the PUN and PDN
for N and P type respectively, meanwhile the floating output nodes still
wait for the correct input transitions. As all the output nodes are serially
cascaded, so the leakage voltage to a certain extent on the output nodes
able to turn on the evaluation transistor in the following Carry gate which
further gives false transitions on all the following NP domino Carry gates.
In the simulation results shown in Figure 7.2, this false transition occurs
after 20-bit in the 32-bit carry chain.
To overcome the leakage problem discussed in the above paragraph,
the evaluation transistors labeled N and P should make stronger enough
to hold the precharged value until the correct input transitions occur. The
strength of the transistor is increased by lowering the threshold voltage of
the transistor.
100 110 120 130 140 150 160 170
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
C_in B AC_out[2,4,6,...32]
34mV
8.6ns
(a) Output nodes from P type Carry gates and the input signals.
100 110 120 130 140 150 160 170
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 54mV
(b) Output nodes from N type Carry gates.
Figure 7.3: Simulation result of 32-bit carry chain 1 with FBB on N and P
transistors.
The strength of the evaluation transistors labeled P is increased by
57
increasing the transistor length L 3×Lmin. Furthermore, FBB is applied at
the bulk terminals of P transistors by connecting the bulks to GND. To
make the transistor more stronger, the width W of the pMOS is increased
1.5×Wmin as increasing the length more than 3×Lmin is not anymore an
effective knob.
On the other hand, the evaluation transistor N should make stronger as
well to retain the precharged value. As the nMOS transistor has a much
higher mobility than the pMOS transistor, thus N transistor does not need
to make as stronger as P. The strength of N transistor is increased by
applying FBB on the bulk terminals by connecting bulks to VDD.
The simulation results of ULV NP domino 32-bit carry chain by tuning
the evaluation transistors N and P is shown in Figure 7.3. All the transistors
except the evaluation transistors N and P are minimum sized low-threshold
transistors.
The simulation results in Figure 7.3 conclude that the leakage problem
is solved by increasing the strength of the specific transistors. The voltage
deviation at the output nodes of P type carry gates is reduced from 139mV
to 34mV. On the other hand, the voltage deviation at the output nodes of N
type carry gates is reduced from 133mV to 54mV. However, the propagation
delay is slightly increased from 5.2ns to 8.6ns in the worst case scenario. The
simulated power consumption is only 667nW for this scenario. The delay
and EDP relative to conventional CMOS 32-bit carry chain is only 2.68%
and 8.11% respectively.
The simulation results conclude that the worst case speed scenario for
a 32-bit carry chain is not probably the worst case power consumption
scenario for the proposed ULV domino logic style. The proposed ULV
domino 32-bit carry chain consumes a certain amount of power in the
evaluation phase when the output nodes hold the precharged value and
Wai t for the correct transitions at the input nodes. The power consumption
for the proposed 32-bit chain in the Wai t mode is 687nW with an average
deviation of 28mV from the rails at the output nodes.
7.1.1 A solution without Forward Body Biasing on nMOS
transistor
Applying Forward Body Biasing (FBB) scheme at the body of the nMOS
transistors or keep the bulks floating can be a challenging task and increase
the complexity during the layout stage in the TSMC process as the demand
of implementing deep n-well is necessary in order to isolate the body of
nMOS transistors. Assuming the N type ULV domino Carry 1 gate, all the
nMOS recharge transistors labeled RN and the keeper transistor labeled KN
are minimum sized low threshold transistors sharing the same P-substrate
which is connected to GND. However the substrate of the evaluation
transistor labeled N is connected to VDD in order to increase the strength
of the transistor. In this case, the substrate of the evaluation transistor N
should be isolated; otherwise there will be short via substrate. To do this,
we apply a deep n-well. The main disadvantage of implementing deep n-
well is increment in the area.
To avoid the problems which can be faced during FBB of nMOS
58
transistors due to the implementation of deep n-well, the standard RBB
technique has been used on the bulk terminal of evaluation transistor N by
connecting the bulk terminal to GND. In order to retain the precharged
value until the correct input transitions occur, the N transistor’s sizing
has been modified to increase the transistor strength. The transistor’s
length L is increased by 2×Lmin and the width W is increased by 2×Wmin.
Furthermore, the strength of P evaluation transistor is also increased by
increasing L 3×Lmin and W 2×Lmin to retain the precharged value in a
cascaded chain.
100 110 120 130 140 150 160 170
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
C_in B AC_out[2,4,6,...32]
8.8ns
30mV
(a) Output nodes from P type Carry gates and the input signals.
100 110 120 130 140 150 160 170
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 46.1mV
(b) Output nodes from N type Carry gates.
Figure 7.4: Simulation result of 32-bit carry chain 1 without FBB on nMOS
transistor N .
Figure 7.4 demonstrates the simulation results of 32-bit carry chain by
applying RBB technique at the bulk terminals of evaluation transistor N.
The propagation delay is slightly increased to 8.8ns, giving the advantage
of better noise margin and probably less area consumption for nMOS
transistor N during the layout stage as compared to the implementation
of deep n-well. The noise margin is increased as the off-current Ioff is
reduced by connecting the substrate to GND for N transistor. The total
power consumption of 32-bit carry chain in the worst case scenario is
705.5nW which is slightly increased compared to the previous solution as
the dimensions of both N and P evaluation transistors are increased. The
delay and EDP relative to conventional CMOS 32-bit carry chain is only
2.8% and 9.3% respectively.
The power consumption for the proposed 32-bit carry chain in the Wai t
mode is 685nW with an average deviation of 24mV from the rails at the
output nodes.
59
7.1.2 32-bit carry chain utilizing Multi-threshold CMOS
Technique (MTCMOS)
MTCMOS technique is employed to maintain both speed and power
performance. Low threshold transistors L− thr are used in speed critical
paths while the leakage power is suppressed on the other paths by
implementing high threshold transistors H − thr . The threshold voltage
_ф
ф
EN2
RP2 ф
_ф
EN4
P1
RN1
RP4 фRP5
EN5
_ф
ф
EN3
RP3ф
EN1
RP1
A0
B0
0Cin
A0 B0
1Cout
_
N type
Carry 1
Cin Cout
A _B
KP
H-thr
L-thr
(a) N type.
EP2
ф ф
EP1
EP5
EP4
ф
ф
N1
EP3
RN1
RN2RN3
RN4 RN5
0
_
Cout
1Cin A1
B1A1 B1
RP1
P type
Carry 1
Cin Cout
A _B
KN
(b) P type.
Figure 7.5: ULV domino Carry 1 Gates utilizing MTCMOS technology.
of the transistors for ULV NP domino Carry 1 gate is modified by applying
MTCMOS technique. All the evaluation transistors labeled E , N and P are
L− thr transistors in order to enhance the speed performance and produce
the correct transitions at the output nodes. All the keeper transistors
labeled K are also minimum sized L − thr transistors. The recharge
transistors RN for N type Carry gates and RP for P type Carry gates are also
minimum sized L− thr transistors as shown in the schematic in Figure 7.5.
All the recharge transistors labeled RP in N type ULV domino Carry 1
gate and RN in P type ULV domino Carry 1 gate are minimum sized H− thr
transistors in order to suppress the leakage problem as shown in Figure 7.5.
60
420 430 440 450 460 470 480 490 500 510
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
C_in B AC_out[2,4,6,...32]
24.18ns
23mV
(a) Output nodes from P type Carry gates and the input signals.
420 430 440 450 460 470 480 490 500 510
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 23mV
(b) Output nodes from N type Carry gates.
Figure 7.6: Simulation result of 32-bit carry chain 1 utilizing MTCMOS
technique.
Figure 7.6 demonstrates the simulation results for the 32-bit carry chain
1 which is utilized by MTCMOS technology in order to reduce the power
consumption. As the simulation results conclude, the propagation delay
is increased to 24.18ns by implementing H − thr recharge transistors. The
floating output nodes of P and N type carry gates are pulled slightly towards
1 and 0 respectively before the carry input signal arrives, but the deviation
is decreased compared to the previous simulation results. The greatest
advantage is on the power consumption as expected. The 32-bit carry chain
only consumes a power of 517nW in the worst case scenario, which is far
lower than the previous simulation results. The delay and EDP relative to
conventional CMOS 32-bit carry chain is 7.54% and 49% respectively.
The power consumption for the proposed 32-bit carry chain in the Wai t
mode is 655nW with an average deviation of 26mV from the rails at the
output nodes.
7.1.3 32-bit carry chain utilizing Variable-threshold CMOS
Technique (VTCMOS)
Variable-threshold CMOS (VTCMOS) is another efficient method to reduce
the leakage power of the circuits. The speed critical transistors can be
biased by adopting VTCMOS technology. As mentioned in the previous
sections, the evaluation transistors labeled N and P in ULV NP domino
Carry 1 gate should be made stronger in order to retain the precharged
value until the correct input transitions arrive. These transistors can
be made stronger by lowering the threshold voltage by different means
as mentioned earlier. One of the efficient node to lower the transistor
61
threshold voltage is the body biasing scheme by having either floating or
FBB bulks. This lowers the transistor threshold voltage at the cost of high
leakage power.
210 220 230 240 250 260 270 280 290 300
0
0.1
0.2
0.3
0.4
Time(ns)
V
ol
ta
ge
(V
)
Input signals and output from P type Carry gates
C_out[2,4,6,...32]
C_in B A
16ns
30mV
(a) Output nodes from P type Carry gates and the input signals.
210 220 230 240 250 260 270 280 290 300
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 35mV
(b) Output nodes from N type Carry gates.
Figure 7.7: Simulation result of 32-bit carry chain implemented in Circuit
7.1 utilizing VTCMOS technique.
Using VTCMOS, the body voltage of the evaluation transistors labeled
N is dynamically switched to VDD andGND in the evaluation and precharge
phase respectively. This lowers the threshold voltage of N transistor only
in the speed critical evaluation phase, otherwise the bulk terminal is RBB .
This can be achieved by connecting the bulk terminal to the clock signal φ.
In the same way, VTCMOS technique is applied on the P type evaluation
transistor by connecting the bulk terminal to clock signal φ.
The graphs presented in Figure 7.7 demonstrate the simulation results
of a 32-bit carry chain in the propagation mode where all the domino Carry
gates are enhanced by VTCMOS technology. The speed performance is
improved compared to the previous simulation results using MTCMOS
technology at the cost of reduction in noise margin before the transition
occurs at the output nodes. Somehow the total propagation delay and
power consumption is 16ns and 718nW respectively. Relative to the
conventional 32-bit carry chain, the delay and EDP is only 4.99% and 30.24%
respectively.
The power consumption for the proposed 32-bit carry chain in the Wai t
mode is 660.7nW with an average deviation of 20.5mV from the rails at the
output nodes. The power consumption is increased in both the propagation
and the Wai t mode as compared to the simulation results obtained by
utilizing MTCMOS technology.
62
7.1.4 VTCMOS and MTCMOS Technique
In this section, a hybrid solution of both MTCMOS and VTCMOS tech-
niques is applied. MTCMOS technique is employed by implementing both
H − thr and L − thr transistors. All the transistors are L − thr except the
recharge transistors labeled RP in N type ULV domino Carry 1 gate and RN
in P type ULV domino Carry 1 gate which are H − thr transistors. Mean-
while, VTCMOS technique is applied on the evaluation transistors labeled
N and P by connecting the bulk terminals to φ and φ respectively.
480 490 500 510 520 530 540 550 560 570 580 590
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
C_in B AC_out[2,4,6,...32]
19.6ns
23mV
(a) Output nodes from P type Carry gates and the input signals.
480 490 500 510 520 530 540 550 560 570 580 590
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 27mV
(b) Output nodes from N type Carry gates.
Figure 7.8: Simulation result of 32-bit carry chain implemented in Circuit
7.1 utilizing both MTCMOS and VTCMOS techniques.
Graphs in Figure 7.8 represents the simulation results for the 32-bit
ULV NP domino carry chain utilizing both combination of both MTCMOS
and VTCMOS techniques. The speed performance for the hybrid style
is better than only MTCMOS technique. However, the delay is slightly
increased compared to VTCMOS technology. But the figure of merits
are much better compared either to VTCMOS or MTCMOS technique.
The total propagation delay and power consumption is 19.6ns and 410nW
respectively. The proposed hybrid style consumes minimum amount power
than all the previous simulation results. Relative to the conventional 32-bit
carry chain, the delay and EDP is only 6.1% and 25.9% respectively.
The power consumption for the proposed 32-bit carry chain in the Wai t
mode is 652nW with an average deviation of 26.5mV from the rails at
the output nodes. The proposed hybrid style consumes minimum power
than all the previous implementations in both propagation mode and Wai t
mode.
63
7.2 32-bit carry chain using NP domino Carry 2
gates
N type
Carry 2
Cin
Cout[0]A[0]
_
B[0]
_
B[0]
_
P type
Inverter
Cout[0]
A[1]
B[1]
B[1]
_
P type
Carry 2
Cout[1] N type
Inverter
Cout[1]
_
N type
Carry 2
C[n-2]
Cout[n-1]A[n-1]
_
B[n-1]
_
B[n-1]
_
P type
Inverter
Cout[n-1]
A[n]
B[n]
B[n]
_
P type
Carry 2
Cout[n] N type
Inverter
Cout[n]
_
Figure 7.9: NP domino n-bit carry chain 2.
Schematic in Figure 7.9 represents an n-bit chain of NP domino Carry 2
gates which are implemented utilizing Pass Transistor Logic (PTL). The
main difference from the previous n-bit carry chain (in Figure 7.9) is the
⇓B and ⇓Cin signals to the N type carry gate together with the other input
bits ⇑ A and ⇑ B . The ⇑ and ⇓ symbols represent the monotonically rising
and falling transitions in the evaluation phase. This topology reduces the
total number of evaluation transistors.
Although the proposed topology is very efficient in terms of speed,
area and power as concluded from the simulation results obtained in the
previous chapter. However, cascading the proposed NP domino Carry 2
gates is quite challenging as the necessity of connecting an ULV domino
inverter is compulsory in order to obtain the required carry signal for the
further carry gate in the chain. Considering the example of N type carry
gate shown in Figure 7.9. A P type ULV domino inverter is connected at
the output node of an N type ULV domino Carry 2 gate in order to obtain
a monotonically rising carry signal for the further P type carry gate in the
chain.
Simulation results shown in Graph 7.10 demonstrates the 32-bit carry
chain shown in Figure 7.9. The carry chain is simulated with the worst
case scenario, considering that B switches from 0 to 1 prior to carry input
signal in the start of evaluation phase, while A holds the precharged value
in the whole evaluation phase. The propagation delay is 16.1ns which
shows the degradation in the speed performance compared to the first 2
simulation results (implemented in Figure 7.3 and 7.4) for the carry 1 chain
implemented in Figure 7.1. This is due to the addition of extra ULV NP
domino inverters at the output nodes of every ULV NP domino Carry 2 gate.
64
180 190 200 210 220 230 240 250 260
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
Input signals and output from P type Carry gates
C_in B A
C_out[2,4,6,...32]
16.1ns
32.4mV
(a) Output nodes from P type Carry gates and the input signals.
180 190 200 210 220 230 240 250 260
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
Output from N type Carry gates
C_in
C_out[1,3,5,...31] 24mV
(b) Output nodes from N type Carry gates.
Figure 7.10: Simulation result of 32-bit carry chain implemented in Circuit
7.9 when only input bits B get transitions.
On the other hand, the presented simulation results offer an advantage
in terms of robustness as the deviation is very close to rails, i.e. VDD and
GND before the transition occurs at the output node. This concludes that
removing the serially connected evaluation transistors suppress the leakage
problem at the output nodes. Although 32 extra NP domino inverters are
added in the proposed chain, the power consumption is 1067nW which is
still acceptable compared to previous simulation results obtained in Section
7.1. Relative to the conventional 32-bit carry chain, the delay and EDP is
only 5.02% and 45.4% respectively.
Unlike the simulation results in the previous section, the propagation
delay of the proposed carry chain varies enormously as input bits A get
transitions instead of input bits B in the evaluation phase as shown in
simulation results in Figure 7.11. This is due to the fact that both evaluation
transistors, for example in an N type ULV domino 2 Carry gates, EN1 and
EN2 turn on as A switches from 0 to 1. This increases the contention current
at the output node as B attempts to hold the output node to 1, whileCin pulls
the output node to 0. The propagation delay is 34ns and the propagation
delay is 973.6nW. However, the proposed ULV NP domino carry chain
offers the best robustness performance with almost no deviation from the
rails.
The proposed carry chain offers a great advantage in robustness and
power consumption of only 396.4nW when no input bits arrive in the
evaluation phase. There is almost no deviation from the rails in the Wai t
mode.
65
180 200 220 240 260 280 300 320
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
Input signals and output from P type Carry gates
34ns
C_in A B
C_out[2,4,6,...32]
(a) Output nodes from P type Carry gates and the input signals.
180 200 220 240 260 280 300 320
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31]
C_in
(b) Output nodes from N type Carry gates.
Figure 7.11: Simulation result of 32-bit carry chain implemented in Circuit
7.9 when only input bits A get transitions.
7.3 32-bit carry chain using NP domino Carry 3
gates
An n-bit carry chain is implemented in Figure 7.12 where the ULV NP
domino Carry 3 gates with PTL (implemented in Figure 6.8 on page
46) are cascaded together. The main advantage of the proposed carry
chain in Figure 7.12 compared to the previous NP domino carry chain
2 (implemented in Figure 7.9) is the obviation of the ULV NP domino
inverters at the output nodes of each NP ULV domino Carry 3 gate. This
impacts on both speed and power performance. The bulks of the evaluation
transistors labeled Ep in the P type ULV domino Carry 3 gates are floating
in order to increase the strength of these transistors.
N type
Carry 3
Cin
Cout[0]
A[0] _
A[0]
B[0]
_
A[1]
A[1]
B[1]
_
P type
Carry 3
Cout[1]
_
_ _ N type
Carry 3
Cout[n-1]
A[n-1]
_
A[n-1]
B[n-1]
_
A[n]
A[n]
B[n]
_
P type
Carry 3
Cout[n]
_
_ _
Cout[n-2]
Figure 7.12: NP domino n-bit carry chain 3.
The graphs plotted in Figure 7.13 demonstrates the simulation result
of 32-bit carry chain 3, considering the worst case scenario when the
transitions only arrive at the input nodes B in the evaluation phase.
66
Comparing with the simulation results in the carry chain 2 (implemented
in Figure 7.10 which is simulated for the same scenario) concludes that
the speed performance is almost the same, i.e. 16.6ns, while the power
consumption is reduced to 795nW with the proposed ULV domino carry
chain 3. However, the noise margin is decreased before the transition
occurs at the output nodes. Relative to the conventional 32-bit carry chain,
the delay and EDP is only 5.17% and 36% respectively.
200 210 220 230 240 250 260 270
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_in A B
16.6ns
53mV
C_out[2,4,6,...32]
(a) Carry output nodes from P Type Carry gates and input signals.
200 210 220 230 240 250 260 270
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 57mV
(b) Carry output nodes from N Type Carry gates.
Figure 7.13: Simulation result of 32-bit carry chain implemented in Circuit
7.12 when only input bits B get transitions.
The graphs plotted in Figure 7.14 demonstrates the simulation results
of the 32-bit carry chain 3, considering the worst case scenario where
the transitions only arrive at the input nodes A in the evaluation phase
while input nodes B remain at the precharged value. This turns on all
the three evaluation transistors when transitions arrive at the carry input
nodes. Considering N type Carry 3 gate, the contention current is increased
as B attempts to hold the output node to 1 through EN1 and EN2 while
A pulls the output node to 0 through EN3. This decreases the speed
performance significantly and the simulated propagation delay is 71.9ns for
the presented scenario. Comparing with the previous simulation results,
the proposed 32-bit carry chain 3 offers the least power consumption of
only 405.5nW. The average deviation from the rails at the output nodes is
only 22mV before the correct input transitions arrive. However, the main
drawback is the significant degradation in the speed performance.
The proposed carry chain offers a great advantage in robustness with
almost no deviation from the rails in the Wai t mode. It only consumes a
power of 4.85nW in the Wai t mode.
67
350 400 450 500 550
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[2,4,6,...32]
C_in A B
71.9ns
21.75mV
(a) Output nodes from P Type Carry gates and input signals.
350 400 450 500 550
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 22mV
(b) Output nodes from N Type Carry gates.
Figure 7.14: Simulation result of 32-bit carry chain implemented in Circuit
7.12 when only input bits A get transitions.
7.4 New implementations of 32-bit carry chain
exploiting PTL
ф
EN3
RP3 фRP4
EN4
_ф
P1
RN1
A B
Cout
_
_
Cin
KP
_ф
ф
EN2
RP2
ф
EN1
RP1
A
B
(a) N type.
EP2EP1
EP4
A B
Cin
Cout
RN1 RN2
RN3
ф
N1
RP1
KN
EP3
RN3 ф
A
B
(b) P type).
Figure 7.15: ULV domino Carry Gates using PTL (Carry 4).
68
The simulation results in the previous two sections utilizing PTL conclude
that the propagation delay through the 32-bit carry chain 2 and 3 alters
significantly in the worst case scenario when only input bits A switch
instead of input bits B . The main cause of the speed degradation is the
significant enhancement in the contention current, for example B attempts
to hold the output node to 1 through EN1 in N type Carry gates with PTL
(implemented in Figure 6.6a on page 44 and 6.8a on page 46).
A new implementation of ULV NP domino Carry 4 gates is presented in
Figure 7.15 to suppress the delay problem discussed in the above paragraph.
Circuits in Figure 7.15a and 7.15b represents N and P type ULV domino
Carry 4 gates respectively. Two serially connected evaluation transistors
are proposed in Figure 7.15 instead of a single pass transistor in order to
reduce the contention problem occurred in the ULV NP domino Carry 2
gates.
300 310 320 330 340 350 360 370 380
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
Input signals and output from P type Carry gates
A B
C_out[2,4,6,...32]
16.83ns
31mV
(a) Output nodes from P Type Carry gates and input signals.
300 310 320 330 340 350 360 370 380
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31]
26mV
(b) Output nodes from N Type Carry gates.
Figure 7.16: Simulation result of 32-bit carry chain 4 implemented in Circuit
7.15 when only input bits A get transitions.
Simulation results shown in graphs in Figure 7.16 demonstrates the 32-
bit carry chain 4 implemented by cascading the ULV NP domino Carry 4
gates. The worst case scenario is assumed where both input bits A and B
get positive transitions in the first N type Carry 4 gate, while only input bits
A get transitions on all other ULV NP domino Carry 4 gates in the chain.
The propagation delay is only 16.8ns which is only 50% relative to the 32-bit
propagation delay for the same scenario as shown in simulation results in
Figure 7.11 on page 66.
The graphs shown in Figure 7.17 demonstrate the speed performance of
the 32-bit carry chain 4. The simulations are considered for the worst case
scenario, where both input bits A and B have positive transitions for the
69
first N type Carry 4 gate, while only input bits B switch in the rest of the NP
domino Carry 4 gates in the chain.
300 310 320 330 340 350 360 370
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
Input signals and output from P type Carry gates
B A
C_out[2,4,6,...32]
16.89ns
25mV
(a) Output nodes from P Type Carry gates and input signals.
300 310 320 330 340 350 360 370
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 26mV
(b) Output nodes from N Type Carry gates.
Figure 7.17: Simulation result of 32-bit carry chain 4 implemented in Circuit
7.15 when only input bits B get transitions.
The simulation results presented in both Figure 7.16 and 7.17 conclude
that the propagation delay is almost the same whether the transitions arrive
at input nodes A or B . Thus the proposed Carry 4 gates provide a much
better solution in terms of speed performance as compared to the 32-bit
carry chain 2. However, the power consumption is increased to 1152nW.
Relative to the conventional 32-bit carry chain, the delay and EDP is only
5.23% and 53% respectively.
However, the main drawback is the degradation in the power consump-
tion and robustness when no transitions arrive at the input nodes and the
output nodes hold the precharged value. The power consumption is 809nW
with an average deviation of 16.5mV from the rails in the Wai t mode.
An new implementation for ULV NP domino Carry 5 gates are proposed
in Figure 7.18 in order to increase the speed performance by reducing the
contention current, which is increased while only input bits A get transition
in the worst case scenario in 32-bit ULV domino carry chain 3 (implemented
in Figure 7.12 on page 66). The propagation delay is increased to 71.9ns as
shown in simulation results in Figure 7.14 (page 68).
Two serially connected evaluation transistors are proposed in Figure
7.18 instead of a single pass transistor in order to reduce the contention
problem occurred in the ULV NP domino Carry 3 gates.
The graphs in Figure 7.19 demonstrates the worst case scenario of 32-
bit NP domino Carry 5 gates where only input bits A switch at the input
nodes of every ULV NP domino Carry 5 gates in the chain. The propagation
70
фEN3
RP3 фRP4
EN4
_ф
P1
RN1
A
Cout
_
_
KP
_ф
ф
EN2
RP2
ф
EN1
RP1
A
B
Cin Cin
B
_
(a) N type.
EP2EP1
EP4
Cout
RN1 RN2
RN4
ф
N1
RP1
KN
EP3
RN3 ф
A
B
C in C in
A B
(b) P type.
Figure 7.18: ULV domino Carry Gates using PTL (Carry 5).
280 290 300 310 320 330 340 350 360 370
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
Input bits and output from P type Carry gates
C_in A B
42.8mV
C_out[2,4,6,...32]
14.4ns
(a) Output nodes from P Type Carry gates and input signals.
280 290 300 310 320 330 340 350 360 370
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31] 230mV
(b) Output nodes from N Type Carry gates.
Figure 7.19: Simulation result of 32-bit carry chain 5 implemented in Circuit
7.18 when only input bits A get transitions.
71
delay of 32-bit carry chain 5 is significantly decreased. The delay is only
20% relative to the delay shown in simulation results in Figure 7.14 (on page
68). Speed performance is improved at the cost of slightly increment in the
power consumption. The 32-bit propagation delay and power consumption
is 14.5ns and 644nW respectively. Relative to the conventional 32-bit
carry chain, the delay and EDP is only 4.5% and 22.27% respectively. The
deviation from the rails is 55mV at the output nodes before the correct input
transitions arrive.
280 290 300 310 320 330 340 350 360 370
0
0.05
0.1
0.15
0.2
0.25
0.3
V
ol
ta
ge
(V
)
Input signals and ouput from Ptype Carry gates
Time(ns)
C_out[2,4,6,...32]
C_in A B
16.2ns
28.6mV
(a) Output nodes from P Type carry gates and input signals.
280 290 300 310 320 330 340 350 360 370
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31]
65mV
(b) Output nodes from N Type carry gates.
Figure 7.20: Simulation result of 32-bit carry chain 5 implemented in
Circuit 7.18 when only input bits B get transitions.
The graphs shown in Figure 7.20 demonstrate the simulation results
of 32-bit carry chain 5. The simulations are considered for the worst case
scenario when the transitions only arrive at the input nodes B in all ULV NP
domino Carry 5 gates. The circuit offers improvement in both speed and
power consumption compared to the simulation results shown in Figure
7.13 (on page 67). The 32-bit propagation delay and power consumption
is 16.2ns and 615.8nW respectively. Relative to the conventional 32-bit
carry chain, the delay and EDP is only 5.05% and 26.5% respectively. The
deviation from the rails is 46mV at the output nodes before the correct input
transitions arrive.
The simulation results conclude that 32-bit carry chain 5 offers the most
fair performance in terms of power, robustness and speed compared to all
previous 32-bit carry chain implementations, in both propagation andWai t
mode. The power consumption is only 313nW with an average deviation of
11.5mV from the rails in the Wai t mode.
72
7.5 Performance of ULV 32-bit carry chains at
different supply voltages
Two configurations of proposed 32-bit NP domino carry chains are simu-
lated at different supply voltages, and the performance is compared with
conventional 32-bit carry chain with respect to speed, power and EDP.
240 260 280 300 320 340 360
10
20
30
40
50
60
VDD[mV]
D
el
ay
(n
s) ULV ULV−PTL
(a) Delay in ns
240 260 280 300 320 340 360
3
4
5
6
7
8
9
10
VDD[mV]
R
el
at
iv
e 
D
el
ay
(%
) ULV ULV−PTL
(b) Delay (%) relative to conventional.
Figure 7.21: Delay for two ULV NP domino 32-bit carry chains compared
with conventional 32-bit carry chain for different supply voltages.
The plots in Figure 7.21 demonstrates the propagation delay for two
proposed ULV domino 32-bit carry chains in the worst case scenario.
The dashed curve named ULV represents the speed performance for ULV
domino 32-bit carry chain implemented in Figure 7.1 (on page 55) and the
solid curve named ULV −PTL represents the speed performance for 32-bit
ULV domino Carry 4 gates presented in Figure 7.15 (on page 68).
The subgraph in Figure 7.21a represents the propagation delay for
proposed domino 32-bit carry chains. The simulation results conclude that
ULV domino carry chain offers better speed performance than ULV −PTL
carry chain. The propagation delay is under 65ns for the supply voltage of
200mV, and decreases to 15ns when the supply voltage rises to 400mV.
The propagation delay relative to the conventional CMOS 32-bit carry
chain is presented in Figure 7.21b. The relative delay is less than 10% for
the simulated ULV domino carry chains as the supply voltage varies from
200mV to 400mV.ULV carry chain offers the best speed performance at the
supply voltage of 275mV, where the delay is only 1% relative to conventional
CMOS carry chain.
73
240 260 280 300 320 340 360
100
102
VDD[mV]
Lo
g[P
ow
er 
Co
ns
um
pti
on
(nW
)]
 
 
ULV ULV−PTL Conventional
Figure 7.22: Power consumption of ULV domino 32-bit carry chain
compared to conventional CMOS carry chain.
The graph in Figure 7.22 compares the power consumption of proposed
ULV NP domino 32-bit carry chains and the conventional 32-bit carry chain
in the worst case scenario. The graph concludes that the proposed domino
carry chains consume more power than the conventional carry chain as
expected, giving the advantage of superb speed performance. The graph
also concludes thatULV consumes less power thanULV −PTL at low supply
voltages, however when the supply voltage exceeds 290mV, ULV consumes
more power than ULV −PTL. Although ULV consumes less power than
ULV −PTL, it still results in better speed performance as mentioned earlier.
240 260 280 300 320 340 360
105
106
VDD[mV]
Lo
g[
E
D
P
](j
s)
*1
0−
27
] ULV ULV−PTL Conventional
(a) EDP (Logarithmic scale))
240 260 280 300 320 340 360
101
102
VDD[mV]
Lo
g[
R
el
at
iv
e 
E
D
P
(%
)]
ULV ULV−PTL
(b) EDP (%) relative to conventional.
Figure 7.23: EDP for two ULV NP domino 32-bit carry chains compared
with conventional 32-bit carry chain for different supply voltages.
Energy delay product (EDP) for the proposed 32-bit carry chain is
compared with the conventional carry chain for different supply voltages.
74
The simulated results are implemented in graphs in Figure 7.23. Figure
7.23a presents the logarithmic y-axis which concludes that EDP for the
proposed ULV domino carry chain is far lower than the conventional carry
chain at ultra low supply voltages. The graph in Figure 7.23b proves that the
relative EDP of ULV carry chain is less than 5% of the conventional carry
chain at the supply voltage of 275mV.
7.6 Summary
A summary of various implementations of 32-bit carry chain together with
the strength factor parameters of different kinds of transistors is shown in
this section. In Table 7.2, the strength parameters for different kinds of
transistors are given.
Simulation results Transistor strength parameters(Sizing(nm)(WL ) and biasing)
Figure RM E R K P N
7.2 WminLmin
L-thr RBB WminLmin L-thr RBB
Wmin
Lmin
L-thr F WminLmin L-thr F
7.3 WminLmin
L-thr RBB WminLmin L-thr RBB
150
300 L-thr FBB
Wmin
Lmin
L-thr FBB
7.4 WminLmin
L-thr RBB WminLmin L-thr RBB
200
300 L-thr FBB
200
200 L-thr RBB
7.6 WminLmin
H-thr RBB WminLmin L-thr RBB
150
300 L-thr FBB
120
100 L-thr FBB
7.7 WminLmin
L-thr RBB WminLmin L-thr RBB
200
300 L-thr V
Wmin
Lmin
L-thr V
7.8 WminLmin
H-thr RBB WminLmin L-thr RBB
150
200 L-thr V
Wmin
Lmin
L-thr V
7.10 and 7.11 WminLmin L-thr RBB
Wmin
Lmin
L-thr RBB 120200 L-thr FBB
Wmin
Lmin
L-thr F
7.13 and 7.14 WminLmin L-thr RBB
Wmin
Lmin
L-thr RBB 120200 L-thr FBB
120
200 L-thr RBB
7.16 and 7.17 WminLmin L-thr RBB
Wmin
Lmin
L-thr RBB 120200 L-thr FBB
120
100 L-thr F
7.19 and 7.20 WminLmin L-thr RBB
Wmin
Lmin
L-thr RBB 120200 L-thr FBB
120
175 L-thr RBB
Table 7.2: Strength parameters for different transistors in various configu-
rations of 32-bit carry chains.
RM represents the recharge transistors labeled RP in the N type ULV
domino Carry gates and RN in all P type ULV domino Carry gates. K
represents the keeper transistors labeled KN in all the N type ULV domino
Carry gates and KP in all the P type ULV domino Carry gates. E represents
all the evaluation transistors labeled EN in all the N type ULV domino Carry
gates and EP in all the P type ULV domino Carry gates. R represents the
recharge transistors labeled RN in all the N type ULV domino Carry gates
and RP in all P type ULV domino Carry gates. P and N represents the
evaluation transistors labeled P and N in all the P and N type ULV domino
Carry gates.
The three most important strength tuning factors for all these different
transistors are also given in Table 7.2. The first parameter represents
the dimensions of the transistors. The second parameter represents the
low threshold (L − thr ), standard threshold (S − thr ) and high threshold
(H − thr ] transistors. The third parameter represents the biasing scheme
at the bulk terminals of the transistors. F represents floating bulk
terminals. RBB represents the Reverse Body Biasing at the bulk terminals
75
by connecting the bulk terminals to VDD for pMOS and GND for nMOS.
FBB represents Forward Body Biasing at the bulk terminals by connecting
the bulk terminals to VDD for nMOS and GND for pMOS. V represents
the variable threshold voltage for the transistors by connecting the bulk
terminals of pMOS to φ and nMOS to φ.
Table 7.3 summarizes the performance of different configurations of 32-
bit carry chains simulated for the worst case scenario. The table represents
the propagation delay TD together with the power consumption, deviation
from the rails (Dev) and other figure of merits (PDP and EDP). Besides this,
the table also represents the TD, PDP and EDP relative to a conventional 32-
bit carry chain.
Figure Simulation Results Relative(%)
TD(ns) Power (nW) Energy(fj) EDP (yjs) Dev(mV) TD PDP EDP
Conventional 320.7 5.91 1.89 607.8 0 100 100 100
7.2 5.2 1068 5.55 28.88 136 1.62 293.6 4.75
7.3 8.6 667 5.73 49.3 44 2.68 303.2 8.11
7.4 8.97 705.5 6.33 56.76 38 2.8 335 9.34
7.6 24.18 517 12.5 302.2 24 7.54 661 49.7
7.7 16 718 11.48 183.8 33.5 4.99 607 30.24
7.8 19.6 410.7 8.05 157.7 25 6.1 426 25.9
7.10 16.1 1067 17.1 276 27 5.02 904.7 45.4
7.11 34 973.6 33 1125 0 10.6 1746 185
7.13 16.6 795 13.2 219 55 5.17 698 36.03
7.14 71.9 405.5 29.1 2092 22 22.4 1539 344
7.16 and 7.17 16.8 1152 19.3 324 27 5.23 1021 53
7.19 14.5 644 9.38 135.4 55 4.5 496 22.27
7.20 16.2 615.8 9.97 161.6 46 5.05 527 26.5
Table 7.3: Performance of various configurations of 32-bit carry chains in
the worst case scenario.
The simulation results conclude that the worst case speed scenario for a
32-bit carry chain is not probably worst case power consumption scenario
for the proposed ULV domino logic style. The proposed ULV domino 32-bit
carry chains consumes more power in the evaluation phase when the output
nodes hold the precharged value and Wai t for the correct transitions at the
input nodes. The worst case power scenario occurs when the output nodes
hold the precharged value during the whole evaluation phase.
Table 7.4 summarizes the power consumption in the Wai t Mode I
for different configurations of 32-bit carry chains. The carry chains are
simulated assuming that both input bits at the first N type ULV domino
Carry gates remain low in the whole evaluation phase, meanwhile one of
the two input bits switch in all the further ULV domino Carry gates in the
chain. The table also represents the average deviation between the output
nodes of N and P type ULV domino Carry gates, i.e. deviation from 0 and 1
for P and N type ULV domino gates respectively.
76
Figure Simulation Results
Power (nW) Dev(mV)
7.3 1161 51.5
7.4 1163 48.5
7.6 1047 61
7.7 1086 35
7.8 963 43
7.10 1136 30
7.11 539.7 0.9
7.13 1022 72
7.14 475 21
7.16 and 7.17 1172 34
7.19 1072 55
7.20 908 50
Table 7.4: Power consumption and deviation of 32-bit carry chains in the
Wai t Mode I.
Table 7.5 summarizes the power consumption and deviation in the
Wai t Mode II for the various configurations of 32-bit ULV domino carry
chains. The carry chains are simulated assuming that all the input bits
remain 0 and 1 during the evaluation phase for all N and P type ULV domino
Carry gates in the chain respectively. The output nodes hold the precharged
value as no transitions arrive at the input nodes.
Figure Simulation Results
Power (nW) Dev(mV)
7.3 687 28
7.4 685 24
7.6 655.6 26
7.7 660.7 20.5
7.8 652.1 26.5
7.10 and 7.11 396.4 0
7.13 and 7.11 4.85 0
7.16 and 7.17 809 16.5
7.19 and 7.20 313 11.5
Table 7.5: Power consumption and deviation of 32-bit carry chains in the
Wai t Mode II.
The simulation results for different scenarios conclude that for high
speed applications, the carry input signal utilizes only a delay of 8.2ns to
propagate through a proposed ULV NP domino 32-bit carry chain. However
the conventional 32-bit carry chain utilizes a propagation delay of 321ns
to perform the same operation. The delay for the high speed carry chain
is only 2.68% relative to conventional 32-bit carry chain. The high speed
performance has an adverse affect on the power consumption. The high
speed carry chain consumes a power of 1068nW whereas the conventional
32-bit carry gate only consumes a power of 5.91nW. However, EDP for
the proposed high speed 32-bit carry chain is only 4.75% relative to the
conventional carry chain due to the superb speed performance. Although
77
the 32-bit high speed offers a great advantage of speed improvement in the
worst case scenario, the circuit also consumes a power of 687nW when
no transitions arrive at the input nodes and the output nodes hold the
precharged value. This is due to the leakage problem at ultra low supply
voltages in the serially connected transistors.
Using the Multi threshold CMOS technique by implementing both high
threshold and low threshold evaluation transistors, the 32-bit ULV domino
carry chain consumes the least power consumption in the worst case
scenario. The power consumption is only 517nW, offering a propagation
delay of 24ns. The relative delay and EDP is only 7.54% and 49.7%
respectively compared to the conventional 32-bit carry chain. Although the
power consumption is least in the worst case scenario, it increases to 655nW
when no input transitions arrive at the input nodes.
The lowest power consumption is obtained by ULV NP domino 32-
bit carry gate which is utilized by pass transistor logic. The average
propagation delay is 44ns in the worst case scenario for the low power 32-bit
ULV domino carry chain. The delay for this low power 32-bit carry chain is
only 13.7% relative to the conventional 32-bit chain. It consumes a power
of 600nW for the 32-bit operation. It offers the least power consumption of
only 4.85nW when no transitions arrive at the input nodes.
78
Chapter 8
Results - Overview of the
papers
This chapter briefly summarizes all the three papers which are written
throughout the master project. Each of the sections related to the papers
states an outline including motivation, results and conclusion.
8.1 Paper I
High Speed and Ultra Low-voltage CMOS Domino Carry gates
Outline: The 1-bit full adder is the most fundamental arithmetical func-
tion in any kind of processor, and is the building block for many processing
operations like Arithmetic Logic Unit (ALU), Floating Point Unit (FPU) and
Application-Specific Integrated Circuit (ASIC). The overall worst case delay
is obtained when the carry signal propagates through the full adder. This
makes the carry propagation path the most critical and speed limiting fac-
tor for many high speed applications. Three novel configurations of ULV
NP domino carry circuits (Carry 1, Carry 2 and Carry 3) are proposed in this
paper in order to enhance the speed performance of the full adder.
Results: The proposed NP domino carry circuits are designed to oper-
ate at ultra low supply voltages. The performance for the proposed ULV
NP domino carry gates and the conventional CMOS carry gate is compared
with respect to speed, robustness and Energy Delay Product (EDP). The cir-
cuits are characterized at different supply voltages in the region of 200mV to
350mV. The simulation results conclude that the delay for all the proposed
ULV carry gates is lower than 5% relative to the conventional CMOS carry
gate as the supply voltage varies between 220mV to 300mV. Carry 2 gate of-
fers the best speed performance at 250mV where the relative delay is only
2% compared to the conventional CMOS carry gate. The simulation results
also conclude that the EDP is less than 1% for all the proposed ULV domino
carry gates for the supply voltage less than 325mV relative to conventional
CMOS carry gate. Carry 2 offers the EDP of only 0.02% relative to conven-
tional CMOS carry gate at a supply voltage of 250mV.
79
Conclusion: The proposed ULV domino carry gates offer superb speed
performance, i.e. the delay and EDP relative to conventional CMOS carry
gate is less than 5% and 1% respectively at a supply voltage of 325mV. The
presented ULV domino carry gates are proposed for low voltage and high
speed full adders. Due to superb speed performance, the present ULV
domino gates can be a possible substitution for parallel design to enhance
the speed performance of the full adders. This may reduce the area and
complexity of the system.
8.2 Paper II
Static NP Domino Carry gates for Ultra Low Voltage and High Speed
Full Adders
Outline: Paper I[30] resulted in an invitation to submit a journal paper. A
more detailed description for the ULV NP domino carry gates proposed in
[30] is given in this paper. Furthermore, the proposed circuits are analyzed
and simulated in more details in order to obtain the Minimum Energy Point
(MEP).
Results: All the circuits are simulated for the worst case scenario where
the carry signal propagates through the circuits. Relative to conventional
CMOS carry gate, the propagation delay is less than 20% for all the pro-
posed ULV domino carry gates when the supply voltage varies between
175mV and 375mV. The overall best relative delay is achieved by using ULV
domino carry gate with pass transistor logic (PTL). The reason is because
the input carry bit only needs to propagate through a single transistor to
reach the output node. Compared to the conventional carry gate, the least
average delay is achieved at the supply voltage of 275mV, where Carry 2 only
utilizes a relative delay of 2.48%.
The power consumption per ULV domino carry gate is compared with
conventional CMOS carry gate at different supply voltages. As expected
the power consumption for the ULV domino carry gates exceeds the power
consumption of the conventional CMOS carry gate, offering the advantage
of fast switching speed. Carry 2 and Carry 3 are utilized with PTL which
contribute minimum power consumption than Carry 1.
Further, the energy of the ULV domino carry gates is compared with the
conventional CMOS carry gate for different supply voltages. PDP for each
of the proposed ULV carry gates is lower than the conventional carry gate
for the supply voltage between 175mV and 350mV. This is mainly caused
due to significant improvement in the speed performance for the proposed
carry gates relative to the conventional carry gate. All the proposed ULV
domino carry gates have the minimum relative PDP of lower than 25% at
the supply voltage of 250mV, which makes it the Minimum Energy Point
(MEP). Carry 2 is the most efficient solution as it only contributes 15.65%
PDP relative to conventional Carry gate at MEP.
The relative EDP for all the proposed ULV domino carry gates is less
than 30% for the supply voltage between 175mV and 375mV. At MEP
(250mV), the EDP for all the proposed ULV carry gates is lower than 1.5%
80
relative to a conventional carry gate. However, Carry 2 is characterized by
least relative EDP with a value closer to 0.527% at 275mV.
Conclusion: In this paper, different configurations of static ULV NP
domino carry gates are presented using precharge and pass transistor logic.
The proposed ULV domino carry gates are aimed for high speed serial
adders in ultra low-voltage applications. In terms of frequency, speed, PDP
and EDP, the ULV carry gates offers significant improvement compared to
conventional CMOS carry gate. Both complexity and area may be saved
if proposed ULV domino carry gates are implemented instead of applying
carry look ahead techniques or parallel structures in order to enhance the
speed performance for the full adders.
8.3 Paper III
Ultra-Low Voltage and High Speed NP Domino Carry Propagation chain
Outline: All the ULV domino carry gates implemented in [30] add only 2
single bits. If the arithmetic operation of more than 2-bits is desired, the
proposed ULV domino carry gates are cascaded in a serial chain. In this
paper, we exploit the performance of one of the proposed ULV NP domino
carry gates in a 32-bit chain. Further, we compare the performance of pro-
posed ULV NP domino 32-bit carry chain with the conventional 32-bit carry
chain with respect to speed and Energy Delay Product (EDP).
Results: The proposed ULV domino 32-bit carry chain is simulated for
worst case scenario, when the carry signal ripples through the whole chain.
Both the conventional ULV domino carry chain and the conventional carry
chain are simulated at a supply voltage of 300mV. A transient simulation
is shown in the paper where the output waveforms of all ULV NP domino
carry gates are shown. As expected, the propagation delay of the proposed
carry chain is far lower than the conventional CMOS carry chain. Conven-
tional carry chain has a propagation delay of approximately 321ns whereas
the proposed ULV domino requires only 8.6ns to propagate from 1 to 32 bit.
The simulated response also demonstrates that the logic levels of output
waveforms are very close to the rails, resulting in a robust design with bet-
ter noise margin.
Conclusion: In this paper we have presented a novel 32 bit ULV NP
domino carry chain. The proposed ULV carry gates are used to achieve a
very fast carry computation, offering a superb speed feature, i.e. the delay
compared to the conventional CMOS carry chain is only 2.68% for a supply
voltage of 300mV with almost no degradation in the noise margin. The EDP
for the proposed ULV domino carry chain is approximately 8% relative to
the conventional CMOS carry chain at the same supply voltage. The clock
signal necessitates only a period of 20ns to execute an addition of 32 bits.
ULV NP domino carry gates may be used to save power and area compared
81
to parallel designed adders for ultra low voltage and high speed applica-
tions.
82
Chapter 9
Discussion
9.1 Power consumption in the idle mode
Although ULV NP domino logic gates implemented in this thesis offer sig-
nificant improvement in the speed performance compared to the conven-
tional CMOS logic gates, the proposed gates also consume more power than
the conventional CMOS logic gates. Considering the example of the pro-
posed ULV NP domino 32-bit carry chains. The chains have the enormous
power consumption when no transitions arrive at the input nodes in the
idle mode and the output nodes hold the precharged value.
9.2 Performance of ULV NP domino carry chains
with Pass Transistor Logic
The proposed ULV NP domino Carry 2 gates with PTL is the most efficient
solution in terms of speed, area and power consumption compared to other
proposed ULV NP domino Carry gates when an addition of only 2 bit
is desired. The performances degrade when cascading the proposed NP
domino Carry 2 gates in a long chain. This is due to the inclusion of the
ULV NP domino inverters at the output nodes in order to obtain the specific
carry signal for the further Carry 2 gates in the chain.
Another major challenge while cascading the proposed ULV NP domino
Carry gates with PTL in a 32-bit chain is the degradation in the speed
performance when the chain is simulated for the worst case scenario. For
example if the transitions only arrive at input nodes A instead of input
nodes B for all the NP domino Carry gates in the chain, the contention
current is increased which degrades the speed performance. However ULV
NP domino carry chains with PTL offer better robustness performance.
ULV NP domino carry chains with PTL offer minimum power consumption
than other proposed ULV domino carry chains in the idle state. New
hybrid solutions are proposed to increase the speed performance for the 32-
bit ULV NP domino carry chains with PTL by suppressing the contention
current. However this degrades the robustness performance and increases
the power consumption due to the serially connected transistors.
83
9.3 Leakage at the output nodes
One other main challenge while cascading the proposed ULV NP domino
logic gates in long chains is the leakage problem at the output nodes. If the
evaluation transistors labeled P and N are not strong enough to hold the
precharged value, the output nodes may have the false transitions before
the correct transitions arrive at the input nodes. However, increasing
the strength of these transistors suppress the leakage problem at the
cost of degradation in the speed performance for the proposed ULV NP
domino gates. Multi-Threshold CMOS (MTCMOS), Variable-Threshold
CMOS (VTCMOS) and carry gates with PTL are utilized in the 32-bit ULV
domino carry chain to suppress the leakage problem in the worst case
scenario. This offers better robustness performance and minimizes the
power consumption but degrades the speed performance.
The simulation results conclude that the proposed ULV NP domino
gates are suitable for high speed applications at ultra low supply voltages.
However the proposed gates may not offer the best solution for low power
applications.
84
Chapter 10
Conclusion
10.1 Summary of the contributions
This section summarizes the results of different NP domino ULV logic gates
proposed throughout this master project.
In Chapter 4, the working operation of ULV domino logic is described
in detail. Further novel ULV NP domino inverters are proposed and the
simulation results are compared with the conventional CMOS inverter with
respect to speed and robustness performance. The simulated circuits are
targeted to a supply voltage of 300mV. The simulation results conclude
that the proposed ULV NP domino inverters degrade the robustness
performance, however the delay, Power Delay Product (PDP) and Energy
Delay Product (EDP) is improved significantly compared to conventional
CMOS inverter. New approaches like pseudo ULV NP domino and ULV
NP domino inverters with keeper are introduced to enhance the robustness
performance for the proposed ULV NP domino inverters. These new
approaches improve the performance for the ULV NP domino inverters in
all aspects.
ULV NAND and NOR ULV logic gates are presented in Chapter 5 by
utilizing NP domino floating gate transistors. Various novel configurations
for ULV NP domino logic gates are proposed by using the conventional style
and pass transistor logic (PTL) style. The proposed gates are compared
with the conventional NAND and NOR logic gates with respect to speed
performance. The simulation results have concluded that ULV NP domino
gates with PTL offers better speed performance than other configurations
due to the reduced number of transistors.
Various novel configurations of ULV NP domino Carry gates are
implemented in Chapter 6 in order to enhance the speed performance of
the 1-bit full adder. The performance for the proposed ULV NP domino
Carry gates are compared with the conventional CMOS carry gate with
respect to speed, power, robustness, PDP and EDP. The simulation results
are characterized as the supply voltages varies between 100mV and 400mV.
The simulation results conclude that the proposed ULV NP domino Carry
gates offer a significant improvement in terms of speed, robustness, PDP
and EDP relative to the conventional CMOS Carry gate.
Montecarlo simulations are also included with process and mismatch
variations at different supply voltage in the range between 100mV and
85
400mV which showed better robustness for the proposed ULV NP domino
Carry gates. The performance with montecarlo simulations for the domino
Carry gates is not compared with the conventional CMOS Carry gate as
the conventional CMOS carry gate is unable to operate at ultra low supply
voltages with high operating frequencies.
Various configurations of 32-bit carry circuits are utilized in Chapter 7
with the help of proposed ULV NP domino Carry gates. The proposed ULV
NP domino 32-bit carry circuits are compared with the conventional 32-bit
carry circuit with respect to speed, power, robustness, PDP and EDP at a
supply voltage of 300mV.
The simulation results conclude that the fastest 32-bit carry chain
utilizes the propagation delay of only 8.2ns for the carry input signal to
propagate through 32-bits full adder. However the chain also consumes
a power of 687nW when no transitions occur at the input nodes and the
output nodes hold the precharged value. On the other hand, the least power
consumption is obtained by ULV NP domino 32-bit carry chain with PTL.
The average propagation delay is 44ns in the worst case scenario for the low
power 32-bit ULV domino carry chain. However it offers the least power
consumption of only 4.85nW when no transitions arrive at the input nodes.
Both high speed and low power 32-bit ULV domino carry chains are
simulated at the worst case scenario as the supply voltage varies between
225mV and 375mV and the performance is compared to conventional 32-
bit carry chain with respect to speed, power and EDP. The high speed ULV
NP domino carry chain offers the least relative delay and EDP at a supply
voltage of 275mV.
10.2 Innovation throughout the project
There are several new innovations presented in this thesis. All the various
approaches of ULV NP domino logic gates presented in Chapter 5, 6 and
7 are founded by the author of this thesis. The ULV NP domino Carry
gates/chains presented in Chapter 6 and 7 are published/submitted to
various conferences/journal. Some of the key logic gates utilizing NP
domino floating gate transistors in this thesis are listed below:
• Various implementations of ULV NP domino NAND and NOR gates
• Various implementations of ULV NP domino Carry gates
• Various implementations of ULV NP domino 32-bit carry chains
10.3 Further work
This thesis has mostly concerned the speed improvement in the digital logic
circuits by exploiting the floating gate techniques when the supply voltage
scales down. Still, the potential for the further work is immense and some
of the major interesting research topics are listed below:
• Different body biasing schemes are utilized in order to scale down the
threshold voltage of the transistors. However Forward Body Biasing
86
and floating bulk terminals of nMOS transistors can be a challenging
task in the layout stage. The transistors with various body biasing
schemes should be designed at the layout stage to verify the area
consumption across different processes for the proposed ULV NP
domino logic gates.
• Multi-Threshold CMOS (MTCMOS) and Variable-Threshold (VTC-
MOS) techniques are applied to enhance the speed and to suppress
the leakage power for the proposed ULV NP domino circuits. Layout
should be implemented to discover the major challenges while utiliz-
ing these techniques and how these challenges impact on the perfor-
mance of the circuits.
• The fastest 32-bit carry circuit consumes an amount of power that
may be high for low power applications in the idle mode when
no transitions arrive at the input nodes. New solutions should
discover to suppress the power consumption in the idle mode without
degrading the speed performance or increasing the complexity of the
circuits.
• Other digital logic circuits, for example the summation circuit for
the full adder can be constructed by utilizing the proposed ULV NP
domino logic style. The full adders may be further used to implement
CMOS multiplier functions utilizing the same logic style.
• All the simulation results shown throughout this thesis are only done
at the schematic level. To obtain more realistic simulations, the
physical layout parameters, for example the parasitic effects and
the interconnect capacitance must be taken into consideration. The
optimal goal would be to design the layout of all the proposed ULV NP
domino circuits with various techniques and verify their performance
with different aspects. A chip could be constructed for the ULV NP
domino circuit which offers the best performance with respect to
speed, power and area consumption. This was not in the scope of
the master project due to the limited project time.
87
88
Appendix A
Truth Tables
A B A B NAND AND NOR OR
0 0 1 1 1 0 1 0
0 1 1 0 1 0 0 1
1 0 0 1 1 0 0 1
1 1 0 0 0 1 0 1
Table A.1: Truth table of main logical functions
A B Sum Car r y_out
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
Table A.2: Truth table: Half Adder
C_in A B Sum Car r y_out
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Table A.3: Truth table: Full Adder
89
90
Appendix B
Publications
Paper I
High Speed and Ultra Low-voltage CMOS
Domino Carry gates
Published at 12th International Conference on Electronics, Hardware,
Wireless and Optical Communications (EHAC’13), pp. 52-57, Cambridge,
UK, February 20-22, 2013.
91
High Speed and Ultra Low-voltage CMOS Domino Carry gates
Sohail Musa Mahmood
University of Oslo
Department of Informatics
Oslo
Norway
sohailmm@ifi.uio.no
Yngvar Berg
University of Oslo
Department of Informatics
Oslo
Norway
yngvarb@ifi.uio.no
Abstract: In this paper we present ultra low-voltage and high speed CMOS domino Carry gates. For supply volt-
ages below 325mV the delay for the proposed ultra low-voltage Carry gates are approximately 5% relative to a
complementary CMOS Carry gate. Furthermore, the Energy Delay Product is less than 1% relative to comple-
mentary CMOS Carry gate at the same supply voltage. Different domino Carry gates are presented using pass
transistor logic. The proposed ultra-low-voltage domino carry gates are going to be used in low-voltage and high
speed adders.
Key–Words: Low-Voltage, High-Speed, Carry gate, Domino Logic, Precharge, CMOS.
I Introduction
In recent years, the power problem has emerged as
one of the fundamental limits facing the future of
CMOS integrated circuit design. The aggressive scal-
ing of device dimensions to achieve greater transis-
tor density and circuit speed results in substantial
subthreshold and gate oxide tunneling leakage cur-
rents. Energy-efficiency is one of the most required
features for modern electronic systems designed for
high-performance and/or portable applications. In one
hand, the ever increasing market segment of portable
electronic devices demands the availability of low-
power building blocks that enable the implementation
of long-lasting battery-operated systems. On the other
hand, the general trend of increasing operating fre-
quencies and circuit complexity, in order to cope with
the throughput needed in modern high-performance
processing applications, requires the design of very
high-speed circuits.
Depending upon the application, there are numer-
ous methods that can be used to reduce the power
consumption of VLSI circuits, these can range from
low-level measures based upon fundamental physics,
such as using a lower power supply voltage or us-
ing high threshold voltage transistors; to high-level
measures such as clock-gating or power-down modes.
The power consumption in digital circuits, which
mostly use complementary metal-oxide semiconduc-
tor (CMOS) devices, is proportional to the square of
the power supply voltage; therefore, voltage scaling
is one of the important methods used to reduce power
consumption. To achieve a high transistor drive cur-
rent and thereby improve the circuit performance, the
transistor threshold voltage must be scaled down in
proportion to the supply voltage. However, scaling
down of the transistor threshold voltage Vt results in
significant increase in the subthreshold leakage cur-
rent.
Figure 1[12] shows a four bit full adder. Four
full adders are cascaded in a chain, each of them has
its Cout connected to Cin of the following one. The
Carry signal propagates through the whole chain.
The Full adder performs in the propagation mode
when the input signals X 6= Y which makes Cout =
Cin. The overall worst case delay is obtained when
all the Full Adders operate in the propagation mode in
a chain, and the carry signal has to propagate from
the first to the last full adder in the chain. Thus Carry
propagation path is the most critical path when an ad-
dition of more than two bits is desired, which makes
it a speed limiting factor for many high speed applica-
tions.
FA0 FA1 FA3FA2Cin C1 C2 CoutC3
S[0] S[1] S[2] S[3]
X[0] Y[0] X[1] Y[1] X[2] Y[2] X[3] Y[3]
Critical path
Figure 1: Four Bits Full Adder.
Floating-Gate (FG) gates have been proposed
for Ultra-Low-Voltage (ULV) and Low-Power (LP)
92
logic [3]. However, in modern CMOS technologies
there are significant gate leakages which undermine
non-volatile FG circuits. FG gates implemented in
a modern CMOS process require frequent initializa-
tion to avoid significant leakage. By using floating
capacitances, either poly-poly, MOS or metal-metal,
to the transistor gate terminals, the semi-floating-gate
(SFG) nodes can have a different DC level than pro-
vided by the supply voltage headroom [3]. There are
several approaches to FG CMOS logic [4, 5]. The
gates proposed in this paper are influenced by ULV
non-volatile FG circuits [6].
In this paper, we are focused on implementing
Ultra-Low-Voltage (ULV) and high speed NP Domino
carry gates. In Section II, an extended description of
the NP Domino ULV inverter [8] is given. Conven-
tional ULV carry gates are presented in Section III. In
Section IV, different implementations of ULV carry
gates are presented using pass transistor logic [10].
Simulation results are given in Section V and a con-
clusion is given in Section VI.
II High speed and Ultra-low-voltage
Domino and semi-floating-gate NP
domino Inverter
The ULV logic carry gates presented in this paper
are related to the ULV domino logic style presented
in [8], [9]. The main purpose of the ULV logic style
is to increase the current level for low supply volt-
ages without increasing the transistor widths. We may
increase the current level compared to complemen-
tary CMOS using different initialization voltages to
the gates and applying capacitive inputs. The extra
load represented by the floating capacitors are less
than extra load given by increased transistor widths.
The capacitive inputs lower the delay through in-
creased transconductance while increased transistor
widths only reduce parasitic delay.
The High speed and ULV domino inverter repre-
sented in [8] is shown in Figure 2. The clock signals φ
and φ are used both as control signals for the recharge
transistors RP1 and RN1, and as reference signals for
nMOS evaluation transistor EN1. When φ switches
from 1 to 0, the circuit is in precharge/recharge
phase. During the precharge phase, RP1 turns on and
recharges the gate of EN1 to 1. Meanwhile φ switches
from 0 to 1 which turns on RN1 and recharges the
gate of pMOS transistor P1 to 0. Thus both EN1 and
P1 turn on in the precharge phase and precharge the
output node Vout to Vdd. Figure 2a is describing the
precharge mode of this circuit. The gray shaded lines
indicate the components which are not operating in
the precharge mode.
In the evaluation phase, clock signals φ and φ
switch from 0 to 1 and 1 to 0 respectively. Both
recharge transistors RP1 and RN1 switch off which
make the charge on nodes Vp and Vn to be floating
as indicated by the gray shadow lines shown in Figure
2b. The output node Vout floats as well until we get
a transition on the input node. The input signal Vin
must be monotonically rising to ensure the correct op-
eration for the N type type domino inverter. This can
only be satisfied if
• input signal Vin is low at the beginning of the
evaluation phase, and
• Vin only makes a single transition from 0 to 1 in
the evaluation phase.
ф
_ф
Vin
Vout
EN1
P1
RN1
RP1
VP
VN
_ф
(a) Precharge Mode.
ф
_ф
Vin
Vout
EN1
P1
RN1
RP1
VP
VN
_ф
(b) Evaluate Mode.
Figure 2: NP domino inverter in a) precharge phase
and b) evaluate phase.
III Ultra-low-voltage and semi-
floating-gate NP domino Carry
circuit
Different NP domino logical gates implemented in [9]
operate at a very low supply voltage,and result in a
really fast switching speed. Thus the carry function
circuit can be implemented using the same logic style
which can increase the propagation speed of the carry
bit in a serial chain of cascaded full adders shown in
Figure 1. Cout logic function of a Full Adder is shown
in Equation 1 which concludes that Cout generates an
AND functionality for the adding bits A and B as far
93
as Cin is 0. With the arrive of Cin bit, Cout generates
an OR functionality for A and B.
Cout = A ·B + Cin · (A+B) (1)
_ф
ф
EN2
RP2 ф
_ф
EN4
P1
RN1
RP4 фRP5
EN5
_ф
ф
EN3
RP3ф
EN1
RP1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
Kp
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
A0
B0
0Cin
A0 B0
1Cout
_
(a) Precharge to 1 (N type).
_ф
Ep2
ф
_ф
ф
Ep1
_ф
Ep5
_ф
Ep4
ф
ф
N1
_ф
Ep3
KN
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
1 1
1
0
0
0
1
0
1
1
0
0
1
1
0
0
1
0
1 0
0
Cin
_
Cout
RN1
RN2RN3
RN4 RN5
0
_
Cout
1Cin A1
B1A1 B1
RP1
(b) Precharge to 0 (P type).
Figure 3: ULV domino Carry Gates I (CARRY1).
Two ULV domino Carry gates are shown in Fig-
ure 3. The output node Cout in Figure 3a and 3b is
precharged to 1 and 0 respectively. The CARRY gate
has been implemented by combining ULV domino
Nand and Nor gates implemented in [9] together
with a control signal Cin. Cin ascertains whether the
output node gives a Nand or Nor functionality for
the input bits A and B. A desired ULV domino in-
verter implemented in [8] should be connected at the
output node of the implemented circuit to obtain Cout.
This means that the output node Cout of a N type ULV
domino carry gate should be connected to a P type
ULV domino inverter to obtain a desired Cout.
In order to retain the precharged value for the im-
plemented Carry gates until the desired input bits ar-
rive, the evaluation transistors P1 to VDD, or N1 to
GND should be made stronger than the other eval-
uation transistors. By applying an additional pMOS
transistor KP and nMOS transistor KN in Figure 3a
and 3b respectively, the gate of the evaluation tran-
sistors P1 and N1 will be pulled to VDD and GND
respectively when the output node Cout gets a transi-
tion in the evaluation phase which turns on the keeper
transistors. This partially turns off the evaluation tran-
sistors P1 and N1 and let the output node Cout swings
fully to VDD and GND respectively. This helps to
increase the noise margin and the robustness of the
dynamic output node.
VDD CARRY 1CMOS Deviation Comment
300mV 54ps 4.72ns 4.2mV Precharge to 1
300mV 153ps 3.7ns 7.6mV Precharge to 0
300mV 103.5ps 4.21ns 11.8mV Average
Table I: The speed and deviation for the Carry gate I
compared to static CMOS Carry gate.
IV ULV NP domino Carry circuit us-
ing PTL
The same logic function can be obtained by using
fewer number of transistors with the help of Pass
Transistor logic (PTL) as compared to the conven-
tional style, which reduces the overall delay of the
system and saves the area on the chip. Circuits im-
plemented in Figure 4 shows ULV NP domino Carry
gates with the help of PTL. As compared to the carry
gate implemented in Figure 3, the total number of
evaluation transistors labelled E have been reduced
from 5 to 3. The carry input bit needs only to pass
through a single evaluation transistor before reaching
the output node.
94
фEN2
RP2 фRP3
EN3
ф
EN1
RP1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
1
_
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
B1
_
_ф
P1
RN1
Kp
Cin
A0
A0 B0
1Cout
_
B1
_
1
_
Cin
(a) Precharge to 1 (N type).
_ф
Ep2
_ф
Ep1
_ф
Ep3
B_
A
ф
N1
KN
RP1
A1 B1
0
_
Cin
1
0
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
1
0
1
1
0
0
1
1
0
Cin Cout
_
0
_
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
B0
_
Cin
0
_
Cout
RN1 RN2
RN3
(b) Precharge to 0 (P type).
Figure 4: ULV domino Carry Gates using PTL II
(CARRY2).
In Figure 4, the evaluation transistors labelled
E can be described as pass transistors with an in-
creased current level. If we consider the circuit in
Figure 4a. As far as Cin is low, the output node 1Cout
only switches from 1 to 0 when the evaluate transistor
EN1 acts as a pass transistor for the input signal 1B
when 1B switches from 1 to 0 and the other input 0A
switches from 0 to 1 in the evaluation phase. When
the Cin bit becomes high, only one of the two evalu-
ation pass transistors EN2 or EN3 needs to turn on to
pull the output node 1Cout from 1 to 0. This implies
that EN2 or EN3 acts as pass transistor for input 1Cin
when 1Cin switches from 1 to 0 and at least one of the
two other inputs 0A or 0B switches from 0 to 1.
VDD CARRY 2CMOS Deviation Comment
300mV 28.5ps 4.72ns 4.2mV Precharge to 1
300mV 261ps 3.7ns 7.6mV Precharge to 0
300mV 144.75ps 4.21ns 11.8mV Average
Table II: The speed and deviation for the Carry gate
II compared to static CMOS Carry gate.
Another alternative solution of ULV domino NP
Carry gate using PTL is implemented in Figure 5. N
type Carry gate in Figure 5a resembles the Carry gate
implemented in Figure 4a. Both circuits perform en-
tirely in the same sense as far as 0Cin is logically 0.
When 0Cin switches from 0 to 1 in the evaluation
phase, both parallel connected evaluation transistors
EN2 or EN3 turn on and act as pass transistors for the
inputs 1A and 1B. Under this instance, only one of
the two inputs 1A or 1B requires to switch from 1 to
0 to pull the output node 1Cout to 0.
VDD CARRY 3CMOS Deviation Comment
300mV 99.4ps 4.72ns 10.65mV Precharge to 1
300mV 411ps 3.7ns 30.5mV Precharge to 0
300mV 255.2ps 4.21ns 20.5mV Average
Table III: The speed and deviation for the Carry gate
III compared to static CMOS Carry gate.
V Simulated Response
The data simulated is based on a 90nm TSMC CMOS
process and the load applied is an identical gate for
each logic style. The simulated response of different
ULV carry gate circuits implemented in Figure 3, 4
and 5 are shown in Table I, II and III respectively. The
implemented carry gates are directly target to a supply
voltage of 300mV . The presented gates are compared
to static CMOS carry gate at the same supply voltage.
The Tables are showing both the delay and noise mar-
gin, for the implemented ULV carry gates compared
to static CMOS Carry gate.
Table IV is showing a summary of the average
relative delay of the presented ULV carry gates as
compared to a static CMOS carry gates at a supp-
ply voltage of 300mV . CARRY1 is the fastest carry
gate at a supply voltage of 300mV . Relative to the
static CMOS carry gate in[11], the delay of the im-
plemented ULV carry gates are between 2.45% and
95
фEN2
RP2 фRP3
EN3
ф
EN1
RP1
_ф
P1
RN1
Kp
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
A1
_
1
0
0
1
1
0
0
1
1
1
0
0
1
1
0
0
B1
_
0Cin 0Cin
1Cout
_
A0
B1
_
B1
_
A1
_
(a) Precharge to 1 (N type).
_ф
Ep2
_ф
Ep1
_ф
Ep3
A
ф
N1
KN
RP1
1 1
0
_
A
1
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
1
0
1
1
0
0
1
1
0
Cin Cout
_
1
0
0
1
1
0
0
1
1
1
0
0
1
1
0
0
B0
_
0
_
Cout
Cin Cin
RN1 RN2 RN3 0
_
B
0
_
B
A0
_
(b) Precharge to 0 (P type).
Figure 5: ULV domino Carry Gates using PTL III
(CARRY3).
CARRY 1CARRY 2CARRY 3
RelativeDelay(%) 2.45% 3.4% 6.06%
Table IV: The relative delay of ULV carry gates at a
supply voltage of 300mV as compared to static CMOS
carry gate.
6.06%. CARRY3 is slowest and less preferable due
to low noise margin, as both parallel connected evlu-
ation transistors EN2 or EN3 turn on in the worst case
scenerio, while only one of the two inputs 1A or 1B
switches from 1 to 0. Thus both 0 and 1 is being trans-
ferred through the evaluation pass transistors, which
makes the output transition slow and gives poor noise
margin. However, it offers a more efficient solution in
terms of area.
Plots in Figure 6 and 7 show the delay and Energy
Delay Product (EDP) for the proposed carry gates rel-
ative to static CMOS carry gate presented in[11] re-
spectively. The delay is lesser than 5% as compared
to static CMOS carry gate for CARRY1 and CARRY2
for the supply voltages less than 325mV as shown in
Figure 6. The CARRY2 gate gives the least relative
delay of about 2% relative to static CMOS carry gate
at the supply voltage of 250mV .
Average EDP of implemented ULV carry gates
relative to static CMOS carry gate is shown in Fig-
ure 7. The simulation results conclude that the EDP is
less than 1% for all presented ULV carry gates for the
the supply voltage less than 325mV relative to static
CMOS carry gate. CARRY2 gives the least average
EDP relative to static CMOS carry gate with a value
close to 0.02% at a supply voltage of 250mV .
200 250 300 350
−5
0
5
10
15
20
VDD[mV]
R
el
at
iv
e 
D
el
ay
[%
]
Delay of ULV Carry gates relative to static CMOS carry gate
 
 
Carry1 Carry2 Carry3
Figure 6: Delay of ULV carry gates relative to static
CMOS carry gate.
96
220 240 260 280 300 320 340
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VDD[mV]
R
el
at
iv
e 
E
D
P
[%
]
Average EDP of ULV carry gates relative to static CMOS gate
Carry1
Carry2
Carry3
Figure 7: Average EDP of ULV carry gates relative to
static CMOS carry gate.
VI Conclusion
Different ultra low-voltage NP domino Carry gates
have been presented in this paper. The ULV domino
carry gates are high speed, i.e. the delay compared to
a static CMOS carry gate is less than 5% for a supply
voltage equal to 325mV . The energy delay product of
the proposed ULV carry gates is less than 1% relative
to the static CMOS carry gate when the circuits oper-
ate at a supply voltage below the threshold voltage of
the transistors. The ULV carry gates can be used to
design high speed and low voltage Full adders with-
out applying parallel design which reduces both the
power and the area.
References:
[1] Chandrakasan A.P. Sheng S. Brodersen R.W.:
“Low-power CMOS digital design” , IEEE Jour-
nal of Solid-State Circuits, Volume 27, Issue 4,
April 1992 Page(s):473 - 484
[2] Verma N. Kwong J. Chandrakasan A.P.:
“Nanometer MOSFET Variation in Mini-
mum Energy Subthreshold Circuits” , IEEE
Transactions on Electron Devices, Vol. 55, NO.
1, January 2008 Page(s):163 - 174
[3] Y. Berg, D. T. Wisland and T. S. Lande: “Ultra
Low-Voltage/Low-Power Digital Floating-Gate
Circuits”, IEEE Transactions on Circuits and Sys-
tems, vol. 46, No. 7, pp. 930–936,july 1999.
[4] K. Kotani, T. Shibata, M. Imai and T. Ohmi.
“Clocked-Neuron-MOS Logic Circuits Em-
ploying Auto-Threshold-Adjustment”, In IEEE
International Solid-State Circuits Conference
(ISSCC), pp. 320-321,388, 1995.
[5] T. Shibata and T. Ohmi. “ A Functional MOS
Transistor Featuring Gate-Level Weighted Sum
and Threshold Operations”, In IEEE Transactions
on Electron Devices, vol 39, 1992.
[6] Y. Berg, Tor S. Lande and Ø. Næss. “Program-
ming Floating-Gate Circuits with UV-Activated
Conductances”, IEEE Transactions on Circuits
and Systems -II: Analog and Digital Signal Pro-
cessing, vol 48, no. 1,pp 12-19, 2001.
[7] Y. Berg “Novel Ultra Low-Voltage and High
Speed Domino CMOS Logic”, In proc.
IEEE/IFIP International Conference on VLSI
and system-on-Chip (VLSI-SoC), Madrid 27-29
september 2010.
[8] Y. Berg and O.Mirmotahari“Ultra Low-Voltage
and High Speed Dynamic and Static Precharge
Logic”, In proc. of the 11th Edition of IEEE
Faible Tension Faible Consommation, June 6-8,
2012, Paris, France.
[9] Y. Berg and O.Mirmotahari. “Novel High-Speed
and Ultra-Low-Voltage CMOS NAND and NOR
Domino Gates”, In proc. of the 5th international
Conference on Advances in Circuits, Electron-
ics and Micro-electronics, August 19-24, 2012,
Rome, Italy.
[10] Y. Berg and M.Azadmehr. “Novel Ultra Low-
Voltage and High-Speed CMOS Pass Transistor
Logic”, In proc. of the 11th Edition of IEEE
Faible Tension Faible Consommation, June 6-8,
2012, Paris, France.
[11] Neil H.E. and David Harris. “CMOS VLSI DE-
SIGN, A circuit and Systems Perspective”, Third
edition, Addison Wesley 2005, p 640.
[12] M.Alioto and G.Palumbo. “Very High-
Speed Carry Computation based on Mixed
Dynamic/Transmission-Gate Full Adders”, In
proc. of the 18th European Conferance on Circuit
Theory and Design, August 27-30, 2007, Sevilla,
Spain.
[13] Y. Berg “Ultra Low Voltage Static Carry Gener-
ate Circuit”, In Proc. IEEE International Sympo-
sium on Circuits and Systems (ISCAS), Paris, may
2010.
[14] Y. Berg: “Static Ultra Low Voltage CMOS
Logic”, In Proc. IEEE NORCHIP Conference,
Trondheim, NORWAY, november 2009.
97
Paper II
Static NP Domino Carry gates for Ultra Low
Voltage and High Speed Full Adders
Submitted to International Journal of Circuits, Systems and Signal
Processing: North Atlantic University Union (NAUN).
98
1Static NP Domino Carry gates for Ultra Low
Voltage and High Speed Full Adders
Sohail Musa Mahmood and Yngvar Berg
Abstract— In this paper we present different configura-
tions of static ULV NP domino carry gates using precharge and
pass transistor logic. The proposed ULV domino carry gates
are aimed for high speed serial adders in ultra low-voltage
applications. In terms of frequency, speed, PDP and EDP, the
ULV carry gates offers significant improvement compared to
conventional CMOS carry gate. At Minimum Energy Point at
250mV , the proposed carry gates have less than 5% of the
delay than the conventional CMOS Carry gate. Furthermore,
the Power and Energy Delay Product is less than 23% and 1%
respectively relative to conventional CMOS Carry gate at the
same supply voltage. The simulated data presented is obtained
using a 90nm TSMC CMOS process.
Index Terms— Low-Voltage, High-Speed, Carry gate, NP
Domino Logic, Precharge, CMOS, Digital, Pass Transistor
Logic.
I. INTRODUCTION
In recent years, the power problem has emerged as
one of the fundamental limits facing the future of CMOS
integrated circuit design. The aggressive scaling of device
dimensions to achieve greater transistor density and circuit
speed results in substantial sub-threshold and gate oxide
tunnelling leakage currents. Energy-efficiency is one of
the most required features for modern electronic systems
designed for high-performance and/or portable applications.
In one hand, the ever increasing market segment of portable
electronic devices demands the availability of low-power
building blocks that enable the implementation of long-
lasting battery-operated systems. On the other hand, the
general trend of increasing operating frequencies and circuit
complexity, in order to cope with the throughput needed in
modern high-performance processing applications, requires
the design of very high-speed circuits.
Depending upon the application, there are numerous
methods that can be used to reduce the power consumption
of VLSI circuits, these can range from low-level mea-
sures based upon fundamental physics, such as using a
lower power supply voltage or using high threshold voltage
transistors; to high-level measures such as clock-gating or
power-down modes. The power consumption in digital cir-
cuits, which mostly use complementary metal-oxide semi-
conductor (CMOS) devices, is proportional to the square
of the power supply voltage[1]; therefore, voltage scaling
is one of the important methods used to reduce power
consumption. To achieve a high transistor drive current
Sohail Musa Mahmood is with the Department of Informatics, Univer-
sity of Oslo, Norway.
Yngvar Berg is with the Institute of MicroSystems Technology, Vestfold
University College, Horten, Norway.
FA0 FA1 FA3FA2Cin C1 C2 CoutC3
S[0] S[1] S[2] S[3]
X[0] Y[0] X[1] Y[1] X[2] Y[2] X[3] Y[3]
Critical path
Fig. 1: Four Bits Full Adder.
and thereby improve the circuit performance, the transistor
threshold voltage must be scaled down in proportion to
the supply voltage. However, scaling down of the transistor
threshold voltage Vt results in significant increase in the
sub-threshold leakage current.
Figure 1[2] shows a four bit full adder. Four full adders
are cascaded in a chain, each of them has its Cout connected
to Cin of the following one. The Carry signal propagates
through the whole chain. The Full adder performs in the
propagation mode when the input signals X 6= Y which
makes Cout = Cin. The overall worst case delay is obtained
when all the Full Adders operate in the propagation mode in
a chain, and the carry signal has to propagate from the first
to the last full adder in the chain. Thus Carry propagation
path is the most critical path when an addition of more
than two bits is desired, which makes it a speed limiting
factor for many high speed applications. By using complex
carry look ahead techniques or applying parallel structures,
the delay can be reduced compared to a simple serial adder
shown in Figure 1 at the cost of increased complexity, power
consumption and chip area.[3]
Floating-Gate (FG) gates have been proposed for Ultra-
Low-Voltage (ULV) and Low-Power (LP) logic [4]. How-
ever, in modern CMOS technologies there are significant
gate leakages which undermine non-volatile FG circuits.
FG gates implemented in a modern CMOS process require
frequent initialization to avoid significant leakage. By using
floating capacitances, either poly-poly, MOS or metal-metal,
to the transistor gate terminals, the semi-floating-gate (SFG)
nodes can have a different DC level than provided by the
supply voltage headroom [4]. There are several approaches
for both analog and digital applications using FG CMOS
logic proposed in [5], [6], [7], [8]. The gates proposed
in this paper are influenced by ULV non-volatile FG cir-
cuits [9].
In this paper, we are focused on implementing Ultra-
99
2ф
_ф
Vin
Vout
EN1
P1
RN1
RP1
VP
VN
_ф
(a) Precharge Phase.
ф
_ф
Vin
Vout
EN1
P1
RN1
RP1
VP
VN
_ф
(b) Evaluate Phase.
Fig. 2: NP domino inverter in a) precharge phase and b)
evaluate phase.
Low-Voltage (ULV) and high speed NP Domino carry
gates. In Section II, an extended description of the NP
Domino ULV inverter [10] is given. Conventional ULV
carry gates are presented in Section III. In Section IV,
different implementations of ULV carry gates are presented
using pass transistor logic [13]. Simulation results are given
in Section V and a conclusion is given in Section VI.
II. HIGH SPEED AND ULTRA-LOW-VOLTAGE
FLOATING-GATE NP DOMINO INVERTER
The ULV logic carry gates presented in this paper
are related to the ULV domino logic style presented
in [10], [11], [12]. The main purpose of the ULV logic
style is to increase the current level for low supply voltages
without increasing the transistor widths. We may increase
the current level compared to complementary CMOS using
different initialization voltages to the gates and applying
capacitive inputs. The extra load represented by the floating
capacitors are less than extra load given by increased
transistor widths. The capacitive inputs lower the delay
through increased transconductance while increased transis-
tor widths only reduce parasitic delay. The proposed logic
style may be used in critical high speed and low voltage
sub circuits together with conventional CMOS logic.
The High speed and ULV N domino inverter repre-
sented in [10] is shown in Figure 2. The clock signals φ and
φ are used both as control signals for the recharge transistors
RP1 and RN1, and as reference signals for nMOS evaluation
transistor EN1. The recharge and the evaluation phase of the
proposed logic style is characterized below:
A. Precharge/Recharge phase
When φ switches from 1 to 0, the circuit is in
precharge/recharge phase. During this phase, RP1 turns on
and recharges the gate of EN1 to 1. Meanwhile φ switches
from 0 to 1 which turns on RN1 and recharges the gate of
pMOS transistor P1 to 0. Thus both EN1 and P1 turn on
in the precharge phase and precharge the output node Vout
to Vdd. Figure 2a shows the precharge mode of this circuit.
The gray shaded lines indicate the components which are
not operating in the precharge mode.
B. Evaluation phase
In the evaluation phase, clock signals φ and φ switch
from 0 to 1 and 1 to 0 respectively. Both recharge transistors
RP1 and RN1 switch off which make the charge on nodes
Vp and Vn to be floating as indicated by the gray shadow
lines shown in Figure 2b. The output node Vout floats as well
until an input transition occurs. The input signal Vin must
be monotonically rising to ensure the correct operation for
the N domino inverter. This can only be satisfied if
• input signal Vin is low at the beginning of the evalua-
tion phase, and
• Vin only makes a single transition from 0 to 1 in the
evaluation phase.
As Vin makes a positive transition, the capacitance at the
gate of EN1 charges and discharges. The charge at node VN
can be estimated by using Equation (1). We assume that
the initial charge at the node is Vdd, Vin is charged upto
Vdd as well. The capacitive division will be 12 if Cin and
Cparasitic assume to be equal. This makes the voltage at the
gate terminal VN 1.5 × higher than the voltage supplied by
the supply voltage Vdd[4]. Thus evaluation transistor EN1
strongly biased which increases the current level of the
transistor. Thus Pull Down Network (PDN) becomes much
stronger than PUN and discharges the output node Vout to
0.
VN = Vinit + ∆Vin ∗ Cin
Cin + Cparasitic
(1)
III. ULTRA-LOW-VOLTAGE AND
SEMI-FLOATING-GATE NP DOMINO CARRY CIRCUIT
Different NP domino logical gates are presented in [11]
which operate in the sub-threshold regime and result in
a really fast switching speed. The CARRY circuit can be
implemented using the same logic style which can increase
the propagation speed of the carry bit in a serial chain of
cascaded full adders shown in Figure 1. Cout logic function
of a Full Adder is shown in Equation 2 which concludes
that Cout generates an AND functionality for the adding bits
A and B as far as Cin is 0. With the arrive of Cin bit, Cout
generates an OR functionality for A and B.
Cout = A ·B + Cin · (A+B) (2)
Two ULV domino Carry gates are shown in Figure 3.
The output node Cout in Figure 3a and 3b is precharged
to 1 and 0 respectively. The CARRY gate has been imple-
mented by combining ULV domino Nand and Nor gates
implemented in [11] together with a control signal Cin. Cin
100
3_ф
ф
EN2
RP2 ф
_ф
EN4
P1
RN1
RP4 фRP5
EN5
_ф
ф
EN3
RP3ф
EN1
RP1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
A0
B0
0Cin
A0 B0
1Cout
_
KP
(a) Precharge to 1 (N type).
Ep2
ф ф
Ep1
Ep5
Ep4
ф
ф
N1
Ep3
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
1 1
1
0
0
0
1
0
1
1
0
0
1
1
0
0
1
0
1 0
0
Cin
_
Cout
RN1
RN2RN3
RN4 RN5
0
_
Cout
1Cin A1
B1A1 B1
RP1
KN
(b) Precharge to 0 (P type).
Fig. 3: ULV domino Carry Gates I (CARRY1).
ascertains whether the output node gives a Nand or Nor
functionality for the input bits A and B. A desired ULV
domino inverter implemented in [10] should be connected
at the output node of the implemented circuit to obtain Cout.
This means that the output node Cout of a N type ULV
domino carry gate should be connected to a P type ULV
domino inverter to obtain a desired Cout.
In order to retain the precharged value for the imple-
mented Carry gates until the desired input bits arrive, the
evaluation transistors P1 to VDD, or N1 to GND should
be made stronger than the other evaluation transistors. By
applying an additional pMOS transistor KP and nMOS
transistor KN in Figure 3a and 3b respectively, the gate
of the evaluation transistors P1 and N1 will be pulled to
VDD and GND respectively when the output node Cout
gets a transition in the evaluation phase which turns on
the keeper transistors. This partially turns off the evaluation
transistors P1 and N1 and let the output node Cout swings
fully to VDD and GND respectively. This helps to reduce
the static current which matches the OFF current Ioff in
the conventional CMOS inverter. The Noise Margin NM
is defined in Equation 3.
NM =
Ion
Ioff
(3)
Thus, by adding keepers, improves both the noise margin
and the power consumption of the proposed circuits.
IV. ULV NP DOMINO CARRY CIRCUIT USING PTL
The same logic function can be obtained by using fewer
number of transistors with the help of Pass Transistor logic
(PTL) as compared to the conventional style, which reduces
the overall delay of the system and saves the area on the
chip. Circuits implemented in Figure 4 shows ULV NP
domino Carry gates with the help of PTL. As compared to
the carry gate implemented in Figure 3, the total number of
evaluation transistors labelled E have been reduced from 5
to 3. The carry input bit needs only to pass through a single
evaluation transistor before reaching the output node.
In Figure 4, the evaluation transistors labelled E can
be described as pass transistors with an increased current
level. If we consider the circuit in Figure 4a. As far as Cin
is low, the output node 1Cout only switches from 1 to 0
when the evaluate transistor EN1 acts as a pass transistor
for the input signal 1B when 1B switches from 1 to 0 and
the other input 0A switches from 0 to 1 in the evaluation
phase. When Cin bit becomes high, only one of the two
evaluation pass transistors EN2 or EN3 needs to turn on to
pull the output node 1Cout from 1 to 0. This implies that
EN2 or EN3 acts as pass transistor for input 1Cin when 1Cin
switches from 1 to 0 and at least one of the two other inputs
0A or 0B switches from 0 to 1.
Another alternative solution of ULV domino NP Carry
gate using PTL is implemented in Figure 5. N type Carry
gate in Figure 5a resembles the Carry gate implemented in
Figure 4a. Both circuits perform entirely in the same sense
as far as 0Cin is logically 0. When 0Cin switches from 0 to 1
in the evaluation phase, both parallel connected evaluation
transistors EN2 or EN3 turn on and act as pass transistors
for the inputs 1A and 1B. Under this instance, only one of
the two inputs 1A or 1B requires to switch from 1 to 0 to
pull the output node 1Cout to 0.
V. SIMULATED RESPONSE
The data simulated is based on a 90nm TSMC CMOS
process. To avoid underestimation of the implemented cir-
cuits and to obtain more realistic waveforms, clock signals
have been made by inserting two symmetric conventional
CMOS inverters between the ideal voltage sources and the
clock signals. In the same way, input signals have been
101
4ф
EN2
RP2 фRP3
EN3
ф
EN1
RP1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
1
_
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
B1
_
_ф
P1
RN1
Cin
A0
A0 B0
1Cout
_
B1
_
1
_
Cin
KP
(a) Precharge to 1 (N type).
Ep2Ep1
Ep3
_
A1 B1
0 Cin
1
0
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
1
0
1
1
0
0
1
1
0
Cin Cout
_
0
1
1
1
1
0
0
0
0
1
1
0
0
1
1
0
0
B0Cin
0 Cout
RN1 RN2
RN3
ф
N1
RP1
KN
(b) Precharge to 0 (P type).
Fig. 4: ULV domino Carry Gates using PTL II (CARRY2).
made by inserting ULV domino inverters implemented in
[10] between the voltage sources and the input nodes. An
identical gate for each logic style is applied as load at the
output nodes of all circuits. The proposed ULV domino
carry gates are simulated for the worst case scenario where
only one of the two input bits are high and the and the
carry signal has to propagate through the full adder.
The performance of the proposed ULV domino carry
gates implemented in Figure 3, 4 and 5 are shown in Table I.
The implemented carry gates are directly target to operate in
the sub-threshold regime. The presented gates are compared
with the conventional CMOS carry gate[14] at the same
supply voltages. Table I demonstrates speed performance,
together with power consumption and other figure of merits
ф
EN2
RP2 фRP3
EN3
ф
EN1
RP1
_ф
P1
RN1
A0 1
0
1
B0
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
0
0
1
1
0
0
1
1
0
Cin Cout
_
A1
_
1
0
0
1
1
0
0
1
1
1
0
0
1
1
0
0
B1
_
0Cin 0Cin
1Cout
_
A0
B1
_
B1
_
A1
_
KP
(a) Precharge to 1 (N type).
Ep2Ep1
Ep3
1 1
0A
1
A1 0
0
1
B1
0
1
1
0
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
0
1
0
1
1
0
0
1
1
0
Cin Cout
_
1
0
0
1
1
0
0
1
1
1
0
0
1
1
0
0
B0
0 Cout
Cin Cin
RN1 RN2 RN3 0 B
0B
A0
N1
RP1
KN
(b) Precharge to 0 (P type).
Fig. 5: ULV domino Carry Gates using PTL III (CARRY3).
(PDP and EDP) in order to optimize the Minimum Energy
Point MEP for the proposed ULV domino Carry gates
comparable with conventional CMOS carry gate.
The power consumed by the clock drivers are not
included and must be taken into consideration for each
specific application. Besides this, the Table also presents the
operating limits of clock frequency which changes rapidly
as the supply voltage varies. In Table I, the style labelled N
Carry and P Carry represents the proposed N and P type
domino carry gates respectively. Avg represents the average
delay or power between the proposed N and P type domino
carry gates.
The average propagation delay between N and P type
102
5Style Comment 100mV 150mV 200mV 250mV 300mV 350mV 400mV
CLK fclk (MHz) 0.83 2.5 8.3 16.67 66.67 83.3 125
Conventional Delay (ns) 328 101 25.4 10.4 2.56 1.55 0.782
Carry Power (nW) 0.008145 0.055 0.34 1.12 6.83 10.35 23
PDP (10−18j) 2.672 5.55 8.64 11.65 17.48 16.04 17.97
EDP (10−27js) 876.4 560.5 219.5 121 44.75 24.9 14.05
N Carry 1 Delay (ns) 53 18.74 2.49 0.318 0.162 0.05 0.022
P Carry 1 Delay (ns) 194 36.83 2.75 0.38 0.19 0.109 0.1086
Carry 1 Avg.Delay (ns) 123.5 27.785 2.62 0.349 0.176 0.0795 0.0653
Relative delay (%) 37.65 27.52 10.31 3.36 6.88 5.11 8.36
Avg.Power (nW) 0.0358 0.3078 1.907 7.55 41.1 100.8 265
Avg.PDP (10−18j) 4.423 8.552 4.996 2.635 7.234 8.014 17.305
Relative PDP (%) 165 154 57.8 22.6 41.4 50 96.3
Avg.EDP (10−27js) 546.4 237.6 13.1 0.919 1.273 0.637 1.13
Relative EDP (%) 62.35 42.4 5.97 0.76 2.84 2.56 8.04
N Carry 2 Delay (ns) 141.5 25.38 3.07 0.4127 0.183 0.0725 0.04516
P Carry 2 Delay (ns) 92.06 11.72 1.26 0.2976 0.166 0.1375 0.333
Carry 2 Avg.Delay (ns) 116.8 18.5 2.165 0.35 0.1745 0.105 0.189
Relative delay (%) 35.61 18.32 8.52 3.36 6.82 6.77 24.18
Avg.Power (nW) 0.01867 0.209 1.461 5.21 28.45 56.9 137
Avg.PDP (10−18j) 2.181 3.86 3.16 1.823 4.96 5.97 25.9
Relative PDP (%) 81.6 69.55 36.57 15.65 28.37 37.22 144.13
Avg.EDP (10−27js) 254.7 71.41 6.84 0.638 0.865 0.627 4.897
Relative EDP (%) 29.06 12.74 3.116 0.527 1.93 2.518 34.85
N Carry 3 Delay (ns) 141.5 22.5 3.35 0.475 0.22 0.092 0.0715
P Carry 3 Delay (ns) 265.7 31.1 2.75 0.504 0.248 0.182 0.883
Carry 3 Avg.Delay (ns) 203.6 26.8 3.05 0.489 0.234 0.137 0.477
Relative delay (%) 62.07 26.53 12 4.7 9.14 8.84 61
Avg.Power (nW) 0.01948 0.245 1.572 5.47 30.75 71.15 176.5
Avg.PDP (10−18j) 3.96 6.58 4.79 2.68 7.19 9.747 84.235
Relative PDP (%) 148.4 117.35 55.1 23.01 41.15 60.57 468
Avg.EDP (10−27js) 807.5 176.4 14.63 1.312 1.684 1.335 40.2
Relative EDP (%) 92.1 31.14 6.62 1.08 3.76 5.34 286
TABLE I: Performance of ULV domino Carry gates compared to complementary CMOS Carry gate at different supply voltages.
200 220 240 260 280 300 320 340 360 380 400
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10−9
VDD[mV]
Av
er
ag
e 
De
la
y[n
s]
Average Delay of ULV domino Carry gates
 
 
Carry1
Carry2
Carry3
Fig. 6: Average Delay of ULV domino carry gates for different
supply voltages.
Carry gates for the proposed ULV domino logic style is
shown in Figure 6. The delay is in ns for the supply voltages
under 225mV and decreases exponentially as the supply
voltage increases. Beyond 300mV , the propagation delay
is only in the range of tens of ps. CARRY 1 and CARRY
2 contributes almost equal delay when the supply voltage is
within 220mV and 320. Under 220mV , CARRY 2 provides
100 150 200 250 300 350 400
0
10
20
30
40
50
60
70
VDD[mV]
R
el
at
iv
e 
D
el
ay
[%
]
Average Delay of ULV domino Carry gates relative to the conventional carry gate
Carry1 Carry2 Carry3
2.48%
Fig. 7: Delay of ULV carry gates relative to conventional CMOS
carry gate for different supply voltages.
minimum propagation delay. On the other hand, CARRY 1
gives minimum delay when the supply voltage exceeds over
320mV .
CARRY 3 is the slowest and less preferable in high
speed applications due to low noise margin, as both parallel
connected evaluation transistors EN2 or EN3 turn on in
the worst case scenario, while only one of the two inputs
1A or 1B switches from 1 to 0. Thus both 1A and 1B
103
6100 150 200 250 300 350 400
10−3
10−2
10−1
100
101
102
103
VDD[mV]
[A
ve
rag
e p
ow
er 
co
ns
um
pti
on
(nW
)] l
og
ari
thm
ic 
sc
ale
Average power consumption of ULV domino carry gates and the conventional carry gate
 
 
Carry1
Carry2
Carry3
Conventional
Fig. 8: Average power consumption per ULV carry gate compared
to conventional CMOS carry gate.
simultaneously contends at the output node, which makes
the output transition slow and gives poor noise margin.
However, it offers a more efficient solution in terms of area
and power consumption as compared to CARRY 1.
The average delay between N and P type Carry gates
for the proposed ULV domino logic style is compared
with the conventional CMOS Carry gate in Figure 7. The
relative delay is lesser than 20% for all the proposed domino
carry gates when the supply voltage varies between 175mV
and 375mV . The overall best relative delay is achieved
by using ULV domino carry gate proposed in Figure 4
which is obtained by using pass transistor logic. The reason
is because the input carry bit only needs to propagate
through a single evaluation transistor to reach the output
node. Compared to conventional carry gate, the least average
delay is achieved at the supply voltage of 275mV , where
CARRY2 only utilizes a delay of 2.48%.
The average power consumption per ULV domino carry
gate is compared with conventional CMOS carry gate in
Figure 8. The total power consumption per gate increases
with supply voltage. As expected the power consumption for
the ULV domino carry gates exceeds the power consumption
of the conventional CMOS carry gate, giving the advantage
of really fast speed. As shown in Figure 8, ULV domino
carry gates using pass transistor logic (PTL) contributes
minimum power consumption than the domino carry gate
implemented in Figure 3. This happens as the total number
of evaluation transistors reduces from 5 to 3 by using PTL
which consumes less power in the evaluation phase.
The average energy of the ULV domino carry gates
relative to conventional CMOS carry gate for different
supply voltages is shown in Figure 9. The Power Delay
Product PDP for the proposed ULV carry gates is lower than
the conventional carry gate for the supply voltage between
175mV and 350mV . This is mainly caused due to very
reduced delay for the proposed carry gates relative to the
conventional carry gate.
Comparing the graphs in Figure 7 and 9 concludes that
150 200 250 300 350 400
20
30
40
50
60
70
80
90
100
VDD[mV]
R
el
at
iv
e 
P
D
P
 [%
]
PDP of ULV domino carry gates relative to conventional carry gate
Carry1
Carry2
Carry3
15.65%
Fig. 9: Average relative energy of ULV domino carry gates.
150 200 250 300 350 400
0
5
10
15
20
VDD[mV]
R
el
at
iv
e 
E
D
P
 [%
]
EDP of ULV domino carry gates relative to conventional carry gate
Carry1
Carry2
Carry3
0.527%
Fig. 10: Average relative energy delay product of ULV domino
carry gates.
minimum relative PDP corresponds to the maximum relative
speed for the proposed carry gates. All three proposed ULV
domino carry gates have the minimum relative PDP of
lower than 25% at the supply voltage of 250mV , which
makes it the Minimum Energy Point. CARRY 2 is the most
efficient solution as it only contributes 15.65% PDP relative
to conventional Carry gate. As the supply voltage reduces
below 175mV , the relative PDP for CARRY 1 and CARRY
3 exceeds 100% while the relative PDP of CARRY 2 is still
beyond the PDP of conventional Carry gate. However, the
relative PDP of CARRY 2 becomes worse than CARRY 1
as the supply voltage exceeds 375mV .
The relative Energy Delay Product EDP for the ULV
domino carry gates for different supply voltages is shown
in Figure 10. The relative EDP for all the proposed ULV
domino carry gates is lesser than 30% for the supply voltage
between 175mV and 375mV which directly corresponds to
the same supply voltage range where the PDP is minimum
104
7CARRY1 CARRY2 CARRY3
Relative Delay(%) 3.36 3.36 4.7
Relative PDP(%) 22.6 15.65 23.01
Relative EDP(%) 0.76 0.527 1.08
TABLE II: The delay, PDP and EDP of ULV domino
carry gates at Minimum Energy Point (250mV ) relative to
conventional CMOS carry gate.
as shown in Figure 9.
At Minimum Energy point (250mV ), the EDP of of all
proposed ULV carry gates is lower than 1.5% relative to a
conventional carry gate. However, CARRY 2 is character-
ized by least relative EDP with a value closer to 0.527%
at 275mV . The relative EDP of CARRY 2 is far better
than the other solutions at the supply voltage under 175mV ,
but becomes worse than CARRY 1 as the supply voltage
increases beyond 375mV .
Table II is showing a summary of delay, PDP and EDP
of proposed ULV domino carry gates relative to conven-
tional CMOS carry gate at Minimum Energy Point with a
supply voltage of 250mV . CARRY 1 and CARRY 2 have
the same relative delay of 3.36% thus both solutions are
efficient for ultra low voltage and high speed applications.
However for low power applications, CARRY 2 is the most
efficient solution as it consumes less power than the other
two ULV carry gates and results in lower PDP and EDP.
CARRY 3 is the slowest and less preferable for high speed
applications, but it offers a more efficient solution than
CARRY 1 in terms of area and power.
VI. CONCLUSION
Different ultra low-voltage NP domino Carry gates have
been presented in this paper. The ULV domino carry gates
are high speed, i.e. the delay compared to conventional
CMOS carry gate is less than 5% for a supply voltage
equal to 250mV . The power and energy delay product of
the proposed ULV carry gates is less than 23% and 1%
relative to conventional CMOS carry gate respectively at
minimum energy point. Both power and area can be saved if
we can avoid using parallel adders by applying ULV domino
carry gates when ultra low voltage solutions are preferable.
In this manner we may take the advantage of the speed
improvement and the reduction of power and area.
REFERENCES
[1] Chandrakasan A.P. Sheng S. Brodersen R.W.: “Low-power CMOS
digital design” , IEEE Journal of Solid-State Circuits, Volume 27,
Issue 4, April 1992 Page(s):473 - 484
[2] M.Alioto and G.Palumbo. “Very High-Speed Carry Computation based
on Mixed Dynamic/Transmission-Gate Full Adders”, In proc. of the
18th European Conferance on Circuit Theory and Design, August 27-
30, 2007, Sevilla, Spain.
[3] Y. Berg and O.Mirmotahari. ‘Static Differential Ultra Low-Voltage
Domino CMOS logic for High Speed Applications.”, North atlantic
university union: International Journal of Circuits, Systems and Signal
Processing. ISSN 1998-4464. 6(4), s 269- 274
[4] Y. Berg, D. T. Wisland and T. S. Lande: “Ultra Low-Voltage/Low-
Power Digital Floating-Gate Circuits”, IEEE Transactions on Circuits
and Systems, vol. 46, No. 7, pp. 930–936,july 1999.
[5] K. Kotani, T. Shibata, M. Imai and T. Ohmi. “Clocked-Neuron-MOS
Logic Circuits Employing Auto-Threshold-Adjustment”, In IEEE In-
ternational Solid-State Circuits Conference (ISSCC), pp. 320-321,388,
1995.
[6] T. Shibata and T. Ohmi. “ A Functional MOS Transistor Featuring
Gate-Level Weighted Sum and Threshold Operations”, In IEEE Trans-
actions on Electron Devices, vol 39, 1992.
[7] Y. Berg and M.Azadmehr. “A band-tunable auto-zeroing amplifier”,
Proceedings of the WSEAS conferences, ISSN 1790-5117. 3(CSCS
12), s 24- 28.
[8] Y. Berg and M.Azadmehr. “A bi-directional auto-zeroing floating-gate
amplifier.”, Proceedings of the 10th WSEAS conference, ISBN 978-1-
61804-062-6. s 70 - 75
[9] Y. Berg, Tor S. Lande and Ø. Næss. “Programming Floating-Gate
Circuits with UV-Activated Conductances”, IEEE Transactions on
Circuits and Systems -II: Analog and Digital Signal Processing, vol
48, no. 1,pp 12-19, 2001.
[10] Y. Berg and O.Mirmotahari“Ultra Low-Voltage and High Speed
Dynamic and Static Precharge Logic”, In proc. of the 11th Edition
of IEEE Faible Tension Faible Consommation, June 6-8, 2012, Paris,
France.
[11] Y. Berg and O.Mirmotahari. “Novel High-Speed and Ultra-Low-
Voltage CMOS NAND and NOR Domino Gates”, In proc. of the
5th international Conference on Advances in Circuits, Electronics and
Micro-electronics, August 19-24, 2012, Rome, Italy.
[12] Y. Berg and O.Mirmotahari. “Novel Static Ultra Low-Voltage and
High Speed CMOS Boolean Gates”, North atlantic university union:
International Journal of Circuits, Systems and Signal Processing. ISSN
1998-4464. 6(4), s 249- 254.
[13] Y. Berg and M.Azadmehr. “Novel Ultra Low-Voltage and High-Speed
CMOS Pass Transistor Logic”, In proc. of the 11th Edition of IEEE
Faible Tension Faible Consommation, June 6-8, 2012, Paris, France.
[14] Neil H.E. and David Harris. “CMOS VLSI DESIGN, A circuit and
Systems Perspective”, Third edition, Addison Wesley 2005, p 640.
[15] Y. Berg “Ultra Low Voltage Static Carry Generate Circuit”, In Proc.
IEEE International Symposium on Circuits and Systems (ISCAS), Paris,
may 2010.
[16] Y. Berg: “Static Ultra Low Voltage CMOS Logic”, In Proc. IEEE
NORCHIP Conference, Trondheim, NORWAY, november 2009.
Sohail Musa Mahmood currently pursues his M.S in Microelectronics at
the Dept. of Informatics, University of Oslo. His master thesis is mainly
focused on Ultra Low Voltage/low-power digital floating-gate design.
Yngvar Berg received the M.S. and Ph.D. degrees in Microelectronics
from the Dept. of Informatics, University of Oslo in 1987 and 1992 respec-
tively. He is currently working as a professor with the same department.
His research activity is mainly focused on low-voltage/low-power digital
and analog floating-gate VLSI design with more than 170 published papers.
105
Paper III
Ultra-Low Voltage and High Speed NP Domino
Carry Propagation chain
Submitted at 2013 IEEE Faible Tension Faible Consommation (FTFC)
conference, Paris, France, June 20-21, 2013.
106
Ultra-Low Voltage and High Speed NP Domino
Carry Propagation chain
Sohail Musa Mahmood
University of Oslo
Department of Informatics
Oslo, Norway
Email: sohailmm@ifi.uio.no
Yngvar Berg
University of Oslo
Department of Informatics
Oslo, Norway
Email: yngvarb@ifi.uio.no
Abstract—In this paper, an Ultra Low Voltage NP domino
logic style is presented to perform a 32 bit computation in a
carry propagation chain. The presented logic style is targeted
to operate at the supply voltages in the sub-threshold regime.
Simulated results of 32-bit proposed carry chain compared to the
conventional carry chain show that the proposed approach offers
a superb improvement in terms of speed and EDP. The proposed
carry chain has a relative delay and EDP of only 2.68% and 8%
respectively compared to the conventional carry chain. The 32-bit
ULV NP domino carry chain using 90nm TSMC CMOS process
technology with a supply voltage of 300mV could be operated on
a clock frequency of 50MHz.
I. INTRODUCTION
In recent years, low voltage digital CMOS becomes more
and more attractive, due to the general advances in process
technology and due to the low power applications. The ag-
gressive scaling of device dimensions and supply voltage in
order to achieve greater transistor density and low power
consumption results in degradation in the speed of the logic
circuits due to reduced effective input voltage to the transistors.
In one hand, the ever increasing market segment of portable
electronic devices demands the availability of low-power
building blocks that enable the implementation of long-lasting
battery-operated systems. On the other hand, the general trend
of increasing operating frequencies and circuit complexity, in
order to cope with the throughput needed in modern high-
performance processing applications, requires the design of
very high-speed circuits.
Low voltage does not necessarily imply low power; the
power consumed by a gate is proportional to the active
current driving the output of the gate. Hence, delay and power
consumption are both dependent on the current and the energy
or power delay product (PDP) is not significantly dependent
on the current. The energy required to toggle a bit is more
dependent on the load and configuration of the gate. Energy
delay product (EDP) is more dependent on speed than on
power and will be improved by increasing the current for a
specific supply voltage. The optimal supply voltage for CMOS
logic in terms of EDP is close to the threshold voltage of the
nMOS transistor Vtn for a specific process, assuming that the
threshold voltage of the pMOS transistor Vtp is approximately
equal to −Vtn [1]. Several approaches to high speed and low
voltage digital CMOS circuits have been presented [2].
A Full Adder is one of the most critical components of
a processor as it is used in ALU, the floating point unit an
address generation in case of cache or memory accesses[3].
Besides the addition task, it is also nucleus to many other
arithmetic operations such as subtraction, multiplication, divi-
sion etc. In a typical serial adder, the critical delay is affected
by the carry propagation. The two bit Full adder operates in
the propagation mode when only one of the the input signals
is high and the carry signal has to propagate through the full
adder. This makes the carry propagation path a speed limiting
factor for many high speed applications. By using complex
carry look ahead techniques or applying parallel structures,
the delay can be reduced compared to a simple serial adder at
the cost of increased complexity, power consumption and chip
area. Thus improving the speed performance of serial adders
at Ultra low supply voltages has been of a continuous interest.
Floating-Gate (FG) gates have been proposed for Ultra-
Low-Voltage (ULV) and Low-Power (LP) logic [4]. FG logic
implemented in a modern CMOS process require frequent
initialization to avoid significant leakage. By using floating
capacitances to the transistor gate terminals, the semi-floating-
gate (SFG) nodes can have a different DC level than pro-
vided by the supply voltage headroom [4]. There are several
approaches for both analog and digital applications using
FG CMOS logic proposed in [5], [6], [7], [8]. The gates
proposed in this paper are influenced by ULV non-volatile FG
circuits [9].
A ULV and high speed serial carry chain has been presented
[10] using a simple dynamic ULV logic. In In this paper, we
exploit ULV and high speed NP Domino carry gates[11] in
a 32-bit serial chain. In Section II, an extended description
of the NP Domino ULV inverter [12] is given. An ULV NP
domino 2 bit carry gate is presented in Section III. Simulation
results are shown in Section IV and a conclusion is given in
Section V.
II. HIGH SPEED AND ULTRA-LOW-VOLTAGE NP
DOMINO INVERTER
The ULV domino carry gate presented in this pa-
per is related to the ULV domino logic style presented
in [12], [13], [14]. The main purpose of the ULV logic style
is to increase the current level for the transistors at low
107
_ф
Vout
EP
RN
Vin
ф
VP
ф
N
RP
KN
VN
Fig. 1: ULV P type domino inverter (Precharge to 0).
supply voltages without increasing the transistor widths. We
may increase the current level compared to complementary
CMOS using different initialization voltages to the gates
and applying capacitive inputs. The extra load represented
by the floating capacitors is less than the extra load given
by increased transistor widths. The capacitive inputs lower
the delay through increased transconductance while increased
transistor widths only reduce the parasitic delay. The proposed
logic style may be used in critical high speed and low voltage
full adders together with the conventional CMOS logic.
The High speed and ULV P type domino inverter repre-
sented in [12] is shown in Figure 1. The clock signals φ and
φ are used both as control signals for the recharge transistors
RP and RN, and as reference signals for pMOS evaluation
transistor EP. The recharge/precharge and the evaluation phase
for the proposed logic style is characterized below:
A. Precharge/Recharge phase
When φ switches from 1 to 0, the circuit becomes in the
precharge/recharge phase. During this phase, RP turns on and
recharges the gate of N transistor to 0. Meanwhile φ switches
from 0 to 1 which turns on RN and recharges the gate of
EP to 0. Thus both evaluation transistors N and EP turn
on and precharge the output node Vout to gnd. The keeper
transistor KN is inactive during this phase as the output node
is precharged to 0.
B. Evaluation phase
In the evaluation phase, clock signals φ and φ switch from 0
to 1 and 1 to 0 respectively. Both recharge transistors RP and
RN switch off which make the charge on gate terminals Vp and
Vn floating. The output node Vout floats as well until an input
transition occurs. The input signal Vin must be monotonically
falling to ensure the correct operation for the P type domino
inverter. This can only be satisfied if
• input signal Vin is high at the beginning of the evaluation
phase, and
• Vin only makes a single transition from 1 to 0 in the
evaluation phase.
100 150 200 250 300 350 400
0
10
20
30
40
VDD[mV]
R
el
at
iv
e 
D
el
ay
[%
]
Average Delay of ULV domino Carry gate relative to the conventional carry gate
2.62%
Fig. 2: Delay of ULV NP domino carry gate relative to standard
CMOS carry gate.
KN turns on when the output node gets a positive transition
in the evaluation phase. This partially turns off the evaluation
transistor N and let the output node swings fully to gnd. This
helps to reduce the static current which directly impacts on
the noise margin and the power consumption for the proposed
logic.
III. TWO BIT ULV NP DOMINO CARRY GATE
Different implementations of ULV NP domino Carry gates
are presented in [11] which operate in the sub-threshold regime
and result in a really superb speed performance. The proposed
NP domino carry gates offer an average propagation delay
below 5% relative to the conventional CMOS carry gate[15] at
a supply voltage of 275mV . A two bit ULV NP domino carry
gate shown in Figure 3 is based on one of the NP domino carry
gates presented in [11]. The propagation average speed for the
proposed ULV carry gate compared to standard CMOS carry
gate[15] for different supply voltages is shown in Figure 2.
The average maximum speed is achieved at a supply voltage of
275mV where the delay is only 2.62% relative to conventional
CMOS carry gate.
Table I demonstrates speed performance, together with
power consumption and other figure of merits (PDP and
EDP) in order to optimize the Minimum Energy Point MEP
for the proposed ULV domino carry gate comparable with
conventional CMOS carry gate at different supply voltages.
The left carry gate (N type) in Figure 3 resembles an
N type ULV domino inverter presented in [12]. The output
nodes Cout[0] and Cout[1] precharges to 1 and 0 respectively
in the precharge phase. In the evaluation phase, the output
nodes remain floating holding the precharged value until the
accurate inputs arrive. In order to retain the precharged value,
the evaluation transistors P to VDD, or N to gnd should be
made stronger than the other evaluation transistors. Different
techniques have been suggested in [16] to increase the strength
of MOS transistors in the subthreshold regime. The circuit
response to different configurations in the evaluation phase is
described below:
1) If the inputs signal A and B gets a positive transition
prior to Cin in the N type carry gate, the evaluation
108
Style Comment 200mV 225mV 250mV 275mV 300mV 325mV 350mV 375mV 400mV
CLK fclk (MHz) 8.3 12.5 16.67 33.33 66.67 71.4 83.3 100 125
Conventional Delay (ns) 25.4 16.1 10.4 9.83 2.56 1.97 1.55 1.09 0.782
Carry Power (nW) 0.34 0.69 1.12 1.61 6.83 8.88 10.35 16.56 23
PDP (10−18j) 8.64 11.12 11.65 15.83 17.48 17.49 16.04 18.05 17.97
EDP (10−27js) 219.5 179 121 155.6 44.75 34.4 24.9 19.7 14.05
N type Carry Delay (ns) 2.49 0.827 0.318 0.245 0.162 0.075 0.05 0.035 0.022
P type Carry Delay (ns) 2.75 0.909 0.38 0.271 0.19 0.129 0.109 0.106 0.1086
NP Carry Avg.Delay (ns) 2.62 0.868 0.349 0.258 0.176 0.102 0.0795 0.0707 0.0653
Avg.Power (nW) 1.907 4.057 7.55 18.8 41.1 63.3 100.8 146.4 265
Avg.PDP (10−18j) 4.996 3.52 2.635 4.85 7.234 6.47 8.014 10.36 17.305
Avg.EDP (10−27js) 13.1 3.05 0.919 1.25 1.273 0.662 0.637 0.733 1.13
Relative (%) Delay 10.31 5.40 3.36 2.62 6.88 5.19 5.11 6.48 8.36
PDP 57.8 31.9 22.6 30.59 41.4 37 50 57.3 96.3
EDP 5.97 1.72 0.76 0.8 2.84 1.92 2.56 3.71 8.04
TABLE I: Performance of ULV NP domino Carry gate compared to complementary CMOS Carry gate at different supply voltages.
_ф
ф
EN2
RP ф
_ф
EN4
P
RN
RP RP
EN5
_ф
ф
EN3
RPф
EN1
RP
A[0]
B[0]
Cin[0]
Cout1[0]
_
KP
ф
EP2
ф ф
EP1
EP5
EP4
ф
ф
N
EP3
RN
RNRN
RN RN
_
Cout[1]
RP
KN
A[0] B[0]
A[1] B[1]
B[1]
A[1]
Fig. 3: A two bit ULV NP domino carry gate.
transistors EN1 and EN2 turn on as the active current
is larger due to the boost of the evaluation transistors.
Thus the output node pulls down to 0. The P type carry
gate operates in the same way but the input signals get a
negative transition. This is the best case scenario as the
carry bit does not propagate in the domino chain and all
the output nodes get a transition when both nput signals
arrive prior to the carry signal.
2) Assuming the worst case scenario when only one of the
two inputs get a transition which turn on one of the
two parallel connected evaluation transistors EN4 or EN5
in N type, the output node will retain the precharged
value until the carry input signal arrives. When the carry
input signal arrives, EN3 turn on making the path from
VDD to gnd which pulls down the output node to 0.
Furthermore, negative transition at Cout[0] turn on the
EP3 pulling the output node to 1 if one of the two input
signals A[1] or B[1] get a negative transition. In this way,
the carry signal propagates through the NP domino carry
gates in the worst case scenario.
IV. SIMULATION RESULTS
The data simulated is based on a 90nm TSMC CMOS
process. To avoid underestimation of the implemented circuits
and to obtain more realistic waveforms, clock signals have
been made by inserting two symmetric conventional CMOS
inverters between the ideal voltage sources and the clock
signals. In the same way, input signals for ULV domino carry
gates have been made by inserting ULV domino inverters
implemented in [12] between the voltage sources and the input
nodes.
Graph in Figure 4 shows the simulated results of the 32 bit
carry chain. 16 two bit NP domino carry gates implemented
109
100 110 120 130 140
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Time(ns)
V
ol
ta
ge
(V
)
C_in B
C_out[2,4,6,...32]
34mV
8.6ns
(a) Output nodes from P Type carry gates.
100 110 120 130 140
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Time(ns)
V
ol
ta
ge
(V
)
C_out[1,3,5,...31]54mV
(b) Output nodes from N Type carry gates.
Fig. 4: Simulated response of a 32-bit carry chain implemented in
Figure 3.
in Figure 3 are cascaded together to make a 32 bit carry chain.
The proposed ULV domino carry gates are simulated at the
supply voltage of 300mV and by considering the worst case
scenario where only one of the two input bits get a transition
in every ULV domino carry gate and the carry input signal
propagates through a 32 bit carry chain.
The propagation delay of the proposed carry chain is far
less than the delay of the conventional CMOS carry gate[15].
Conventional carry gate has a propagation delay of approxi-
mately 321ns at the same supply voltage whereas the proposed
ULV domino requires only 8.6ns to propagate from 1 to 32
bit. The simulated response also demonstrates that the logic
level of different carry signals are very close to the rails after
the transition, offering a robust design with a better noise
margin. The floating output nodes of P and N type carry
gates are pulled slightly towards 1 and 0 respectively before
the carry input signal arrives. This problem can be eliminated
by increasing the strength of the evaluation transistors labelled
N and P offering a low static power consumption at the cost
of slightly degradation in speed performance.
V. CONCLUSION
In this paper we have presented a 32 bit ULV NP domino
carry chain which is simulated for the worst case scenario.
The proposed ULV carry gates are used to achieve a very
fast carry computation, offering a superb speed feature, i.e.
the delay compared to the conventional CMOS carry chain is
only 2.68% for a supply voltage of 300mV with almost no
degradation in the robustness and the area. The energy delay
product EDP for the proposed ULV domino carry chain is
approximately 8% relative to the conventional CMOS carry
chain at the same supply voltage. The clock signal necessitates
only a period of 20ns to execute an addition of 32 bits. ULV
NP domino carry gates can be used to save power and area
compared to parallel designed adders for ultra low voltage and
high speed applications.
REFERENCES
[1] Chandrakasan A.P. Sheng S. Brodersen R.W.: “Low-power CMOS digital
design” , IEEE Journal of Solid-State Circuits, Volume 27, Issue 4, April
1992 Page(s):473 - 484
[2] Verma N. Kwong J. Chandrakasan A.P.: “Nanometer MOSFET Variation
in Minimum Energy Subthreshold Circuits” , IEEE Transactions on
Electron Devices, Vol. 55, NO. 1, January 2008 Page(s):163 - 174
[3] A. Wang, A.P. Chandrakasan, S.V. Kosonocky: “Optimal supply and
threshold scaling for subthreshold CMOS circuits” , Proc. of IEEE
Computer Society Annual Symposium on VLSI, 2002, pp. 5-9.
[4] Y. Berg, D. T. Wisland and T. S. Lande: “Ultra Low-Voltage/Low-
Power Digital Floating-Gate Circuits”, IEEE Transactions on Circuits
and Systems, vol. 46, No. 7, pp. 930–936,july 1999.
[5] K. Kotani, T. Shibata, M. Imai and T. Ohmi. “Clocked-Neuron-MOS
Logic Circuits Employing Auto-Threshold-Adjustment”, In IEEE Inter-
national Solid-State Circuits Conference (ISSCC), pp. 320-321,388, 1995.
[6] T. Shibata and T. Ohmi. “ A Functional MOS Transistor Featuring Gate-
Level Weighted Sum and Threshold Operations”, In IEEE Transactions
on Electron Devices, vol 39, 1992.
[7] Y. Berg and M.Azadmehr. “A band-tunable auto-zeroing amplifier”,
Proceedings of the WSEAS conferences, ISSN 1790-5117. 3(CSCS 12),
s 24- 28.
[8] Y. Berg and M.Azadmehr. “A bi-directional auto-zeroing floating-gate
amplifier.”, Proceedings of the 10th WSEAS conference, ISBN 978-1-
61804-062-6. s 70 - 75
[9] Y. Berg, Tor S. Lande and Ø. Næss. “Programming Floating-Gate Circuits
with UV-Activated Conductances”, IEEE Transactions on Circuits and
Systems -II: Analog and Digital Signal Processing, vol 48, no. 1,pp 12-
19, 2001.
[10] Y. Berg “Ultra Low Voltage Static Carry Generate Circuit”, In Proc.
IEEE International Symposium on Circuits and Systems (ISCAS), Paris,
may 2010.
[11] S.M. Mahmood and Y.Berg. “High Speed and Ultra Low-voltage CMOS
Domino Carry gates”, In proc. of the 12th WSEAS Conferance on
Electronics, Hardware, Wireless and Optical Communications, ISBN 978-
1-61804-164-7. s 52 - 57
[12] Y. Berg and O.Mirmotahari “Ultra Low-Voltage and High Speed Dy-
namic and Static Precharge Logic”, In proc. of the 11th Edition of IEEE
Faible Tension Faible Consommation, June 6-8, 2012, Paris, France.
[13] Y. Berg and O.Mirmotahari. “Novel High-Speed and Ultra-Low-Voltage
CMOS NAND and NOR Domino Gates”, In proc. of the 5th international
Conference on Advances in Circuits, Electronics and Micro-electronics,
August 19-24, 2012, Rome, Italy.
[14] Y. Berg and O.Mirmotahari. “Novel Static Ultra Low-Voltage and
High Speed CMOS Boolean Gates”, North atlantic university union:
International Journal of Circuits, Systems and Signal Processing. ISSN
1998-4464. 6(4), s 249- 254.
[15] Neil H.E. and David Harris. “CMOS VLSI DESIGN, A circuit and
Systems Perspective”, Third edition, Addison Wesley 2005, p 640.
[16] M. Alioto. ”Impact of NMOS/PMOS imbalance in Ultra-Low Voltage
CMOS standard cells”, Circuit Theory and Design (ECCTD), 2011 20th
European Conference on , vol., no., pp.536,539, 29-31 Aug. 2011
110
Bibliography
[1] A.P. Chandrakasan, S. Sheng, and R.W. Brodersen. Low-power CMOS
digital design. Solid-State Circuits, IEEE Journal of, 27(4):473 –484,
apr 1992. ISSN 0018−9200. doi: 10.1109/4.126534.
[2] N. Verma, J. Kwong, and A.P. Chandrakasan. Nanometer MOSFET
variation in Minimum Energy Subthreshold Circuits. Electron
Devices, IEEE Transactions on, 55(1):163–174, 2008. ISSN 0018−9383.
doi: 10.1109/TED.2007.911352.
[3] Kimiyoshi Usami and Mark Horowitz. Clustered voltage scaling tech-
nique for low-power design. In Proceedings of the 1995 international
symposium on Low power design, pages 3–8. ACM, 1995.
[4] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and
J. Yamada. 1-V power supply high-speed digital circuit technology
with multithreshold-voltage cmos. Solid-State Circuits, IEEE Journal
of, 30(8):847–854, 1995. ISSN 0018−9200. doi: 10.1109/4.400426.
[5] T. Shibata and T. Ohmi. A functional mos transistor featuring gate-
level weighted sum and threshold operations. Electron Devices, IEEE
Transactions on, 39(6):1444–1455, 1992. ISSN 0018 − 9383. doi:
10.1109/16.137325.
[6] Paul Hasler, Chris Diorio, Bradley A Minch, and Carver Mead.
Single transistor learning synapses. Advances in neural information
processing systems, pages 817–826, 1995.
[7] Bradley A Minch, Chris Diorio, Paul Hasler, and Carver A Mead.
Translinear circuits using subthreshold floating-gate MOS transistors.
Analog Integrated Circuits and Signal Processing, 9(2):167–179,
1996.
[8] P. Hasler, B.A. Minch, and C. Diorio. Adaptive circuits using
pFET floating-gate devices. In Advanced Research in VLSI, 1999.
Proceedings. 20th Anniversary Conference on, pages 215–229, 1999.
doi: 10.1109/ARVLSI.1999.756050.
[9] Y. Berg, D.T. Wisland, and T.S. Lande. Ultra low-voltage/low-power
digital floating-gate circuits. Circuits and Systems II: Analog and
Digital Signal Processing, IEEE Transactions on, 46(7):930 –936, jul
1999. ISSN 1057−7130. doi: 10.1109/82.775389.
111
[10] Y. Berg, T.S. Lande, and O. Naess. Programming floating-gate circuits
with UV-activated conductances. Circuits and Systems II: Analog and
Digital Signal Processing, IEEE Transactions on, 48(1):12 –19, jan
2001. ISSN 1057−7130. doi: 10.1109/82.913182.
[11] Y. Berg, S. Aunet, O. Minnotahari, and M. Hovin. Novel recharge
semi-floating-gate CMOS logic for multiple-valued systems. In
Circuits and Systems, 2003. ISCAS ’03. Proceedings of the 2003
International Symposium on, volume 5, pages V–193–V–196 vol.5,
2003. doi: 10.1109/ISCAS.2003.1206229.
[12] Y. Berg. Novel ultra low-voltage and high speed domino CMOS logic.
InVLSI System on Chip Conference (VLSI-SoC), 2010 18th IEEE/IFIP,
pages 225–228, 2010. doi: 10.1109/VLSISOC.2010.5642664.
[13] K. Venkat, Liang Chen, I. Lin, P. Mistry, P. Madhani, and K. Sato.
Timing verification of dynamic circuits. In Custom Integrated Circuits
Conference, 1995., Proceedings of the IEEE 1995, pages 271−−274,
may 1995. doi: 10.1109/CICC.1995.518184.
[14] P. Meher and K.K. Mahapatra. Ultra low-power and noise tolerant
CMOS dynamic circuit technique. In TENCON 2011 - 2011 IEEE
Region 10th Conference, pages 1175 –1179, nov. 2011. doi: 10.1109/
TENCON.2011.6129297.
[15] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, and J. Yamada. 1V high-
speed digital circuit technology with 0.5µm multi-threshold CMOS. In
ASIC Conference and Exhibit, 1993. Proceedings., Sixth Annual IEEE
International, pages 186–189, 1993. doi: 10.1109/ASIC.1993.410836.
[16] T. Kuroda, T. Fujita, T. Nagamatu, S. Yoshioka, T. Sei, K. Matsuo,
Y. Hamura, T. Mori, M. Murota, M. Kakumu, and T. Sakurai. A
high-speed low-power 0.3µm cmos gate array with variable threshold
voltage (VT) scheme. In Custom Integrated Circuits Conference,
1996., Proceedings of the IEEE 1996, pages 53–56, 1996. doi: 10.1109/
CICC.1996.510510.
[17] Neil H.E. and David Harris. Integrated Circuit Design, page 389.
PEARSON, fourth edition, 2010.
[18] M. Alioto. Ultra-Low Power VLSI Circuit Design Demystified and
Explained: A Tutorial. Circuits and Systems I: Regular Papers,
IEEE Transactions on, 59(1):3 –29, jan. 2012. ISSN 1549−8328. doi:
10.1109/TCSI.2011.2177004.
[19] M. Anis, S. Areibi, and M. Elmasry. Design and optimization of
multithreshold CMOS (MTCMOS) circuits. Computer-Aided Design
of Integrated Circuits and Systems, IEEE Transactions on, 22(10):
1324–1342, 2003. ISSN 0278−0070. doi: 10.1109/TCAD.2003.818127.
112
[20] M. Alioto. Impact of NMOS/PMOS imbalance in Ultra-Low Voltage
CMOS standard cells. In Circuit Theory and Design (ECCTD), 2011
20th European Conference on, pages 536 –539, aug. 2011. doi: 10.
1109/ECCTD.2011.6043407.
[21] Song Ye and Jun Li. The effects of a deep n-well junction on RF circuit
performance. In Junction Technology (IWJT), 2010 International
Workshop on, pages 1 –4, may 2010. doi: 10.1109/IWJT.2010.
5475012.
[22] F. Assaderaghi, S. Parke, D. Sinitsky, J. Bokor, P.-K. Ko, and
Chenming Hu. A dynamic threshold voltage MOSFET (DTMOS) for
very low voltage operation. ElectronDevice Letters, IEEE, 15(12):510–
512, 1994. ISSN 0741−3106. doi: 10.1109/55.338420.
[23] P. Simonen, A. Heinonen, M. Kuulusa, and J. Nurmi. Comparison
of bulk and SOI CMOS technologies in a DSP processor circuit
implementation. In Microelectronics, 2001. ICM 2001 Proceedings.
The 13th International Conference on, pages 107–110, 2001. doi:
10.1109/ICM.2001.997499.
[24] P.F. Butzen and R.P. Ribas. Leakage Current in Sub-Micrometer
CMOS Gates, http://www.inf.ufrgs.br/logics/docman/book_emicro_
butzen.pdf.
[25] H.J.M. Veendrick. Short-circuit dissipation of static CMOS circuitry
and its impact on the design of buffer circuits. Solid-State Circuits,
IEEE Journal of, 19(4):468 – 473, aug 1984. ISSN 0018− 9200. doi:
10.1109/JSSC.1984.1052168.
[26] X. Qi, S.C. Lo, A. Gyure, Y. Luo, M. Shahram, K. Singhal, and D.B.
MacMillen. Efficient subthreshold leakage current optimization -
Leakage current optimization and layout migration for 90- and 65- nm
ASIC libraries. Circuits and Devices Magazine, IEEE, 22(5):39 –47,
sept.-oct. 2006. ISSN 8755−3996. doi: 10.1109/MCD.2006.272999.
[27] Neil H.E. and David Harris. Integrated Circuit Design, page 154.
PEARSON, fourth edition, 2010.
[28] M. Alioto and G. Palumbo. Very high-speed carry computation based
on mixed dynamic/transmission-gate full adders. In Circuit Theory
and Design, 2007. ECCTD 2007. 18th European Conference on, pages
799 –802, aug. 2007. doi: 10.1109/ECCTD.2007.4529717.
[29] Y. Berg and M. Azadmehr. Novel ultra low-voltage and high-speed
CMOS pass transistor logic. In Faible Tension Faible Consommation
(FTFC), 2012 IEEE, pages 1 –4, june 2012. doi: 10.1109/FTFC.2012.
6231719.
[30] S.M Mahmood and Y. Berg. High Speed and Ultra Low-voltage CMOS
Domino Carry gates. In 12th WSEAS Conferance on Electronics,
113
Hardware, Wireless and Optical Communications, pages 52–57,
2013. doi: ISBN978-1-61804-164-7.
[31] M. Alioto and G. Palumbo. Analysis and comparison on full adder
block in submicron technology. Very Large Scale Integration (VLSI)
Systems, IEEE Transactions on, 10(6):806 –823, dec. 2002. ISSN
1063−8210. doi: 10.1109/TVLSI.2002.808446.
114
