Full-Custom Sub-/Near-Threshold Cell Library in 130nm CMOS with Application to an ALU by Johnsen, Glenn André
Full-Custom Sub-/Near-Threshold Cell 
Library in 130nm CMOS with Application 
to an ALU
Glenn André Johnsen
Electronics System Design and Innovation
Supervisor: Snorre Aunet, IET
Department of Electronics and Telecommunications
Submission date: June 2014
Norwegian University of Science and Technology
 
Problem Description
Candidate name: Glenn André Johnsen
Assignment title: Full-Custom Sub-/Near-Threshold Cell Library in 130nm CMOS with Ap-
plication to an ALU
Assignment text:
Energy harvesting systems typically contain an embedded processor to collect, process, and
interpret sensory input data. The system typically includes a CPU, memories, buses, and pe-
ripherals. In order to build e.g. an Ultra-Low-Voltage CPU a set of standard cells must be
designed.
This assignment involves defining standard cells for an ULV standard cell library. These stan-
dard cells should be defined and designed using state of the art design techniques and litera-
ture.
Assignment proposer / Co-supervisor: Jan Rune Herheim from Atmel Norway AS
Supervisor: Professor Snorre Aunet
i
ii
Abstract
This thesis presents a cell library with limited functionality targeting to operate in sub-threshold
(350mV) as well as above-threshold (1.2V) voltages utilizing the dynamic speed requirement
of the circuit. The sub-threshold cell library can be used to synthesize any general Finite State
Machine (FSM) since it contains logic gates and a D-FF memory element. The sub-threshold
cell library proposed in this thesis consists of: Inverter, NAND2, NOR2, XNOR2, XOR2,
AOI22, OAI22 and D flip-flop. All cells are designed with static CMOS and use of 130 nm
HVT n-well process. The main motivation behind this work is the desirable for longer lasting
battery powered IC chips.
CMOS power consumption includes three components where the dynamic component is Pdyn ∝
V 2DD. Hence, a promising method to reduce power consumption is to reduce the supply voltage
VDD to the sub-threshold region. The reduction of VDD increases the delay through the circuit
(excellent trade-off in application with low performance requirements) and increases sensitivity
to process, voltage and temperature (PVT) variations.
The sub-threshold cells are evaluated with an ALU synthesized into three circuits: No.1: un-
limited, with use of provided above-threshold cells; No.2: limited to INV, NAND2, NOR2 and
D-FF with sub- and above-threshold cell library; and No.3: limited as No.2 + XNOR2, XOR2,
AOI22 and OAI22 with sub- and above-threshold cell library.
The results shows a power consumption reduction of ∼ 14 times from VDD = 1.2V to 350mV
for both No.2 and No.3 ALU circuit. It is also shown that a more complex library including
XNOR2, XOR2, AOI22 and OAI22 reduces the power consumption with ∼ 7.7% compared to
a library with only Inverter, NAND2, NOR2 and D-FF at 350mV. The No.3 circuit is shown to
be the best ALU with use of sub-threshold cells in term of delay and power consumption. Both
No.2 and No.3 only fails to comply with the 32KHz frequency in SS and FS corner in −40◦C,
350mV and with use of sub-threshold cells, whereas with use of above-threshold cells fails in
all except FF corner in −40◦C, in addition to failing in SS corner at 25◦C.
iii
iv
Sammendrag
Denne avhandlingen presenterer et celle bibliotek med begrenset funksjonalitet designet til å
kunne operere i sub-terskel område (350mV) samt i over-terskel område for å utnytte dynamisk
hastighetskrav i kretsen. Sub-terskel celle biblioteket kan brukes til å syntetisere enhver generell
endelig tilstandsmaskin (FSM) siden det inneholder logiske porter og et D-FF minne element.
Sub-terskel celle biblioteket inneholder disse logiske funksjonene med minimum drivestyrke:
Inverter, NAND2, NOR2, XNOR2, XOR2, AOI22, OAI22 og D-FF. Alle cellene er designet
med bruk av statisk CMOS og 130 nm HVT n-well prosess. Hovedmotivasjonen bak dette
arbeidet er ønsket om lengre levetid på batteridrevende IC brikker.
Effektforbruket i CMOS består av tre komponenter hvor den dynamisk komponenten er Pdyn ∝
V 2DD. En lovende metode for å redusere strømforbruket er derfor å redusere forsyningsspennin-
gen VDD til sub-terskel området. Reduksjonen i forsyningsspenningen øker forsinkelsen gjen-
nom en krets (utmerket avveining i applikasjoner med lav ytelseskrav) og øker sensitiviteten til
prosess, spenning og temperatur (PVT) variasjoner.
Sub-terskel cellene er evaluert ved bruk i en ALU som er syntetisert til tre kretser: No.1: ube-
grenset, med bruk av tilgjengelige over-terskel celler; No.2: begrenset til kun INV, NAND2,
NOR2 og D-FF med bruk av sub-terskel og over-terskel celle bibliotek; og No.3: begrenset
som No.2 + XNOR2, XOR2, AOI22 og OAI22 med bruk av sub-terskel og over-terskel celle
bibliotek.
Resultatene viser at effektforbruket reduseres med ∼ 14 ganger fra VDD = 1.2V til 350mV
for både No.2 og No.3 ALU kretsene med bruk av sub-terskel cellene. Det er også vist at
et mer komplisert bibliotek med inkludering av XNOR2, XOR2, AOI22 og OAI22 reduserer
effektforbruket med∼ 7.7% sammenlignet med et bibliotek bestående av kun Inverter, NAND2,
NOR2 og D-FF, med 350mV. No.3 er vist å være den beste ALU kretsen med bruk av sub-terskel
celler i form av forsinkelse og effektforbruk. Både No.2 og No.3 feiler kun med å etterkomme
en 32KHz frekvens i SS og FS prosess hjørner i −40◦C og 350mV med bruk av sub-terksel
celler, hvor de feiler i alle hjørner unntatt FF i −40◦C i tillegg til å feile i SS hjørne ved 25◦C
med bruk av over-terkel celler.
v
vi
Preface
This report is written as a result of a Master thesis in the second year of a 2 year Master’s
degree program in Electronics and the study path Circuit and System Design with main profile
Design of Digital Systems at The Norwegian University of Science and Technology (NTNU) in
Trondheim. The report was written during the spring of 2014, at the Department of Electronics
and Telecommunications.
The company named Atmel Norway AS in Trondheim proposed the project topic in agreement
with the author and was decided to be: “Full-Custom Sub-/Near-Threshold Cell Library in 130
nm CMOS with Application to an ALU”. Atmel has provided with workplace and computer
equipment with access to design tools which I am grateful for.
I will firstly like to thank Professor and supervisor Snorre Aunet for all his help and guidance
throughout the project. His positive spirit and good will has inspired and contributed to in-
creased interest within the field of Low Power technology. I will also like to thank co-supervisor
Jan Rune Herheim from Atmel Norway AS and fellow student Ole S. Kjøbli for all the help and
discussions throughout the project. Last, but not least, a final thanks to Susanne N. Rapp for the
love and support through the whole two-year Master education, and feedback on the report.
Trondheim, 2014-06-16
Glenn André Johnsen
vii
viii
Contents
Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Sammendrag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Structure of the Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Theoretical Background 5
2.1 Semiconductor Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Electronic Analysis of CMOS Logic Gates . . . . . . . . . . . . . . . . . . . . 6
2.2.1 DC Characteristics of CMOS Inverter . . . . . . . . . . . . . . . . . . 7
2.2.2 Switching Characteristics of CMOS Inverter . . . . . . . . . . . . . . 10
2.3 The D Flip-Flop Memory Element . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 D FF Timing and Delay . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 CMOS Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Dynamic Power Consumption . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Short-Circuit Power Consumption . . . . . . . . . . . . . . . . . . . . 15
2.4.3 Leakage Power Consumption . . . . . . . . . . . . . . . . . . . . . . 15
2.4.4 Techniques to Reduce Power Consumption . . . . . . . . . . . . . . . 16
2.5 MOSFETs in the Sub-Threshold Region . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Operation of MOS Transistor in Sub-threshold Region . . . . . . . . . 17
2.5.2 The Threshold Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.3 nMOS / pMOS Imbalance Factor . . . . . . . . . . . . . . . . . . . . 19
2.5.4 Delay in Saturated MOSFETs . . . . . . . . . . . . . . . . . . . . . . 19
2.5.5 Delay in Sub-threshold Region of MOSFETs . . . . . . . . . . . . . . 20
2.5.6 High Fan-in Problematics in Sub-threshold Voltage . . . . . . . . . . . 20
2.5.7 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
ix
2.5.8 Corner Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.9 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Microelectronic Design Styles . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Design Methodology and Application of the Sub-Threshold Cells 25
3.1 Library Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Design of Sub-Threshold Cells . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Choice of Transistor Type and Supply Voltage for the Sub-Threshold
Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Transistor Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.3 General design methodology of logic elements . . . . . . . . . . . . . 32
3.2.4 Monte Carlo Simulation and the Number of Runs . . . . . . . . . . . . 34
3.2.5 Design of Inverter Gate . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.6 Design of NAND2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.7 Design of NOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.8 Design of XNOR2 and XOR2 Gate . . . . . . . . . . . . . . . . . . . 41
3.2.9 Design of AOI22 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.10 Design of OAI22 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.11 Design of D Flip-Flop Memory Element . . . . . . . . . . . . . . . . . 49
3.3 The ALU Test Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Logic Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 Circuit Design Method (i.e. The ALU Module) . . . . . . . . . . . . . 56
4 Layout 59
4.1 Layout of Inverter Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Layout of NAND2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Layout of NOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Layout of XNOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Layout of XOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Layout of AOI22 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Layout of OAI22 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.8 Layout of D Flip-flop Memory Element . . . . . . . . . . . . . . . . . . . . . 63
5 Simulations and Test of Sub-Threshold Cells and ALU Application 65
5.1 Sub-Threshold Cell Design Simulations . . . . . . . . . . . . . . . . . . . . . 65
5.1.1 Transistor Strength and Threshold Voltage Simulation . . . . . . . . . 65
5.1.2 Cell Test bench Setup and Simulation . . . . . . . . . . . . . . . . . . 66
5.1.3 D Flip-Flop memory element Test bench Setup . . . . . . . . . . . . . 70
5.2 ALU Module Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.1 Test bench Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
x
5.2.3 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.4 Propagation Delay Through Critical Path . . . . . . . . . . . . . . . . 74
6 Results 75
6.1 Sub-Threshold Cell Library Results . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.1 Inverter Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.1.2 NAND2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1.3 NOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.1.4 XNOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.5 XOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.6 AOI22 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.1.7 OAI22 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.1.8 D Flip-Flop Memory Element . . . . . . . . . . . . . . . . . . . . . . 91
6.2 ALU Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.1 ALU No.1: Results with use of Above-Threshold Library . . . . . . . 93
6.2.2 ALU No.2: Sub-Threshold VS Above-Threshold Library Cells . . . . . 95
6.2.3 ALU No.3: Sub-Threshold VS Above-Threshold Library Cells . . . . . 97
6.2.4 Comparison between ALU Synthesis Results . . . . . . . . . . . . . . 99
7 Discussion 101
7.1 The Accuracy of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2 The Sub-Threshold Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2.1 The Inverter Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.2.2 The NAND2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2.3 The Other Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2.4 The D-FF Memory Element . . . . . . . . . . . . . . . . . . . . . . . 103
7.2.5 Common Discussion Considering the Cells . . . . . . . . . . . . . . . 104
7.3 The ALU Test Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8 Conclusion 107
8.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Bibliography 111
A HDl and Synthesis Scripts 113
A.1 8-bit ALU VHDL Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.2 Encounter RTL TCL Script .tcl . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.3 Encounter RTL Constraint file .sdc . . . . . . . . . . . . . . . . . . . . . . . . 115
B ALU Stimuli Files 116
B.1 Dynamic Stimuli File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
B.2 Static Stimuli File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
xi
C Additional Simulation and test bench setup for Sub-threshold Cells 119
C.1 NOR2 Gate Test bench Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
C.2 XNOR2 Gate Test bench Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 120
C.3 XOR2 Gate Test Bench Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 121
C.4 AOI22 Gate Test Bench Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 121
C.5 OAI22 Gate Test Bench Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 122
D Intermediate Results 123
D.1 Alternative Designs for all Cells . . . . . . . . . . . . . . . . . . . . . . . . . 123
D.2 Inverter Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
D.2.1 VTC Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . 125
D.2.2 Switching Analysis Results w/ and w/o Parasitics . . . . . . . . . . . . 126
D.3 NAND2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
D.3.1 VTC Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . 127
D.3.2 Switching Analysis Results w/ and w/o Parasitics . . . . . . . . . . . . 128
D.4 NOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
D.4.1 VTC Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . 129
D.4.2 Switching Analysis Results w/ and w/o Parasitics . . . . . . . . . . . . 130
D.5 XNOR2 Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
D.5.1 VTC Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . . 131
D.5.2 Switching Analysis Results w/ and w/o Parasitics . . . . . . . . . . . . 132
xii
List of Figures
2.1 Three basic bond lattice of a semiconductor: (a) Intrinsic with negligible impu-
rities; (b) n-type with donor (Arsenic); and (c) p-type with acceptor (Boron) [6]. 6
2.2 Cross section of MOS transistors: (a) nMOS transistor; (b) pMOS transistor [8]. 6
2.3 Left side (LS): An inverter gate, and right side (RS): Voltage transfer curve for
the inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 LS: Ideal VTC curve, and RS: non-ideal VTC curve for an inverter. . . . . . . . 9
2.5 LS: An inverter gate, and RS: Switching waveforms for the inverter. . . . . . . 10
2.6 LS: Positive edge-triggerend D-FF, and RS: Negative edge-triggered D-FF. . . 12
2.7 Master-slave configuration of D-latches. . . . . . . . . . . . . . . . . . . . . . 12
2.8 Setup-, hold time and propagation delay of D-FF. . . . . . . . . . . . . . . . . 13
2.9 The dynamic, short-circuit and leakage power component of CMOS power con-
sumption [13]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.10 The components of leakage power consumption [13]. . . . . . . . . . . . . . . 15
2.11 Arbitrary Id current versus Vgs (on a semilogarithmic scale) , showing the ex-
ponential characteristics in sub-threshold region marked as the weak region in
the figure. The other regions are pointed out as moderate region from VT to
approximately 100mV and strong region above. [12] . . . . . . . . . . . . . . 18
2.12 Microelectronic design styles [9]. . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Both plots for an existing above-threshold inverter at 25◦C and nominal corner. 28
3.2 Normalized nMOS and pMOS strength to the case with W=Wmin (L=Lmin)
versus W (L) with VDD = 350mV , 25◦C, nominal process and mismatch. . . . 29
3.3 Normalized pMOS strength to the case with L=160nm versus W, with Vdd=350mV,
25◦C and nominal process and mismatch. . . . . . . . . . . . . . . . . . . . . 30
3.4 Normalized threshold voltage versus L or W with other dimension minimized
and VDD = 350mV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Design methodology of cells and tools used in each step where N is chosen
number of alternative designs to further explore. . . . . . . . . . . . . . . . . . 32
3.6 INV: First coarse parametric sweep. . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 INV: Second coarse parametric sweep. . . . . . . . . . . . . . . . . . . . . . . 37
3.8 NAND2 truth- and transition table where green numbers are common starting
point for each arrow column and red numbers are ending points. . . . . . . . . 38
3.9 NOR2 truth- and transition table where green numbers are common starting
point for each arrow column and red numbers are ending points. . . . . . . . . 40
xiii
3.10 XNOR2 truth- and transition table where green numbers are common starting
point for each arrow column and red numbers are ending points. . . . . . . . . 42
3.11 XOR2 truth- and transition table where green numbers are common starting
point for each arrow column and red numbers are ending points. . . . . . . . . 42
3.12 AOI22 truth- and transition table where green numbers are common starting
point for each arrow column and red numbers are ending points. . . . . . . . . 45
3.13 OAI22 truth- and transition table where green numbers are common starting
point for each arrow column and red numbers are ending points. . . . . . . . . 46
3.14 Symbol (LS) and schematic (RS) of the designed Inverter gate. . . . . . . . . . 47
3.15 Symbol (LS) and schematic (RS) of the designed NAND2 gate. . . . . . . . . . 47
3.16 Symbol (LS) and schematic (RS) of the designed NOR2 gate. . . . . . . . . . . 47
3.17 Symbol (LS) and schematic (RS) of the designed XNOR2 gate. . . . . . . . . . 48
3.18 Symbol (LS) and schematic (RS) of the designed XOR2 gate. . . . . . . . . . . 48
3.19 Symbol (LS) and schematic (RS) of the designed AOI22 gate. . . . . . . . . . 48
3.20 Symbol (LS) and schematic (RS) of the designed OAI22 gate. . . . . . . . . . 49
3.21 Symbol (LS) and schematic (RS) of the designed D Flip-Flop PowerPC 603. . . 50
3.22 Symbol (LS) and schematic (RS) of the Clocked-Inverter Gate. . . . . . . . . . 52
3.23 Feedback F1 floating in precharge phase problem at -40◦C. . . . . . . . . . . . 53
3.24 Symbol (LS) and schematic (RS) of the Transmission Gate. . . . . . . . . . . . 53
3.25 ALU block schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.26 Block schematic of pipelined ALU after modifications. . . . . . . . . . . . . . 54
3.27 Number of synthesized cells (circles) and critical path delay (squares) versus
number of Fan-outs allowed. Delay is simulated at −40◦C and nominal corner. 56
3.28 Design hierarchy showing methods and tools used. . . . . . . . . . . . . . . . 58
4.1 Layout of the Inverter gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Layout of the NAND2 gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Layout of the NOR2 gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Layout of the XNOR2 gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Layout of the XOR2 gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Layout of the AOI22 gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Layout of the OAI22 gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.8 Layout of the D-FF memory element. . . . . . . . . . . . . . . . . . . . . . . 63
5.1 Test bench setup to simulate the ON transistor strength. . . . . . . . . . . . . . 66
5.2 Test bench setup to simulate both VTC and switching analysis of Inverter gate. 68
5.3 Test bench setup to simulate both VTC and switching analysis of NAND2 gate
with one input sourced to VDD and the other connected in chain. . . . . . . . . 69
5.4 Test bench setup to simulate both VTC and switching analysis of NAND2 gate
with both input connected in chain. . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5 The method to determine setup and hold time presented in [31]. . . . . . . . . 70
xiv
5.6 Check validity of master-latch propagation delay tD_P0 as the setup time simu-
lation method against iteratively narrowing data input transition towards clock
edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Simulation of propagation delay through master-latch setup. . . . . . . . . . . 71
5.8 Timing diagram of clock-to-output tco simulation. . . . . . . . . . . . . . . . . 72
5.9 ALU test bench in Cadence Virtuoso schematic editor. . . . . . . . . . . . . . 73
6.1 Monte Carlo Inverter layout, VDD = 350mV: Midpoint percentage with process
and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2 Monte Carlo Inverter layout, VDD = 350mV: Delay with process and mismatch. 78
6.3 Monte Carlo NAND2 layout, VDD = 350mV: Midpoint percentage with process
and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Monte Carlo NAND2 layout, VDD = 350mV: Delay with process and mismatch. 80
6.5 Monte Carlo NOR2 layout, VDD = 350mV: Midpoint percentage with process
and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.6 Monte Carlo NOR2 layout, VDD = 350mV: Delay with process and mismatch. . 83
6.7 Monte Carlo XNOR2 layout, VDD = 350mV: Midpoint percentage with process
and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.8 Monte Carlo XNOR2 layout, VDD = 350mV: Delay with process and mismatch. 85
6.9 Monte Carlo XOR2 layout, VDD = 350mV: Midpoint percentage with process
and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.10 Monte Carlo XOR2 layout, VDD = 350mV: Delay with process and mismatch. . 88
6.11 Monte Carlo D-FF layout, VDD = 350mV: Clock-to-output propagation delay
with process and mismatch. Left side: rising tco, right side: falling tco. . . . . . 92
6.12 Monte Carlo D-FF layout, VDD = 350mV: Setup time with process and mis-
match. Left side: rising tsu, right side: falling tsu. . . . . . . . . . . . . . . . . 92
6.13 ALU No.1: Corner sim. results of critical path delay in semilog plot and −40◦C. 94
6.14 ALU No.1: Power consumption in the components of total, dynamic and static
with 32KHz, 25◦C and TT corner. . . . . . . . . . . . . . . . . . . . . . . . . 94
6.15 ALU No.2: Corner sim. results of critical path delay in semilog plot and −40◦C. 95
6.16 ALU No.2: Power consumption comparison between use of sub-threshold and
above-threshold cells in terms of total, dynamic and static with 32KHz, 25◦C
and TT corner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.17 ALU No.3: Corner sim. results of critical path delay in semilog plot and −40◦C. 98
6.18 ALU No.3: Power consumption comparison between use of sub-threshold and
above-threshold cells in terms of total, dynamic and static with 32KHz, 25◦C
and TT corner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.19 Comparison between ALU design No.1, No.2 and No.3 in power consumption
with 32KHz, 25◦C and TT corner. . . . . . . . . . . . . . . . . . . . . . . . . 99
xv
C.1 Test bench setup to simulate both VTC and switching analysis of NOR2 gate
with one input sinked to GND and the other connected in chain. . . . . . . . . 119
C.2 Test bench setup to simulate both VTC and switching analysis of NOR2 gate
with both input connected in chain. . . . . . . . . . . . . . . . . . . . . . . . . 120
C.3 Test bench setup to simulate both VTC and switching analysis of XNOR2 gate
with one input sinked to GND and the other connected in chain. . . . . . . . . 120
C.4 Test bench setup to simulate both VTC and switching analysis of XOR2 gate
with one input sourced to VDD and the other connected in chain. . . . . . . . . 121
C.5 Test bench setup of VTC and switching analysis of AOI22 gate with two input
sourced to VDD, one input sinked to GND and the other connected in chain. . . 121
C.6 Test bench setup of VTC and switching analysis of OAI22 gate with one input
sourced to VDD, two input sinked to GND and the other connected in chain. . . 122
D.1 Monte Carlo INVERTER schematic and layout: mean midpoint percentage and
std. dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . 125
D.2 Monte Carlo INVERTER schematic and layout: propagation mean delay and
std. dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . 126
D.3 Monte Carlo NAND schematic and layout: mean midpoint percentage and std.
dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . 127
D.4 Monte Carlo NAND schematic and layout: propagation mean delay and std.
dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . 128
D.5 Monte Carlo NOR schematic and layout: mean midpoint percentage and std.
dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . 129
D.6 Monte Carlo NOR schematic and layout: propagation mean delay and std.
dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . 130
D.7 Monte Carlo XNOR schematic and layout: mean midpoint percentage and std.
dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . 131
D.8 Monte Carlo XNOR schematic and layout: propagation mean delay and std.
dev. with process and mismatch. . . . . . . . . . . . . . . . . . . . . . . . . . 132
xvi
List of Tables
2.1 Tradeoffs between design styles [9]. . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 General sizing strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Approximately threshold voltages for the 130nm HVT technology at VDD =
350mV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Inverter: Design sizes chosen for further investigation. . . . . . . . . . . . . . 37
3.4 NAND2: Design sizes chosen for further investigation. . . . . . . . . . . . . . 39
3.5 NOR2: Design sizes chosen for further investigation. . . . . . . . . . . . . . . 41
3.6 XNOR2: Design sizes chosen for further investigation. . . . . . . . . . . . . . 43
3.7 ALU operations [26]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.8 Logic synthesis results for the ALU cases: (1): no restrictions of logic gates and
FO; (2): restricted to FO3 and INV, NAND2, NOR2 and D FF; (3): restricted as
No.2 + XNOR2, XOR2, AOI22 and OAI22. Synthesis is based on 1.2V, 25◦C,
nominal with Atmel‘s above-threshold cell library. . . . . . . . . . . . . . . . 57
5.1 DC analysis simulations and expressions where "/Y" is the VTC curve. VS():
nodal voltage (DC sweep), VAR(): variable. . . . . . . . . . . . . . . . . . . . 67
5.2 Transient analysis simulations and expressions where "/A" is the input and "/Y"
is the output of the gate simulated. VT(): nodal voltage (transient analysis),
VAR(): variable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Power simulations and expressions. IT(): terminal current (transient analysis),
VAR(): variable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.1 Chosen gate design dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Monte Carlo DC results for the inverter gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Monte Carlo AC results for the inverter gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.4 Monte Carlo DC results for the NAND2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.5 Monte Carlo AC results for the NAND2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.6 Monte Carlo DC results for the NOR2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
xvii
6.7 Monte Carlo AC results for the NOR2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.8 Monte Carlo DC results for the XNOR2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.9 Monte Carlo AC results for the XNOR2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.10 Monte Carlo DC results for the XOR2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.11 Monte Carlo AC results for the XOR2 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.12 Monte Carlo DC results for the AOI22 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.13 Monte Carlo AC results for the AOI22 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.14 Monte Carlo DC results for the OAI22 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.15 Monte Carlo AC results for the OAI22 gate with process and mismatch and
extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.16 Monte Carlo AC results for the PowerPC 603 memory element with process
and mismatch and extracted parasitics. . . . . . . . . . . . . . . . . . . . . . . 91
6.17 ALU No.1: Corner functionality results with VDD = 350mV and 400mV
where faulty=7 and pass=3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.18 ALU No.2: Corner functionality results with VDD = 350mV and 400mV
where faulty=7 and pass=3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.19 Estimated Ptotal for the ALU FO3 with VDD = 350mV , nominal corner and
25◦C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.20 ALU No.3: Corner functionality results with VDD = 350mV and 400mV
where faulty=7 and pass=3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.21 Summarized power results from ALU simulation in 32KHz, 25◦C and TT corner.100
D.1 Inverter: Design sizes chosen for further investigation. . . . . . . . . . . . . . 123
D.2 NAND2: Design sizes chosen for further investigation. . . . . . . . . . . . . . 123
D.3 NOR2: Design sizes chosen for further investigation. . . . . . . . . . . . . . . 123
D.4 XNOR2: Design sizes chosen for further investigation. . . . . . . . . . . . . . 124
xviii
List of Acronyms
ALU Arithmetic Logic Unit
ASIC Application-Specific Integrated Circuit
CAD Computer-Aided Design
CMOS Complementary Metal–Oxide–Semiconductor
DC Direct Current
D-FF D Flip-Flop
DIBL Drain Induced Barrier Lowering
DRC Design Rule Checking
EDP Energy-Delay Product
FO Fan-Out
FPGA Field-Programmable Gate Array
FSM Finite State Machines
GIDL Gate-Induced Drain Leakage
GND Ground
HDL Hardware Description Language
HVT High VT
IC Integrated Circuits
LS Left Side
LVT Low VT
MCU MicroController Unit
MOSFET Metal–Oxide–Semiconductor Field-Effect Transistor
MOP Multiobjective Optimization Problem
nMOS n-type MOSFET
PDN Pull-Down Network
PDP Power-Delay Product
xix
pMOS p-type MOSFET
PUN Pull-Up Network
PVT Process-Voltage-Temperature
RDF Random Doping Fluctuations
RNCE Reverse Narrow Channel Effect
RSCE Reverse Short Channel Effect
RS Right Side
RTC Real-Time Clock
RVT Regular VT
SOS Silicon On Sapphire
SRAM Static Random Access Memory
Tcl Tool Commando Language
TG Transmission Gate
ULP Ultra-Low-Power
ULV Ultra-Low-Voltage
VDD Voltage Drain Drain (modern used as positive supply voltage)
VHDL VHSIC Hardware Description Language
VSS Voltage Source Source (modern used as ground or negative supply voltage)
VTC Voltage Transfer Characteristic
xx
Chapter 1
Introduction
This Master thesis is written as a completion of a Master degree study in Electronics with the
study path: Circuit and System Design and main profile: Design of Digital Systems in the
Department of Electronics and Telecommunications at NTNU. The introduction chapter will
give the reader a brief description of the background and the motivation to solve an assignment
within the topic. The chapter also presents previous work, a problem formulation, objectives,
main contributions, and finally the structure of the report.
1.1 Background and Motivation
The main objective of this master thesis is to design a sub-threshold cell library in 130nm
such that the power consumption of various modules in e.g. MCUs is reduced to a minimum
while non-critical performance modes are used. Nowadays, more and more battery-powered
electronic devices are developed and sold. At the same time the desire for longer lasting battery-
powered applications are at increasing interest. Until everlasting energy comes along, we have
to reduce the energy consumption in the device to achieve longer battery runtime. Less energy
consumption leads to as mentioned longer lasting batteries; however it could also result in
smaller batteries and maybe battery-less applications with use of energy harvesting. These
benefits lead to cheaper application systems due to smaller batteries, and/or cheaper manpower
costs because of less or no need for battery exchange. In extreme cases, some application
systems could be placed in unreachable environments such as in space, in concrete construction
and in oil drilling heads causing battery charging or battery replacement a difficult or impossible
task. Often nowadays electronic systems are designed and produced to perform a specific task(s)
within small Integrated Circuits (IC) chips. Hence, the power consumption of these chips should
be improved to reach the goals previously mentioned.
The power consumption in static CMOS technology consists of three contributions: switching,
short-circuit and leakage power consumption. The power consumption due to switching activ-
ity is depended on the square of the supply voltage [1]. Hence, reducing the supply voltage to
1
2 1.1. BACKGROUND AND MOTIVATION
the sub-threshold region is a promising and motivating method to reduce power consumption.
The reducing of VDD increases the delay through the circuit, nevertheless an excellent trade-off
in application with low performance requirements. The reduction also increases sensitivity to
process, voltage and temperature (PVT) variations. Commercial cell libraries are inappropri-
ate for use in sub-/near-threshold operation since they are designed to operate at full supply
voltage and thus not optimized resource efficiency for this purpose. Therefore, specialized sub-
threshold cells should be designed to operate at weak inversion [2].
As early as late 1960s, people saw that the principle of a threshold of transistors not after all was
a threshold and that drain current continues to flow while lowering the gate voltage underneath
the threshold voltage. Eric Vittoz then began researching how to exploit this current in analog
circuits and ten years later the Vittoz and Fellrath paper was released [3]. This paper describes
the characteristics and models for devices operating in weak inversion. Since Eric and his fellow
colleague was working at that time at CEH (Centre Electronique Horloger, a research center of
the Swiss watch industry), the first wristwatch containing MOS exploiting the weak inversion
region was released on the marked as early as in 1975 [4].
"Their killer application was the electronic watch, which stands to reason, given that they were
working at CEH, the Centre Electronique Horloger (the research arm of the Swiss watch indus-
try) in Switzerland. The first wristwatch containing weak inversion MOS circuits appeared on
the market in 1975." Y. Tsividis [4].
One of the popular methods to create ASICs today is to design circuits by describing the ar-
chitecture and functionality with use of Hardware Description Language (HDL). Then by use
of synthesis tools and existing building blocks in a cell library, synthesize the HDL code into
IC. The cell library is often designed and well-tested for above-threshold voltages (e.g. 1.2V or
3.3V) and provided by companies that may also provide fabrication for the process technology
that the library is based on. Although operating at sub-threshold voltage have been known since
the idea was first presented by Eric Vittoz and Jean Fellrath in 1977 [3], Benton H. Calhoun and
David Brooks says in their paper published in 2010 that no commercial applications have yet
adopted this approach [5].
"Many research teams have demonstrated the ability to operate digital complementary metal-
oxide semiconductor (CMOS) chips in the subthreshold or near-threshold region in recent
years, but no commercial applications have yet adopted this approach." B.H. Calhoun and
D. Brooks [5].
Previous Work
Eric Vittoz and Jean Fellrath described the sub-threshold principle in a paper in 1977 as previ-
ously mentioned. As a result of their research the Swiss watch industry used the principle in a
watch in 1975.
Chapter 1 Introduction 3
Nevertheless, very few commercial products today exploit the concept of operating circuits in
sub-threshold voltage. By searching the Internet for commercial sub-threshold use, the com-
pany Ambiq Micro came up as a result. They claim to have developed an advanced CMOS
semiconductor platform that they calls "SPOT" (Sub-threshold Power Optimized Technology).
With this technology, they have developed a Real-Time Clock (RTC) product, and as today they
claim to further release an MCU product with the technology very soon.1 Another company
named Iridium Technologies also seems to have ongoing research and develop sub-threshold
logic, but with the addition of being radiation-hardened.2 This could mean that they and maybe
a few others have and are developing sub-threshold circuits. Still, it is good reason to believe
that it may not been developed well-tested sub-threshold cell library to be used by synthesis
tools.
There exist many scientific papers that present elements and circuits that utilize the concept of
sub-threshold voltage. Unfortunately, they are often not transparently transported to commercial
production as due to higher demand of yield and robustness in large-scale production line.
Problem Formulation
How should a logic cell library be designed to be used in sub-threshold voltage region so that
it will function and give high production yield in spite of Process-Voltage-Temperature (PVT)
variations?
1.2 Objectives
The main objectives for this master thesis are:
1. Design the main basic logical cells in a cell library including: INVERTER, NAND2,
NOR2 and D Flip-Flop (D-FF), and further: XNOR2, XOR2, AOI22 and OAI22 be-
cause they might reduce the number of cells needed and thus reduce die area and power
consumption.
2. Characterize the logic cells and perform Monte Carlo analysis to determine robustness
against PVT variations.
3. Test the designed sub-threshold cell library in a synthesized 8-bit ALU circuit and analyze
performance.
4. Estimate winnings of transforming the design into sub-threshold voltage cells.
1More information about Ambiq Micro at www.ambiqmicro.com.
2More information about Iridium Technologies at www.iridiumtec.com
4 1.3. MAIN CONTRIBUTIONS
1.3 Main Contributions
1. Designed a minimum drive strength sub-threshold cell library including cells needed for
a fully functioning library, i.e.: Inverter, NAND2, NOR2 and D-FF aimed to operate at
a supply voltage 350mV in 25◦C (and probably lower) with the ability of scaling to an
above-threshold voltage of 1.2V.
2. High Threshold Voltage (HVT) transistor type is used to reduce leakage to the minimum.
3. Static CMOS technology without body biasing is used to reduce complexity and cost of
production.
4. Multiple more complex gates is designed: XNOR2; XOR2; AOI22 and OAI22.
5. Layout for each of the cells in the library is designed.
1.4 Structure of the Report
The rest of this report is structured as follows.
Chapter 2 presents the necessary theory needed to understand the work of this thesis. The first
part presents theory about Semiconductor Technology which is fundamentally important for
understanding this thesis. Important theory for designing logic gates is presented in Electronic
Analysis of CMOS Logic Gates. Then, theory about MOSFETs in the sub-threshold region is
described, which is required to designing logic gates aiming at sub-threshold supply voltage.
Finally, a brief review of microelectronic design styles is presented, and gives the reader a better
understanding of what a cell library is.
Chapter 3 presents into detail the design methodology of all the sub-threshold cell elements
designed and proposed in this thesis. Then, the application of the sub-threshold library in an
ALU module versus the use of a larger above-threshold library is presented.
Chapter 4 presents layout of all the cells.
Chapter 5 describes how simulation, verification and testing have been done in order to give
the results.
Chapter 6 presents the final simulation results gained from this project, and especially results
regarding the sub-threshold cell performances and results yielded by use of the library in an
ALU circuit module.
Chapter 7 includes analysis and discussion of the results gained in this thesis.
Chapter 8 concludes the thesis, and a suggestion for further work is presented.
Chapter 2
Theoretical Background
2.1 Semiconductor Technologies
Microelectronic circuits are designed to utilize the properties of semiconductor materials. The
most important and most extensively studied types of semiconductor materials are silicon, ger-
manium and recent years gallium-arsenide [6]. A semiconductor is a three-dimensional crystal
lattice structured material that can have free electrons and/or free holes. Free electrons and free
holes are often referred as negative carriers and positive carriers respectively. Semiconductor
silicon is found in high concentration in sand and is a material with a valence of four. This im-
plies that each silicon atom has four electrons in its outer shell to share with neighboring atoms
and thus forms covalent bonds with four adjacent atoms as shown in figure 2.1(a). Intrinsic
silicon (undoped silicon) is a very pure crystal lattice and has equal number of free electrons
and holes. These free electrons have gained enough energy from thermal exposure to escape
their bonds. The escaped carrier electrons results in free holes [7].
When silicon is doped, a pentavalent impurity (i.e. atoms having five electrons in the outer shell
and a valence of five) are combined with the pure silicon lattice forming an unpure silicon lattice
as shown in figure 2.1(b). For each impurity atom there will be almost one extra free electron
which can be used to conduct current. The pentavalent impurity donates free electrons to the
silicon crystal and is known as an n-type dopant. Often used n-type dopants are phosphorus (P)
and arsenic (As). Also, one can dope silicon with atoms with valence of three and Boron (B) is
one such atom often used. Boron is called a p-type acceptor because it can borrow an electron
from a neighboring silicon atom, which then becomes short by one electron. This missing
electron forms a hole and can propagate about the lattice and act as a positive carrier. Doping
silicon with this acceptor will give almost one extra hole for each Boron atom [7]. Figure 2.1(c)
depicts a silicon lattice doped with a p-type dopant Boron.
A Metal-Oxide-Semiconductor (MOS) is created by stacking several layers of conducting and
insulating materials. To exploit the properties of semiconductor materials the circuits are con-
structed by first patterning a substrate and locally modify its properties by introduction of
dopants and then by shaping layers of interconnecting wires. This fabrication process is of-
5
6 2.2. ELECTRONIC ANALYSIS OF CMOS LOGIC GATES
Figure 2.1: Three basic bond lattice of a semiconductor: (a) Intrinsic with negligible impurities;
(b) n-type with donor (Arsenic); and (c) p-type with acceptor (Boron) [6].
ten very complex and involves series of chemical processes. The fabrication can be classified
by terms of the type of semiconductor used and in terms of the electronic device types being
constructed. The most often used circuit technology families for silicon substrate are Com-
plementary Metal Oxide Semiconductor (CMOS), Bipolar and a combination of the two named
BiCMOS. Within a family there may also be different technologies. The CMOS family provides
two types of transistors: an n-type (nMOS) and a p-type (pMOS) as depicted in figure 2.2(a)
and (b) respectively. These can be created by use of e.g.: single-well (P or N), twin-well and
silicon on sapphire (SOS) technology [8] [9].
Figure 2.2: Cross section of MOS transistors: (a) nMOS transistor; (b) pMOS transistor [8].
2.2 Electronic Analysis of CMOS Logic Gates
In this section, theory for electrical analysis and characterization of logic gates are presented.
Two types of characteristics are needed to characterize logic gates: DC characteristics (DC
analysis) and switching characteristics (transient analysis). The characteristics for an inverter
are only presented since the other gates are similar though multiple VTC curves may arise due
to multiple logic states.
Chapter 2 Theoretical Background 7
2.2.1 DC Characteristics of CMOS Inverter
The inverter gate is one of the simplest existing logic gates. Thus, the inverter provides as a
basis for electrical characteristics of logic gates. DC analysis determines output voltage Vout
for a given input voltage Vin. It is assumed that Vin is changed so slowly that Vout is allowed to
stabilize before sampling is done. A DC analysis yields a 2D plot that shows Vout as function
of Vin, and an example is shown in figure 2.3. The plot is often called Voltage Transfer Char-
acteristic (VTC) curve. In VTC the voltage Vin is varied from 0 V to VDD which in turn gives
the output voltage Vout [10].
Figure 2.3: Left side (LS): An inverter gate, and right side (RS): Voltage transfer curve for the
inverter.
An inverter gate consist of two MOS transistors (complementary pair) connected in a network
for switching the output voltage Vout between VDD and ground (gnd). The pMOS (Mp) is
connected to VDD and nMOS (Mn) is connected to gnd because of their distinctly good ability
to lead high and low voltages respectively. Both transistor gates are connected to the input Vin
node. If Vin = 0V then Mn is OFF while Mp is ON connecting the output to the power supply
and gives Vout = VDD as shown in the upper-left region of figure 2.3. Output high voltage
(VOH) is then defined as [10]
VOH = VDD (2.1)
On the other hand, if Vin = VDD, then Mp is OFF while Mn is ON connecting the output to 0
volt (gnd). Output low voltage (VOL) is then defined as [10]
VOL = 0 V (2.2)
8 2.2. ELECTRONIC ANALYSIS OF CMOS LOGIC GATES
The output is called a full-rail output if the logic swing at the output is [10]
VL = VOH − VOL
= VDD
(2.3)
The Logic Voltage Ranges
There are voltage ranges where logic 0 and logic 1 is defined. These are defined by the changing
slope of the VTC curve. A logic 0 input is defined to be the range from 0 V to the point where
the VTC curve has a slope of -1 at point ’a’ in figure 2.3. The point ’a’ is defines as input low
voltage VIL and the logic 0 input range is defined as [10]
0 ≤ Vin ≤ VIL (2.4)
The logic 1 input is defined to be the range from the point where the second VTC has a slope
of -1 at point ’b’ to VDD value as shown at lower-right region in figure 2.3. The point ’b’
where the slope is -1 is defined as input high voltage VIH giving the logic 1 input range defined
as [10]
VIH ≤ Vin ≤ VDD (2.5)
Noise Margins
The noise margins of a logic gate are a measure of how stable the gate is at both logical 1 and
0 against electromagnetic signal interference. The noise margins for high logic levels and low
logic levels are given as [10] [11] [12]
V NMH = VOH − VIH
V NML = VIL − VOL
(2.6)
Regenerative Property
The effect of different noise sources may accumulate and eventually force a signal level into
the undefined region. This, fortunately, does not happen if the gate possesses the regenerative
property. [12]
Chapter 2 Theoretical Background 9
The VTC Midpoint
The VTC midpoint is where the input voltage is equal to the output voltage in the VTC curve,
or equally where the VTC curve intersects a line given by Vout = Vin. This point is also referred
as the midpoint voltage VM and is depicted in figure 2.3. [12].
The ideal VTC of an Inverter
The ideal DC characteristics of an inverter are important as it gives a reference to judge the
quality of non-ideal designed inverters. The ideal inverter DC characteristic is when the VTC
curve has the following properties [12]:
• Infinite gain in the transient region (g = −∞).
• The midpoint located in the middle of the logic swing (typically VDD/2).
• With both high and low noise margins equal to half of the logic swing.
• The input and output impedances are infinity and zero respectively.
Figure 2.4: LS: Ideal VTC curve, and RS: non-ideal VTC curve for an inverter.
To the left side of figure 2.4 an ideal VTC curve for an inverter is shown. Infinite gain is
demonstrated as the curve is completely vertical at midpoint, and the midpoint is also located
at VDD/2. Point ’a’ and ’b’, where the slope is -1, is located at midpoint, and the value of
voltage output high (VOH) and output low (VOL) is VDD and zero volt respectively yielding a
noise margins of half the logic swing.
On the other hand, the non-ideal VTC is presented to the right side in figure 2.4. Although this
VTC curve is non-ideal, it could be sufficient if there is minimal noise present (because of low
noise margins).
10 2.2. ELECTRONIC ANALYSIS OF CMOS LOGIC GATES
When designing a logic gate, the VTC curve should be as close to the ideal case. However, the
ideal VTC curve is in maybe all cases impossible to achieve in real designs.
2.2.2 Switching Characteristics of CMOS Inverter
CMOS circuits are often designed to perform calculations as fast as possible. This leads to the
requirement of having as low delay through each logic gate in the digital circuit as possible
for a given application. The second and important characteristics of logic gates is therefore the
switching characteristics or may be known as transient analysis. As stated for the DC analysis,
the inverter gate provides as a basis for electrical characteristics and the inverter is thus further
used to describe the switching characteristics of logic gates.
The difference between DC analysis and transient analysis is that in DC analysis the input
voltage Vin is changed so slowly that Vout is allowed to stabilize before sampling, while in
transient analysis the input voltage Vin is changing faster in time and thus both Vin(t) and
Vout(t) are functions of time t. Figure 2.5 depict waveforms of the general case when an unit-
step function is applied to the input of an inverter gate and how the output voltage responses to
this abrupt change in voltage at timestamp t1 and t2. The output reacts to the unit-step input,
however the output cannot change as abrupt as the unit-step and introduces fall time (tf ) and
rise time (tr). This is due to parasitic resistance and capacitances of the transistors [10].
Figure 2.5: LS: An inverter gate, and RS: Switching waveforms for the inverter.
The Fall Time
Both rise (tr) and fall time (tf ) are traditionally defined to be the time interval from V0 =
0.1 · VDD to V1 = 0.9 · VDD and V0 = 0.9 · VDD to V1 = 0.1 · VDD respectively. This is also
known as the 10% and 90% voltages as referenced to full rail voltage swing of VDD.
Chapter 2 Theoretical Background 11
As mentioned earlier, the rise and fall time is due to parasitic resistance and capacitances in the
transistors and external capacitive loads. Both FETs in the gate can be replaced with switch
equivalents and results in simplified RC models of the FETs. For the falling case, the output
voltage is falling from VDD to 0 V and discharges the output capacitance through the Pull-Down
Network (PDN) of the inverter gate, which is also the resistanceRn of the nMOS transistor. The
discharging current leaving the capacitor is the differential equation [10]
i = −Cout · dVout
dt
= Vout
Rn
(2.7)
and solving with the initial output voltage Vout = VDD and time constant τn = RnCout gives the
well-known output voltage form
Vout(t) = VDDe−t/τn (2.8)
Now if this solution is rearranged to solve the time t, and only the time interval between when
the output voltage is 90% and 10% of VDD the solution for the fall time is
tf = τn ln
VDD
0.1VDD
− τn ln VDD0.9VDD
= τn · ln(9)
= 2.2 · τn
(2.9)
The Rise Time
In the case when the output voltage is rising from 0 V to VDD, the output capacitance is charging
through the Pull-Up Network (PUN) of the inverter gate and the resistance is now denoted as
Rp of the pMOS transistor. The charge current flowing to the output capacitor is then given by
[10]
i = Cout · dVout
dt
= VDD − Vout
Rp
(2.10)
and solving with the initial output voltage Vout = 0V and time constant τp = RpCout gives the
other well-known output voltage form
Vout(t) = VDD[1− e−t/τp ] (2.11)
Again if this solution is rearranged to solve the time t, and only the time interval between when
the output voltage is 10% and 90% of VDD the solution for the rise time is
tr = τp · ln(9)
= 2.2 · τp
(2.12)
12 2.3. THE D FLIP-FLOP MEMORY ELEMENT
Propagation Delay
The propagation delay through a cell or even a chain of logic elements are the mean of High-to-
Low tpHL and Low-to-High tpLH propagation delays. That is Delay = (tpHL + tpLH)/2. The
High-to-Low and opposite is referred to a propagation that leads to a transition on the output of
a chain or cell. The propagation delay is defined as the time from the input signal crosses 50%
to the output crosses 50%, both percentages referred to VDD [8].
2.3 The D Flip-Flop Memory Element
D Flip-Flop (D-FF) is a storage element that holds a bit value such as a latch, but with the
difference of being non-transparent. The non-transparency is explained as the D-FF is only
changing output value to the input value on rising or falling edge of a clock signal, while a latch
is transparent from input to output as long as the enable signal is set. Standard symbols for both
positive- and negative edge-triggered D-FF is shown in figure 2.6.
Figure 2.6: LS: Positive edge-triggerend D-FF, and RS: Negative edge-triggered D-FF.
The D-FF element is basically designed on the principle of cascading two D-latches in a master-
slave configuration where each latch is oppositely clock-phased. Figure 2.7 depicts the principle
of cascading master-slave D-latches. The functionality is such that when the clock signalClk =
0 the master D-latch propagates the input signal to the slave D-latch. When the clock signal
makes a transition from Clk = 0→ 1, then the input value of the slave latch propagates to the
output while the master latch is blocking while Clk = 1 [10].
Figure 2.7: Master-slave configuration of D-latches.
Chapter 2 Theoretical Background 13
2.3.1 D FF Timing and Delay
All timing simulations described next are presented in the timing diagram of figure 2.8.
The Setup Time
The input data must be valid and stable a certain time before the clock signal makes an edge-
triggered transition. This minimum time is because the input data D has to propagate through
the master latch and be applied to the input of the slave before a clock edge-triggered transition
can be applied. Hence, the setup time tsu is the minimum time that the input D has to be stable
before the clock Clk makes the edge-triggered transition [11].
The Hold Time
Assuming that the input data D has been stable for tsu complying with the minimum setup
time described above, the input data also needs to be stable for a minimum time after the clock
transition. This minimum required time is called the hold time th [11].
The Propagation Delay
The propagation delay tco is the time from a valid clock edge transition to an input value is
propagated to the output Q, also known as the clock-to-output time.
Figure 2.8: Setup-, hold time and propagation delay of D-FF.
14 2.4. CMOS POWER CONSUMPTION
2.4 CMOS Power Consumption
In CMOS transistor circuits, the average power consumption is equal to [1]:
Pavg = Pswitching + Pshort−circuit + Pleakage
= α · fclk · CL · V 2DD + Isc · VDD + Ileakage · VDD
(2.13)
where Pswitching is due to the power consumption of switching activity. The second term,
Pshort−circuit is due to direct-path short circuit current Isc which arises when both NMOS and
PMOS are active in transition of Vgs from high->low or low->high. The last term, Pleakage is the
power due to leakage currents arising from substrate injection and sub-threshold effects.
Figure 2.9: The dynamic, short-circuit and leakage power component of CMOS power con-
sumption [13].
2.4.1 Dynamic Power Consumption
Power consumption due to switching activity is also referred to dynamic power consumption. It
arises in CMOS circuits when a capacitive load CL is charged or discharged in transitions from
low-to-high or high-to-low respectively. When a transition from low-to-high occurs, energy is
drawn from VDD through PMOS charging the CL. This energy is equal to CLV 2DD where half is
dissipated by the PMOS and half is stored in CL. However, in the other case when a transition
from high-to-low the energy held by CL is dissipated by a short circuit through the NMOS to
ground and this energy is 12CLV
2
DD. Not to be confused, the total dynamic power in a transition
pair is only equal to CLV 2DD. If the occurrence of transitions happens at a rate of fclk, the
average dynamic power consumption is equal to fclkCLV 2DD. However, in most cases this is not
true. The rate of transitions is often reduced compared to fclk and is related to the probability
of a transition occurring in a given circuit. A number with range [0-1] named α is defined to be
Chapter 2 Theoretical Background 15
the average number of transitions from low->high occurring in each clock cycle of fclk. Thus,
the total average dynamic power consumption is as shown in second line of equation (2.13)
and (2.14) [13].
Pswitching = α · fclk · CL · V 2DD (2.14)
2.4.2 Short-Circuit Power Consumption
The short-circuit contribution of power consumption is independent on rise and fall time of
output nodes in a logic circuit. If there were infinitely short rise and fall times in transitions de-
scribed in the former section, the short-circuit contribution would been equal to zero. However,
a finite rise and fall time in transitions gives rise to a short-circuit path from VDD to GND which
is dependent of and increases with rise and fall times. However, it will only be a short-circuit
path if Vtn < Vin < VDD − |Vtp| making both nMOS and pMOS ON, where Vtn and Vtp is the
threshold voltage for NMOS and PMOS respectively [14].
2.4.3 Leakage Power Consumption
Even when a transistor is in a stable logic state, it continues to consume power due to unde-
sirable leakage. The leakage component of power consumption is due to these leakage con-
tributions: sub-threshold leakage; reverse biased diode leakage; Gate-Induced Drain Leakage
(GIDL); and gate oxide tunneling leakage [13]. The leakage component is often referred as the
static power consumption.
Figure 2.10: The components of leakage power consumption [13].
16 2.4. CMOS POWER CONSUMPTION
2.4.4 Techniques to Reduce Power Consumption
Dynamic Power Reduction
As shown in equation (2.14), the dynamic power component is proportional to the square of
supply voltage VDD. Thus, reducing the supply voltage significantly reduces power consump-
tion. E.g. a reduction of VDD to the half gives a reduction in dynamic power to a quarter (1/4)
of originally power consumption. Another term in the equation easily adjustable is the clock
frequency fclk. A reduction in frequency reduces power consumption proportionally. However,
reducing frequency would give a throughput performance penalty which means that a duty-
cycled (computes and sleeps) application may have to be awake for a longer time, increasing
power consumption.
Short-circuit Power Reduction
Short-circuit component of power consumption is only present if the condition Vtn < Vin <
VDD − |Vtp| holds. However, if VDD supply is lowered to below the sum of thresholds with
VDD < Vtn + |Vtp| condition met, then the short-circuit power dissipation eliminates as because
both transistors will not be ON simultaneously [13] [14].
Leakage Power Reduction
To reduce the leakage power consumption, different methods are discussed in [13]. Among
these are without going into details:
• Multiple supply voltage.
• Multiple threshold voltage (HVT type is used whenever speed is not critical).
• Adaptive body biasing (Effective, but fabrication is complex due to need of twin or triple
well technology).
• Transistor stacking (Design methodology).
• Power gating (Particularly useful for duty-cycled applications where blocks on chip can
be turned off while in "sleep").
Chapter 2 Theoretical Background 17
2.5 MOSFETs in the Sub-Threshold Region
2.5.1 Operation of MOS Transistor in Sub-threshold Region
Reducing the supply voltage VDD is shown to be an effective method to reduce the power
consumption (see section 2.4 for more details). The supply voltage can be scaled down as far as
below the threshold voltage of the transistors VT (as a common name for Vtn and Vtp), and into
the sub-threshold region. In sub-threshold region, the gate-source voltage Vgs is smaller than VT
which gives a negative Veff by the equation Veff = Vgs − VT and the transistor is thus in weak
inversion. The transistor is on the other hand said to be in strong inversion if Vgs is greater than
about 100mV, and the region in between weak and strong is called moderate inversion [7].
In sub-threshold region, the square-law equation that relates the current to the voltage of a tran-
sistor is no longer valid. Instead the transistors obey an exponential voltage-current relationship.
In sub-threshold region, the drain current is dominated by the sub-threshold contribution IST
over gate current IG and junction current IJ , and is approximately [15] [16]
Id(sub−th) ≈ I0W
L
e(Vgs−VT )/n·Vth
(
1− e−Vds/Vth
)
(2.15)
where I0 is the technology dependent drain current extrapolated for Vgs = VT , VT is the tran-
sistor threshold voltage, n is the sub-threshold slope factor (n = a + Cd/Cox), and Vth is the
thermal voltage Vth = kT/q. The term within the parenthesis to the right of equation 2.15 is
the roll-off current which occurs when Vds is lower than a few times the thermal voltage Vth.
Figure 2.11 illustrates the relationship between the drain current Id and control voltage Vgs for
an arbitrary nMOS transistor with VT ≈ 0.5V . The inversion regions are marked as weak, mod-
erate and strong, and it is shown in the weak region that the drain current varies exponentially
with Vgs.
18 2.5. MOSFETS IN THE SUB-THRESHOLD REGION
Figure 2.11: Arbitrary Id current versus Vgs (on a semilogarithmic scale) , showing the ex-
ponential characteristics in sub-threshold region marked as the weak region in the figure. The
other regions are pointed out as moderate region from VT to approximately 100mV and strong
region above. [12]
2.5.2 The Threshold Voltage
The threshold voltage VT of a MOSFET transistor in sub-threshold region depends on the drain-
source voltage Vds through the drain induced barrier lowering (DIBL) effect and the bulk-source
voltage Vbs through the body effect. Thereby, the threshold voltage is expressed as [16]
VT = VTH0 − λdsVds − λbsVbs (2.16)
where VTH0 is the intrinsic threshold voltage when Vds = Vbs = 0, λds > 0 is the DIBL
coefficient and λbs > 0 is the body effect coefficient. By combining equations 2.15 and 2.16,
it is shown by [16] how the Id current is more dependent on Vds and Vgs through the threshold
voltage
Id(sub−th) = β · eVgs/n·Vth ·
[
eλdsVds/n·Vth
(
1− e−Vds/Vth
)]
, (2.17)
β = I0
W
L
e−(VTH0−λbsVbs)/n·Vth (2.18)
where all other terms are grouped in the parameter β which represents the "transistor strength".
The transistor strength can be tuned by modifying the aspect ratio (W/L) or by the bulk-source
voltage Vbs. However, an increase in W leads to an increase of the threshold voltage due to
Chapter 2 Theoretical Background 19
an effect called reverse narrow channel effect (RNCE), and this has most impact on VT when
W ∼ Wmin. Hence, the expected increase of transistor strength is thus overcompensated by the
increase of threshold voltage. On the other hand, an increase of L could lead to an increase of
transistor strength thanks to the reverse short channel effect (RSCE) by its decrease of threshold
voltage as L is increased. By this, sizing transistors for ULV is much different than from above-
threshold and is strongly technology dependent [17].
2.5.3 nMOS / pMOS Imbalance Factor
The strengths (β) of the pMOS and nMOS transistors are of important interest, especially the
imbalance between them. The strengths should be close to each other to ensure sufficient noise
margins and reasonable equal rise and fall time transients.
At above-threshold, the nMOS strength is often twice the pMOS strength for equal dimensions.
However, at ultra-low voltages and sub-threshold the nMOS/pMOS imbalance is usually higher.
The imbalance factor is defined as the highest of the strength ratios that is equal or above 1
between nMOS and pMOS regardless of which of them is the stronger one [17].
IF = max
[
βp
βn
,
βn
βp
]
≥ 1 (2.19)
2.5.4 Delay in Saturated MOSFETs
The delay of a CMOS circuit depends on the supply voltage VDD as shown in equation (2.20)
taken from [13] [14]:
Td =
CL · VDD
I
= CL · VDDµCox
2 (W/L)(VDD − VT )2
= k · CL · VDD(VDD − Vt)2 (2.20)
where: in the first term CL is a load capacitance, VDD is as usual supply voltage and I is the
current through either PMOS or NMOS during transition. In the second term I from first term
is exchanged with the square law model and VDD = Vgs for digital circuits and VT is transistor
threshold voltage. In the last term transistor parameters are collected into k for simplicity’s
sake.
From the first term in equation (2.20), one could as a first consideration suppose that a reduction
in VDD would gain better delay time. However, when looking at the last term of equation (2.20),
VDD is squared in the denominator which gives an exponential increase in delay when VDD is
reduced.
20 2.5. MOSFETS IN THE SUB-THRESHOLD REGION
2.5.5 Delay in Sub-threshold Region of MOSFETs
In this case, another equation is presented to model the current passing through a MOSFET
when in near and sub-threshold region. However, the form is adopted from (2.20). The equation
is as shown in equation (2.21) taken from [15]:
Td = k · CL · VDD
I0 exp (VDD−VTnVth )
(2.21)
where, k is the fitting parameter as in equation (2.20). Vth is the thermal voltage i.e. Vth =
kT/q.
2.5.6 High Fan-in Problematics in Sub-threshold Voltage
A cell library often consists of complex gates with various numbers of stacked and/or parallel
devices (transistors) connected to a node. However, when the supply voltage is to be lowered
to near/sub-threshold, it is discussed in [15] how cells with large number of parallel or stacked
devices significantly raised the lowest supply voltage that the cell could function in.
A problem mentioned in [15] is that the Ion/Ioff ratio is effectively degraded when there are
several parallel OFF transistors increasing the Ioff . While the cell is in strong inversion, this
degradation does not affect functionality and there is no problem pulling an output node high.
However, when scaling the voltage down below sub-threshold region, the Ioff current can dom-
inate the Ion current of a single pull-up transistor raising problems in the functionality.
Another problem mentioned in [15] is when several devices are stacked. Actually two effects
of stacking are discussed. The first one is when two devices are stacked and conduct current,
the drive current is approximately halved. The second is that the threshold voltage in a stacked
device increases when the Vsb source-to-body voltage increases.
In [15], a comparison simulation is discussed between a 2-input and a 3-input NAND gate. It
is stated that the 3-input gate required about 15mV higher minimum supply voltage than the
2-input gate.
2.5.7 Robustness
Robustness may be defined as "the ability of a system to resist change without adapting its initial
stable configuration". Other specific definitions exist, but in the topic of digital CMOS design
the definition could better be defined as "the ability of a logic element or system to withstand
Process-Voltage-Temperature (PVT) variations".
Chapter 2 Theoretical Background 21
Process Variations
Process variations are manufacturing variations that cause film thickness, lateral dimensions
and doping concentration to vary. The variations can be classified as inter-die or global (equally
influence all transistors on a die) and intra-die or local (transistor mismatch within a die due to
e.g. random dopant atoms implanted). [8]
Supply Voltage Variations
The supply voltage may vary by the fluctuations in the external supply, IR drop through supply
rails and others. However, sub-threshold operation requires a lower amount of current than
above-threshold and especially the IR drop can be neglected [17].
Temperature Variations
Temperature variations are due to environment influence because there is no source for self-
heating at sub-threshold supply voltage. However, the temperature affects the Ion and Ioff and
the currents are increasing when the temperature is increased [17]. Ambient temperature range
standards are defined as: Commercial [0, 70 ◦C]; Industrial [-40, 85 ◦C]; and Military [-55, 125
◦C] [8].
2.5.8 Corner Simulation
Corner simulation is when a single run is done with use of the transistor model corners. Cor-
ners are process corners that with the effect of process variations model transistors into worst,
nominal and best cases. These corners are often named typical (also called nominal), fast and
slow. Each corner can influence pMOS and nMOS independently, and thus there are five cor-
ners modeling each case: TT (Typical nMOS, Typical pMOS), FF (Fast nMOS, Fast pMOS), SS
(Slow nMOS, Slow pMOS), SF (Slow nMOS, Fast pMOS) and FS (Fast nMOS, Slow pMOS).
The TT corner is however not an actual corner, but a point in between the other four corners and
models the mean expected case [8]. These five process corners can further be combined with
wire, VDD and temperature corners. However, in this thesis the corners are only combined with
the temperatures: −40◦C, 25◦C and 85◦C.
22 2.6. MICROELECTRONIC DESIGN STYLES
2.5.9 Monte Carlo Simulation
Monte Carlo simulation is used to find the influence of random variations on a circuit. Monte
Carlo simulation repeats a simulation with different randomly selected parameters in a model.
Hence, a model file with a statistical distribution has to be available in order to do Monte Carlo
simulation. Manufacturers often provide such model files. Results often reported from Monte
Carlo simulation is mean, minimum, maximum and standard deviation σ [8].
2.6 Microelectronic Design Styles
Different design styles or methodologies have been used to design IC circuits. They can ba-
sically be classified into full-custom or semi-custom design styles. In full-custom style, the
functional and physical designs are handcrafted and thus require comprehensive effort of the
design team to achieve satisfactory performance in each detailed feature in the circuit. The high
design effort gives long design time and high cost. However, the effort is often compensated by
achieving high-quality circuits. Semi-custom design, on the other hand, is based on the concept
of using a restricted number of predesigned primitives in order to reduce design time and cost
through exploiting well-designed and well-characterized primitives. Due to the fact that fine-
tuning large and complex full-custom design may be extremely difficult, and the possibility of
automated optimization technique with use of Computer-Aided Design (CAD) tools, the loss
in quality in semi-custom design is often very small. Nowadays, the number of semi-custom
designs outnumbers full-custom designs.
Figure 2.12: Microelectronic design styles [9].
Semi-custom designs are partitioned into two major classes which are Cell-based design and
Chapter 2 Theoretical Background 23
array-based design. These major classes are further partitioned into subclasses as shown in
figure 2.12. Cell-based design utilizes the use of library cells. Library cells can be designed
once and stored. Also, cell-based design can use cell generators that synthesize macro-cell
layouts from functional specifications.
Cell-based design by use of standard-cells, the fundamental cells are stored in a library and
the cells are designed once. However, as the semiconductor process technology scales down
updates on the cell library is required. Each cell needs to be parameterized in terms of area and
delay over ranges of supply voltages and temperatures, and thus the maintenance of a library is
not a trivial task.
Cell-based design by use of macro-cells consists of combining already synthesized building
blocks that is synthesized by a program called cell or module generators. These generator
programs vary widely in capabilities and have evolved over the last decades. To use a macro-
cell generator one has to provide the functional description. Then macro-cells are placed and
wired. Although these steps have been automated through software, they are more difficult and
may be less efficient when compared to standard-cell placement and wiring due to irregularity
in size of macro-cells [9].
When choosing design style to be used in development of microelectronic circuits, a trade-off
between performance and design cost and time is often considered. A full-custom design may
only be justified if high production volume is assumed, however it might involve higher risk
with longer time to marked due to very long design time. On the other hand, if a reduction in
performance is acceptable, one could chose cell-based semi-custom design style to reduce time
to marked and thus reduce risk of manufacturing lower volume than anticipated. A comparison
between design styles, in terms of density, performance, flexibility, design time, manufacturing
time and cost is shown in table 2.1 and could be used as a guideline in choosing a style for a
given application and market.
Table 2.1: Tradeoffs between design styles [9].
Custom Cell-based Prediffused Prewired
Density Very High High High Medium-Low
Performance Very High High High Medium-Low
Flexibility Very High High Medium Low
Design time Very Long Short Short Very Short
Manufacturing time Medium Medium Short Very Short
Cost - low volume Very High High High Low
Cost - high volume Low Low Low High
24 2.6. MICROELECTRONIC DESIGN STYLES
Chapter 3
Design Methodology and Application of the
Sub-Threshold Cells
This chapter presents the design of cell library elements/gates targeting at sub-threshold supply
voltage of 350mV. The sub-threshold logic elements are designed with minimum drive strength
and with use of an available 130nm HVT technology. The cells and logic elements designed
are: Inverter, NAND2, NOR2, XNOR2, XOR2, AOI22, OAI22 and D flip-flop. The sub-
threshold cells are further applied in an ALU circuit to test the performance and power reduction
gained.
3.1 Library Specifications
Before the design phase of the sub-threshold cell library, initial specifications were established.
The requirements were:
• A circuit with the sub-threshold cell library applied should be able to operate at a fre-
quency of least 32kHz.
• The main focus should be robustness, not lowest voltage / lowest power consumption.
• The library should include at least the minimum basic logic functions in order to create
functioning circuits with use of synthesis tools (Inverter, NAND2, NOR2 and D-FF).
• Additionally complex gates as the XNOR2, XOR2, AOI22 and OAI22 may be included
if available time left.
• The temperature range is set to [-40, 85◦C], a standard for industrial products.
25
26 3.2. DESIGN OF SUB-THRESHOLD CELLS
3.2 Design of Sub-Threshold Cells
Designing a cell library for use in sub-threshold is very different from above-threshold supply
voltage. In a sub-threshold cell library, cells with larger fan-in than 2-3 should be avoided, and
particularly logic cells that stacks transistors of the weaker type between nMOS and pMOS due
to the increase of VDD needed to make the cell fully functioning. This is because the effective
Ion/Ioff ratio is lowered due to decreased Ion by the increased stacking in one network, and
increased Ioff by the increased parallel transistors in the other network (PDN/PUN). Conse-
quently, increase of VDD would probably make the library infeasible for ULP operation [15].
Synthesized circuits for ULP often require cells with small drive strengths. Hence, very few
strength versions of cells may be needed in an ULP library. Stronger cells are often needed in
critical paths where the speed must be increased to achieve the clocking requirement. However,
in ULP circuits where these critical paths occurs, it might be better to parallelize minimum
strength cells to increase overall strength and thereby achieving better speed [17]. In another
perspective, the number of cells in an ULP library should be small (in order of tens) such that
development and porting cost from technology scaling are as low as possible.
The proposed sub-threshold cell library is from the above viewing designed with use of max-
imum fan-in of 2 and with minimum drive strengths. The logic elements that are designed
and described further in this chapter are: Inverter, NAND2, NOR2 and memory element D-FF.
These cells may be the minimum logic cells required to create area and power efficient circuits.
Beyond this, four cells are designed to extend the library such that synthesized circuits may(!)
be more area, delay and power efficient. These are: XOR2, XNOR2, AOI22 and OAI22.
All logic gates are designed with use of static CMOS logic style and all transistor topologies
are obtained from [10].
3.2.1 Choice of Transistor Type and Supply Voltage for the Sub-Threshold
Cells
In order to design an ULP cell library, the components of power consumption should be mini-
mized as much as possible. In section 2.4, the components of power consumption is presented
as: dynamic, short-circuit and leakage/static components. As described in section 2.4.4, the
dynamic power consumption is most effectively reduced by lowering the supply voltage and/or
by lowering the frequency. In fact, as stated in section 2.5.4 and 2.5.5 the delay increases with
decreased VDD (delay increases exponentially for sub-threshold VDD), and thus the frequency
may have to be lowered in order to sustain a functioning circuit. Hence, at lower frequencies, the
static power consumption is of increased importance and should be taken accounted for.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 27
Transistor Type
The static power consumption is the product of leakage current Ileakage and VDD, and a reduction
in the supply voltage obviously reduces the leakage. However, within a MOSFET technology,
there are often two or more transistor types which has different threshold voltages VT . These
transistors types are often referred as Low-VT (LVT), High-VT (HVT) and Regular-VT (RVT)
in between. Properties that differ with these types are the static leakage power consumption
and the speed. High speed circuits are often designed by using LVT transistors, while other
non-speed critical circuits are designed with HVT in order to reduce the sub-threshold leakage
current [13]. This means that the LVT type is the fastest, but with the drawback of high static
leakage current, while the HVT type is the slowest, but with the advantage of low leakage. The
130nm process technology provided for this project includes all these three types mentioned.
The transistor type chosen for this project is the HVT type. One reason for the choice is because
the HVT type possesses the least leakage power consumption. Secondly, requirement specifica-
tions were that a circuit should be able to operate at a frequency of at least 32kHz, and be able to
scale the supply voltage to above-threshold voltage for speed critical modes. Hence, choosing
HVT will give least leakage for all VDD modes. It might be beneficial to use the same transistor
type for the sub-threshold cell library as used in an above-threshold library due to the require-
ment of additional fabrication steps to support multiple VT types which in hand lengthens the
design time, increases fabrication complexity and may decrease yield [13].
Supply Voltage
The ultra-low-voltage (ULV) supply voltage at which the sub-threshold cell library is designed
to operate at, is chosen from analysis of an existing above-threshold library, and specifically
the inverter gate. The first analysis was to simulate a five stage ring oscillator of inverters
and simulate the power-delay product (PDP) versus VDD. However, the PDP curve depicted
in figure 3.1a is inconclusive as it has no optimum point and continued to decreased as VDD
decreased. Another analysis was to simulate the propagation delay versus VDD through an in-
verter gate. It is shown in figure 3.1b for 25◦C how the three propagation delays: rising, falling
and delay (which is the mean of the two) is increased exponentially from below approximately
400mV and down. This is as expected due to that the provided 130nm HVT transistor tech-
nology has an intrinsic threshold voltage of approximately 400mV and the delay is exponential
at near/sub-threshold as described in section 2.5.5. However, as it is impossible to accurately
predict logic depth of critical paths in future synthesized designs, and the requirement of an
operating frequency of least 32kHz, it might be a good idea aiming at a VDD of 350mV at 25◦C
before the delay grows exponentially to the ceiling. The following paragraphs substantiates the
choice of supply voltage by involving noise margins and the lower bound VDD,min.
28 3.2. DESIGN OF SUB-THRESHOLD CELLS
(a) Power-delay product versus supply voltage. (b) Propagation delay, falling and rising versus supply
voltage.
Figure 3.1: Both plots for an existing above-threshold inverter at 25◦C and nominal corner.
In [17], the physical lower bound of supply voltage VDD,min is given by the requirement of non-
negative noise margin (NM), NM should not be NM < 0. The NM is found to be, by assuming
nn ≈ np ≈ n
NM = min(NML, NMH)
= VDD2 − Vth
n
2 ln (IF )− Vth
n
2
[
ln
( 2
n
)
+ 1
] (3.1)
where Vth = kTq is the thermal voltage, n is the sub-threshold slope factor and IF is the imbal-
ance factor as described in section 2.5.3. The lower bound VDD,min is then derived from invert-
ing the NM equation (3.1) and setting NM=0 providing (this provides a theoretical VDD,min and
in practical circuits the NM should be higher to ensure robustness)
VDD,min = nVth
[
ln (IF ) + ln
( 2
n
)
+ 1
]
(3.2)
The equation (3.2) gives in the ideal case with perfect imbalance factor (IF=1) a lower bound of
only 2Vth ≈ 50mV at 25◦C. However, the lower bound supply voltage with a non-ideal IF will
significantly increase the value. Estimation of the practical lower bound supply voltage for the
130nm has been done. However, to do so, the IF value is first calculated by deriving the equa-
tion (3.3) from [16] for practical midpoint DC voltage VM and setting VDD = 350mV .
VM ≈ VDD2 ±
n
2Vth ln (IF )
(3.3)
If a pessimistic assumption of the midpoint deviation of a worst case cell in the cell library is
20% deviation from VM with respect to VDD/2 = 175mV , the IF is calculated to be IF ≈
10 at 25◦C. The calculations, accuracy and sub-threshold slope factor n is left out from the
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 29
report to protect sensitive information regarding the process technology. However, the lower
bound VDD,min is calculated by (3.2) to 4.5Vth ∼ 115mV with IF ∼ 10. This confirms the
predictions discussed in [17], and thus by adopting the increased voltage to overcome PVT
and IF variations, VDD,min should be approximately 13 - 14 Vth that is between 325 - 350 mV
at 25◦C. This also confirms the previously choice of VDD = 350mV . However, in terms of
timing the propagation delay is exponentially dependent on the temperature through the thermal
voltage Vth in equation 2.21 in section 2.5.5. Thus the supply voltage should be adjusted to
mitigate a critical path delay in a specific circuit design to achieve requirement specifications in
all temperatures.
3.2.2 Transistor Sizing
An initial analysis of the transistors has been done prior the design stage of the sub-threshold
cells. The analysis was to find how the strength of both pMOS and nMOS changes as the
geometry (i.e. the width and length) changes. The analysis is important as it gives better
knowledge of the difference between pMOS and nMOS, and how they should be sized to give
better imbalance factor between them.
Each curve in figure 3.2 depicts the normalized Ion strength with supply voltage of 350mV for
nMOS and pMOS, and with the cases of one geometry parameter set to the minimum and the
other changed from minimum to 800nm. For instance, the nMOS Ion vs. L curve versus nMOS
length (L) is with the width parameter of nMOS fixed to the minimum (W=160nm).
Figure 3.2: Normalized nMOS and pMOS strength to the case with W=Wmin (L=Lmin) versus
W (L) with VDD = 350mV , 25◦C, nominal process and mismatch.
To the leftmost in figure 3.2, it is shown that the pMOS is about 19 times stronger than nMOS
30 3.2. DESIGN OF SUB-THRESHOLD CELLS
at 350mV with all dimensions of both pMOS and nMOS minimum sized. This means that
the pMOS strength should be lowered and nMOS strength should be increased to yield better
imbalance factor and performance when designing logic gates for 350mV supply voltage.
Focusing on the pMOS, both strength curves for the pMOS depicted in red in figure 3.2 de-
creases as one of the dimensions are increased from minimum. Nevertheless, one can see that
the solid red curve for pMOS Ion vs. L has a steeper strength reduction than the other. Thus,
the pMOS length is the most effective dimension to modify in order to reduce the strength. The
other less effective red dotted curve also stops reducing the strength at almost 300nm and rather
increases the strength. Although it might look like it would be effective to increase both length
and width of the pMOS, it is discovered that from a certain point of L it rather worsened if W
is increased. The point of L where it shifted from being better to worse for increasing the W is
when L ≥ 160nm approximately. Setting the pMOS length to L=160nm provides a new pMOS
Ion vs. W curve shown in figure 3.3 where the pMOS strength only increases if the width is
increased.
Figure 3.3: Normalized pMOS strength to the case with L=160nm versus W, with Vdd=350mV,
25◦C and nominal process and mismatch.
Now by focusing on the nMOS, both strength curves for the nMOS increases with increasing
dimensions as shown in the blue lower curves of figure 3.2. The blue solid curve of nMOS Ion
vs. W has a strength peak at approximately 380nm and then barely decreases for larger widths.
On the other hand, the dotted blue line strength of nMOS Ion vs. L increases as the length is
increased until the end of the plot. Again, it might look like it would be effective to increase
both length and width of nMOS, but as also discovered for the pMOS, it is a low point of each
dimension where it would rather worsen the strength if both dimensions are increased from
minimum size. Since both nMOS curves are similar, both with increasing pMOS length could
be used strategically in two ways to design gates. These strategies are listed in table 3.1.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 31
Table 3.1: General sizing strategies.
Strategy Description
1 Increasing pMOS length and nMOS width
2 Increasing pMOS and nMOS length
The threshold voltage of pMOS and nMOS transistor is highly sensitive to changes in dimen-
sions when the dimensions are minimum sized due to Reverse narrow channel effect (RNCE)
and Reverse short channel effect (RSCE) [17] as discussed in section 2.5.2. This effect is shown
in figure 3.4 where the normalized threshold voltages versus width and length are plotted and
the thresholds are changing rapidly for small dimensions.
In terms of variability it is stated and shown in [15, 18, 19] that the variance and std. deviation
for the threshold voltage VT due to Random Doping Fluctuations (RDF) is σ ∝ 1/
√
WL.
Additionally, [19] states that RDF is the dominant source of variation in sub-threshold operation.
Hence, this also applies to the Ion variations due to its dependence of threshold voltage as
shown in section 2.5.1 and equation (2.15). Although minimum sized transistors might yield
best timing performance due to less input-output capacitances, sizing transistors for operation
in sub-threshold region requires larger area WL to decrease threshold variations.
Table 3.2: Approximately threshold voltages for the 130nm HVT technology at VDD = 350mV .
Device Threshold Voltage
pMOS Vtp ∼ −400mV
nMOS Vtn ∼ 600mV
Intrinsic VTH0 ∼ 400mV
Figure 3.4: Normalized threshold voltage versus L or W with other dimension minimized and
VDD = 350mV .
32 3.2. DESIGN OF SUB-THRESHOLD CELLS
3.2.3 General design methodology of logic elements
Figure 3.5: Design methodology of cells and tools used in each step where N is chosen number
of alternative designs to further explore.
A general methodology is used for designing all the cells. Figure 3.5 presents a flow diagram
and all steps in the design methodology is described next. Initial System specifications includ-
ing logical behavior, supply voltage, drive strength etc. are determined prior designing a cell.
The next step is to do transistor level Schematic design in Cadence Virtuoso Schematic Editor
tool [20] where the focus is to realize desired behavior. The next few steps are within a loop
where Transistor sizing and Pre layout simulation including DC and transient analysis are
repeated until satisfied results are achieved. A number of N alternative designs are Stored for
further evaluation such that higher confidence of choosing the best design is achieved. When
all alternatives are found, then Layout design is done on every alternatives with use of Cadence
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 33
Virtuoso Layout XL tool [21]. The layout design is important as it provides the ability to ex-
tract capacitive and resistive parasitics that are due to interconnecting wires and placements.
2D extraction mode (xRC) is the method used to parasitic extraction. This gives a more realis-
tic model of the alternatives than schematic. These alternatives with parasitic models are then
Post layout simulated to verify correct behavior and observe expected increase of delay due
to larger capacitive parasitics. However, to gain even more confidence, each alternative is sim-
ulated with use of Monte Carlo simulation in -40, 25 and 85◦C so that variations by process
and mismatch are taken in to account when Choosing best alternative. The final steps are to
Optimize layout and do Characterization prior the logic cell is put in a cell library ready to
be used by synthesis tools to create ULP circuits.
The design of logic gates for sub-threshold supply voltage is done with focus of optimizing DC
analysis results so that midpoint voltage VM moves towards VDD/2, and at the same time mini-
mizing transient analysis results such as propagation delay. However, in most cases improving
the midpoint voltage causes the propagation delay to increase and vice versa. Thus, the diffi-
cult part is to balance the trade-offs when choosing design geometries of the pMOS and nMOS
transistors in the logic gate.
The test bench setup and method used to design minimum strength logic gates is by connect
an odd number of logic gates in a ring oscillator to make an uniform way without influence of
fan-in and fan-out while transient analysis simulation [12].
Minimum strength logic gates are simulated and designed by use of an odd number stage ring
oscillator. This method provides a uniform way of measuring timing results without influence
of undefined fan-in and fan-outs [12]. The odd number is typically least five stages [12], thus
also chosen in this project. Simulation setup and figures of test benches are presented in sec-
tion 5.1.2. The ring oscillator oscillates at the maximum possible speed rate allowed by the gate
under design enabling simulation of all important transient analysis results as: rise tr and fall
times tf ; propagation delays tpHL, tpLH ; and delay. Although the ring oscillator has nothing to
do with DC analysis, it is convenient to use the same test bench and do DC analysis on the first
stage of the ring. DC analysis applies a steady DC voltage which does not allow oscillations to
begin, thus enabling VTC curves to be simulated. Nevertheless, most gates besides the simple
inverter gate have multiple inputs and even more logic operations. For example the NAND2
gate has two inputs and four states. To propagate a signal through a NAND2 ring oscillator
there are several options to bias the gates. Either bias input A or B to VDD while connecting the
other to the chain, or no biasing and connecting both inputs to the chain. However, the worst
case biasing in term of DC and transient analysis is focused on while designing the gates. The
design of the different logic gates or cells is described in the forthcoming sections.
34 3.2. DESIGN OF SUB-THRESHOLD CELLS
3.2.4 Monte Carlo Simulation and the Number of Runs
Monte Carlo simulation described in section 2.5.9 is used to determine how random variations
influence the results of a circuit or cell. Monte Carlo simulation repeats a simulation n times
and the number is chosen by the designer. However, the question is how many runs n should be
chosen.
The number of Monte Carlo runs n for the intermediate results is chosen by iterative testing.
The procedure was to first obtain a std. deviation for n=30 runs which is greater than the
minimum required number 20 stated as a rule of thumb by the central limit theorem to create an
approximate normal distribution [22]. The std. deviation was then observed while increasing
the number of runs until the value stabilized. When a final sizing design is chosen for a logic
gate, final results are obtained with a higher number of runs n = 220 to increase accuracy. How
accurate the results are with number of runs n is described in the forthcoming paragraphs.
A confidence interval gives an idea of how accurate an estimated value X¯ of µ is. Both expec-
tation value µ and standard deviation σ is unknown prior any simulation. Hence, the T-interval
is used to find the confidence interval. This method requires that the measurements are either
normal distributed or the number of measurements larger than 30. The T-interval is defined
and shown in equation (3.4) where S is an estimator of std. deviation and tα/2 is the T-interval
quantile [22]. [
X¯ − tα/2 · S√
n
, X¯ + tα/2 · S√
n
]
(3.4)
Equation (3.4) is transformed into a relative percentage interval in equation (3.5). More than 30
Monte Carlo runs must be simulated to find an estimator S of std. deviation. As an example, the
designed inverter gate yields an estimator S≈ 0.5 (50% relative to µ) with VDD = 350mV after
100 runs shown as design 3 in figure D.2a. For a 99% confidence interval then (1− α) = 0.99
and tα/2 = t0.005. The number of run and degree of freedom equal to 100 gives a value of
t0.005 = 2.626 obtained from a T-interval quantile table in [22]. A 99% confidence interval
is then µ = ±2.626 · 0.5√100 = 0.1313, which is ±13.13% relative to µ. The final results for
each logic gate is obtained with a number of run n = 220. The 99% confidence interval is
then µ = ±2.59836 · 0.5√220 ≈ 0.0876 which is ±8.76%, with t0.005 = 2.59836 obtained for
n = 220.
The std. deviation estimator S is also influencing the accuracy and number of runs. All the in-
termediate delay results show that std. deviation is relatively much smaller for above-threshold
than sub-threshold. For instance, the chosen inverter design 3 has an estimated std. deviation of
4.5% relative to mean in VDD = 1.2V and -40◦C. This is a much smaller estimated S compared
to the sub-threshold case with an estimated std. deviation of ∼ 50%. The 99% confidence
interval for VDD = 1.2V and n=220 is thus only µ = ±2.59836·0.045√220 ≈ 0.008 (±0.8%) which is
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 35
much better than ±8.76% for the sub-threshold case.
µ = ±tα/2 · S√
n
(3.5)
To find out how many runs that should be simulated to give a specific interval ± length (µ),
equation (3.6) is used with standard normal distributed quantile zα = 2.576 for α = 0.005.
An example in [22] uses this method. With continue of the inverter example with S=0.5, the
number of runs required to gain a 99% confidence interval with µ = ±0.1 (10%) is n ≥(
2.576·0.5
0.1
)2
= 166. For a 99% confidence interval with µ = ±5% the number of run should
be greater or equal to n ≥
(
2.576·0.5
0.05
)2
= 664. Even more radically, a 99% confidence interval
with µ = ±1% requires a number of runs greater or equal to n ≥
(
2.576·0.5
0.01
)2
= 16590. Hence,
the interval length µ is not decreasing proportionally with the number of runs n since the n is
square rooted in the denominator of equation (3.5). In order to double the accuracy the number
of Monte Carlo runs has to be increased by four.
n ≥
(
zα/2 · S
µ
)2
(3.6)
3.2.5 Design of Inverter Gate
The inverter gate is the most important and fundamental logic cell in a cell library. The sub-
threshold inverter is designed with use of static CMOS logic style as shown in figure 3.14,
where the symbol is shown to the left and the transistor schematic is at the right side [10]. The
inverter gate only possesses a single logical operation and inverts a single bit. Hence, only one
DC and transient operation is needed to be examined meanwhile designing the inverter. The
test bench with inverters connected in a ring oscillator is presented in section 5.1.2.
The pMOS is clearly the strongest transistor compared to nMOS with VDD = 350mV and
minimum geometry sizes as discussed in section 3.2.2. Thus, the midpoint with both transistors
minimum sized gives a non-optimal percentage deviation from VDD/2 equal to ∼ 35% as seen
in figure 3.6a with n_w = 160nm. In addition, the propagation delay is very large as seen
in figure 3.6b. To explore trade-offs in the two sizing strategies described in section 3.2.2, the
Cadence Sim. tool enables use of parametric sweep of custom parameters.
The objectives behind the parametric sweep is to find the minimum point of midpoint VM [%]
and timing by searching for the optimum transistor dimensions. The idea behind this methodol-
ogy is to fix two transistor size parameters to the minimum and then coarse sweep the other two
size parameters from minimum to a relatively large size (720nm) with steps of 5 points in each
parameter. The first coarse sweep indicates which region the optimum sizes may be located.
36 3.2. DESIGN OF SUB-THRESHOLD CELLS
A new narrower region is secondly swept after analyzing the previous result. This is repeated
until optimum sizes are found with good certainty. The same sweep procedure is done on each
cell designed in this thesis and a brief explanation is given in the following paragraphs. Only
the sizing strategy 1 is shown although sizing strategy 2 is found to be the best strategy in all
designed gates.
First Coarse Parametric Sweep of n_w and p_l
The first coarse sweep of transistor sizes is done by setting nMOS length n_l and pMOS width
p_w to minimum size (120 nm and 160 nm respectively) and sweep the nMOS width from
minimum 160 nm to 720 nm with 5 steps (n_w = [160 nm, 720 nm]). For each step of n_w
the pMOS length is swept from minimum 120 nm to 720 nm with steps of 5 (p_l = [120 nm,
720 nm]). The first coarse sweep gives an indication of the change in VTC midpoint and the
transient propagation delay versus n_w and p_l shown in figures 3.6.
(a) Midpoint percentage vs p_l and n_w. (b) Propagation delay vs. p_l and n_w.
Figure 3.6: INV: First coarse parametric sweep.
Second Coarse Parametric Sweep of n_w and p_l
A second parametric sweep is done with narrower regions of n_w and p_l decided to be n_w=[250
nm, 440 nm] and p_l=[120 nm, 500 nm] by analyzing the results given in the preceding sweep.
The second sweep is done with higher number of steps in p_l direction to increase resolu-
tion, resulting in midpoint and propagation delay curves as shown in figures 3.7. A size of
n_w ≥ 350nm gives best midpoint, but n_w =≈ 350nm gives best propagation delay. The
p_l size on the other hand gives a trade-off between midpoint and delay. This parametric sweep
method is further used to find the other alternatives listed in table 3.3.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 37
(a) Midpoint percentage vs p_l and n_w. (b) Midpoint percentage vs p_l and n_w.
Figure 3.7: INV: Second coarse parametric sweep.
The Chosen Inverter Design
Table 3.3 lists all the Inverter sizing designs that are chosen for further investigation by use of
Monte Carlo simulation on schematic and layout. Design 1 and 2 is found by sizing strategy 1
referred to table 3.1. All the other designs are found with sizing strategy 2. These are chosen
with different trade-offs between midpoint and delay. Layout is designed for all alternatives and
simulated with Monte Carlo and 100 runs to explore the difference between the sizing strategies
in term of mean and standard deviation. The results are called intermediate results and are found
in appendix D.2. The sizing strategy 2 is found to be best when considering the intermediate
delay results for VDD = 350mV . The sizing strategy 1 is found to be almost twice in mean
delay and standard deviation than the strategy 2 in the worst case with temperature of −40◦C
seen in figure D.2a. There is no distinction between the strategies when it comes to the midpoint
percentage besides the strategy 1 has slightly worse standard deviation than the others.
By analyzing the intermediate results found in appendix D.2, design 3 is chosen as the final
Inverter design. Design 3 has not the best mean midpoint percentage and std. deviation for
VDD = 350mV , but a value of∼ 10% for all temperatures is adequate. The midpoint percentage
is however the best one with VDD = 1.2V and in all temperatures with a value ∼ −2%. The
design 3 the best delay in 350mV and temperature of 25 and 85◦C, and one of three best at
−40◦C.
Table 3.3: Inverter: Design sizes chosen for further investigation.
Design 1 2 3 (Chosen) 4 5 6
pMOS (W / L) 160 / 389 160 / 500 160 / 240 160 / 480 160 / 600 160 / 720
nMOS (W / L) 385 / 120 350 / 120 160 / 480 160 / 480 160 / 600 160 / 720
MOS Area [fm2] 108.4 122.0 115.2 153.6 192.0 230.4
38 3.2. DESIGN OF SUB-THRESHOLD CELLS
3.2.6 Design of NAND2 Gate
The sub-threshold NAND2 is designed with use of static CMOS logic style as shown in fig-
ure 3.15, where the symbol is shown to the left and the transistor schematic is to the right
side [10]. The NAND2 gate has two inputs and four logical operations as shown in the truth
table of figure 3.8a. The same figure shows three possible transitions that can occur. Two tran-
sitions with only one input value changing (either A or B changing) with the other kept low and
the third transition occurs if both inputs are changing from low to high simultaneously. Hence,
there exist three VTC curves in DC analysis shown in figure 3.8b. However, two of the tran-
sients (1) and (2) are similar and occurs at approximately the same midpoint in VTC diagram
whereas the third transient curve (0) occurs to the right of the other two. Since the pMOS has
the largest drive strength compared to nMOS with minimum sized dimensions, the VTC curves
are positioned to the right of optimum midpoint VDD/2. Transient (0) gives the worst case VTC
curve, unless if the whole VTC family is moved to the left side of VDD/2.
The same number of transient opportunities exists to configure the ring oscillator in the transient
analysis test bench. However, since transient (1) and (2) are similar only one of them are
configured in the test bench. In addition the transient (0) who gives the worst case VTC is also
DC and transient analyzed in test bench. The test bench configuration for DC and transient
analysis are found in section 5.1.2. By simulation, the transient (0) case is also found to be the
worst case in terms of propagation delay in transient analysis.
(a) Truth and transition table. (b) VTC family.
Figure 3.8: NAND2 truth- and transition table where green numbers are common starting point
for each arrow column and red numbers are ending points.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 39
The Chosen NAND2 Design
Table 3.4 lists all the NAND2 sizing designs that are chosen for further investigation with use of
Monte Carlo simulation on schematic and layout. Experienced by designing the Inverter gate,
the sizing strategy 1 is worse than strategy 2 (referring to table 3.1). Thus, design 1 holding the
sizing strategy 1 is included to check that the same experience applies to the NAND2 gate. Lay-
out and Monte Carlo simulation with 100 runs is done on each design to explore the difference
between the sizing strategies in term of mean and standard deviation. The intermediate results
are found in appendix D.3. The sizing strategy 2 is as with the inverter found to be the best in
term of mean propagation delay and std. deviation in VDD = 350mV . In the worst case delay
with temperature of -40◦C and VDD = 350mV as shown in figure D.4a, the design 1 has again
almost twice the mean delay and std. deviation compared to the others with sizing strategy 2.
Similar to the Inverter gate there are no distinction between the strategies for the NAND2 gate
when it comes to the midpoint percentage besides that the strategy 1 has slightly worse standard
deviation than the others.
By analyzing the intermediate results found in appendix D.3, design 4 is chosen as the final
NAND2 design. Design 4 has not the best mean midpoint in VDD = 350mV , however it is in
the middle range with sufficient mean midpoint value of ∼ 20%. This is higher than for the
Inverter gate because the PUN has two pMOS in parallel instead of one. The midpoint may be
impossible to improve without sacrifice of timing such as propagation delay. Considering the
trade-off, the transient results are given higher weight than VTC midpoint when deciding which
design to choose. Hence, the design 4 is chosen because it has the least mean propagation delay
and std. deviation for all temperatures at VDD = 350mV as shown in figure D.4(a-c).
Table 3.4: NAND2: Design sizes chosen for further investigation.
Design 1 2 3 4 (Chosen) 5 6
pMOS (W / L) 160 / 389 160 / 240 160 / 480 160 / 270 160 / 520 160 / 720
nMOS (W / L) 385 / 120 160 / 480 160 / 480 160 / 720 160 / 720 160 / 720
MOS Area [fm2] 216.9 230.4 307.2 316.8 396.8 460.8
3.2.7 Design of NOR2 Gate
The sub-threshold NOR2 is designed with use of static CMOS logic style as shown in fig-
ure 3.16, where the symbol is shown to the left and the transistor schematic is to the right
side [10]. The NOR2 gate has similar to NAND2 gate two inputs and four logic operations,
but with the difference of only producing a logical high when both inputs are low as shown in
figure 3.9a. The NOR2 gate has also similarly three transitions as shown in figure 3.9b where
two of them (0) and (1) occurs at approximately the same midpoint in VTC diagram. However,
40 3.2. DESIGN OF SUB-THRESHOLD CELLS
the difference compared to the NAND2 gate is that the third single transition (2) occurs to the
left instead of to the right of the two similar transitions. Hence, there are two worst case VTC
transitions: both (0) and (1). Nevertheless, only transition (1) in addition to transition (2) is
used in the test bench.
The test bench configuration for DC and transient analysis is shown in section C.1. By sim-
ulation, the transients are found to be similar in terms of transient analysis and timing mea-
surements. Worst case is thus not easily determined. The test bench configuration for DC and
transient analysis are found in appendix C.1.
(a) Truth and transition table. (b) VTC family.
Figure 3.9: NOR2 truth- and transition table where green numbers are common starting point
for each arrow column and red numbers are ending points.
The Chosen NOR2 Design
Table 3.5 lists all the NOR2 sizing designs that were chosen for further investigation with use
of Monte Carlo simulation on schematic and layout. Even though experience through designing
the inverter and NAND2 gate shows that sizing strategy 1 is worse than strategy 2, then two
designs have been included in the list of further investigated designs (design 1 and 2). Layout
and Monte Carlo simulation with 100 runs is done on each design to explore the difference
between the sizing strategies in term of mean and standard deviation. The intermediate results
are found in appendix D.4. The same conclusion with respect of which sizing strategy is the
best applies to the NOR2 gate as for the previously discussed gates. I.e. the mean delay and std.
deviation in sizing strategy 1 at -40◦C and 350mV is almost twice as slow as the other strategy
2 as shown in figure D.5a.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 41
The design 5 is chosen after analyzing the intermediate results found in appendix D.4. The rea-
son is that the design 5 has one of the least propagation delays in all temperatures and 350mV
compared to the others. Although design 3 has similar results in delay, the design 5 is chosen
prior design 3 due to slightly better mean midpoint percentage and std. deviation in all temper-
atures and 350mV. Design 5 has a sufficient mean midpoint percentage of ∼ 10− 12%.
Table 3.5: NOR2: Design sizes chosen for further investigation.
Design 1 2 3 4 5 (Chosen)
pMOS (W / L) 160 / 200 160 / 200 160 / 150 160 / 240 160 / 150
nMOS (W / L) 300 / 120 350 / 120 160 / 600 160 / 480 160 / 720
MOS Area [fm2] 136.0 148.0 240.0 230.4 278.4
3.2.8 Design of XNOR2 and XOR2 Gate
The sub-threshold XNOR2 and XOR2 gate are designed with use of static CMOS logic style
as shown in figure 3.17 and 3.18 respectively, where the symbol is shown to the left and the
transistor schematic is at the right side [10]. The only difference between the gates in the
structure is how the inputs are organized. The complementary inputs are produced by including
sub-threshold inverters which are designed earlier and described in section 3.2.5.
Both XNOR2 and XOR gate has two inputs and four logical operations. The difference between
them are that XOR2 produces a logical high only when inputs are different from each other,
while XNOR2 produces the opposite as shown in the truth tables in figure 3.10a and 3.11a.
Both gates have two possible transitions. The XNOR2 gate has both transitions going from
when both inputs are logical low to either input A or B changes to logical high. These transitions
are labeled (0) and (1) in figure 3.10a and the respective VTC curves are shown in figure 3.10b.
The XOR2 gate has similar behavior, but with transitions going from when either input is logical
high to when both inputs are logical high as shown in figure 3.11a. For both gates the transitions
(0) and (1) occurs at approximately the same midpoint in VTC diagram. Hence, there is no worst
case VTC curve in these gates. The input B is causally selected as the biased input. For the
XNOR2 gate input B is sink to GND, and for the XOR2 gate input B is sourced to VDD while
input A is swept in DC analysis.
The transient test bench with a ring oscillator is configured equally as for the DC analysis
described in previous paragraph. Since the VTC curves are similar it is assumed that this is
approximately the same case for the timing measurements. The test bench configuration for DC
and transient analysis are found in appendix C.2 and C.3.
42 3.2. DESIGN OF SUB-THRESHOLD CELLS
(a) Truth and transition table. (b) VTC family.
Figure 3.10: XNOR2 truth- and transition table where green numbers are common starting point
for each arrow column and red numbers are ending points.
(a) Truth and transition table. (b) VTC family.
Figure 3.11: XOR2 truth- and transition table where green numbers are common starting point
for each arrow column and red numbers are ending points.
The Chosen XNOR2 and XOR Design
Table 3.6 lists all the XNOR2 sizing designs that were chosen for further investigation with use
of Monte Carlo simulation on schematic and layout. Design 1 is again included to investigate
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 43
if the sizing strategy 1 referred to table 3.1, but is still the worse strategy compared to the
other. Layout is made for all the alternatives and further they are Monte Carlo simulated with
100 runs to explore differences between designs in term of mean and standard deviation. The
intermediate results are found in appendix D.5. The worst case propagation delay with VDD =
350mV and -40◦C depicted in figure D.8a shows that design 1 with sizing strategy 1 is still the
worse with approximately twice mean and std. deviation compared to the rest. Additionally,
design 1 has also worse mean midpoint percentage and std. deviation than the others in -40◦C
and 25◦C shown in figure D.7(a-c).
Design 3 is chosen for the XNOR2 gate because it has the least mean propagation delay in
worst case temperature of -40◦C. Additionally, it is the best for 25◦C and second best in 85◦C
(however, best compared to those with sizing strategy 2). On the other hand, design 3 does not
stand out from the rest in terms of mean midpoint percentage and std. deviation. Nevertheless,
the mean midpoint and std. deviation is sufficient with a mean value of ∼ 10− 12%.
Only the XNOR2 gate has been thoroughly investigated by exploring different sizing designs.
The XNOR2 and XOR2 gates are totally equal in transistor structure (different input labels)
and therefore the results gained from the former gate is assumed applicable to the latter gate.
Nevertheless, the XOR2 gate is simulated with a distinct test bench do verify that the same
sizing design holds as for the XNOR2. By this, design 3 is also concluded to be the best design
for the XOR2 as well.
Table 3.6: XNOR2: Design sizes chosen for further investigation.
Design 1 2 3 (Chosen) 4
pMOS (W / L) 160 / 160 160 / 200 160 / 160 160 / 200
nMOS (W / L) 350 / 120 160 / 600 160 / 720 160 / 720
MOS Area [fm2] 270.4 448.0 563.2 588.8
3.2.9 Design of AOI22 Gate
The AND-OR-INVERT-22 gate is a combination of two AND2 input gates with a NOR2 gate
(which includes the Inverting feature) at the output. Conveniently, the AOI22 gate inherits the
transistor structure of XNOR2 and XOR2, but with the difference of eliminating the inverters
and labeling four inputs from A to D as shown in figure 3.19 [10]. If a functionality of the
AOI22 gate is desired in a circuit, then it could be more area and power efficient to use the 8T (8
transistors) structure rather than combining existing logic gates (NAND2, NOR2 and Inverter)
to realize the same functionality. Intuitively, two NAND2s and inverters are needed to construct
the two input AND gates, and one NOR2 gate as the OR-Invert at the output. This gives a
number of transistors equal to 16 which is twice the amount of transistors as the AOI22.
44 3.2. DESIGN OF SUB-THRESHOLD CELLS
The AOI22 gate has four inputs and thus sixteen combinations of the input. However, by an-
alyzing the truth table the AOI22 gate is found to possess thirty-nine (39) transitions labeled
with distinct numbers as depicted in figure 3.12a. Hence, the gate has 39 possible VTC curves
instead of 1-3 VTC curves as for most other gates in this thesis. In order to find the worst case
VTC curve combination, all the possibilities are simulated with DC analysis. It is discovered
that there are four distinct regions where multiple VTC curves occurs at. These regions are
labeled (i), (ii), (iii) and (iv) in figure 3.12b. Transitions that occur at region (i) are: 0-5, 10,
15, 17 and 28. In region (ii): 6-9, 12-14, 18-20, 22, 23, 25, 26, 29-31, 33, 34, 36 and 37.
In region (iii): 11, 16, 21 and 32. And finally region (iv) holds these transitions: 24, 27, 35
and 38. Because the pMOS transistor is the stronger one with minimum dimensions the VTC
curves typically occurs with midpoint location to the right of VDD/2. Hence, (i) is the worst
case VTC region. For the sake of simplicity the transition (0) is chosen as the configuration for
VTC analysis in the test bench.
The same number of transitions holds when it comes to transition analysis simulations. In
order to find the worst case transition with respect to timing, all combinations are simulated by
connecting an 1f capacitance the output and applying a rising and falling signal to simulate tr
and tf . The worst case is the transition number (37) and is not consistent with the worst case
VTC region. The transition number (37) is thus used as the transition analysis configuration
in a ring oscillator test bench. The test bench configuration for DC and transient analysis are
found in appendix C.4.
After simulating the AOI22 gate with worst case VTC and transient test bench configura-
tion the optimal sizing is found to be the same as for the XNOR2 and XOR2 gates. This is
not unexpected since the AOI22 gate has similar transistor structure. The transistor dimen-
sions chosen are repeated convenience sake: pMOS W/L = 160n/160n and nMOS W/L =
160n/720n.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 45
(a) Truth and transition table. (b) VTC family.
Figure 3.12: AOI22 truth- and transition table where green numbers are common starting point
for each arrow column and red numbers are ending points.
3.2.10 Design of OAI22 Gate
The OR-AND-INVERT-22 gate is a combination of two OR gates at the input and a NAND2 at
the output. The OAI22 also inherits basic structure from XNOR2 and XOR2 as the AOI22 gate
does, but with the difference of connecting together midpoint of nMOS rather than midpoint of
pMOS as with the AOI22 gate shown in figure 3.20 and 3.19 respectively [10]. Also, the inputs
are labeled different from the AOI22 gate.
The OAI22 gate has similar features as the AOI22 gate described in section 3.2.9. It has equal
number of transitions (39 transitions) and four distinct VTC family regions labeled (i), (ii),
(iii) and (iv) in figure 3.13b. However, the difference are the logic functionality and transitions
shown in figure 3.13a. Transitions that occur in region (i) are: 0-4. In region (ii): 2 and 5-7. In
region (iii) these transitions occur: 8-12, 15-18, 21, 22, 24, 25, 27, 28, 30, 31, 33, 34, 36 and
37. And finally in region (iv) these transitions occur: 13, 14, 19, 20, 23, 26, 29, 32, 35 and 38.
The VTC family region (i) is the worst case as similar to the AOI22 gate and transition number
(0) is used as the DC analysis test bench configuration.
The worst case transition with respect of transition analysis with timing measurements is found
to be the transition number (31). Hence, the (31) transition is used as configuration in the ring
oscillator test bench. The test bench configuration for DC and transient analysis are found in
appendix C.5.
After simulating the OAI22 with worst case VTC and transient test bench configuration, the
46 3.2. DESIGN OF SUB-THRESHOLD CELLS
same conclusion of transistor dimensions are applied as for the AOI22 gate described in sec-
tion 3.2.9. This means that the optimal transistor dimensions is found to be equal as AOI22,
XNOR2 and XOR2. Transistor dimensions are conveniently repeated: pMOSW/L = 160n/160n
and nMOS W/L = 160n/720n.
(a) Truth and transition table. (b) VTC family.
Figure 3.13: OAI22 truth- and transition table where green numbers are common starting point
for each arrow column and red numbers are ending points.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 47
Figure 3.14: Symbol (LS) and schematic (RS) of the designed Inverter gate.
Figure 3.15: Symbol (LS) and schematic (RS) of the designed NAND2 gate.
Figure 3.16: Symbol (LS) and schematic (RS) of the designed NOR2 gate.
48 3.2. DESIGN OF SUB-THRESHOLD CELLS
Figure 3.17: Symbol (LS) and schematic (RS) of the designed XNOR2 gate.
Figure 3.18: Symbol (LS) and schematic (RS) of the designed XOR2 gate.
Figure 3.19: Symbol (LS) and schematic (RS) of the designed AOI22 gate.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 49
Figure 3.20: Symbol (LS) and schematic (RS) of the designed OAI22 gate.
3.2.11 Design of D Flip-Flop Memory Element
The PowerPC 603 Flip-Flop
The D Flip-Flop design structure chosen for this project is the PowerPC 603 D-FF. The choice
is based on the results gained in a former Master thesis at NTNU [23], which is a comparative
study between these D-FF design structures: PowerPC 603; C2MOS; a Classic NAND-based D
Flip-Flop; and two Minority3-based D Flip-Flops. The PowerPC 603 was concluded to have the
lowest PDP, lowest total and static power consumption, very low propagation delay and average
relative std. deviation with respect to delay compared with the rest. However, the Minority3
D-FF was concluded to be the best choice if yield and robustness are prioritized, but in cost of
speed, power consumption and energy per transition. Although robustness and yield should be
prioritized in this project, the PowerPC 603 is chosen as the D-FF structure. The robustness and
yield are enhanced by increased supply voltage VDD rather than use of design strategies that are
speed, power, energy and area costly. Without dummy transistors, the Minority3 designs has
∼ 3X more transistors than PowerPC 603. Another comparative study [24] also concludes that
the PowerPC 603 D-FF is the best of static D Flip-Flops in terms of delay, power consumption,
PDP and EDP.
The PowerPC 603 D-FF was introduced in a PowerPC 603 RISC Microprocessor architecture
presented in [25]. The structure of the D-FF is based on master-slave D-latch configuration with
use of Transmission Gates (TG) on the input of each latch instead of clocked-inverters used in
C2MOS D-FF. Figure 3.21 depicts to the left side the standard symbol for a D-FF, and to the
right side the PowerPC 603 structure.
50 3.2. DESIGN OF SUB-THRESHOLD CELLS
Figure 3.21: Symbol (LS) and schematic (RS) of the designed D Flip-Flop PowerPC 603.
Clock Generation Circuit
The PowerPC 603 D-FF assumes dual-phase clocking scheme with Clk and complementary
Clk as shown in figure 3.21. However, in this project the D-FF is intended to be designed with
single-phase clocking scheme circuit. Thus, clock generation circuitry is implemented in the
PowerPC such that the dual-phase scheme is ported to a single-phase clocking scheme.
To sustain proper functionality of the D-FF, it is important to have high quality of the dual-
phase clocks seen internally. The rise and fall times should be as low as possible to provide
as abrupt transition between transmission gates as possible. In the other case, if the clock rise
and fall times are slow, the D-FF might not function properly, especially for high mismatch and
low temperatures (i.e. −40◦C). One method to generate dual-phase clocks is by complemen-
tary clock generation using a D-latch [10]. This method demands relatively many gates and
consequently large die area if used in each D-FF. Hence, this clock generator may usually be
globally shared by a number of D-FFs, and would then be an area efficient method. Another
way to generate the complementary clock is to use an inverter gate in each D-FF shown as ICG
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 51
in figure 3.21, however with the cost of a clock skew equal to the delay through the inverter
gate. This method is used in this project and in the designed D-FF as the flip-flop operates
correctly in Monte Carlo simulation at−40◦C which is the worst case temperature with respect
to delay.
The clock generator inverter was in the first revision of the D-FF simply designed as the in-
verter gate previously described in section 3.2.5. However, after simulating the D-FF with
VDD = 350mV and −40◦C, it was discovered that the clock-gen inverter was too weak when
discharging the complementary clock node, and consequently the fall time tf was to long com-
pared to other clock signal. The reason for the problem was that the clock-gen inverter has a
large capacitive load from four transistor gates in the D-FF. The solution for the problem was
to design a new stronger inverter with 2X strength since it has ∼ 2X capacitive load as the first
inverter is designed for. The method to design the 2X inverter was to use a ring oscillator with
four inverters connected in cascade and two inverters added in the chain, but in parallel giving
2X capacitive load to the preceding inverter. Then the inverters are sized such that rise tr and
fall times tf are minimized and VTC imbalance curve is optimized towards VDD/2. The 2X in-
verter is dimensioned with pMOS W/L = 160nm/200nm and nMOS W/L = 160nm/720nm
and is proven to be sufficiently strong after new simulations.
The Design of D-FF Building Blocks
The D Flip-Flop can be divided into two D-latches with different clock-phases. The D-latch can
further be divided into several building blocks. These are: inverters (I0, I1, and I2); TGs (paral-
lel transistors before net F0 and F1); and clocked-inverters (the upper and lower transistors) as
shown in figure 3.21. These building blocks are designed individually and will be described in
the following paragraphs.
All the inverter building blocks was in the first revision of the D-FF designed as the inverter gate
previously designed in section 3.2.5. However, after simulating the D-FF with VDD = 350mV
and −40◦C, it was discovered that inverter I0 was to weak when discharging the P0 node and
the fall time was to long with respect to the 32KHz frequency. Similarly to the clock-generator,
node P0 has the capacitive load of four transistors of inverter I1 and I2, and additionally a small
resistance through the TG before the F1 node. The solution was to use the 2X strength inverter
designed for the clock-generator to increase the drive strength and provide sufficient fast rise
and fall times. Besides this, the rest of the inverters (I1 and I2) are designed as the inverter
described in section 3.2.5.
A clocked-inverter building block (depicted in figure 3.22) is used in each latch of D-FF as
feedback. When input TG is open (i.e. stops the input signal to propagate), the clocked-inverter
is enabled and feeds back the latch output signal to retain stable state. The feedback clocked-
inverter was first sized so that the D-FF performance is optimized in term of minimum setup
52 3.2. DESIGN OF SUB-THRESHOLD CELLS
times tsu and clock-to-Q delays tco. The best sizing is then simulated to be when all transistors
in the clock-inverters were minimized, i.e. W/L = 160nm/120nm. Minimum sizes give the
least capacitive load on both clock nodes and latch outputs, and thus gives best performance.
However, after Monte Carlo simulating the D-FF with 32KHz clock and toggling the input
every period with temperature equal to −40◦C, the feedback node F1 noted in figure 3.21 of
the slave-latch was increasing slowly towards 120mV instead of near zero volt producing a
logic "1" on the output while the feedback is ON in the zero clock period (i.e. Clk=0). A
timing diagram of the problem is shown in figure 3.23. Still the D-FF did not fail with the
floating value of F1 and produced correct output value every time. However, since the value
was floating in opposite desired direction, it could happen that the flip-flop fail and flip the
stored value after a certain time or if a sudden noise peak arises. The problem was that all
transistors in the clocked-inverter were minimum sized which gave to low Ion current by the
stacked nMOS compared to the pMOS Ioff current, due to improper sizing and the fact that
only one pMOS is OFF in the PUN in this case. The solution was to increase all transistor
lengths (both nMOS and pMOS), although, as little as possible to accompany the previously
discussed principle of keeping the capacitive load as small as possible. All transistors, both
pMOS and nMOS are sized to W/L = 160nm/190nm and improved the worst case floating
voltage by half, thus not eliminating the floating problem with cost of minor increase of tco and
tsu.
Figure 3.22: Symbol (LS) and schematic (RS) of the Clocked-Inverter Gate.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 53
Figure 3.23: Feedback F1 floating in precharge phase problem at -40◦C.
The Transmission Gate should have as small propagation delay since the minimum required
setup time and clock-to-output time is affected by the delay through master and slave-latch re-
spectively. The TG has however no VTC problematic as it follows a transparency logic scheme,
i.e. 1 to 1 or 0 to 0 relationships. By simulation of a TG incorporated in an inverter ring oscilla-
tor with three inverters and one TG, it is confirmed that the transistors with as high strength as
possible gave best transient time in order of rise tr and fall time tf , and best propagation delay in
order of tpHL and tpLH . The nMOS and pMOS strengths versus sizing is depicted in figure 3.2,
and to yield highest strengths, the pMOS is minimum sized, i.e. W/L = 160nm/120nm and
the nMOS is sized with long length i.e. W/L = 160nm/720nm.
Figure 3.24: Symbol (LS) and schematic (RS) of the Transmission Gate.
54 3.3. THE ALU TEST CIRCUITS
3.3 The ALU Test Circuits
An Arithmetic Logic Unit (ALU) with word length of 8 bit is used as a test case for the sub-
threshold cell library. A simple VHDL code is taken from [26] and contains ALU operations
except multiplication and division. The list of operations are listed in table 3.7 and the ALU
symbol is depicted in figure 3.25.
Figure 3.25: ALU block schematic.
ALU Operation Description
Add Signed R = A + B : (signed two’s complement integers)
Substract Signed R = A - B : (signed two’s complement integers)
Bitwise NOT R(i) = NOT A(i).
Bitwise NAND R(i) = A(i) NAND B(i).
Bitwise NOR R(i) = A(i) NOR B(i).
Bitwise AND R(i) = A(i) AND B(i)
Bitwise OR R(i) = A(i) OR B(i).
Bitwise XOR R(i) = A(i) XOR B(i).
Table 3.7: ALU operations [26].
The ALU test case is intended to be a circuit between two pipelines, as synchronous systems
often is. Some modifications to the VHDL code were required to realistically simulate the
circuit with respect to timing scheme. The ALU is modified to include D-FFs at both inputs
and outputs to model the pipeline registers, since the original VHDL module only had output
registers. The VHDL is modified by including a "process()" statement sensitive to Clk, which
at each rising edge assigns the inputs A, B and Op to registers named Reg1, Reg2 and Reg4.
These registers are read from the "process()" originally present in the code, which performs
the ALU operations and assigning the result to the output registers. The pipelined test case is
depicted in figure 3.26 where the input and output has D-FFs as registers. The modified ALU
VHDL code is provided in appendix A.1.
Figure 3.26: Block schematic of pipelined ALU after modifications.
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 55
3.3.1 Logic Synthesis
The ALU HDL design is synthesized with use of Cadence Encounter RTL Compiler tool [27]
and with an available 130nm HVT cell library for above-threshold. There is multiple character-
ization steps needed to make the sub-threshold cell library fully supported by the synthesis tool.
This characterization work is not part of this thesis. Hence, the ALU circuit is synthesized with
an above-threshold cell library, and afterwards the cells in the synthesis are manually changed
to the sub-threshold cells in the resulting netlist.
Two files are needed to use the synthesis tool. One is a Tool Commando Language (Tcl) script
file which applies numerous commandos to the synthesis tool. The script contains specification
of attributes such as: HDL language; library and search path; commando to read HDL file;
commando to read constraint file (.sdc); and to start the synthesis (and others). It is also pos-
sible to define avoidance of cells in the script such that a circuit is only synthesized with e.g.
Inverter, NAND2, NOR2 and D-FF. The second file is a constraint file (.sdc). The constraint file
is used to define output capacitances, define clock and clock frequency and especially define
maximum allowable fan-out for each node or specific node in a design. Content of both files
for synthesizing an ALU with restriction to cells of Inverters, NAND2, NOR2 and D-FFs, and
with maximum fan-out for each node to 3 is provided in appendix A.2 and A.3.
The sub-threshold cell library is designed with minimum drive strength in each logical element.
The sub-threshold cell library is not characterized in the way that enables the synthesis tool
to use the library and thereby optimize the circuit for sub-threshold operation. With this in
mind, it is not certain that a synthesized circuit without fan-out restriction and use of above-
threshold library would be robust for sub-threshold operation. Six ALU circuits with restriction
to Inverter, NAND2, NOR2 and D-FF cells and with restricted node fan-out of 2, 3, 4, 5, 6 and
∞ (i.e. without fan-out restriction) is therefore synthesized to find the optimum fan-outs for the
circuit. The number of synthesized cells and corresponding critical path delays for each fan-out
case are depicted in the graph of figure 3.27. For instance, when restricting the synthesis to
only allow a fan-out of 2 (FO2) in each node of the circuit, the number of cells increases to 512
cells. Although the circuit has lowest fan-out, the critical path seems to increase in length and
thereby yield longer delay through the circuit. At the other end with no restriction of fan-outs,
the number of cells is the least with a number of 319 cells. However, due to large fan-outs, the
critical path delay is even worse than the former case. There is a trade-off between number of
cells and delay. Higher number of cells increases the die area and even more important static and
probably dynamic power consumption since more cells are needed to switch in order to produce
the same functionality. However, due to speed requirements, especially in low environment
temperature that increases the delay rapidly, the main objective is to decrease worst case delay
to the minimum. A minimum delay point is found with a fan-out of 3 with 393 cells and is thus
chosen as the best sub-threshold ALU circuit restricted to INV, NAND2, NOR2 and D-FF. This
ALU design is shown as No.2 in table 3.8.
56 3.3. THE ALU TEST CIRCUITS
Figure 3.27: Number of synthesized cells (circles) and critical path delay (squares) versus num-
ber of Fan-outs allowed. Delay is simulated at −40◦C and nominal corner.
Another ALU circuit is synthesized to include all the designed sub-threshold cells (i.e. +
XNOR2, XOR2, AOI22, OAI22). This ALU synthesis is also restricted to a fan-out of 3 assum-
ing that the number of fan-out is approximately optimal in this case to. The synthesis resulted
in less number of total cells of 272 as shown in table 3.8 as the No.3 design.
A ALU circuit without restrictions of cells and fan-out is also synthesized and yielded a total
number of cell of 118 as shown in table 3.8 as the No.1 design. This circuit is used with the
above-threshold library and is compared against the fan-out restricted sub-threshold circuits
in term of power consumption. However, the use of above-threshold library does not include
extracted layout parasitics though.
The fan-out 3 circuit No.2 and No.3 are intended as test circuits for comparison between the
existing above-threshold library and the designed sub-threshold cell library in this thesis. The
procedure to apply the sub-threshold cells in the ALU circuits is to modify the netlist file such
that all e.g. inverter names are replaced with the sub-threshold inverter name found in the
library and with corresponding input and output letters (above-threshold library used "I" for
input and "O" for output, while in the sub-threshold library "A" and "Y" is used respectively).
The modified netlist is then imported into Cadence schematic editor with reference to the sub-
threshold cell library.
3.3.2 Circuit Design Method (i.e. The ALU Module)
Figure 3.28 depicts a block schematic over the methodology used in this project to design the
ALU circuit. The blue boxes present steps that have been done. However, since the circuit
in this project is only a test case for the cell library designed, less time is spent in system
specifications and Abstract high-level model part. The boxes in darker colors presents further
Chapter 3 Design Methodology and Application of the Sub-Threshold Cells 57
Table 3.8: Logic synthesis results for the ALU cases: (1): no restrictions of logic gates and FO;
(2): restricted to FO3 and INV, NAND2, NOR2 and D FF; (3): restricted as No.2 + XNOR2,
XOR2, AOI22 and OAI22. Synthesis is based on 1.2V, 25◦C, nominal with Atmel‘s above-
threshold cell library.
Design No.1 (Above-threshold) No.2 (Above- & sub-threshold) No.3 (Above- & sub-threshold)
Mapped Gates
Gate Instances
Inverter 4 131 123
NAND2 12 105 39
NOR2 3 130 26
D-FF 27 27 27
XNOR2 0 1
XOR2 0 9
AOI22 6 34
OAI22 1 13
Misc 65 0 0
TOTAL 118 393 272
work steps to realized a finished chip and are not part of this project. Specifications for the ALU
circuit were decided to be 8-bit word length and most of the functionality except multiplication
and division. The next steps are described in the next paragraphs.
The method used to design the ALU module was firstly to research after available high level
hardware description language code. A HDL code was found in [26] and the behavior of the
module is verified and modified to contain input registers. VHDL language is used with Active-
HDL EDA tool [28] as editor and simulator. The VHDL code is provided in appendix A.1.
In the next step, a logic synthesizer tool named Cadence Encounter RTL Compiler [27] is used
to synthesize the VHDL code into a Verilog netlist. After synthesis, the tool reports estimated
values for area, power, timing and so on.
The design is then imported into Cadence Virtuoso Schematic Editor tool [20] with use of the
synthesized netlist. Virtuoso platform is a tool for designing full-custom integrated circuits
and comes with a schematic editor. Within Virtuoso there exists simulators called Analog De-
sign Environment [29] (ADE-L,XL,GXL) that are used to simulate, analyze and verify circuit
behavior. This tool is used to give results in term of power, delay and so on.
58 3.3. THE ALU TEST CIRCUITS
Figure 3.28: Design hierarchy showing methods and tools used.
Chapter 4
Layout
Layout design is done on each logic element such that more realistic model with capacitive
and resistive parasitics is taken into account when simulating DC and transient analysis. The
extraction method used is the 2D extraction mode (xRC) with R+C+CC and no inductance
type in Calibre PEX tool. The extraction format is CALIBREVIEW. The focus of this thesis
has not been to optimize layout for sub-threshold operation with use of techniques available in
literature and papers. Such study should be considered prior finalizing the layout. However,
some guidelines and rules have been followed while designing the layout for every logic gate.
To make the layout as realistic and portable into existing above-threshold cell library, the rail-
to-rail pitch is set to the same as in the mentioned library1. All transistors are orientated in
the same directions to minimize mismatch, and transistors with gate connected together are
placed above each other as long as possible to make poly wire as short and vertical as possible.
Additionally, Design Rule Checking (DRC) rules are complied with use of Calibre DRC tool
where rules such as minimum metal and poly widths, and minimum distances in different layers
are met. The nMOS and pMOS transistor is not custom made and ready-made layout for those
is used. However, properties of the transistors are changed such as the contacts not in use due
to direct connection to the neighboring transistor is removed as shown in the bottom of the
NAND2 layout of figure 4.2 and top of NOR2 layout of figure 4.3. The layout of all designed
cells is presented in the remainder of this chapter.
1Design rules are not revealed due to confidentiality.
59
60 4.1. LAYOUT OF INVERTER GATE
4.1 Layout of Inverter Gate
Figure 4.1: Layout of the Inverter gate.
4.2 Layout of NAND2 Gate
Figure 4.2: Layout of the NAND2 gate.
Chapter 4 Layout 61
4.3 Layout of NOR2 Gate
Figure 4.3: Layout of the NOR2 gate.
4.4 Layout of XNOR2 Gate
Figure 4.4: Layout of the XNOR2 gate.
62 4.5. LAYOUT OF XOR2 GATE
4.5 Layout of XOR2 Gate
Figure 4.5: Layout of the XOR2 gate.
4.6 Layout of AOI22 Gate
Figure 4.6: Layout of the AOI22 gate.
Chapter 4 Layout 63
4.7 Layout of OAI22 Gate
Figure 4.7: Layout of the OAI22 gate.
4.8 Layout of D Flip-flop Memory Element
Figure 4.8: Layout of the D-FF memory element.
64 4.8. LAYOUT OF D FLIP-FLOP MEMORY ELEMENT
Chapter 5
Simulations and Test of Sub-Threshold Cells
and ALU Application
This chapter presents the methods used to simulate, test and verify the correctness of the pro-
posed sub-threshold cells and the ALU applications. This chapter will first present how test
bench and simulation is done considering the sub-threshold cells designed including DC, tran-
sient and power simulations. Then finally the test bench and simulation of the test case ALU
circuits is presented. All test benches related to logic cells, MOS transistors and ALU circuit
are drawn in Cadence Schematic editor tool [20] and simulated by use of Cadence ADE-L and
ADE-XL tools [29] and Spectre simulator.
5.1 Sub-Threshold Cell Design Simulations
5.1.1 Transistor Strength and Threshold Voltage Simulation
Transistor strength is simulated in term of ON current Ion. The ON currents are simulated by
biasing pMOS to zero volt and nMOS to VDD as shown in figure 5.1, and setting supply voltage
to 350mV. The currents are simulated on the drain and source terminal of nMOS and pMOS
respectively with use of DC analysis in ADE-L. The DC analysis is set up to sweep either width
or length of the transistors from minimum to 800nm to model the strength variation versus
either dimension.
65
66 5.1. SUB-THRESHOLD CELL DESIGN SIMULATIONS
Figure 5.1: Test bench setup to simulate the ON transistor strength.
The same test bench and simulation setup is used to simulate the change in transistor threshold
voltage versus dimensions. Threshold voltages are found in result browser as "vth" under the
"dcOpInfo" and transistor folder. Intrinsic threshold voltages are found as "vtho" under model
and transistor folder.
5.1.2 Cell Test bench Setup and Simulation
In order to design, simulate and analyze logic elements targeting minimum drive strength, the
general test bench setup is by connecting logic elements in a five stage ring oscillator config-
uration [12]. A test bench for the inverter gate is depicted in figure 5.2 with the configuration
mentioned. The test bench is designed to enable both DC and transient analysis by use of a
basic simulator switch S0 that closes during DC analysis and opens during transient analysis.
In order to make the ring oscillator converge and start oscillate in transient analysis a voltage
controlled switch (W0) is connected to the input node of the first gate such that the node is
initialized to VDD in a period of 1ns at the start of simulation. The voltage controlled switch is
closed while the voltage is grounded for 1ns, and opened for the rest of the simulation time by
holding the voltage equal to VDD by a signal period longer than the double of simulation time.
The 1ns time is made by setting the delay parameter of the voltage pulse source to "-1ns".
DC Analysis Simulation
DC analysis simulations results in a VTC curve by simulating the first gate of the ring oscillator.
The DC analysis is further described. Input denoted "A" in test benches are sweep from zero to
VDD voltage where each point is held stable for long enough time to produce stable DC behavior
at the output Y. Hence, a VTC curve is produced from the whole voltage sweep range. All the
expressions used to simulate interesting DC values in Cadence ADE-L and ADE-XL tools are
listed in table 5.1. The parameters of the functions are set through a calculator GUI and the
functions are further described in Help menu.
Chapter 5 Simulations and Test of Sub-Threshold Cells and ALU Application 67
Table 5.1: DC analysis simulations and expressions where "/Y" is the VTC curve. VS(): nodal
voltage (DC sweep), VAR(): variable.
Simulation Expression
VM cross(VS("/Y") (VAR("vdd_var")/2) 1 "either" nil nil)
VM % 100 * ( (VM / (VAR("vdd_var") /2 )) - 1)
VIL cross(deriv(VS("/Y")) -1 1 "falling" nil nil)
VIH cross(deriv(VS("/Y")) -1 1 "rising" nil nil)
VOH value(VS("/Y") 0)
VOL value(VS("/Y") VAR("vdd_var"))
NMH VOH - VIH
NML VIL - VOL
Gain ymin(deriv(VS("/Y")))
Transient Analysis Simulation
Transient analysis simulations are done by first making the ring oscillator oscillate for a suffi-
cient number of periods to exclude startup errors. Hence, the transient analysis starts at time
zero, but does not outputstart simulation data before e.g. 10µs. All transient results such as rise
time, fall time, propagation delay etc. are found at the first occurrence after outputstart time.
The transient analysis simulation could actually stop immediately after timing simulations are
found. However, this is not supported by the simulator and therefore the simulation stop time
has to be sufficiently long to capture all needed transient values. All the expressions used to
simulate transient values in Cadence ADE-L and ADE-XL tools are listed in table 5.2.
Table 5.2: Transient analysis simulations and expressions where "/A" is the input and "/Y" is
the output of the gate simulated. VT(): nodal voltage (transient analysis), VAR(): variable.
Simulation Expression
tf fallTime(VT("/A") VAR("vdd_var") nil 0 nil 90 10 nil "time")
tr riseTime(VT("/A") 0 nil VAR("vdd_var") nil 10 90 nil "time")
tpLH delay(?wf1 VT("/A") ?value1 (0.5*VAR("vdd_var")) ?edge "falling" ?nth1 1 ?td1 0.0 ?wf2
VT("/Y") ?value2 (0.5*VAR("vdd_var")) ?edge2 "rising" ?nth2 1 ?td2 0.0 ?stop nil ?multiple nil)
tpHL delay(?wf1 VT("/A") ?value1 (0.5*VAR("vdd_var")) ?edge "rising" ?nth1 1 ?td1 0.0 ?wf2
VT("/Y") ?value2 (0.5*VAR("vdd_var")) ?edge2 "falling" ?nth2 1 ?td2 0.0 ?stop nil ?multiple nil)
Delay (tpLH + tpHL) / 2
Power Consumption Simulation
Power consumption is simulated in term of total and static power consumption by averaging
U*I. Dynamic power consumption is calculated by the relation of Ptotal = Pdynamic + Pstatic.
68 5.1. SUB-THRESHOLD CELL DESIGN SIMULATIONS
Logic gates have multiple states and these may consume different static power. These are sim-
ulated by applying distinct stable logic inputs signals. The total power consumption is through
the dynamic component depended on the activity factor, frequency, capacitive load and supply
voltage. Two power cases are thus simulated: the fullspeed power with use of a five stage ring
oscillator, and 32KHz power consumption with a five stage chain. To simulate the distinct cur-
rents drawn from a selection of gates, the gate Vdd Net Expressions are overridden to distinctly
made voltage sources and the currents are then simulated from each voltage sources [30]. All
the expressions used to simulate power are listed in table 5.3.
Table 5.3: Power simulations and expressions. IT(): terminal current (transient analysis),
VAR(): variable.
Simulation Expression
Ptot fullspeed (VAR("vdd_var") * average(abs(IT("/Vfullspeed/PLUS")))) / 5
Ptot 32KHz (VAR("vdd_var") * average(abs(IT("/V32khz/PLUS")))) / 5
Pstat 00 VAR("vdd_var") * average(abs(IT("/Vstatic00/PLUS")))
Pstat 01 VAR("vdd_var") * average(abs(IT("/Vstatic01/PLUS")))
Pstat 10 VAR("vdd_var") * average(abs(IT("/Vstatic10/PLUS")))
Pstat 11 VAR("vdd_var") * average(abs(IT("/Vstatic11/PLUS")))
Inverter Gate Simulation
The inverter gate has only one input and output. For this reason the inverter gate can only
be connected in a ring oscillator in one configuration in a chain of inverters as shown in fig-
ure 5.2.
Figure 5.2: Test bench setup to simulate both VTC and switching analysis of Inverter gate.
Chapter 5 Simulations and Test of Sub-Threshold Cells and ALU Application 69
NAND2 Gate Simulation
The NAND2 and NOR2 gate has two different connection configuration methods in a ring
oscillator. These are depicted in figure 5.3 and 5.4 for the NAND2 gate where one input is
biased to VDD in the former case and both inputs connected in the chain in the latter case. The
difference for the NOR2 gate is that one input is biased to ground, while the second case is equal.
This means that logic gates with more inputs than one has multiple VTC curves and transient
analysis results. However, the focus is at the worst case of VM and transient results.
Figure 5.3: Test bench setup to simulate both VTC and switching analysis of NAND2 gate with
one input sourced to VDD and the other connected in chain.
Figure 5.4: Test bench setup to simulate both VTC and switching analysis of NAND2 gate with
both input connected in chain.
70 5.1. SUB-THRESHOLD CELL DESIGN SIMULATIONS
The Other Cell Test Benches
The test benches for NOR2, XNOR2, XOR2, AOI22 and OAI22 are similar to NAND2, but
with different biasing. The test benches are depicted and found in appendix C.
5.1.3 D Flip-Flop memory element Test bench Setup
The Setup and Hold time Simulation
The minimum setup time tsu is the minimum time an input data value should be stable prior a
valid clock edge. In [31], a method to find the metastability window is presented. The method
is to iteratively narrow the distance between a data input transition and a valid clock edge, and
simulate the clock-to-output propagation delay tco. This method can be used to find both setup
tsu and hold time th. Figure 5.5 depicts the method in detail. The metastability window is
defined by the maximum tolerable Clock-to-output delay tp_cq_max chosen by the designer. This
point can be defined as the minimum allowable setup and hold time. The downside of this
method is the lack of implementation in tools. Thus, comprehensive programming skills and
design time is needed to implement automatic search for metastability window. In Cadence
tools, it may be possible to use the method by use of bisection function in HSPICE or search in
SpectreMDL.
Figure 5.5: The method to determine setup and hold time presented in [31].
Due to design time constraints, another simpler method is used to simulate the setup time. The
method is to simulate the propagation delay from the data input to the output of the master latch
Chapter 5 Simulations and Test of Sub-Threshold Cells and ALU Application 71
P0. The method is manually checked against the method described in the former paragraph
by parametric sweep of data input delay where the transition of data input is approaching the
clock edge, and the clock-to-output propagation delay tco, master-latch propagation delay tD_P0
and setup time tsu is simulated for each sweep step. These three simulations are presented in
figure 5.6. The D-FF fails when data input delay is too long, or equally the setup time is to
short. This is shown when tco increases and suddenly flatten because no transition occurs on the
output. However, at point ’a’ in the figure, the simulated tD_P0 and tsu intersects and represents
the point at where the minimum setup time is defined by tD_P0. Both rising and falling master-
latch delay setup time method is checked. The rising case resulted in a tco increase of ∼ 30%,
and the falling case resulted in a tco increase of ∼ 5%. These results confirm that the method is
reliable enough with acceptable tolerance.
Figure 5.6: Check validity of master-latch propagation delay tD_P0 as the setup time simulation
method against iteratively narrowing data input transition towards clock edge.
Figure 5.7: Simulation of propagation delay through master-latch setup.
The procedure to simulate the propagation delay through master-latch (setup time) is to apply
zero volt to the clock input which enables and closes the master-latch TG, then apply a pulse
72 5.1. SUB-THRESHOLD CELL DESIGN SIMULATIONS
signal on the data input and simulate the propagation delay from data input to the output of
master-latch P0.
The hold time th is unfortunately not achieved by simulation of propagation delay through any
part of the D-FF, and due to lack of time to implement search algorithm the hold time is not
simulated in this project.
The Clock-to-output Propagation Delay Simulation
The clock-to-output propagation delay tco is dependent on the length of setup time applied
before a valid clock edge. This is shown in figure 5.5 and 5.6 where tpcq or tco respectively
is increasing as setup time is shortened towards the clock edge. Hence, the data input should
be applied at the point where minimum setup time is achieved which gives worst case clock-
to-output propagation delay tco. However, due to difficulties in applying present simulated
minimum setup time, a nominal clock-to-output delay is rather simulated by applying data
input at time Tclk/2+ ∼ 3µs as shown in figure 5.8. The extra 3µs time is to let the TG fully
close before data input is applied.
The rise and fall time of Clk and data D input for all tests is set to the mean between the Monte
Carlo simulated mean values of tr and tf for the designed inverter gate and distinctively for
the different temperatures. For instance, the rise and fall time is set to (tr + tf )/2 = (476ns +
879ns)/2 ∼ 677ns for the -40◦C simulation case. This is not perfect for all corners, but it is a
much better estimate than ideal rise and fall times of 1ns.
Figure 5.8: Timing diagram of clock-to-output tco simulation.
The D-FF Power Simulation
There are two cases of total power consumption simulated. The first case is power consumption
with a clock frequency of 32KHz and data toggling with a rate of 2 · 32KHz which makes the
output toggle between zero and one logically. The power consumption is simulated by averaging
U*I over four clock periods. The same applies for the simulation of static power consumption,
Chapter 5 Simulations and Test of Sub-Threshold Cells and ALU Application 73
but with static input signals on data and clock input. To simulate the current drawn from the
distinct D-FF instances in the test bench the Net Expression are overridden to a distinctly made
voltage source and the currents are then simulated from each voltage sources [30].
5.2 ALU Module Simulation
5.2.1 Test bench Setup
Figure 5.9: ALU test bench in Cadence Virtuoso schematic editor.
5.2.2 Simulation Setup
The clock input frequency is set to 32KHz. The rise and fall time of Clk signal is set to 1ns for
all simulations (power, delay etc.) since these are difficult to scale for the range of temperatures
and supply voltages. Hence, it seems to be better to set the rise and fall times to 1ns, which is
much lower than realistic times for an inverter gate output. This gave insignificant difference
in critical path delay and power consumption simulation compared to higher rise and fall times
closer to realistic values, however it gave more realistic simulation results for higher supply
voltage towards 1.2V.
The A<7:0> and B<7:0> inputs are statically set to "1000 0000" and "0111 1111" respectively
so that it is high activity on the output R. The Op<2:0> input is changed in the middle be-
tween valid rising edges of the clock every clock period. The Op input is changed in ascending
order from "000" up to "111" to cycle through the available operations (addition, subtraction,
NOT, NAND, etc.). Stimulus files are shown in appendix B.1 and B.2, where the former is a
stimulus file to simulate dynamic power consumption and the latter is to simulate static power
consumption.
74 5.2. ALU MODULE SIMULATION
5.2.3 Power Consumption
The power consumption is simulated by averaging U*I. The best method to simulate total power
consumption is to have long simulation time to increase accuracy and to apply various input
signals to obtain average over different activities. However, the used method is simplified to
only use static values on input A and B and sequentially change Op input in acceding order
as described in section 5.2.2. The simulation time is set to [0 : 5.0 · TClk], and simulation
outputstart time 1.0 · TClk such that every logic block is at a stable state before simulations
are done. The power simulation is done with averaging power over four clock periods TClk.
Additionally the simulator accuracy is set to conservative such that better accuracy is gained.
The power consumptions (total and static) is simulated in TT corner and 25◦C for the range of
supply voltage of [350m, 1.2V] with 50mV spacing resulting in 17 parametric sweeps.
The static power consumption is computationally easier for the simulation tool due to static
input values, and is simulated for a sufficiently longer time. However, there is no point in
simulating for a way to long time since the static power consumption is fairly stable.
The dynamic power consumption is implicitly simulated by calculating it from the relation
Ptotal = Pdynamic + Pstatic.
5.2.4 Propagation Delay Through Critical Path
The critical delay through the circuits was simulated by simulating critical path in the design in
Cadence ADE-XL simulator [29]. The critical path is only informal reported by the synthesis
tool (timing_report.rep) and thus had to be manually made in a custom netlist file prior import
to Cadence Schematic Editor. The timing report also states a number of fan-out on each node
in the critical path. Hence, the same numbers of dummy inverters are included to each node
to model the fan-out capacitance. The propagation delay is simulated within a temperature of
−40◦C and in every corner for the range of supply voltage of [350m, 1.2V] with 25mV spacing
resulting in 35 parametric sweeps.
Chapter 6
Results
This section presents the results of the sub-threshold cells and simulation results of ALU test
case circuits. The former results are gained from Monte Carlo simulation (220 runs, seed=47)
and the latter results are gained from corner simulations. Intermediate results regarding the
logic gates are found in appendix D.
6.1 Sub-Threshold Cell Library Results
Table 6.1: Chosen gate design dimensions
Logic gate pMOS [nm] nMOS [nm] Cell area N. Transistors
W L W L [µm2]
Inverter 160 240 160 480 5.248 2
NAND2 160 270 160 720 9.088 4
NOR2 160 150 160 720 10.24 4
XNOR2 27.136 12
XNOR2 structure 160 160 160 720 8
XNOR2: 2x Inverter 160 240 160 480 2x 2
XOR2 28.288 12
XOR2 structure 160 160 160 720 8
XOR2: 2x Inverter 160 240 160 480 2x 2
AOI22 160 160 160 720 18.688 8
OAI22 160 160 160 720 16.432 8
D-FF 43.328 20
D-FF: 2x Inverter 160 240 160 480 2x 2
D-FF: 2x Inverter 160 200 160 720 2x 2
D-FF: 2x TG 160 120 160 720 2x 2
D-FF: 2x Clk-Inverter 160 190 160 190 2x 4
75
76 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
6.1.1 Inverter Gate
Table 6.2: Monte Carlo DC results for the inverter gate with process and mismatch and extracted
parasitics.
DC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
Midpoint [%] -40 -10.1 30.33 10.93 7.16 -9.48 4.23 -2.08 2.33
25 -10.45 29.94 9.948 7.219 -9.93 3.12 -2.77 2.42
85 -12.43 29.75 9.042 7.293 -10.03 2.79 -3.24 2.45
NMH [mV] -40 100.4 172.7 135.3 12.60 528 600.6 563.9 13.4
25 103.5 175.4 138.2 12.61 516.6 600 558.3 14.0
85 102.4 175.4 138 12.71 507.9 591.1 551.3 14.4
NML [mV] -40 141.9 215.6 179.5 12.89 517.9 596.6 557.2 13.5
25 141.8 214 179.2 12.62 502.9 580.3 541.3 13.9
85 138.6 210.9 175.6 12.68 484.1 567.5 526.9 14.7
Gain -40 -36.47 -7.31 -25.92 5.36 -45.35 -23.3 -38.7 5.66
25 -37.04 -21.18 -32.92 4.31 -42.67 -21.81 -35.11 5.461
85 -34.8 -20.13 -30.76 4.26 -39.59 -21.26 -32.43 4.829
Figure 6.1: Monte Carlo Inverter layout, VDD = 350mV: Midpoint percentage with process and
mismatch.
Chapter 6 Results 77
Table 6.3: Monte Carlo AC results for the inverter gate with process and mismatch and extracted
parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tr [ns],[ps] -40 138.4 1844 476.1 271.9 144.3 194.4 167.9 8.79
25 22.71 166.6 58.1 23.74 184.1 245.9 210.9 10.63
85 7.816 41.73 17.55 5.667 218.3 291.9 250.4 12.32
tf [ns],[ps] -40 119 3805 878.6 640.1 115.9 150.7 130.6 6.18
25 19.83 243.3 82.63 42.19 151.9 196.5 170.3 8.07
85 7.457 51.67 21.88 8.563 186.7 239.9 209.9 9.68
tPHL [ns],[ps] -40 225.2 2939 812.3 459.2 106.3 144 125.1 6.79
25 30.92 201.9 82.54 32.44 127.4 171.3 149.7 8.09
85 10.11 47.06 22.42 6.843 145.4 194.2 170.4 9.15
tPLH [ns],[ps] -40 187.9 3580 778.4 438.7 119.7 161.5 137.3 8.18
25 29.58 266.6 78.73 30.94 144.9 193.6 166.3 9.81
85 10.08 59.08 21.17 6.534 166.8 220.3 191 11.11
Delay [ns],[ps] -40 266.6 2975 795.3 366.2 117.3 149.7 131.2 5.83
25 37.56 230.7 80.64 26.0 141.5 179 158 6.90
85 12.12 52.31 21.8 5.484 161.9 203.5 180.7 7.74
Pstatic Vdd [fW] -40 368 375.7 369.4 1.117 4681 8041 5530 476.9
25 420.5 1211 589 125.8 5384 11210 7061 985.3
85 1893 15120 5133 2204 11740 64970 24890 8679
Pstatic gnd [fW] -40 367.9 369.1 368.2 0.162 4344 4586 4401 33.32
25 373.8 519.2 413.5 28 4406 5304 4624 135.6
85 699.6 4228 1799 685.1 5741 20310 10270 2825
Pavg max freq -40 13.51 91.73 38.5 14.61 2.707 3.289 2.983 0.117
[pW],[uW] 25 167.3 682.7 358.9 99.43 2.315 2.8 2.544 0.098
85 712.7 2230 1329 294.6 2.082 2.512 2.285 0.086
Pavg 32KHz -40 9.15 17.27 13.61 1.57
[pW] 25 10.44 21.99 12.55 1.785
85 11.31 32.97 16.69 3.504
78 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
Figure 6.2: Monte Carlo Inverter layout, VDD = 350mV: Delay with process and mismatch.
6.1.2 NAND2 Gate
Table 6.4: Monte Carlo DC results for the NAND2 gate with process and mismatch and ex-
tracted parasitics.
DC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
Midpoint [%] -40 5.79 34.07 19.16 5.27 2.046 10.16 6.406 1.865
25 6.02 34.14 20.09 5.30 5.995 14.16 9.866 1.988
85 7.66 37.71 22.13 5.40 9.397 18.1 13 1.939
NMH [mV] -40 93.65 147.1 121.6 9.211 479.5 532 507.1 11.06
25 94.58 144.7 120.5 9.264 440.4 498.9 471.7 11.68
85 88.33 139.2 114.4 9.329 408.7 467 438.8 12.31
NML [mV] -40 172.2 220.1 195.1 9.275 586.2 645.9 614 10.43
25 173.3 223.5 197.3 9.269 601.2 655.8 628.3 10.53
85 174.7 224.9 199 9.304 611.7 672.1 641 11
Gain -40 -39.03 -19.56 -28.7 4.735 -44.75 -23.97 -41.08 4.418
25 -38.11 -21.55 -34.31 4.273 -41.32 -22.58 -37.62 4.259
85 -35.33 -20.14 -31.9 3.885 -38.38 -21.57 -34.05 4.307
Chapter 6 Results 79
Table 6.5: Monte Carlo AC results for the NAND2 gate with process and mismatch and ex-
tracted parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tr [ns],[ps] -40 170.3 1495 531.1 201.9 223 268.5 244.5 7.737
25 32.22 159.3 73.12 19.92 292.8 353.3 323.1 9.884
85 13.08 45.13 24.55 5.21 363.6 435.6 400.6 11.81
tf [ns],[ps] -40 417 4559 1584 782.3 536.1 665.4 603.8 24.84
25 71.47 414.4 187.7 67 732.3 899.8 819.7 31.57
85 27.58 110.3 58.2 16.1 926.6 1131 1032 37.5
tPHL [ns],[ps] -40 408.9 4370 1350 561.6 364.2 483.2 424 18.5
25 66.19 372.3 160 47.24 464.9 612.1 538.9 23.06
85 24.71 94.77 49.5 11.29 557.6 729.3 643.9 26.93
tPLH [ns],[ps] -40 395.6 2401 1071 409.3 244.7 308.7 277.1 12.61
25 61.93 215.6 123.2 33.38 296.5 378.9 337.3 15.87
85 20.79 55.6 35.5 7.46 336 435.1 383.5 18.82
Delay [ns],[ps] -40 449.6 3079 1211 425.9 314.7 382.6 350.5 12.24
25 69.48 271 141.6 35.37 394.5 476.9 438.1 15.08
85 24.46 68.78 42.46 8.15 463.8 557 513.7 17.41
Pstatic 00 [fW] -40 306.5 307.7 306.8 0.229 3658 3983 3766 62.28
25 312.8 392.3 332.8 14.26 3721 4480 3927 113.2
85 665.1 2008 1129 272.2 4897 10680 6828 1058
Pstatic 10 [fW] -40 368.2 370.4 368.9 0.444 5011 8312 6162 636.9
25 379.9 566.4 423.7 31.04 5515 10860 7399 1028
85 866 4849 1950 686.6 8979 26890 14170 3248
Pstatic 01 [fW] -40 312.7 327.6 317.6 2.716 3994 4208 4074 40.41
25 348.9 515.4 393.1 27.02 4187 4985 4415 131
85 782.8 4485 1869 613.9 5835 20050 10030 2354
Pstatic 11 [fW] -40 491.1 502.5 494.1 1.904 7082 13370 9281 1213
25 596.2 1718 909.2 190.1 8403 20260 12830 2194
85 3398 21610 9288 3245 21050 96920 45730 13000
Pavg max freq -40 15.89 87.53 41.42 13.69 2.005 2.421 2.204 0.078
[pW],[uW] 25 170.7 596 339.9 81.53 1.666 1.994 1.824 0.065
85 680 1817 1162 218.3 1.463 1.754 1.609 0.054
Pavg 32KHz -40 16.22 33.01 23.98 3.326
[pW] 25 16.98 35.44 22.01 2.435
85 18.82 51.06 29.84 6.443
80 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
Figure 6.3: Monte Carlo NAND2 layout, VDD = 350mV: Midpoint percentage with process and
mismatch.
Figure 6.4: Monte Carlo NAND2 layout, VDD = 350mV: Delay with process and mismatch.
Chapter 6 Results 81
6.1.3 NOR2 Gate
Table 6.6: Monte Carlo DC results for the NOR2 gate with process and mismatch and extracted
parasitics.
DC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
Midpoint [%] -40 -18.58 26.03 7.524 8.036 -12.41 -2.151 -7.426 1.961
25 -16.64 29.27 9.78 8.005 -14 -5.009 -9.074 1.859
85 -14.33 29.9 11.15 7.923 -14.26 -5.984 -10.26 1.887
NMH [mV] -40 111.6 188.6 142.5 13.74 573.1 624.4 596.9 10.27
25 106.6 184.5 139.3 13.83 571.4 626.5 599 10.94
85 102 180.7 135 14 567.5 626.9 599.4 11.5
NML [mV] -40 126.6 208.6 172.9 14.83 498.1 553.4 523.8 10.37
25 133.3 211 178.5 13.85 475.3 532.2 501.5 10.66
85 133.5 211.4 178.8 13.93 455.4 512.8 481.2 11
Gain -40 -39.24 -13.71 -27.81 5.609 -44.63 -23.29 -38.89 6.001
25 -37.91 -21.35 -33.95 4.314 -41.65 -22.07 -36.61 4.751
85 -35.26 -20.27 -31.52 3.968 -38.83 -21.11 -33.66 4.606
Figure 6.5: Monte Carlo NOR2 layout, VDD = 350mV: Midpoint percentage with process and
mismatch.
82 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
Table 6.7: Monte Carlo AC results for the NOR2 gate with process and mismatch and extracted
parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tr [ns],[ps] -40 186.9 4059 1068 668.9 317.1 460.5 391.7 27.57
25 33.77 320.8 114.5 51.98 393.6 557.9 478.5 32.49
85 12.99 75.69 32.87 11.69 463.7 647.2 558.9 36.53
tf [ns],[ps] -40 157.6 1575 495.2 202.3 187.4 225.4 203.4 6.781
25 27.16 142.8 60.18 17.44 246.3 296 267.5 8.815
85 10.14 36.49 18.74 4.168 303.8 365.8 331.2 10.71
tPHL [ns],[ps] -40 281.2 2653 913.3 443.7 187.8 241.5 211.2 9.635
25 42.34 212.6 98.07 33.44 219.3 281.1 246 10.92
85 14.6 50.2 27.73 7.183 244.7 311.9 273.5 11.91
tPLH [ns],[ps] -40 242.5 4638 1029 576.2 230.8 318.9 270.5 15.64
25 39.49 312.9 108.7 41.98 282.7 389.3 329 18.67
85 14.07 68.33 30.52 9.149 329.4 451.8 381.3 21.14
Delay [ns],[ps] -40 284.7 2986 970.9 424.7 212 271.7 240.9 10.47
25 42.5 232.8 103.4 31.79 254.6 323.5 287.5 12.18
85 14.57 54.73 29.13 6.934 291.5 368.3 327.4 13.5
Pstatic 00 [fW] -40 490.3 492.4 490.8 0.315 5801 6011 5871 39.76
25 523.3 785 598.7 45.2 6000 7166 6338 205.6
85 1683 7189 3520 1009 10460 31810 17570 3914
Pstatic 10 [fW] -40 367.8 380.4 370.2 2.17 4998 8254 6138 626.3
25 439.4 1814 742.6 260.9 5936 15410 8870 1640
85 2554 26540 8517 4642 16720 124800 43700 20600
Pstatic 01 [fW] -40 311.8 340.9 323.2 5.104 4590 7779 5715 613
25 380.9 2554 703.1 281.8 5487 15820 8388 1653
85 1647 35900 8251 4970 13010 156600 41130 21330
Pstatic 11 [fW] -40 306.4 314.5 308.3 1.335 4918 11200 7114 1211
25 328.3 793.9 457.8 85.64 6052 15970 9665 1958
85 1224 10120 3901 1575 10380 49230 24630 7058
Pavg max freq -40 18.98 108.4 50.65 17.27 2.695 3.319 2.993 0.11
[pW],[uW] 25 225.9 812.1 452.1 112 2.339 2.856 2.587 0.093
85 955.2 2626 1627 318.4 2.123 2.573 2.341 0.082
Pavg 32KHz -40 13.9 22.83 16.09 1.585
[pW] 25 14.59 27.23 19.39 3.076
85 17.41 70.58 27.54 8.006
Chapter 6 Results 83
Figure 6.6: Monte Carlo NOR2 layout, VDD = 350mV: Delay with process and mismatch.
6.1.4 XNOR2 Gate
Table 6.8: Monte Carlo DC results for the XNOR2 gate with process and mismatch and ex-
tracted parasitics.
DC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
Midpoint [%] -40 -14.61 29.38 7.974 7.789 -6.097 9.544 0.958 2.642
25 -14.01 30.03 8.919 7.912 -5.255 10.76 3.218 2.804
85 -13.39 32.83 10.39 7.92 -2.069 14.12 5.568 2.77
NMH [mV] -40 106.1 172.5 139.3 12.79 484.3 586 538.5 16.69
25 102.3 179.6 139.9 13.67 452.6 562.6 511 18.06
85 97.08 176 135.4 13.96 420 622.6 485.4 24.13
NML [mV] -40 127.5 212.5 172.8 14.71 530.6 621.8 575.2 15.71
25 136.7 213.9 176.7 13.62 532.7 627.4 581.1 18.19
85 136.8 215.1 177.1 13.79 534.5 635.6 589.6 19.76
Gain -40 -35.74 -8.227 -23.31 5.804 -44.12 -22.55 -36.35 6.184
25 -36.42 -21.16 -32.03 4.259 -39.63 -21.17 -34.1 5.121
85 -33.94 -19.78 -29.44 4.158 -37.49 -19.35 -30.72 4.739
84 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
Table 6.9: Monte Carlo AC results for the XNOR2 gate with process and mismatch and ex-
tracted parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tr [ns],[ps] -40 486.5 19940 1856 1741 506.4 678.3 580.8 33.78
25 81.56 1130 200.8 109.5 636.3 841.1 725.5 39.61
85 30.72 220.8 59.83 22.19 763.3 993.1 860.9 44.55
tf [ns],[ps] -40 587.3 8380 2307 1179 701.5 850 774.4 30.06
25 100.2 712.7 268.5 98.81 937.9 1128 1033 38.07
85 39.01 180.7 82.39 23.56 1169 1398 1285 45.39
tPHL [ns],[ps] -40 775.2 9498 2314 1207 449.8 580.4 514.9 23.89
25 115.5 595 244.9 84.83 554.3 706.7 630.6 28.36
85 39.19 140.7 69.71 18.03 647.3 817.8 733.2 32.04
tPLH [ns],[ps] -40 669.7 6073 2109 942.7 324 485 416.8 27.27
25 96.33 506.5 223.2 72.54 383.5 579.8 498.4 33.43
85 31.55 121 61.5 16.09 429.5 655.1 562.6 38.8
Delay [ns],[ps] -40 848.1 6395 2211 900.7 407.6 514.3 465.9 20.1
25 118.6 527.6 234.1 67.02 494.8 622 564.5 23.85
85 39.18 125.9 65.6 14.56 569 712.4 647.9 26.79
Pstatic 00 [fW] -40 663.4 688.2 673.7 4.024 9035 12430 10210 708.2
25 760.7 1171 891 79.71 10010 16000 12150 1258
85 3391 12880 6510 1849 22170 64090 35220 8085
Pstatic 10 [fW] -40 613 661.8 634.7 8.542 9762 17810 12620 1662
25 940.5 3360 1603 422.4 12690 32660 19310 3594
85 8571 50760 21470 7491 45890 222100 102000 31970
Pstatic 01 [fW] -40 723.5 768.9 729.8 4.753 10630 18790 13480 1696
25 931.3 5169 1658 485.4 13070 41970 20130 4021
85 7049 74650 21310 8290 40810 338900 103700 36410
Pstatic 11 [fW] -40 665.4 687.5 673.7 4.102 9964 17020 12450 1472
25 859 2188 1241 219.7 11690 25960 16970 2676
85 5662 28850 13190 3854 31980 127600 64450 15920
Pavg max freq -40 16.49 95.07 44.22 14.48 2.27 2.766 2.514 0.096
[pW],[uW] 25 202 698.7 395.5 95.54 1.936 2.362 2.148 0.082
85 832.7 2225 1418 271 1.753 2.143 1.947 0.074
Pavg 32KHz -40 17.89 43.31 34.21 3.204
[pW] 25 37.16 57.05 42.25 2.851
85 40.06 60.56 47.6 3.466
Chapter 6 Results 85
Figure 6.7: Monte Carlo XNOR2 layout, VDD = 350mV: Midpoint percentage with process and
mismatch.
Figure 6.8: Monte Carlo XNOR2 layout, VDD = 350mV: Delay with process and mismatch.
86 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
6.1.5 XOR2 Gate
Table 6.10: Monte Carlo DC results for the XOR2 gate with process and mismatch and extracted
parasitics.
DC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
Midpoint [%] -40 -13.96 29.77 7.756 7.522 -6.064 9.18 0.908 2.511
25 -13.84 30.43 8.713 7.611 -5.706 10.38 3.15 2.749
85 -12.99 33.64 10.21 7.702 -2.142 14.06 5.429 2.725
NMH [mV] -40 104.3 174.2 139.6 12.59 489.5 585.7 539.2 16.1
25 100.7 178.6 140.2 13.23 457.7 561.3 511.7 17.25
85 95.3 175.7 135.7 13.5 424.6 604.5 485.3 21.21
NML [mV] -40 131.2 212.8 172.4 13.95 531.3 624.1 574.8 15.3
25 138.1 215.7 176.4 13.21 532.6 630.4 581.3 18.03
85 137.6 217 176.8 13.38 533.8 638.9 589.7 19.83
Gain -40 -36.59 -8.642 -22.99 5.354 -43.83 -22.77 -36.75 5.903
25 -36.35 -21.05 -31.73 4.619 -40.01 -21.26 -33.96 5.11
85 -34.07 -19.88 -29.7 3.861 -37.35 -20.14 -30.87 4.794
Figure 6.9: Monte Carlo XOR2 layout, VDD = 350mV: Midpoint percentage with process and
mismatch.
Chapter 6 Results 87
Table 6.11: Monte Carlo AC results for the XOR2 gate with process and mismatch and extracted
parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tr [ns],[ps] -40 389.9 15740 1871 1603 523.8 667 584.6 30.46
25 69.05 914.4 202.2 101.9 657.2 823.5 728.2 36.3
85 26.54 184.5 60 20.84 779.9 972.1 861.9 41.31
tf [ns],[ps] -40 569.9 6621 2183 1043 688.9 841.4 762.1 28.56
25 96.05 583.8 255.8 89.22 924.8 1119 1015 36.28
85 37.72 150.9 78.76 21.55 1155 1391 1264 43.56
tPHL [ns],[ps] -40 679.4 11580 2209 1192 444.1 579 498.8 24.23
25 102.5 730.5 234.8 81.28 542.8 704.1 608.5 29.14
85 36.11 156.6 66.89 17.15 630.4 813.5 705.2 33.2
tPLH [ns],[ps] -40 612.7 4952 2075 779.8 342.5 485 421 27.05
25 85.38 424.5 220.8 62.08 403.8 579.8 502.3 33.06
85 28.11 106.1 60.97 14.07 449.9 658.8 565.9 38.33
Delay [ns],[ps] -40 682.5 8266 2142 876.6 401 517.1 459.9 20.07
25 98.78 577.5 227.8 63.79 486.7 623.8 555.4 23.94
85 32.83 128.6 63.93 13.78 559.4 713.4 635.5 27.05
Pstatic 00 [fW] -40 627.7 673.7 647.4 8.857 9358 15550 11580 1284
25 865.5 3564 1428 426.3 11220 28320 17000 3128
85 6711 53200 17890 7583 35810 235300 86610 32520
Pstatic 01 [fW] -40 682 703 689.7 3.762 9628 14840 11460 1091
25 819.8 2108 1088 170.4 11170 21210 14770 2065
85 4413 26790 9946 3073 26790 118600 50330 12970
Pstatic 10 [fW] -40 681.6 700 688.6 3.35 9588 14830 11450 1090
25 866.3 1566 1086 132.3 11520 20580 14760 1909
85 5522 17850 9981 2516 31250 85460 50460 10640
Pstatic 11 [fW] -40 738 770.2 747.6 5.736 11270 21180 14700 2073
25 1077 3770 1864 478.9 15590 36230 22820 4245
85 10060 56260 25000 8350 53780 255600 120100 36100
Pavg max freq -40 13.75 108.3 44.18 14.33 2.26 2.785 2.534 0.093
[pW],[uW] 25 174.4 768.4 397.4 94.21 1.937 2.382 2.168 0.080
85 762.4 2399 1431 267.3 1.769 2.162 1.969 0.073
Pavg 32KHz -40 23.77 45.56 35.1 3.389
[pW] 25 37.48 58.29 42.22 4.078
85 43.02 63.06 51.5 4.282
88 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
Figure 6.10: Monte Carlo XOR2 layout, VDD = 350mV: Delay with process and mismatch.
6.1.6 AOI22 Gate
Table 6.12: Monte Carlo DC results for the AOI22 gate with process and mismatch and ex-
tracted parasitics.
DC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
Midpoint [%] -40 6.01 34.24 20.74 6.004 2.55 13.98 8.67 2.105
25 9.65 37.89 23.45 6.084 6.56 18.08 12.95 2.165
85 13.45 41.77 27.04 6.12 10.34 22.1 16.65 2.218
NMH [mV] -40 94.1 142.9 118.1 10.29 457.7 528.1 488.4 13.24
25 88.94 139.7 113.8 10.58 414.3 486.8 447.4 14.08
85 79.76 130.4 105.2 10.65 376.6 452.6 410.5 14.64
NML [mV] -40 166.4 222.7 196.6 10.89 600.1 651.4 626.1 11.72
25 176.9 227.1 202.6 10.57 617.1 674.2 646.4 12.05
85 182.1 231.5 206.7 10.57 631.4 695.2 663.8 12.36
Gain -40 -36.88 -13.31 -25.91 4.696 -43.66 -22.61 -37.15 6.033
25 -36.76 -21.05 -32.38 4.522 -40.36 -21.22 -34.26 5.522
85 -34.17 -19.83 -30.27 3.801 -37.57 -20.07 -32.02 4.506
Chapter 6 Results 89
Table 6.13: Monte Carlo AC results for the AOI22 gate with process and mismatch and extracted
parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tr [ns],[ps] -40 304.4 6176 1240 741.6 414.7 554.8 475.8 23.5
25 52.2 440.2 138.8 55.97 532.2 698.9 601.6 28.2
85 19.98 99.65 41.91 12.43 648.3 833.6 723.7 31.9
tf [ns],[ps] -40 501.7 4925 1729 850.8 584.1 692.4 638.9 22.45
25 89.21 477.6 212.6 75.26 801.7 946.9 876 29.86
85 35.28 130 67.62 18.41 1027 1209 1119 37.35
tPHL [ns],[ps] -40 611.1 4939 1816 719.6 400.3 494.8 445.5 18.42
25 94.13 432.2 208.9 59.38 508.5 626.2 566.8 23
85 33.72 110.8 62.82 14.01 609.8 751.9 679.7 27.22
tPLH [ns],[ps] -40 657.4 4932 1826 715.7 431.7 567.2 498.2 28.24
25 97.8 446.3 205.3 58.69 514 684.9 598.7 34.97
85 33.58 114.2 60.64 13.93 576.6 778 676.4 40.74
Delay [ns],[ps] -40 772.9 4035 1821 601.2 427.3 522.9 471.9 17.64
25 110.4 388.5 207.1 49.24 527.1 646.1 582.7 21.84
85 37.86 103.7 61.73 11.44 613.4 752.7 678 25.44
Pavg max freq -40 12.42 60.07 27.58 7.767 1.726 2.019 1.875 0.062
[pW],[uW] 25 137 454.5 252.1 52.15 1.471 1.719 1.6 0.050
85 564.1 1485 923.9 152 1.326 1.546 1.442 0.045
Pavg 32KHz -40 9.822 23.92 17.75 1.436
[pW] 25 19.91 37.68 24.38 2.455
85 25.55 48.45 32.97 3.531
6.1.7 OAI22 Gate
Table 6.14: Monte Carlo DC results for the OAI22 gate with process and mismatch and ex-
tracted parasitics.
DC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
Midpoint [%] -40 5.05 37.7 19.63 6.061 2.33 15.01 8.28 2.168
25 6.88 41.58 22.44 6.101 6.34 20.88 12.54 2.263
85 10.18 45.54 26.02 6.118 10.27 24.87 16.28 2.318
NMH [mV] -40 91.42 146 120 10.33 439.4 527.5 490.7 13.66
25 84.07 141.4 115.8 10.72 394.9 486.1 449.6 14.43
85 73.89 133.6 107.1 10.75 356.2 451.9 412.7 14.99
NML [mV] -40 166.9 227.6 194.6 11.06 597.5 668.5 624 11.93
25 175.2 232.5 200.6 10.69 612.1 688.9 644.2 12.31
85 178 237.7 204.8 10.65 628.5 706.8 661.8 12.74
Gain -40 -37.63 -10.67 -26.43 5.035 -43.42 -22.93 -37.51 6.017
25 -36.74 -20.92 -32.61 4.218 -40.02 -21.67 -34.58 5.486
85 -34.03 -19.71 -29.99 3.939 -37.25 -20.36 -32.41 4.688
90 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
Table 6.15: Monte Carlo AC results for the OAI22 gate with process and mismatch and ex-
tracted parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tr [ns],[ps] -40 267.6 4763 1323 856.4 416.5 548.4 477.1 24.39
25 49.53 415.4 152.5 68.48 535.2 694.5 608.3 29.76
85 20.26 105.5 47.19 15.71 655.5 832.9 737.6 33.91
tf [ns],[ps] -40 421.5 5640 1519 800 471.2 562.9 519.2 19.25
25 73.02 500.9 185.7 70.68 650.2 773.7 714.1 25.56
85 28.61 129.5 58.95 17.35 830 988.4 912.5 31.35
tPHL [ns],[ps] -40 743.8 5722 2021 799.4 437.4 550.5 498.6 18.81
25 112.9 508.7 239.3 67.23 556.3 697.4 632.6 23.45
85 39.93 131.9 73.62 16 664.5 827 753 27.28
tPLH [ns],[ps] -40 432.9 10080 1656 1007 315.3 463.4 387.3 22.85
25 69.8 636.1 181.1 68.11 378 561.1 467.9 28.1
85 24.61 136.7 52.28 14.83 428.7 641.6 533.9 32.75
Delay [ns],[ps] -40 619.9 6079 1838 724.2 405 475.9 442.9 14.99
25 94.03 440.7 210.2 54.68 503.9 592.3 550.2 18.42
85 33.27 105.9 62.95 12.4 591.4 693.3 643.4 21.32
Pavg max freq -40 11.13 56.57 28.74 8.488 1.764 2.053 1.908 0.063
[pW],[uW] 25 133.2 435.3 262.7 56.68 1.506 1.751 1.627 0.053
85 565.7 1439 962.1 164.1 1.359 1.57 1.461 0.046
Pavg 32KHz -40 14.55 17.75 16.22 0.69
[pW] 25 19.37 29.79 23.68 2.26
85 26.43 38.57 31.29 2.57
Chapter 6 Results 91
6.1.8 D Flip-Flop Memory Element
Table 6.16: Monte Carlo AC results for the PowerPC 603 memory element with process and
mismatch and extracted parasitics.
AC Test Temp Min Max Mean Sigma Min Max Mean Sigma
[◦C] VDD = 350mV VDD = 1.2V
tco_falling [ns],[ps] -40 1129 8918 3380 1425 554.5 713.7 633.1 27.61
25 161.5 768.2 366.2 111.9 669 858.7 762.1 32.62
85 52.37 183.8 100.6 24.5 765.1 980.6 871.3 36.58
tco_rising [ns],[ps] -40 760.2 10320 2292 1174 481.7 624.6 539.6 24.92
25 119.9 788.8 277.4 96.93 598.6 768.5 667.1 29.96
85 40.81 175.7 80.22 21.34 697.4 890.4 774.3 33.62
tsu_falling [ns],[ps] -40 666.4 3525 1454 519.5 278.4 376.4 322.1 15.18
25 85.88 332.3 162.6 45.02 329 445.4 381.4 18.08
85 26.54 83.53 45.32 10.26 369.2 500.1 429.4 20.33
tsu_rising [ns],[ps] -40 475.6 3633 1063 446.7 227 288.5 254.1 11.55
25 61.94 314.7 122.2 37.99 271.1 343.6 303.7 13.52
85 19.83 76.61 35.22 8.833 310.5 393.7 348 15.07
Pstatic 00 [fW][pW] -40 874.7 2664 945.4 247.9 12.47 148800 8931 30250
25 1161 9090 1732 858.7 15.37 180600 9039 31520
85 9041 48300 18250 5001 52.39 12680 1946 2680
Pstatic 01 [fW][pW] -40 882.3 917.4 893.9 6.202 12.23 21.49 16.04 1.86
25 1300 4525 1994 471.4 16.31 35.65 23.47 4.021
85 11860 66200 24860 8147 61.48 278.5 114.7 33.5
Pstatic 10 [fW][pW] -40 884.6 2883 1190 251.2 11.98 148500 24920 46700
25 1148 14460 4304 3292 15.65 166300 12930 35400
85 8124 113700 38570 19340 42.84 16090 3589 2852
Pstatic 11 [fW][pW] -40 861.9 1124 875.3 18.41 12.33 22.82 16.67 2.12
25 1157 3172 1653 328.6 15.5 37.24 23.37 3.979
85 8256 44990 19070 6021 50.17 214.3 97.03 26.46
Pavg 32KHz -40 54.95 64.91 60.37 1.165 5.534 147.4 58.81 35.16
[pW][nW] 25 62.17 69.71 65.27 1.224 3.114 75.28 31.43 16.37
85 77.82 126.3 93.98 8.505 2.679 121.9 46.13 28.14
92 6.1. SUB-THRESHOLD CELL LIBRARY RESULTS
Figure 6.11: Monte Carlo D-FF layout, VDD = 350mV: Clock-to-output propagation delay with
process and mismatch. Left side: rising tco, right side: falling tco.
Figure 6.12: Monte Carlo D-FF layout, VDD = 350mV: Setup time with process and mismatch.
Left side: rising tsu, right side: falling tsu.
Chapter 6 Results 93
6.2 ALU Results
Results in term of corner functionality check, critical path propagation delay and power con-
sumption is presented in this chapter. Details on the various ALU synthesis is found in sec-
tion 3.31. All synthesizes except No.1 are simulated with above- and sub-threshold cells.
6.2.1 ALU No.1: Results with use of Above-Threshold Library
Corner Functionality Simulation Results
Table 6.17: ALU No.1: Corner functionality results with VDD = 350mV and 400mV where
faulty=7 and pass=3.
Corner 350mV 400mV
−40◦C 25◦C 85◦C −40◦C 25◦C 85◦C
FF 3 3 3 3 3 3
SF 7 3 3 3 3 3
TT 7 3 3 3 3 3
FS 7 3 3 7 3 3
SS 7 3 3 7 3 3
1No.1: all available above-threshold cells, No.2: restricted to INV, NAND2, NOR2 and D-FF with FO3, and
No.3: same as No.2 including XNOR2, XOR2, AOI22 and OAI22.
94 6.2. ALU RESULTS
Delay Results of Critical Path
Figure 6.13: ALU No.1: Corner sim. results of critical path delay in semilog plot and −40◦C.
Power Consumption Results
Figure 6.14: ALU No.1: Power consumption in the components of total, dynamic and static
with 32KHz, 25◦C and TT corner.
Chapter 6 Results 95
6.2.2 ALU No.2: Sub-Threshold VS Above-Threshold Library Cells
Corner Functionality Simulation Results
Table 6.18: ALU No.2: Corner functionality results with VDD = 350mV and 400mV where
faulty=7 and pass=3.
Sub-Threshold Cells
Corner 350mV 400mV
−40◦C 25◦C 85◦C −40◦C 25◦C 85◦C
FF 3 3 3 3 3 3
SF 3 3 3 3 3 3
TT 3 3 3 3 3 3
FS 7 3 3 3 3 3
SS 7 3 3 3 3 3
Above-Threshold Cells
350mV 400mV
−40◦C 25◦C 85◦C −40◦C 25◦C 85◦C
3 3 3 3 3 3
7 3 3 3 3 3
7 3 3 3 3 3
7 3 3 3 3 3
7 7 3 7 3 3
Delay Results of Critical Path
(a) Sub-Threshold Cells. (b) Above-Threshold Cells.
Figure 6.15: ALU No.2: Corner sim. results of critical path delay in semilog plot and −40◦C.
96 6.2. ALU RESULTS
Power Consumption Results
Figure 6.16: ALU No.2: Power consumption comparison between use of sub-threshold and
above-threshold cells in terms of total, dynamic and static with 32KHz, 25◦C and TT corner.
Calculation of Estimated Power Consumption for the No.2 Synthesis
An estimation of the power consumption is important to evaluate the quality of the measured
power consumption by simulation. The estimation is based on 32KHz mean power consumption
in 25◦C for each cell. For the fan-out of 3 circuit used as a test case for the sub-threshold cell
library, the estimated power is the sum of all mean Pavg times the number of cells in the circuit
as shown next:
Table 6.19: Estimated Ptotal for the ALU FO3 with VDD = 350mV , nominal corner and 25◦C.
131 X INV 131 · 12.55pW ⇒ 1.644nW
105 X NAND2 105 · 22.01pW ⇒ +2.31nW
130 X NOR2 130 · 19.39pW ⇒ +2.52nW
27 X D-FF 27 · 65.27pW ⇒ +1.76nW
Pavg Total with α = 1 = 8.234nW
Pavg Total with α = 0.2 = 1.647nW
where the activity factor α is estimated by the activity on the inputs of the ALU circuit. The
activity factor is based on the test stimulus applied where input A and B is set to a static value,
Chapter 6 Results 97
the Op input is changed every clock period, and the clock has its activity equal to the clock
period. Hence, four inputs are actively changing every clock period. The activity factor is thus
estimated to α = 420 = 0.2.
6.2.3 ALU No.3: Sub-Threshold VS Above-Threshold Library Cells
Corner Functionality Simulation Results
Table 6.20: ALU No.3: Corner functionality results with VDD = 350mV and 400mV where
faulty=7 and pass=3.
Sub-Threshold Cells
Corner 350mV 400mV
−40◦C 25◦C 85◦C −40◦C 25◦C 85◦C
FF 3 3 3 3 3 3
SF 3 3 3 3 3 3
TT 3 3 3 3 3 3
FS 7 3 3 3 3 3
SS 7 3 3 3 3 3
Above-Threshold Cells
350mV 400mV
−40◦C 25◦C 85◦C −40◦C 25◦C 85◦C
3 3 3 3 3 3
7 3 3 3 3 3
7 3 3 3 3 3
7 3 3 7 3 3
7 7 3 7 3 3
98 6.2. ALU RESULTS
Delay Results of Critical Path
(a) Sub-Threshold Cells. (b) Above-Threshold Cells.
Figure 6.17: ALU No.3: Corner sim. results of critical path delay in semilog plot and −40◦C.
Power Consumption Results
Figure 6.18: ALU No.3: Power consumption comparison between use of sub-threshold and
above-threshold cells in terms of total, dynamic and static with 32KHz, 25◦C and TT corner.
Chapter 6 Results 99
6.2.4 Comparison between ALU Synthesis Results
(a) Ptotal comparison.
(b) Pstatic comparison.
Figure 6.19: Comparison between ALU design No.1, No.2 and No.3 in power consumption
with 32KHz, 25◦C and TT corner.
100 6.2. ALU RESULTS
Table 6.21: Summarized power results from ALU simulation in 32KHz, 25◦C and TT corner.
Design No.1 No.2 No.3
Sub-T Cells Above-T Cells Sub-T Cells Above-T Cells
VDD Ptotal Pstat Ptotal Pstat Ptotal Pstat Ptotal Pstat Ptotal Pstat
[V ] [nW] [nW] [nW] [nW] [nW] [nW] [nW] [nW] [nW] [nW]
1.2 30.99 2.39 29.08 3.08 38.45 4.1 26.43 2.46 34.57 3.32
1.0 18.31 1.51 19.35 1.85 23.4 2.65 17.27 1.42 21.37 2.18
800m 11.25 0.95 12.2 1.12 14.44 1.67 11.2 0.86 13.2 1.37
600m 6.25 0.56 6.66 0.64 8.04 0.98 6.13 0.493 7.34 0.81
400m 2.84 0.275 2.73 0.298 3.51 0.498 2.63 0.239 3.32 0.409
350m 2.11 0.228 2.095 0.239 2.77 0.399 1.934 0.183 2.54 0.319
Chapter 7
Discussion
7.1 The Accuracy of the Results
Intermediate results for all the logic gates and their alternative sizing designs are obtained with
use of Monte Carlo simulation and n = 100 runs. The intermediate results are used to explore
how variations affect the different sizing strategies and sizing designs. The number of n = 100
runs is chosen to reduce simulation time as the results are only used to designate a final chosen
sizing design. The number of n = 100 gives a 99% confidence interval of ±13.13% for the
VDD = 350mV case as discussed in section 3.2.4.
The final chosen sizing designs for all sub-threshold cells are further simulated with use of
n = 220 to increase accuracy and narrowing the 99% confidence interval. The confidence
interval with n = 220 is equal to ±8.76% which is more accurate than the intermediate results.
To increase the 99% confidence accuracy of the sub-threshold Monte Carlo simulation to a±1%
interval, the number of runs have to be n = 16590 which is a huge amount and time demanding
task. The final results are in sufficient accuracy, but with room for improvements by increasing
the number of runs.
7.2 The Sub-Threshold Cells
7.2.1 The Inverter Gate
The designed inverter gate has in term of DC analysis a mean midpoint percentage of the VTC
curve ∼ 9 − 11%1and a std. deviation σ ∼ 7.3% over the range of temperatures and VDD =
350mV . This is shown in section 6.1.1, table 6.2 and the distribution of midpoint percentage
over temperatures −40◦C, 25◦C and 85◦C in figure 6.1. Although the optimal midpoint is
VDD/2 as described in section 2.2.1, the mean midpoint is considered to be sufficient as it gives
a µ ± 3σ interval for the midpoint approximated to a [-12.9%, 32.9%] range. It is however
101
102 7.2. THE SUB-THRESHOLD CELLS
not just the midpoint that is of importance. The gain and noise margins (NMH and NML)
reveals the quality of a VTC curve and how resistant the gate is against noise exposure. The
inverter has a mean gain with a range of ∼[-26, -33] and a σ ∼ 4.3− 5.4 over the temperature
range. The noise margins are sufficient with a minimum mean NMH of ∼ 135mV and NML
of ∼ 175mV , and both with σ ∼ 13mV . Hence, the DC analysis values are minimally affected
by temperature variations.
The inverter resulted in in term of transient analysis a mean propagation delay ranging between
∼[796ns, 81ns, 22ns] in the temperatures −40◦C, 25◦C and 85◦C respectively with VDD =
350mV . The transient analysis results are presented in table 6.3 and with a distribution of
delay over the three temperatures in figure 6.2. The σ is following the same tendency over
the temperature range with ∼[366 ns, 26 ns, 5.5 ns]. However, the relative σ to the mean
increases with decreasing temperature. At 85◦C the σ/µ = 0.25 (25%), but in −40◦C the ratio
is σ/µ = 0.46 (46%). This can be observed as the delay distribution in figure 6.2 increases in
width as the temperature decreases. The same tendency is observed in the other timing results
such as rise tr, fall time tf , tPHL and tPLH .
7.2.2 The NAND2 Gate
The NAND2 gate has in term of DC analysis a mean midpoint percentage of ∼ 19− 22% over
the temperature ranges as shown in section 6.1.2, table 6.4 and the distribution of midpoint
percentage is shown over the temperatures in figure 6.3. The mean midpoint is higher than for
the inverter probably due to the structure with two parallel pMOS in the PUN and two stacked
nMOS in the PDN. Additionally, a contributive factor might be that the pMOS is intrinsically
stronger than nMOS in sub-threshold. On the other hand, the NAND2 gate has lower midpoint
σ ∼ 5.3% than the inverter gate (∼ 7.2%). The µ ± 3σ range is thus ∼[3.1%, 37.9%], which
is only ∼ 5.3% higher in the extremity (37.9%) compared to the inverter gate (32.6%). The
VTC curve quality is slightly better than the inverter with a mean gain ∼[-29, -32] and lower
σ ∼ 3.9 − 4.7 for all temperature ranges. The NMH noise margin is slightly worse than the
inverter gate with a mean of∼ 115mV due to the increased midpoint position. The same reason
makes on the other hand the NML better with a value of ∼ 195mV than the inverter gate. Both
noise margins have lower σ than the inverter with σ ∼ 9.3mV . Hence, the DC analysis results
are minimally affected by temperature variations.
The NAND2 gate has in term of transient analysis results a mean propagation delay ranging
between ∼[1.2 µs, 142 ns, 42.5 ns] in the temperatures −40◦C, 25◦C and 85◦C respectively
with VDD = 350mV as shown in table 6.5 and with a distribution of delay over the three
temperatures in figure 6.4. This might be due to increased ON resistance in the PDN (two nMOS
stacked), higher input capacitance (additional transistor gates) and higher output capacitance
1% relative to VDD/2 e.g. 0% means midpoint of VDD/2, and +100%(-100%) means a midpoint of VDD(0V).
Chapter 7 Discussion 103
(more parasitics due to internal wiring etc.). The delay σ is higher than for the inverter gate
with a σ ∼[426 ns, 36 ns, 8.2 ns]. However, the relative σ (σ/µ) is lower than for the inverter
gate with σ/µ = 0.19 (19%) in 85◦C, and σ/µ = 0.35 (35%) in −40◦C.
7.2.3 The Other Gates
Besides the previously discussed gates other gates are additionally designed. These are: NOR2,
XNOR2, XOR2, AOI22 and OAI22. The NOR2 gate has a mean midpoint range between
∼ 7.5− 11% with a σ ∼ 8% for all temperature ranges and VDD = 350mV . The quality of the
VTC curve is revealed with a mean gain ranging between ∼[-28, -34] with σ ∼ 4 − 5.6. The
noise margins has a mean NMH of ∼ 135mV and NML of 173mV , both with σ ∼ 14mV . In
term of transient analysis the NOR2 has a mean propagation delay ranging between ∼[970 ns,
104 ns, 30 ns] and σ ∼[425 ns, 32 ns, 7 ns].
The XNOR and XOR gates are designed equally, but with difference in input ordering and minor
layout differences. Hence, it is inconclusive which one of them is the better one since both DC
and transient results are similar. Both has a mean midpoint ranging between ∼ 7.8 − 10.4%
with the highest at 85◦C and σ ∼ 8%. The XNOR2 and XOR2 VTC quality in term of gain and
noise margins are similar to the NOR2 previously discussed with some small deviations. The
AOI22 and OAI22 gates are not verified as thoroughly with exploration of other alternatives as
the other designed gates due to lack of time. Nevertheless, the results shown in section 6.1.6
and 6.1.7 has equal quality as the rest by Monte Carlo sim. with 220 runs. The AOI22 and
OAI22 gates are designed with similar transistor dimensions as the XNOR2 and XOR2 since
they possess similar transistor structure, but with difference of excluding the inverters, including
more inputs and connecting the nMOS network together in the OAI22 gate. The AOI22 and
OAI22 gates has several more possible transitions and four distinct VTC curve regions due to
multiple inputs as described in section 3.2.9 and 3.2.10. The simulation results for the gates in
section 6.1.6 and 6.1.7 shows a DC mean midpoint in the range of 20.7 − 27% for the AOI22
and 19.6 − 26% for the OAI22, both with σ ∼6.1%. The noise margins are still only slightly
worse than the NAND2 gate with a mean NMH above 100mV. Also the gain is similar to the
other gates. On the other hand, the transient analysis results are better for the AOI22 and OAI22
than XNOR2 and XOR2 across all simulated temperatures. This is probably partly due to the
excluded inverters needed in XNOR2 and XOR2 which leads to an increase of propagation
delay through the XNOR and XOR gate.
7.2.4 The D-FF Memory Element
The D-FF memory element has a mean falling clock-to-output tco range of ∼[3.4 µ, 366 ns,
101 ns] and rising in the range of [2.3 µ, 277 ns, 80 ns] for the temperatures −40◦C, 25◦C
104 7.2. THE SUB-THRESHOLD CELLS
and 85◦C respectively, an output capacitance of 1f F and VDD = 350mV . Additionally, the
minimum mean setup times tsu falling range of ∼[1.45 µs, 163 ns, 46 ns] and rising range of
∼[1.06 µs, 123 ns, 36 ns] for the temperatures −40◦C, 25◦C and 85◦C respectively.
7.2.5 Common Discussion Considering the Cells
Common for the logic gates is that the DC analysis results are minimally affected by temperature
variations. This means that the DC analysis values are of little concern with varying temperature
and the focus is process variations. However, the temperature exponentially affects the timing
results and σ with the worst case occurring in −40◦C. This means that the exponentially worse
timing results should be taken into account if a circuit is designed to operate in low temperatures.
However, all the designed sub-threshold cells works in Monte Carlo simulations with 100%
yield and thus a circuit may only fail due to insufficient timing margin. One of the simplest
method to compensate for the increase delay in −40◦C may be to increase the supply voltage.
However, if the increased supply voltage is statically scaled it may give a penalty of power
consumption reduction over all the temperatures to compensate for the worst case temperature
delay. Instead, it might be better to incorporate a dynamically temperature compensated supply
voltage regulator. Hence, the supply voltage is scaled up when the temperature is reduced
to improve timing results giving a more efficient power reduction for the whole temperature
range. Multiple visual figures could be made in the result chapter, however the most important
ones are included visualizing the distribution of DC the VTC midpoint and transient delay over
temperatures.
The design and results may be improved by improving the design methodology. A question
is what corner case the logic gates should be optimized for. In this thesis the logic gates are
designed by search for good sizing design alternatives by deciding trade-offs between DC and
transient analysis values versus transistor dimensions with use of test benches as described in
section 5 prior layout and Monte Carlo simulation of the alternatives. Nevertheless, the strategy
is used with a temperature of 25◦C and nominal corner. This corner is not the worst case with
respect of transient analysis and it could been better to use worst case corner SS with a tem-
perature of −40◦C to optimize worst case. On the other hand, it may still be better to optimize
the nominal corner with 25◦C as done in this thesis since the corner occurs in the middle of all
possible corners and is the most frequent case, rather than optimizing the worst case that rarely
occurs. In [2], another design methodology by use of Multiobjective Optimization Problem
(MOP) algorithms is shown to be necessary and beneficial when designing sub-threshold cells.
The method searches for optimal tradeoffs in the resources and robustness giving an entire set
of optimal tradeoffs (Pareto set). By this, a broad set of varying tradeoff and resource efficient
sizing solutions in a cell library is achieved. This method may give multiple more various cells
in much shorter design time and allow a synthesis tool to use optimal cells whenever needed to
achieve design rules as e.g. critical path delay versus overall power consumption.
Chapter 7 Discussion 105
7.3 The ALU Test Circuits
The ALU circuit is used as a test case for the designed sub-threshold library. Three ALU circuits
are synthesized where No.1: unlimited with use of available above-threshold cell library (total
cells synthesized=118), No.2: synthesized with restriction to INV, NAND2, NOR2 and D-FF
and FO3 (total cells=393), and No.3: as with No.2 + XNOR2, XOR2, AOI22 and OAI22 gates
(total cells=272). The two latter ALUs are both simulated with use of above- and the designed
sub-threshold cells in order to make a comparison. The decrease in cell number from No.2 to
No.3 with 31% shows that a richer cell library may give a more optimal solution in term of
area and power consumption probably due to less total parasitic capacitance and less paths to
ground.
The ALU No.2 with above-threshold cells has the worst power consumption in all power com-
ponents compared to the others as seen in figure 6.19 and table 6.21. Additionally the No.3
with above-threshold cells is the second worst. The No.1 (which is only made with above-
threshold cells) is third worst, but only in total power consumption. This might be due to the
low number of cells of 118 in No.1 synthesis and thus less paths to ground than the others. By
these results, the sub-threshold cell library is shown to be more power efficient than an existing
above-threshold cell library.
The ALU No.2 shows a reduction in total power of ∼ 24.3% with use of sub-threshold cells
compared to the same ALU with use of above-threshold cells for VDD = 350mV and 1.2V .
Additionally in term of static power consumption, the sub-threshold application shows a re-
duction of ∼ 40% with VDD = 350mV and ∼ 25% reduction with VDD = 1.2V compared
to the above-threshold case. Table 6.21 summarizes the power consumptions and figures 6.16
depicts total, dynamic and static comparison between the No.2 above- and sub-threshold cases.
It is also shown in figure 6.15 how the critical path delay develops versus VDD for all corners
and −40◦C, and that the sub-threshold implementation is more robust than the above-threshold
case. The sub-threshold case has a lower critical path delay in all corners in sub-threshold
region compared to the above-threshold case. In a functionality test seen in table 6.18, the sub-
threshold case fails only in SS and FS corner where the above-threshold case fails in all except
FF within 350mV and −40◦C as well as fails for SS corner at 25◦C. By increasing the VDD
to 400mV makes the sub-threshold case pass in all corners while the above-threshold still fails
at SS and FS corner with −40◦C. Hence, the sub-threshold cells yields greater robustness in
sub-threshold region than a existing above-threshold library.
The ALU No.3 shows a reduction in total power of ∼ 23.8% with use of sub-threshold cells
compared to the same ALU with use of above-threshold cells for VDD = 350mV and 1.2V .
In static power consumption the sub-threshold application shows a reduction of ∼ 43% with
VDD = 350mV and ∼ 26% reduction with VDD = 1.2V compared to the above-threshold case.
Figure 6.18 depicts the difference in power consumption. The No.3 shows similar trends as
106 7.3. THE ALU TEST CIRCUITS
for the No.2 in term of critical path delay and functionality test as shown in figure 6.17 and
table 6.20. However, the No.3 ALU with additional complex gates included shows a reduction
in total power consumption of ∼ 7.7% compared to the No.2 case with 350mV, and ∼ 9.1% re-
duction with 1.2V, both ALUs with sub-threshold cells. By comparing the best ALU No.3 with
sub-threshold cells against No.1 with above-threshold cells shows that the sub-threshold case
yields a reduction in total power consumption of ∼ 8.4% with 350mV and ∼ 15% reduction
in 1.2V. Additionally, the static power consumption is reduced by ∼ 20% within 350mV, but
increased by ∼ 3% in 1.2V. This might be because the No.1 has fewer total cells of 118 ver-
sus the No.3 with 272 cells and the fact that the above-threshold library is optimized for 1.2V
region. To overcome the worst case propagation delay in −40◦C and SS corner, an increase of
VDD seems to be the best solution. However, to fully utilize maximum possible power savings
in all temperature ranges it should be considered to incorporate a dynamic voltage scaled power
supply that increases the voltage in low temperatures such that better propagation delays are
yielded.
Chapter 8
Conclusion
The designed sub-threshold cell library including Inverter, NAND2, NOR2, XNOR2, XO2,
AOI22, OAI22 and D-FF shows an increase of robustness within ALU No.2 (INV, NAND2,
NOR2, D-FF) and No.3 (as No.2 + XNOR2, XOR2, AOI22, OAI22) compared to an above-
threshold library, and also compared to an unlimited optimized No.1 ALU with above-threshold
cells. Adding XNOR2, XOR2, AOI22 and OAI22 in the sub-threshold library clearly shows
improvement in delay of ∼ −6.5% in 350mV, −40◦C and SS corner. It also shows a reduction
in power consumption of ∼ −7.7% with 350mV, and ∼ −9.1% reduction with 1.2V compared
to a library consisting of only INV, NAND2, NOR2 and D-FF (TT corner and 25◦C). The larger
sub-threshold library also shows a reduction in total number of synthesized cell from 393 cells
for the No.2 case to 272 cells for the No.3 case (∼ −30.1%).
The power consumption for the best case (i.e. ALU No.3 with sub-threshold cells) yields a
reduction from VDD = 1.2V to 350mV of ∼ 13.7 times (-92.7%). Additionaly, the ALU No.2
with use of sub-threshold cells yields a reduction from VDD = 1.2V to 350mV of ∼ 13.9 times
(-92.8%).
The sub-threshold cell library includes 7 logic cells and one memory element so that any general
FSM can be synthesized. The cells are designed with min. drive strength with tradeoffs between
imbalance and timing, where timing has been the 1. priority in term of absolute value and
variations σ.
8.1 Future Work
• Research and optimize layout such that the variations may be reduced.
• Implement a dynamical supply voltage regulator that compensate for the increased timing
in decreased temperature.
• Realization of a prototype and measurements.
107
108 8.1. FUTURE WORK
Bibliography
[1] N.H.E. Weste and K. Eshraghian. Principles of CMOS VLSI design: a systems perspective.
Addison-Wesley, 1985.
[2] M. Blesken, S. Lütkemeier, and U. Rückert. Multiobjective optimization for transistor siz-
ing sub-threshold cmos logic standard cells. In Circuits and Systems (ISCAS), Proceedings
of 2010 IEEE International Symposium on, pages 1480–1483, May 2010.
[3] E. Vittoz and J. Fellrath. Cmos analog circuits based on weak inversion operation. Solid-
State Circuits, IEEE Journal of, 12(3):224–231, Jun 1977.
[4] Y. Tsividis. Eric vittoz and the strong impact of weak inversion circuits. Solid-State
Circuits Society Newsletter, IEEE, 13(3):56–58, Summer 2008.
[5] B.H. Calhoun and D. Brooks. Can subthreshold and near-threshold circuits go main-
stream? Micro, IEEE, 30(4):80–85, July 2010.
[6] S.M. Sze. Physics of Semiconductor Devices. Wiley-Interscience, 1969.
[7] T.C. Carusone, D.A. Johns, and K.W. Martin. Analog Integrated Circuit Design 2E. Wiley,
2012.
[8] N.H.E. Weste and D.M. Harris. CMOS VLSI Design: A Circuits and Systems Perspective.
ADDISON WESLEY Publishing Company Incorporated, 2011.
[9] G. De Micheli. Synthesis and Optimization of Digital Circuits. Electrical and Computer
Engineering Series. McGraw-Hill Higher Education, 1994.
[10] J.P. Uyemura. Introduction to VLSI Circuits and Systems. J. Wiley, 2002.
[11] R.J. Baker. CMOS Circuit Design, Layout and Simulation. Wiley-IEEE Press, 3rd edition,
2010.
[12] J.M. Rabaey, A.P. Chandrakasan, and B. Nikolic. Digital Integrated Circuits: A Design
Perspective. Prentice Hall electronics and VLSI series. Pearson Education, 2003.
[13] P.R. Panda, B.V.N. Silpa, A. Shrivastava, and K. Gummidipudi. Power-efficient System
Design. Springer, 2010.
109
110 BIBLIOGRAPHY
[14] Harry J. M. Veendrick. Short-circuit dissipation of static cmos circuitry and its impact on
the design of buffer circuits. Solid-State Circuits, IEEE Journal of, 19(4):468–473, Aug
1984.
[15] A. Wang, B.H. Calhoun, and A.P. Chandrakasan. Sub-threshold Design for Ultra Low-
Power Systems. Integrated Circuits and Systems. Springer, 2006.
[16] M. Alioto. Understanding dc behaviour of subthreshold cmos logic through closed-form
analysis. Circuits and Systems I: Regular Papers, IEEE Transactions on, 57(7):1597–
1607, July 2010.
[17] M. Alioto. Ultra-low power vlsi circuit design demystified and explained: A tutorial.
Circuits and Systems I: Regular Papers, IEEE Transactions on, 59(1):3–29, Jan 2012.
[18] M.J.M. Pelgrom, Aad C J Duinmaijer, and A.P.G. Welbers. Matching properties of mos
transistors. Solid-State Circuits, IEEE Journal of, 24(5):1433–1439, Oct 1989.
[19] Bo Zhai, S. Hanson, D. Blaauw, and D Sylvester. Analysis and mitigation of variability in
subthreshold design. Low Power Electronics and Design, 2005. ISLPED ’05. Proceedings
of the 2005 International Symposium on, pages 20–25, Aug 2005.
[20] Cadence Design Systems Inc. Virtuoso schematic editor tool.
http://www.cadence.com/products/rf/schematic_editor, Desember 16. 2013.
[21] Cadence Design Systems Inc. Virtuoso layout suite tool.
http://www.cadence.com/products/rf/layout_suite/pages/default.aspx, May 25. 2014.
[22] G.G. Løvås. Statistikk for universiteter og høgskoler, 2. utgave. Universitetsforlaget, 2010.
[23] M. Værnes. Trade-offs between performance and robustness for ultra low power/low
energy subthreshold d flip-flops in 65nm cmos. Master thesis, Norwegian University of
Science and Technology, Department of Electronics and Telecommunications, 2013.
[24] H.P. Alstad and S. Aunet. Seven subthreshold flip-flop cells. In Norchip, 2007, pages 1–4,
Nov 2007.
[25] G. Gerosa, S. Gary, C. Dietz, Dac Pham, K. Hoover, J. Alvarez, H. Sanchez, P. Ippolito,
Tai Ngo, S. Litch, J. Eno, J. Golab, N. Vanderschaaf, and J. Kahle. A 2.2 w, 80 mhz
superscalar risc microprocessor. Solid-State Circuits, IEEE Journal of, 29(12):1440–1454,
Dec 1994.
[26] "vipin" at vhdlguru.blogspot.no. Vhdl code for a simple alu.
http://vhdlguru.blogspot.no/2011/06/vhdl-code-for-simple-alu.html, May 21. 2014.
[27] Cadence Design Systems Inc. Encounter rtl compiler tool.
http://www.cadence.com/products/ld/rtl_compiler, Desember 16. 2013.
Chapter 8 BIBLIOGRAPHY 111
[28] Aldec Inc. Active-hdl fpga design and simulation tool.
http://www.aldec.com/en/products/fpga_simulation/active-hdl, Desember 16. 2013.
[29] Cadence Design Systems Inc. Virtuoso analog design environment sim. tool.
http://www.cadence.com/products/cic/analog_design_environment, Desember 16. 2013.
[30] EDA wiki page. How to use inherited connections.
http://eda.engineering.wustl.edu/wiki/index.php/How_to_use_inherited_connections,
June 14. 2014.
[31] C. Foley. Characterizing metastability. In Advanced Research in Asynchronous Circuits
and Systems, 1996. Proceedings., Second International Symposium on, pages 175–184,
Mar 1996.
112 BIBLIOGRAPHY
Appendix A
HDl and Synthesis Scripts
A.1 8-bit ALU VHDL Module
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity simple_alu is
port( Clk : in std_logic; --clock signal
A,B : in signed(7 downto 0); --input operands
Op : in unsigned(2 downto 0); --Operation to be performed
R : out signed(7 downto 0) --output of ALU
);
end simple_alu;
architecture Behavioral of simple_alu is
--temporary signal declaration.
signal Reg1,Reg2,Reg3 : signed(7 downto 0) := (others => ’0’);
signal Reg4 : unsigned(2 downto 0) := (others => ’0’);
begin
R <= Reg3;
process(Clk)
begin
if(rising_edge(Clk)) then
Reg1 <= A;
Reg2 <= B;
Reg4 <= Op;
end if;
end process;
process(Clk)
begin
if(rising_edge(Clk)) then --Do the calculation at the positive edge of clock cycle.
case Reg4 is
when "000" =>
Reg3 <= Reg1 + Reg2; --addition
when "001" =>
Reg3 <= Reg1 - Reg2; --subtraction
when "010" =>
Reg3 <= not Reg1; --NOT gate
113
114 A.2. ENCOUNTER RTL TCL SCRIPT .TCL
when "011" =>
Reg3 <= Reg1 nand Reg2; --NAND gate
when "100" =>
Reg3 <= Reg1 nor Reg2; --NOR gate
when "101" =>
Reg3 <= Reg1 and Reg2; --AND gate
when "110" =>
Reg3 <= Reg1 or Reg2; --OR gate
when "111" =>
Reg3 <= Reg1 xor Reg2; --XOR gate
when others =>
NULL;
end case;
end if;
end process;
end Behavioral;
A.2 Encounter RTL TCL Script .tcl
########################################################
## Define what RTL language is used
set_attribute hdl_language vhdl
## Define the library search path
set_attribute lib_search_path /home/glenn_andre.johnsen/master_thesis/svn/digital/
standalone/lib/import/STDCELLLIB/lib/
## Define the specific library and/or characterized corner
set_attribute library {STDCELLLIB.lib}
## inserted constraints: to avoid cells:
set_attribute avoid true ao*
set_attribute avoid true fa*
set_attribute avoid true ha*
set_attribute avoid true mao*
set_attribute avoid true moa*
set_attribute avoid true mux*
set_attribute avoid true mx*
set_attribute avoid true nd3*
set_attribute avoid true nd4*
set_attribute avoid true nd5*
set_attribute avoid true nd6*
set_attribute avoid true nd8*
set_attribute avoid true nr3*
set_attribute avoid true nr4*
set_attribute avoid true nr5*
set_attribute avoid true nr6*
set_attribute avoid true nr8*
set_attribute avoid true oa*
set_attribute avoid true or*
set_attribute avoid true an*
set_attribute avoid true xor*
set_attribute avoid true xnr*
set_attribute avoid true dff*
set_attribute avoid true dfz*
set_attribute avoid true qdfz*
set_attribute avoid true qdla*
set_attribute avoid true dfe*
Chapter A HDl and Synthesis Scripts 115
## Define the input HDL file(s)
read_hdl ALU.vhd
## Generates a technology independent schematic
elaborate
## Read a constraint file (Must be defined by user) Contain operating conditions such as
clock waveforms, I/O timing, load, etc.
read_sdc ALU.sdc
## Generates a technology dependent (-to_mapped) or generic (-to_generic) schematic Effort
levels: low, medium and high
synthesize -to_mapped -effort high
## Writes a technology dependent netlist
write -mapped > ./output/ALU/ALU_output.v
## Write a constraint file to the encounter folder for place and route constraints
write_sdc > ./output/ALU/ALU_output.sdc
## Write out area and timing reports
report area > ./output/ALU/ALU_area_report.rep
report timing > ./output/ALU/ALU_timing_report.rep
report power > ./output/ALU/ALU_power_report.rep
## Writes a script if one is preferred
#write_script > script
########################################################
A.3 Encounter RTL Constraint file .sdc
########################################################
# Set parameter units
set_units -time ps
set_units -capacitance fF
# Set clock conditions f=32KHz..#16.368MHz
create_clock -name {Clk} -period 31250000.0 -waveform {0.0 15625000} [get_ports {Clk}]
# Set load capacitance to output node(s)
set_load 50 [get_ports {R}]
# Set max fanout
set_max_fanout 3 [get_designs simple_alu]
# Set wireload model (Described in the library .lib file)-specify the effect of the
interconnects on the timing and area
# set_wire_load_model -name "area_0Kto1K"
########################################################
Appendix B
ALU Stimuli Files
B.1 Dynamic Stimuli File
simulator lang=spectre
//global gnd!
//global mixvss!
//vdd (vdd! 0) vsource dc=vdd_var
//mixvdd (mixvdd! 0) vsource dc=vdd_var
//mixvss (mixvss! 0) vsource dc=0
//Gnd (gnd! 0) vsource dc=0
//K is the signal period
//parameters K = 1
//D1 is the delay of vA0
//parameters D1 = 0
//Clk is applied in simulator as stimuli with the same configuration:
//Vclk (Clk 0) vsource type=pulse val0=0 val1=vdd_var delay=0.5*t_period rise=risefalltime
fall=risefalltime width=0.5*t_period period=t_period
Va0 (A\\<0\\> 0) vsource dc=0
Va1 (A\\<1\\> 0) vsource dc=0
Va2 (A\\<2\\> 0) vsource dc=0
Va3 (A\\<3\\> 0) vsource dc=0
Va4 (A\\<4\\> 0) vsource dc=0
Va5 (A\\<5\\> 0) vsource dc=0
Va6 (A\\<6\\> 0) vsource dc=0
Va7 (A\\<7\\> 0) vsource dc=vdd_var
Vb0 (B\\<0\\> 0) vsource dc=vdd_var
Vb1 (B\\<1\\> 0) vsource dc=vdd_var
Vb2 (B\\<2\\> 0) vsource dc=vdd_var
Vb3 (B\\<3\\> 0) vsource dc=vdd_var
Vb4 (B\\<4\\> 0) vsource dc=vdd_var
Vb5 (B\\<5\\> 0) vsource dc=vdd_var
Vb6 (B\\<6\\> 0) vsource dc=vdd_var
Vb7 (B\\<7\\> 0) vsource dc=0
V1 (Op\\<0\\> 0) vsource type=pwl wave=\[
+ 0u 0 (0.995*t_period) 0
+ (1*t_period) vdd_var (1*t_period+0.995*t_period) vdd_var
+ (2*t_period) 0 (2*t_period+0.995*t_period) 0
+ (3*t_period) vdd_var (3*t_period+0.995*t_period) vdd_var
+ (4*t_period) 0 (4*t_period+0.995*t_period) 0
116
Chapter B ALU Stimuli Files 117
+ (5*t_period) vdd_var (5*t_period+0.995*t_period) vdd_var
+ (6*t_period) 0 (6*t_period+0.995*t_period) 0
+ (7*t_period) vdd_var (7*t_period+0.995*t_period) vdd_var
+ ]
V2 (Op\\<1\\> 0) vsource type=pwl wave=\[
+ 0u 0 (0.995*t_period) 0
+ (1*t_period) 0 (1*t_period+0.995*t_period) 0
+ (2*t_period) vdd_var (2*t_period+0.995*t_period) vdd_var
+ (3*t_period) vdd_var (3*t_period+0.995*t_period) vdd_var
+ (4*t_period) 0 (4*t_period+0.995*t_period) 0
+ (5*t_period) 0 (5*t_period+0.995*t_period) 0
+ (6*t_period) vdd_var (6*t_period+0.995*t_period) vdd_var
+ (7*t_period) vdd_var (7*t_period+0.995*t_period) vdd_var
+ ]
V3 (Op\\<2\\> 0) vsource type=pwl wave=\[
+ 0u 0 (0.995*t_period) 0
+ (1*t_period) 0 (1*t_period+0.995*t_period) 0
+ (2*t_period) 0 (2*t_period+0.995*t_period) 0
+ (3*t_period) 0 (3*t_period+0.995*t_period) 0
+ (4*t_period) vdd_var (4*t_period+0.995*t_period) vdd_var
+ (5*t_period) vdd_var (5*t_period+0.995*t_period) vdd_var
+ (6*t_period) vdd_var (6*t_period+0.995*t_period) vdd_var
+ (7*t_period) vdd_var (7*t_period+0.995*t_period) vdd_var
+ ]
B.2 Static Stimuli File
simulator lang=spectre
//global gnd!
//vdd (vdd! 0) vsource dc=vdd_var
//mixvdd (mixvdd! 0) vsource dc=vdd_var
//mixvss (mixvss! 0) vsource dc=0
//Gnd (gnd! 0) vsource dc=0
//K is the signal period
//parameters K = 1
//D1 is the delay of vA0
//parameters D1 = 0
// One clock cycle before applying static value:
Vclk (Clk 0) vsource type=pwl wave=\[
+ 0u 0 (0.4*t_period) 0
+ (0.42*t_period) vdd_var (0.96*t_period) vdd_var
+ (0.98*t_period) 0 (2*t_period) 0
+ ]
Va0 (A\\<0\\> 0) vsource dc=0
Va1 (A\\<1\\> 0) vsource dc=0
Va2 (A\\<2\\> 0) vsource dc=0
Va3 (A\\<3\\> 0) vsource dc=0
Va4 (A\\<4\\> 0) vsource dc=0
Va5 (A\\<5\\> 0) vsource dc=0
Va6 (A\\<6\\> 0) vsource dc=0
Va7 (A\\<7\\> 0) vsource dc=vdd_var
118 B.2. STATIC STIMULI FILE
Vb0 (B\\<0\\> 0) vsource dc=vdd_var
Vb1 (B\\<1\\> 0) vsource dc=vdd_var
Vb2 (B\\<2\\> 0) vsource dc=vdd_var
Vb3 (B\\<3\\> 0) vsource dc=vdd_var
Vb4 (B\\<4\\> 0) vsource dc=vdd_var
Vb5 (B\\<5\\> 0) vsource dc=vdd_var
Vb6 (B\\<6\\> 0) vsource dc=vdd_var
Vb7 (B\\<7\\> 0) vsource dc=0
V1 (Op\\<0\\> 0) vsource type=pwl wave=\[
+ 0u 0 (0.995*t_period) 0
+ ]
V2 (Op\\<1\\> 0) vsource type=pwl wave=\[
+ 0u 0 (0.995*t_period) 0
+ ]
V3 (Op\\<2\\> 0) vsource type=pwl wave=\[
+ 0u 0 (0.995*t_period) 0
+ ]
Appendix C
Additional Simulation and test bench setup
for Sub-threshold Cells
C.1 NOR2 Gate Test bench Setup
Figure C.1: Test bench setup to simulate both VTC and switching analysis of NOR2 gate with
one input sinked to GND and the other connected in chain.
119
120 C.2. XNOR2 GATE TEST BENCH SETUP
Figure C.2: Test bench setup to simulate both VTC and switching analysis of NOR2 gate with
both input connected in chain.
C.2 XNOR2 Gate Test bench Setup
Figure C.3: Test bench setup to simulate both VTC and switching analysis of XNOR2 gate with
one input sinked to GND and the other connected in chain.
Chapter C Additional Simulation and test bench setup for Sub-threshold Cells 121
C.3 XOR2 Gate Test Bench Setup
Figure C.4: Test bench setup to simulate both VTC and switching analysis of XOR2 gate with
one input sourced to VDD and the other connected in chain.
C.4 AOI22 Gate Test Bench Setup
(a) VTC analysis (b) Transient analysis
Figure C.5: Test bench setup of VTC and switching analysis of AOI22 gate with two input
sourced to VDD, one input sinked to GND and the other connected in chain.
122 C.5. OAI22 GATE TEST BENCH SETUP
C.5 OAI22 Gate Test Bench Setup
(a) VTC analysis (b) Transient analysis
Figure C.6: Test bench setup of VTC and switching analysis of OAI22 gate with one input
sourced to VDD, two input sinked to GND and the other connected in chain.
Appendix D
Intermediate Results
D.1 Alternative Designs for all Cells
Table D.1: Inverter: Design sizes chosen for further investigation.
Design 1 2 3 (Chosen) 4 5 6
pMOS (W / L) 160 / 389 160 / 500 160 / 240 160 / 480 160 / 600 160 / 720
nMOS (W / L) 385 / 120 350 / 120 160 / 480 160 / 480 160 / 600 160 / 720
MOS Area [fm2] 108.4 122.0 115.2 153.6 192.0 230.4
Table D.2: NAND2: Design sizes chosen for further investigation.
Design 1 2 3 4 (Chosen) 5 6
pMOS (W / L) 160 / 389 160 / 240 160 / 480 160 / 270 160 / 520 160 / 720
nMOS (W / L) 385 / 120 160 / 480 160 / 480 160 / 720 160 / 720 160 / 720
MOS Area [fm2] 216.9 230.4 307.2 316.8 396.8 460.8
Table D.3: NOR2: Design sizes chosen for further investigation.
Design 1 2 3 4 5 (Chosen)
pMOS (W / L) 160 / 200 160 / 200 160 / 150 160 / 240 160 / 150
nMOS (W / L) 300 / 120 350 / 120 160 / 600 160 / 480 160 / 720
MOS Area [fm2] 136.0 148.0 240.0 230.4 278.4
123
124 D.1. ALTERNATIVE DESIGNS FOR ALL CELLS
Table D.4: XNOR2: Design sizes chosen for further investigation.
Design 1 2 3 (Chosen) 4
pMOS (W / L) 160 / 160 160 / 200 160 / 160 160 / 200
nMOS (W / L) 350 / 120 160 / 600 160 / 720 160 / 720
MOS Area [fm2] 270.4 448.0 563.2 588.8
Chapter D Intermediate Results 125
D.2 Inverter Gate
D.2.1 VTC Analysis Results
1 2 3 4 5 6
0
2
4
6
8
10
12
Design
Pe
rc
en
ta
ge
 [%
]
INVERTER VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4 5 6
0
2
4
6
8
10
12
Design
Pe
rc
en
ta
ge
 [%
]
INVERTER VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4 5 6
−2
0
2
4
6
8
10
Design
Pe
rc
en
ta
ge
 [%
]
INVERTER VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4 5 6
−10
−8
−6
−4
−2
0
2
4
Design
Pe
rc
en
ta
ge
 [%
]
INVERTER VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4 5 6
−14
−12
−10
−8
−6
−4
−2
0
2
4
Design
Pe
rc
en
ta
ge
 [%
]
INVERTER VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4 5 6
−20
−15
−10
−5
0
5
Design
Pe
rc
en
ta
ge
 [%
]
INVERTER VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.1: Monte Carlo INVERTER schematic and layout: mean midpoint percentage and std. dev.
with process and mismatch.
126 D.2. INVERTER GATE
D.2.2 Switching Analysis Results w/ and w/o Parasitics
1 2 3 4 5 6
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6 x 10
−6
Design
Ti
m
e 
[s]
INVERTER delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4 5 6
0
0.2
0.4
0.6
0.8
1
1.2
1.4 x 10
−7
Design
Ti
m
e 
[s]
INVERTER delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4 5 6
0
0.5
1
1.5
2
2.5
3
3.5
4 x 10
−8
Design
Ti
m
e 
[s]
INVERTER delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4 5 6
0
0.5
1
1.5
2
2.5
3
3.5
x 10−10
Design
Ti
m
e 
[s]
INVERTER delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4 5 6
0
1
2
3
4
5 x 10
−10
Design
Ti
m
e 
[s]
INVERTER delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4 5 6
0
1
2
3
4
5
6 x 10
−10
Design
Ti
m
e 
[s]
INVERTER delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.2: Monte Carlo INVERTER schematic and layout: propagation mean delay and std. dev.
with process and mismatch.
Chapter D Intermediate Results 127
D.3 NAND2 Gate
D.3.1 VTC Analysis Results
1 2 3 4 5 6
0
5
10
15
20
25
30
Design
Pe
rc
en
ta
ge
 [%
]
NAND VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4 5 6
0
5
10
15
20
25
Design
Pe
rc
en
ta
ge
 [%
]
NAND VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4 5 6
0
5
10
15
20
25
30
Design
Pe
rc
en
ta
ge
 [%
]
NAND VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4 5 6
−10
−8
−6
−4
−2
0
2
4
6
8
Design
Pe
rc
en
ta
ge
 [%
]
NAND VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4 5 6
−15
−10
−5
0
5
10
15
Design
Pe
rc
en
ta
ge
 [%
]
NAND VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4 5 6
−20
−15
−10
−5
0
5
10
15
Design
Pe
rc
en
ta
ge
 [%
]
NAND VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.3: Monte Carlo NAND schematic and layout: mean midpoint percentage and std. dev.
with process and mismatch.
128 D.3. NAND2 GATE
D.3.2 Switching Analysis Results w/ and w/o Parasitics
1 2 3 4 5 6
0
0.5
1
1.5
2
2.5
3
3.5
4 x 10
−6
Design
Ti
m
e 
[s]
NAND delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4 5 6
0
0.5
1
1.5
2
2.5 x 10
−7
Design
Ti
m
e 
[s]
NAND delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4 5 6
0
1
2
3
4
5
6
7
x 10−8
Design
Ti
m
e 
[s]
NAND delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4 5 6
0
1
2
3
4
5
6
7
x 10−10
Design
Ti
m
e 
[s]
NAND delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4 5 6
0
1
2
3
4
5
6
7
8 x 10
−10
Design
Ti
m
e 
[s]
NAND delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4 5 6
0
0.2
0.4
0.6
0.8
1 x 10
−9
Design
Ti
m
e 
[s]
NAND delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.4: Monte Carlo NAND schematic and layout: propagation mean delay and std. dev. with
process and mismatch.
Chapter D Intermediate Results 129
D.4 NOR2 Gate
D.4.1 VTC Analysis Results
1 2 3 4 5
0
2
4
6
8
10
12
Design
Pe
rc
en
ta
ge
 [%
]
NOR VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4 5
0
2
4
6
8
10
12
14
Design
Pe
rc
en
ta
ge
 [%
]
NOR VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4 5
−15
−10
−5
0
5
10
15
Design
Pe
rc
en
ta
ge
 [%
]
NOR VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4 5
−14
−12
−10
−8
−6
−4
−2
0
2
4
Design
Pe
rc
en
ta
ge
 [%
]
NOR VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4 5
−20
−15
−10
−5
0
5
Design
Pe
rc
en
ta
ge
 [%
]
NOR VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4 5
−25
−20
−15
−10
−5
0
5
Design
Pe
rc
en
ta
ge
 [%
]
NOR VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.5: Monte Carlo NOR schematic and layout: mean midpoint percentage and std. dev. with
process and mismatch.
130 D.4. NOR2 GATE
D.4.2 Switching Analysis Results w/ and w/o Parasitics
1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2 x 10
−6
Design
Ti
m
e 
[s]
NOR delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4 x 10
−7
Design
Ti
m
e 
[s]
NOR delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
4 x 10
−8
Design
Ti
m
e 
[s]
NOR delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4 5
0
0.5
1
1.5
2
2.5
x 10−10
Design
Ti
m
e 
[s]
NOR delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5 x 10
−10
Design
Ti
m
e 
[s]
NOR delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4 5
0
0.5
1
1.5
2
2.5
3
3.5
x 10−10
Design
Ti
m
e 
[s]
NOR delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.6: Monte Carlo NOR schematic and layout: propagation mean delay and std. dev. with
process and mismatch.
Chapter D Intermediate Results 131
D.5 XNOR2 Gate
D.5.1 VTC Analysis Results
1 2 3 4
0
5
10
15
Design
Pe
rc
en
ta
ge
 [%
]
XNOR VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4
0
2
4
6
8
10
12
Design
Pe
rc
en
ta
ge
 [%
]
XNOR VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4
0
2
4
6
8
10
12
14
Design
Pe
rc
en
ta
ge
 [%
]
XNOR VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4
−6
−5
−4
−3
−2
−1
0
1
2
3
Design
Pe
rc
en
ta
ge
 [%
]
XNOR VTC in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4
−10
−8
−6
−4
−2
0
2
4
5
Design
Pe
rc
en
ta
ge
 [%
]
XNOR VTC in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4
−15
−13
−11
−9
−7
−5
−3
−1
1
3
5
7
9
10
Design
Pe
rc
en
ta
ge
 [%
]
XNOR VTC in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.7: Monte Carlo XNOR schematic and layout: mean midpoint percentage and std. dev. with
process and mismatch.
132 D.5. XNOR2 GATE
D.5.2 Switching Analysis Results w/ and w/o Parasitics
1 2 3 4
0
1
2
3
4
5
6 x 10
−6
Design
Ti
m
e 
[s]
XNOR delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(a) -40◦C, 350mV.
1 2 3 4
0
0.5
1
1.5
2
2.5
3
3.5 x 10
−7
Design
Ti
m
e 
[s]
XNOR delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(b) 25◦C, 350mV.
1 2 3 4
0
1
2
3
4
5
6
7
8 x 10
−8
Design
Ti
m
e 
[s]
XNOR delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(c) 85◦C, 350mV.
1 2 3 4
0
1
2
3
4
5
6 x 10
−10
Design
Ti
m
e 
[s]
XNOR delay in schematic and layout, −40°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(d) -40◦C, 1.2V.
1 2 3 4
0
1
2
3
4
5
6
7
x 10−10
Design
Ti
m
e 
[s]
XNOR delay in schematic and layout, 25°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(e) 25◦C, 1.2V.
1 2 3 4
0
1
2
3
4
5
6
7
8 x 10
−10
Design
Ti
m
e 
[s]
XNOR delay in schematic and layout, 85°C
 
 
Sch.: Mean
Sch.: Std. dev.
Lay.: Mean
Lay.: Std. dev.
(f) 85◦C, 1.2V.
Figure D.8: Monte Carlo XNOR schematic and layout: propagation mean delay and std. dev. with
process and mismatch.
