Reliable Integrated Circuits Design and Test at Sub-45nm Technologies by Chen, Jifeng
University of Connecticut
OpenCommons@UConn
Doctoral Dissertations University of Connecticut Graduate School
9-3-2013
Reliable Integrated Circuits Design and Test at
Sub-45nm Technologies
Jifeng Chen
chenjifeng@gmail.com
Follow this and additional works at: https://opencommons.uconn.edu/dissertations
Recommended Citation
Chen, Jifeng, "Reliable Integrated Circuits Design and Test at Sub-45nm Technologies" (2013). Doctoral Dissertations. 246.
https://opencommons.uconn.edu/dissertations/246
Reliable Integrated Circuits Design and Test
at Sub-45nm Technologies
Jifeng Chen, Ph.D.
University of Connecticut, 2013
The rapid scaling of CMOS technology into the 45nm feature node or below enables the
design of higher-performance chips through the construction of complex and powerful
circuitry. Under area and power constraints similar to those of the past, circuit perfor-
mance can raise to multi-GHz. The major impulse is that current design can provide
an abundance of functionalities with a compacted layout of millions of transistors in a
small chip area. However, shrinking transistor dimensions and highly condensed design
layout inevitably raise reliability issues including a notable impact on performance, e-
specially aging effects and process variations. The timing uncertainties introduced by
these factors cause path delays to deviate considerably from their deterministic values
and to span to a wide range. Consequently, transistor and gate capabilities are degrad-
ed. Circuits experience aggravated performance degradation, increased yield loss and
escape, and a reduced operational lifetime. Circuit reliability analysis at the pre-silicon
stage has become vital for sub-45nm technology designs to detect aging effects in partic-
ular, such as Negative Bias Temperature Instability (NBTI) and Hot Carrier Injection
(HCI). Meanwhile, clock tree, the most active component in a circuit, experiences a
severe skew problem. Traditional load-matching and zero-skew routing techniques are
not as effective upon aging effects and process variations as they are in large node tech-
nologies. In addition, structure tests are required to ensure that the products delivered
to customers satisfy design specifications. Of all test methods, the path delay fault
(PDF) test is the most effective method to bin the circuit frequency and evaluate the
circuit performance. Considering aging effects and process variations, conventional PDF
pattern generation and tests face the problems of test complexity and high test cost.
This thesis targets these challenging issues in reliable integrated circuits designs and
tests to deal with the performance impacts. The techniques developed here include:
(1) A comprehensive analysis of the importance of performance impacts, including two
major aging effects, NBTI and HCI. (2) A novel methodology, named aging-aware path
delay (APD) analysis flow. APD flow is developed based on current commercial tools
to guarantee its high accuracy and low CPU runtime on circuit-level aging analysis. (3)
A selective gate sizing method, which makes use of APD flow analysis results. This
method is proposed to efficiently mitigate reliability threats and requires minimal area
overhead. (4) A representative critical reliability path design to construct stand-alone
circuitry for accurate performance evaluation and path delay deviation measurement.
(5) A novel flow for reducing clock skew with high efficiency. This flow takes into ac-
count the impact of NBTI and process variations. (6) A novel test cost reduction flow.
Post-silicon measurements on the ring-oscillators (ROs) facilitate the prediction of ac-
tual delay variations. Path delays are ranked accordingly, and PDF patterns are thus
reduced. (7) A novel methodology for identifying testable representative paths (TRPs)
for path delay fault tests. This technique reduces PDF patterns and saves test cost
and time considering both NBTI effect and process variations. These methodologies
are proposed to address challenging issues to design reliable integrated circuits, and
to provide a pathway for the successful design and testing of unpredictable silicon at
sub-45nm technologies.
Reliable Integrated Circuits Design and Test
at Sub-45nm Technologies
Jifeng Chen
M.S., University of Arizona, Tucson, USA, 2009
M.E., Xidian University, Xi’an, China, 2006
B.E., Xidian University, Xi’an China, 2002
A Dissertation
Submitted in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
at the
University of Connecticut
2013
Copyright by
Jifeng Chen
2013

Dedicated to my dearest family.
iii
ACKNOWLEDGEMENTS
I am very grateful to my sincere advisor, Dr. Mohammad Tehranipoor, for his contin-
uous support and guidance during my Ph.D research and study at the University of
Connecticut. Thanks to his patience, wisdom, and inspiring suggestions, I enjoyed my
research and study in science and engineering. Without his patient instruction, insight-
ful criticism, and expert guidance, I would not have obtained the necessary theoretical
knowledge, analytical thinking techniques, and written and presentation skills. What
I learned from him is not only knowledge of my academic area, but also the valuable
skills that qualify me as a researcher and prepare me for success in my future career.
My deep gratitude and appreciation also go to Dr. Peilin Song, Dr. Franco Stellari,
Dr. Ukinder Kindereit, Dr. Alan Weger, Dr. Thomas M. Shaw, Lynne Gignac, Dr.
Dirk Pfeiffer, Dr. Griselda Bonila, and C-K. Hu for their help and beneficial discussion
during my intern work at IBM T. J. Watson Research Center.
My sincere thanks to Dr. John Chandy, Dr. Lei Wang, Dr. Mehdi Anwar, and Dr.
Omer Khan for reading my proposal and dissertation and providing constructive feed-
back.
I would like to thank LeRoy Winemberg (at Freescale), John Carulli (at Texas Instru-
ments), Dr. Kenneth Butler (at Texas Instruments), Dr. David Yeh, and other experts
from Semiconductor Research Cooperation (SRC) for their beneficial discussion.
My thanks also go to my previous labmates Junxia Ma (now with LSI), Ke Peng (now
iv
with Freescale), Xiaoxiao Wang (now with Freescale), Shuo Wang (now with Qualcom-
m), Hassan Salmani (now with Howard University), and Niranjan Kayam (now with
Synopsys) for their great help in our lab. I would also like to thank the members of our
research group, Wei Zhao, Xuehui Zhang, Fang Bao, Kan Xiao, Ujjwal Guin, Qihang
Shi, Kun Yang, Gustavo Contreras, Halit Dogan, Miao He, and Ivy Stabell for their
help. I owe very special thanks to Ivy Stabell for her time on my papers.
I would also like to thank the ECE staff, especially Mary McCarthy, Dee Stolstrom,
Paul Dufilie, Celine Goorahoo, and Catherine Dagon for all their warm hearted help.
I am also deeply indebted to my teachers, mentors, classmates and friends in my life.
There are so many of them that I cannot list all their names.
Last, but not least, I would like to thank my parents for their spiritual care and pro-
tection, and for their amazing love and trust. I thank my brother, Jiwei Chen, and his
wife, Guangying Zhang, for taking such good care of our parents. I would like to thank
my wife, Dandan Wang for her support, encouragement, quiet patience, and unwavering
love. I also thank my father- and mother-in-law, Yongxiang Wang and Ming Ma, for
their faith in me and my ambition. They brought my angel to this world. My wonderful
family inspired my drive and ability to tackle challenges head on. It is my greatest
honor to dedicate this work to them.
v
TABLE OF CONTENTS
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Circuit Reliability Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Negative Bias Temperature Instability (NBTI) . . . . . . . . . . . . . . . 4
1.1.2 Hot-Carrier Injection (HCI) . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Performance Analysis and Compensation Methodologies . . . . . . . . . . . 14
1.2.1 Pre-Silicon Design Methodologies . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2 Post-Silicon Runtime Techniques . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Research Focuses of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 Aging Analysis and Compensation . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Clock Skew Considering NBTI and Process Variations . . . . . . . . . . . 22
1.3.3 Post-Silicon Monitoring and Calibration for Performance Evaluation . . . 23
1.3.4 Path-Delay Faults Pattern Generation Considering Process Variations . . 24
1.3.5 Path-Delay Fault Pattern Generation Considering Process Variations and
NBTI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Contributions of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2. Aging Analysis and Compensation . . . . . . . . . . . . . . . . . . . . . 28
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Critical-Reliability Paths Analysis . . . . . . . . . . . . . . . . . . . . . . . 35
vi
2.3 Aging-Aware Models and Analysis . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.1 Gate Delay Models Considering NBTI and HCI . . . . . . . . . . . . . . . 40
2.3.2 Library Characterization Considering Aging . . . . . . . . . . . . . . . . . 44
2.3.3 Output Slew Rate Function Forms . . . . . . . . . . . . . . . . . . . . . . 47
2.3.4 Accuracy and Efficiency Analysis . . . . . . . . . . . . . . . . . . . . . . . 48
2.4 APD Analysis Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Critical Reliability Path and Critical Reliability Gates Identification . . . . 56
2.5.1 CRP Delay Analysis and Selection . . . . . . . . . . . . . . . . . . . . . . 56
2.5.2 CRG Identification Technique . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3. Representative Critical Reliability Paths . . . . . . . . . . . . . . . . . 82
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.2 Concept of RCRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4 RCRP Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4. Clock Skew Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2 NBTI Effect and Process Variations Characterization . . . . . . . . . . . . 109
4.2.1 NBTI Effect Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
vii
4.2.2 Statistical Prediction of NBTI Effect under Process Variations . . . . . . 112
4.3 Motivation for Controlling on Clock Path Degradation and Rate using Multi-
Vth Clock Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4 Clock Skew Reduction Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5. Test-Cost Reduction Considering Silicon Variations . . . . . . . . . . 132
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2 Delay Analysis considering Process Variations . . . . . . . . . . . . . . . . . 135
5.2.1 The First-Order Linear Delay Model . . . . . . . . . . . . . . . . . . . . . 135
5.2.2 Relationship between Delays of Gates, Paths, and Ring-Oscillators . . . . 140
5.3 Delay Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.4 Silicon-Variation-Aware PDF Pattern Generation . . . . . . . . . . . . . . . 144
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6. Testable Representative Paths . . . . . . . . . . . . . . . . . . . . . . . . 158
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.2 Delay Analysis Considering Process Variations and NBTI Effect . . . . . . 162
6.2.1 Gate Delay Analysis and Approximation . . . . . . . . . . . . . . . . . . 162
6.2.2 Path Structure and Delay Analysis . . . . . . . . . . . . . . . . . . . . . . 167
6.3 Testable Representative Paths (TRP) Identification Flow . . . . . . . . . . 169
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
viii
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Bibliography 190
ix
LIST OF FIGURES
1.1 The empirical bath-tub curve evaluating circuit reliability failure rate over
operational time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The mechanism of negative bias temperature instability (NBTI). . . . . . . 5
1.3 Vth degradation and recovery of NBTI effect [3] [4]. . . . . . . . . . . . . . 6
1.4 Vth degradation according to different cycle time considering NBTI effect [3]. 7
1.5 The mechanism of hot-carrier injection (HCI). . . . . . . . . . . . . . . . . 8
1.6 NMOS Vth degradation due to HCI effect in addition to the gate oxide degra-
dation for different stress inputs, as a function of stress time [5]. . . . . 9
1.7 PMOS Vth degradation due to gate oxide degradation without HCI effect for
different stress inputs, as a function of stress time [5]. . . . . . . . . . . 10
1.8 Major fabrication steps in the conventional CMOS process flow. . . . . . . . 11
1.9 Reliable integrated circuits design taxonomy. . . . . . . . . . . . . . . . . . 15
1.10 Conservative design methodologies. . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Aging analysis considering off-path SP, on-path SP (all SPs are obtained
from Monte Carlo simulation by using our VCS-VPI tool (discussed later
in Figure 2.3)), Temperature=125oC. . . . . . . . . . . . . . . . . . . . . 37
2.2 Analyzing aging effects on paths delay considering different WLs. . . . . . . 39
2.3 APD flow including four major steps. . . . . . . . . . . . . . . . . . . . . . 53
2.4 CRP and CRG selection flow. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
x
2.5 CRG selection using the concept of importance to CRGs: (a) an illustrative
circuit, (b) paths in the circuit and their timing, (c) order of gate aging
at time t, and (d) the gate selection and sizing to compensate aging. . 61
2.6 Correlation analysis for different workloads. . . . . . . . . . . . . . . . . . . 69
2.7 Path delay difference and overlap analysis considering degradation due to
aging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.8 Computational complexity comparison between HSPICE MOSRA and APD
flow for different benchmark circuits. . . . . . . . . . . . . . . . . . . . . 73
2.9 Overall computational time comparison between HSPICE MOSRA and APD
flow considering path count. . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.10 Number of CRGs for different timing margins (when T = 75oC,WL = 0.5,
and t = 10 years). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.11 Number of CRGs for different stress times (when T = 75oC,WL = 0.5). . . 77
2.12 CRP delay comparison before/after gate sizing on the CRGs with varying
stress time (when Γm = 5%,WL = 0.5, and T = 75
oC). . . . . . . . . . 78
2.13 CRP delay comparison before/after gate sizing on the CRGs with varying
stress time (when Γm = 10%,WL = 0.5, and T = 75
oC). . . . . . . . . . 80
2.14 CRP delay comparison before/after gate sizing on the CRGs with varying
Γm (when WL = 0.5, T = 75
oC, and t = 10 years). . . . . . . . . . . . . 80
3.1 Demonstration of the RCRP concept. . . . . . . . . . . . . . . . . . . . . . 86
3.2 Implementation of an RCRP. . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.3 Off-path workload analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
xi
3.4 Actual largest delay vs. RCRP estimation in s9234 when WL=50% and
temperature is 75oC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5 Normalized MSE under different area budgets. . . . . . . . . . . . . . . . . 100
3.6 Range of adjusted error under different area budgets. . . . . . . . . . . . . . 101
3.7 Critical reliability paths coverage under different area budgets. . . . . . . . 102
4.1 Normalized delay of a minimum-sized clock buffer versus initial threshold
voltage Vth and stress time t (α = 0.5, and temperature T = 125
oC). . . 111
4.2 Path delay and degradation comparison: (a) before replacement; (b) after
replacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.3 Clock tree synthesis flow for skew reduction. . . . . . . . . . . . . . . . . . . 119
4.4 Clock skew reduction for ethernet benchmark circuit considering NBTI and
process variation (T = 125oC). . . . . . . . . . . . . . . . . . . . . . . . 123
4.5 Clock skew reduction for benchmark circuit ethernet under different tem-
perature scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1 Gate upper-bound delay analysis corresponding to different input combi-
nations (r1, f1, 1r, 1f) on AND X4 gate, where the letter indicates the
transition on the on-path input (r: rise, f: fall), and the number indicates
the status of the off-path input. . . . . . . . . . . . . . . . . . . . . . . . 138
5.2 Circuit layout example. ROs are measured after fabrication to obtain varia-
tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.3 The proposed SVA PDF pattern generation flow. . . . . . . . . . . . . . . . 145
5.4 Comparing two paths length using SVA and SSTA. . . . . . . . . . . . . . . 151
xii
5.5 Comparison of ranking accuracy on gate-dominant paths using SVA, STA,
and SSTA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.6 Comparison of ranking accuracy on interconnect-dominant paths using SVA,
STA, and SSTA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.7 Comparison of ranking accuracy using SVA, STA, and SSTA for 400 critical
paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.8 Improvement of ranking accuracy using SVA over STA and SSTA. . . . . . 154
5.9 RSTA and RSSTA using SVA over STA and SSTA. . . . . . . . . . . . . . . 155
6.1 Mean and variance approximations corresponding to gate type INV X1,
where the falling delays are measured using HSPICE MOSRA and com-
pared with model approximation at 11 stress time points. . . . . . . . . 165
6.2 An example circuit with 9 basic gates. . . . . . . . . . . . . . . . . . . . . . 169
6.3 Delay distributions of 4 paths at time 0 and time t considering process vari-
ations and NBTI (“pdf” stands for probability distribution function). . . 172
6.4 TRP selection and TRP-PDF pattern generation. . . . . . . . . . . . . . . . 176
6.5 Case 1: Comparison of the maximum mean delays and variances measured
from all 1992 critical paths and 638 TRPs from benchmark circuit s9234
over time considering process variations and NBTI (Temperature: T =
25oC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
xiii
6.6 Case 2: 100 comparisons of maximum mean delays and variances measured
from all 1992 critical paths and 638 TRPs from benchmark circuit s9234
over time considering process variations and NBTI (Temperature: T =
25oC, Stress time: t ≈ 10 years). . . . . . . . . . . . . . . . . . . . . . . 181
xiv
LIST OF TABLES
2.1 Accuracy and efficiency analysis for mathematical models . . . . . . . . . . 49
2.2 Error and accuracy comparison . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3 Accuracy analysis for APD compared to HSPICE (T = 125oC,WL = 0.25) 67
2.4 Accuracy analysis for APD compared to HSPICE (T = 125oC,WL = 0.50) 67
2.5 Accuracy analysis for APD compared to HSPICE (125oC) under different
workload (WL = 0.75) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.6 Correlation analysis for APD compared to HSPICE (125oC) with different
WLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.7 CRP and CRG identification for three timing margins with 10 years NBTI
and HCI degradation (T = 75oC,WL = 0.5) . . . . . . . . . . . . . . . . 79
3.1 Area overhead and estimation accuracy of RCRPs when Areabudget = 1%. . 98
3.2 Selective workload sampling for s9234 and s13207. . . . . . . . . . . . . . . 104
4.1 Clock skew comparison before and after buffer replacement at different stress
time (T = 125oC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.2 Clock skew reduction and buffer replacement analysis (T = 125oC, t =
10 years) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.1 The Structure of Paths in Figure 6.2 . . . . . . . . . . . . . . . . . . . . . . 169
xv
6.2 Comparison of the maximum mean delays and variances measured from all
1992 critical paths and 638 TRPs from benchmark circuit s9234 at four
stress time points (t = 2.5, 5, 7.5, and 10 years) considering process
variations and NBTI (Temperature: T = 25oC, 50oC, 75oC, 100oC, and
125oC) (for each case, 100 round of MC simulations were run). . . . . . 182
6.3 TRP identification considering process variations and NBTI (T = 25oC, and
125oC, 100 round of MC simulations) . . . . . . . . . . . . . . . . . . . 184
6.4 PDF pattern reduction analysis. . . . . . . . . . . . . . . . . . . . . . . . . 186
xvi
Chapter 1
Introduction
Industry implementation and academic research has driven CMOS technology into a
very deep sub-micron area. Advanced technologies are being developed to shrink tran-
sistor dimensions, following Moore’s Law [1] into the nano-regime below the 45nm
feature node. Technological advancement makes it possible to realize more powerful
circuitry by packing an extremely large number of gates into a small area. According
to Moore’s Law, transistor dimensions scale down by about 30% with every technology
generation. Correspondingly, circuit performance is expected to improve at a speed of
1.4x, doubling the transistor density for each technology generation.
As frequency speed up, circuits become more susceptible to environmental and
intrinsic factors as technology scales down. The immediate negative impact from these
performance factors, to some extent, may cause failures in circuit functionalities. A-
mong all these factors, process variations and aging effects are regarded as the two most
important reliability issues. Process variations, including inter-die and intra-die varia-
tions, cause circuit performance to deviate from nominal design specifications. Intra-die
variations cause spatial difference in the values of the physical parameters within iden-
tical die. Inter-die variations result in bias in parameter values from die to die. In
1
2contrast, aging effects shift gate and path delays gradually over time. As a result, cir-
cuit functionalities may fail after a period of time, depending on the actual degrading
conditions of the circuitry.
The introduction to this thesis examines reliability issues, process variations, and
aging effects. It provides an overview of the mechanisms of aging effects and process
variations as well as the related conventional performance compensation methodologies.
1.1 Circuit Reliability Issues
Reliability describes the capability of an item to perform its required functions, its
resistance to unexpected conditions, and its ability to remain operational for a long
period of time. In the integrated circuit (IC) regime, these reliability definitions are
also used to characterize the necessary features of circuit components. Figure 1.1 shows
an overview of the circuit reliability failure rate as a function of operational lifetime.
This curve is called the empirical bath tub curve [2]. In this figure, the curve starts in an
infant mortality phase, which presents a very high circuit failure rate. Manufacturing
defects are electronically malfunctioning circuitry within a tiny chip area, caused by
errors in the fabrication process. During this stage, high cost burn-in tests are required
to ensure that the products delivered to customers satisfy the design specifications with
no defects. The second stage, the operational time phase, has a low and constant
failure rate. During this stage, tests are characterized as qualification tests to estimate
the product’s lifetime. The last stage of the curve, the wearout phase, shows a sharp
increase in the failure rate. In actual product applications, this region may not be
3Circuit lifetime
Reliability failure rate
Infant 
mortality
phase
Wearout
phase
Operational
time phase
Fig. 1.1: The empirical bath-tub curve evaluating circuit reliability failure rate over
operational time.
observable, since most of the products fail in functionality before reaching this region.
By comparing all three phases described in Figure 1.1, we can see that reliable integrated
circuit designs and tests are subject to the reliability analysis, degradation estimation,
and lifetime prediction during the second phase.
In cutting-edge very large scale integrated (VLSI) circuit designs and tests, a
number of fundamental reliability issues serve as obstacles necessitating careful atten-
tion and research efforts. One of these issues, the effect of on-chip process variations
upon CMOS performance, has been explored for decades. Though different mathe-
matical models and estimation methodologies have been proposed, effective solutions
applicable to all technology nodes remain an open challenge. Aging failure mechanisms,
by contrast, have recently attracted extra attention. Failure effects unavoidably degrade
the lifetime of an individual transistor and can therefore threaten the lifetime of the en-
tire circuit. These aging issues need to be addressed from the design point as well as the
4testing point, because most of the aging effects evolve with time and depend on actual
operational conditions. Existing aging-related reliability issues primarily include:
• Time-dependent gate-oxide breakdown (TDDB)
• Negative-bias temperature instability (NBTI)
• Hot-carrier injection (HCI)
• Electromigration (EM)
• Electrostatic discharge (ESD)
• Latch-up
• Leakage isolation
This thesis focuses on two aging effects, NBTI and HCI, which are considered the
most important, because they so severely degrade performance at the circuit level.
1.1.1 Negative Bias Temperature Instability (NBTI)
NBTI is a major reliability issue in deep sub-micron feature nodes, especially those
below the 45nm region. NBTI degrades PMOS transistor robustness, impacts gate
performance, and reduces the circuit’s operational lifetime. NBTI occurs when traps
accumulate on the boundary of the Si − SiO2 interface and when the transistor is
reversely biased (Vgs = −Vdd). Temperature plays an important role as an accelerating
factor in NBTI effect. The mechanism and progressive development of NBTI is highly
related to dangling Si atoms in the gate dielectric and fabrication process perturbation.
5p+ p+
Si Si Si Si Si Si Si Si
H
H H
H
H H H
H
n+ n+
Reverse biased Vg=0
Vd=VddVs=Vdd
Gate dielectric
Fig. 1.2: The mechanism of negative bias temperature instability (NBTI).
In current fabrication technology, the gate channel is made of highly ordered crystalline
silicon, where the gate dielectric is traditionally made of amorphous SiO2. The interface
surface of these two dissimilar materials is formidably rough. Dangling Si atoms are
generated from the fabrication process. Trapping these Si atoms in the gate dielectric
will decrease the MOS transistor performance. In order to remove these dangling Si
atoms, the conventional CMOS manufacturing process anneals the dangling bonds from
FETs fabrication after their formation at the Si−SiO2 interface. During the annealing
step, hydrogen gas is diffused to the interface, which binds the hydrogen atoms (H) to
the dangling Si atoms to form Si − H bonds, as shown in Figure 1.2. Si atoms come
from manufacturing imperfections as well. These Si −H bonds break down eventually
due to the high voltage stress over the gate oxide. The whole breakdown process can be
accelerated under high temperatures. Meanwhile, traps are generated when hydrogen
atoms are diffused into the gate oxide. These interface traps can capture charger carriers
in the channel, leading to the elevation of threshold voltage in absolute value (|Vth|),
6Fig. 1.3: Vth degradation and recovery of NBTI effect [3] [4].
and thereby weaken the transistor performance. As physical dimensions scale down in
low technology nodes, power supply voltage scales less aggressively for the purpose of
improving circuit performance. The unmatched scaling of power supply voltage creates
every high stress electrical field, which exacerbates NBTI degradation. Meanwhile,
highly dense transistor layout makes it more difficult to dissipate heat in local hot
islands on chips. The increased temperature also speeds up NBTI degradation.
In recent years, the NBTI recovery mechanism has attracted increasing atten-
tion. As the PMOS transistor is positively biased, when Vgs = Vdd, part of the trapped
chargers will be released from the gate dielectric. Thus, part of the threshold volt-
age degradation is recovered. Figure 1.3 shows both degrading and recovery phases of
transistor threshold voltage Vth during circuit dynamic operation [3], which is based
on the silicon data collected in [4]. Results in the figure show that ≥ 75% of the Vth
7Fig. 1.4: Vth degradation according to different cycle time considering NBTI effect [3].
caused by NBTI degradation can be recovered in a very short period of time during
the recovery stage when the PMOS transistor is positively biased. Aging degradation
analysis becomes further complicated considering this NBTI recovery phase, since dif-
ferent implementations of the actual circuit leads to different stress conditions for each
transistor. The actual stress probabilities of transistors with the same overall stress
time may deviate from each other significantly. Figure 1.4 shows the results from [3],
which compares the Vth degradation according to different cycle times. Here, different
cycle times assign different stress probabilities to the transistor, thus the overall stress
time can be changed.
8n+ n+
gate
Vs=GND
Gate dielectric
source drain
electron
hole
p-type
Velocity 
accelerating
Carrier 
injection
Fig. 1.5: The mechanism of hot-carrier injection (HCI).
1.1.2 Hot-Carrier Injection (HCI)
Unlike the NBTI effect, HCI has a prominent negative impact upon NMOS transistors.
HCI degrades NMOS performance during the switching stage of gate stress voltage. The
mechanism of HCI is shown in Figure 1.5. When the gate-source voltage elevates, the
channel is reversely biased and is partially turned on. Synchronously, if the drain-source
voltage is high enough, a sufficient number of conducting electrons are generated. Due
to the lateral electrical field between gate and source, many electrons accelerate and gain
enough kinetic energy to cross the gate oxide interface barrier and inject themselves into
the gate dielectric. Some electrons near the drain may cause impact ionization, which
is a secondary effect and leads to the further generation of electrons to inject into the
gate oxide region. The injected electrons generate traps in the dielectric and result in
the leveling up of the NMOS transistor’s threshold voltage (Vth).
Figure 1.6 shows the Vth degradation of a NMOS transistor in an inverter over
time due to HCI effect [5], where the CMOS inverter was operated at 1.2V and ≈
9Fig. 1.6: NMOS Vth degradation due to HCI effect in addition to the gate oxide degra-
dation for different stress inputs, as a function of stress time [5].
1GHz maximum frequency. The NMOS has an aspect ration W/L = 3µm/0.13µm.
In the experiment, four input signals were used to compare different HCI degradation
conditions, where DC0 indicates that the input is a DC signal at 0V ; DC0 indicates
that the input is a DC signal at 3.2V ; AC 500KHz indicates that the input is an AC
signal with a frequency of 500KHz; and AC 1KHz indicates that the input is a AC
signal with a frequency of 1KHz. The same experiments were repeated to measure the
PMOS threshold voltage degradation with the results shown in Figure 1.7.
From the results, we can see that when the AC signal is applied at the gate input,
the NMOS transistor shows an extra Vth degradation ∆Vth in addition to the gate oxide
degradation, which is presented by HCI during the signal transition. However, HCI
during signal transition has no effect on the PMOS transistor. A comparison of the
10
Fig. 1.7: PMOS Vth degradation due to gate oxide degradation without HCI effect for
different stress inputs, as a function of stress time [5].
results in Figures 1.6 and 1.7 clearly demonstrates that HCI is a unique aging effect on
NMOS transistors.
1.1.3 Process Variations
Figure 1.8 shows the major steps in the conventional CMOS fabrication process. Process
variations are introduced by imperfections in the manufacturing process. Due to the
limitations and inaccuracy of the tools used in each step, the conventional fabrication
process creates process variations in the manufactured devices. Considering all the
manufacturing steps, the major sources causing transistor process variations can be
characterized as:
• Random dopant fluctuation (RDF): RDF results from variability in the impuri-
11
Fi
el
d 
O
xi
da
ti
on
Ph
ot
or
es
ist
Co
ati
ng
Ex
po
se
d 
Ph
ot
or
es
ist
M
as
k-
W
af
er
Al
ig
nm
en
t &
 
Ex
po
su
re
Ph
ot
or
es
ist
De
ve
lo
pm
en
t
Po
ly
sil
ic
on
 
M
as
k 
&
 E
tc
h
Po
ly
sil
ic
on
 
De
po
siti
on
Ph
ot
or
es
ist
 
St
rip
Ga
te
 
O
xi
da
ti
on
O
xi
de
 E
tc
h
Io
n 
Im
pl
an
ta
ti
on
Ac
ti
ve
 
Re
gi
on
s
Co
nt
ac
t E
tc
h
N
itr
id
e 
De
po
siti
on
M
et
al
 
De
po
siti
on
 &
 
Et
ch
F
ig
.
1
.8
:
M
a
jo
r
fa
b
ri
ca
ti
on
st
ep
s
in
th
e
co
n
ve
n
ti
on
al
C
M
O
S
p
ro
ce
ss
fl
ow
.
12
ty concentration implanted in the transistor channel region, which significantly
alters the transistor’s properties, especially transistor threshold voltage. As tech-
nology shrinks down, RDF becomes much more severe. The main reason is that
compared with the decreased volume of dopants, the change of impurity atoms
weighs as a more significant factor, as it considerably alters transistor properties.
RDF is a severe variation source because even two adjacent transistors may have
significantly different dopant concentrations.
• Line edge roughness (LER): LER is the deviation of the dimensional scale of a
feature edge from the ideal smooth shape expected from design requirements.
LER is subject to the resolution limit of the imaging tool that was used to print
the feature. LER is recognized as a major issue in device manufacturing. For
example, local poly-gate patterning fluctuations result in parameter fluctuations
in the transistors.
• Gate dielectric variations: These are multiple variability sources in the high-k gate
dielectric including: gate oxide thickness variation, fixed charge variation, and the
variation of interface traps.
• Implant and anneal inaccuracy: These inaccuracies are the physical implant and
anneal processes associated with several transistor variations, including threshold
voltage fluctuation, channel resistance deviation, and so on.
• Channel process strain imperfection: Starting from the 90nm feature node and
below, channel process strain approaches are used to enhance the transistors.
13
Through channel strain, thin Si layers, which are grown on SiGe substrates, such
the thin Si layer can create more intense biaxial tensile stress in the channel.
Meanwhile, channel strain leads to some new random and systematic variations.
• Chemical mechanical polish (CMP) imperfection: CMP is a process of using chem-
ical and mechanical forces, etching and abrasive polishing, to smooth and flatten
surfaces. CMP can cause several gate dimensional variations.
We can see that the physical parameters affected include but are not limited
to: doping concentrations, oxide thickness, gate length and width, threshold voltage,
channel length, propagation delay, and power dissipation. Considering all the variations
introduced during the manufacturing process, process variations can be categorized
as inter-die and intra-die variations at the circuit level. Inter-die variations are the
variations across the dies. These process variations impact all transistors/gates on one
die in the same way. Intra-die variations are the variations within each die, and they
impact each transistor/gate on a single die differently.
In large technology nodes, such as 90nm and larger, inter-die variations are re-
garded as dominant over intra-die variations. However, in small technology nodes,
especially those below 45nm, intra-die variations are rapidly and steadily growing and
can significantly affect circuit performance as well. Thus, both inter- and intra-die vari-
ations impact the production yield, and it is important to properly handle the variations
for reliable integrated circuits designs and tests. The ITRS [6] report shows that the 3σ
intra-die variation of threshold voltage Vth and gate effective channel length Leff can
reach 42% and 12% separately in the 45nm technology node.
14
1.2 Performance Analysis and Compensation Methodologies
Performance analysis and compensation methodologies are required at different design
stages to create reliable integrated circuits. Circuit and system level reliability are the
final reliability goals of products that interest customers. The major task of current
reliable integrated circuit designs and tests is the proposition of effective methodologies
to handle: (1) spatial unreliability introduced by manufacturing process variations and
random defects, (2) temporal unreliability caused by aging effects, and (3) dynamic
unreliability due to different workload dependence and temperature variations.
Figure 1.9 shows the taxonomy of reliable integrated circuits design methods.
Reliable integrated circuits design methods of compensating performance degradation
can be broadly divided into two categories:
1. Pre-Silicon Design Methodologies
2. Post-Silicon Runtime Techniques
Pre-silicon design methodologies are further categorized into: (a) conservative
design, (b) gate sizing, and (c) resilient design. Post-silicon runtime techniques can
also be further categorized into: (a) sensor-based design, (b) adaptive body-bias, (c)
adaptive supply voltage, and (d) adaptive frequency.
In the following subsection, details of the design methodologies will be presented.
15
Re
lia
bl
e 
In
te
gr
at
ed
 
Ci
rc
ui
ts
 D
es
ig
n
Pr
e-
Si
lic
on
 D
es
ig
n 
M
et
ho
do
lo
gi
es
Po
st
-S
ili
co
n 
Ru
nti
m
e 
Te
ch
ni
qu
es
Co
ns
er
va
tiv
e 
De
sig
n
Tr
an
sis
to
r &
 G
at
e 
Re
sil
ie
nt
 D
es
ig
n
Se
ns
or
-B
as
ed
 
De
sig
n
Ad
ap
tiv
e 
Bo
dy
-
Bi
as
Ad
ap
tiv
e 
Su
pp
ly
 
Vo
lta
ge
Ad
ap
tiv
e 
Fr
eq
ue
nc
y
Si
zin
g F
ig
.
1
.9
:
R
el
ia
b
le
in
te
gr
at
ed
ci
rc
u
it
s
d
es
ig
n
ta
x
on
om
y.
16
1.2.1 Pre-Silicon Design Methodologies
Conservative Design
Conservative design is a popular design method used to introduce a sufficient timing
safety margin, known as guardbanding, to ensure that circuits function correctly in the
presence of potential timing violations from supply voltage fluctuation, temperature
perturbation, aging degradation, and impact from process variations. The usual options
that can create guardband are to:
1. Conservatively slow down the clock to meet the longest path delay
2. Boosting up the supply voltage to obtain faster operational circuit speed
3. Pessimistically fabricate the design using low-Vth manufacturing technologies
Figure 1.10 shows the exemplary waveforms of three conservative design method-
ologies, where Tclk is the clock cycle; tcp is the largest critical path delay; ∆t is the
delay perturbation caused by process variations, aging effects, supply voltage (Vdd) per-
turbation, and temperature fluctuation; Tm is the value by which the clock is slowed
down; ∆Vdd is the value by which the supply voltage is increased; Vth is the threshold
voltage; and ∆Vth is the value by which the threshold voltage is decreased. The figure
demonstrates that by these three conservative methodologies, we can ensure the design
specification is satisfied as:
tcp + ∆t ≤ Tclk (1.1)
where Tclk → Tclk + Tm for slowing down clock using method 1.
17
Tclk
tcp ∆t
Tclk+Tm
clock
Slow
clock
(a) Slow down clock to speed up circuit (clock cycle from Tclk to
Tclk + Tm)
Tclk
tcp ∆t
clock
Vdd
tcp ∆tVdd+∆Vdd
(b) Increase supply voltage Vdd to speed up circuit (supply voltage
from Vdd to Vdd + ∆Vdd)
Tclk
tcp ∆t
clock
Vth
tcp ∆tVth-∆Vth
(c) Lower threshold voltage Vth to speed up circuit (threshold
voltage from Vth to Vth −∆Vth)
Fig. 1.10: Conservative design methodologies.
18
By adding extra timing safety margins, circuit delay can be successfully guard-
banded within the clock cycle. However, conservative approaches are ineffective in
reducing area and power overhead. All three design methodologies, slowing down clock
frequency, leveling up supply voltage, and lowering Vth, increases the power consump-
tion. In addition, low-Vth technologies introduce complexity and difficulty to the man-
ufacturing process. There is a tradeoff between performance and area/power overhead;
however, the area and power penalty cannot be accepted in the contemporary compet-
itive market where power and performance are dominating factors.
Transistor and Gate Sizing
Transistor sizing is an effective design method for performance compensation. One
general way is to tweak the transistor channel length and width ratio (W/L) to improve
the gate drive current, so that the transistor drive current is proportional to W/L as:
Idrive ∝ W
L
(1.2)
From Equation 1.2, it is obvious that either increasing W or decreasing L can in-
crease drive current Idrive. In conventional manufacturing processes, shrinking channel
length L is preferred. The transistor sizing method is usually used to improve circuit
tolerance to process variations. Up-sizing on the transistor dimensions at the design
stage can effectively increase drive current Idrive to obtain performance compensation.
However, transistor sizing methods bring manufacturing burdens and are more appli-
cable to customerized designs. In addition, transistor sizing introduces area and power
overhead.
19
Gate sizing is a standard cell library-based sizing method for compensating perfor-
mance degradation. Instead of tweaking the transistor dimension, gates will be replaced
with their larger size counterpart gates in the standard cell library. Compared with a
replaced gate, the replacing gate has the same functionality but with a larger drive cur-
rent. Advantageous over transistor sizing, the gate sizing method is more flexible, and
can be performed after synthesis or after physical layout. Gate sizing can be conduct-
ed on the basis of extracting actual circuit conditions, which renders it more practical
to implementation. In addition, without modifying the cell dimensions in the library,
gate sizing is applicable to digital design and incurs no burdens during manufacturing.
There is a different definition of the gate sizing method, which modifies the PMOS and
NMOS aspect ratios W/L separately to obtain the same rising and falling delays and
the same transistor drive currents. However, in this thesis, we include such sizing in the
transistor sizing.
Resilient Design
Resilient designs are design schemes providing robust resilience to VLSI circuits that
mitigate the effects of performance perturbations on device and circuit parameters. In
this introduction, we define the cross-level (from the bottom process technology up to the
system level of the hierarchy) design methodologies as resilient designs. Resilient design
exploits potential opportunities for circuit robustness by facilitating the interaction of
hardware and software to compensate for pervasive performance loss.
One general method of resilient designs is using error control blocks and error
detection and correction (EDC) techniques. In error detection and correction, two-
20
bit errors can be detected and one-bit errors can be corrected. By inserting EDC
blocks into the circuit and implementing error correction coding (ECC) at the system-
or architecture-level, timing errors are detected and corrected when performance is
perturbed. Thus the proper circuit functionalities are maintained.
Other resilient design methodologies include voltage droop, which includes sup-
ply voltage (Vdd) droop, body bias voltage (Vsb) droop, and frequency droop. These
methodologies require a pre-analysis of the circuit topology to get the circuit timing
information, including gate driving capabilities, capacitive load, and fan-out/-in condi-
tions. For aging estimation, the worst or corner-case workload conditions also need to
be estimated. By collecting the timing information, a circuit can be divided into grids
according to their susceptibilities to performance impacts and to determine the supply
voltages, body biases, and even operational frequencies for each grid . Different from
the conservative designs discussed in Section 1.2.1, these resilient design methodologies
locally specify supply voltage, body bias, and operational frequency to each grid, while
a unique supply voltage, body bias, and operational frequency is globally assigned to
all gates or functional blocks in the circuitry of conservative designs.
1.2.2 Post-Silicon Runtime Techniques
Post-silicon runtime techniques exploit circuit performance deviation via on-chip sen-
sors and dynamically manipulate circuit calibrations. Sensors are inserted at the design
stage according to analysis of circuit topology and connection and an accurate anal-
ysis of the workload condition. Instead of using pre-determined supply voltage, body
bias, and operational frequency, post-silicon runtime techniques determine calibration
21
schemes in the field. Thus post-silicon runtime techniques are also called adaptive con-
trol methodologies. According to the circuit parameter adopted for circuit performance
calibration, adaptive methodologies can be categorized into:
• Adaptive supply voltage
• Adaptive body bias
• Adaptive frequency scaling
Using the accurate measurement from on-chip sensors and sophisticated supply
voltage tuning, body bias can effectively improve circuit performance and remain critical
path delay within design specifications. Tuning the frequency is different from tuning
body bias and supply voltage. The circuit frequency will scale down to ensure that
the circuit can safely operate, while the overall performance is sacrificed. However,
in many low power designs, scaling the operational frequency is preferred, because
power consumption is a more stringent constraint rather than frequency. Different from
the conservative and resilient designs in Section 1.2.1, adaptive control methodologies
essentially depend on sensor design and insertion, since the measurement resolution and
accuracy of the inserted sensors determine the effectiveness of the post-silicon runtime
calibration.
22
1.3 Research Focuses of This Thesis
1.3.1 Aging Analysis and Compensation
Circuit reliability analysis at the pre-silicon stage has become vital for sub-45nm technol-
ogy designs, in particular due to aging effects such as NBTI and HCI. To avoid potential
reliability hazards in the post-silicon stage, current large-scale designs for commercial
implementation over pessimistically analyze circuit aging under an assumed worst-case
workload in order to avoid violating the corner cases even in low possibilities, thus in-
troducing unnecessary margins in the design timing analysis. The major issue is a lack
of effective aging analysis method applicable to large designs with low CPU runtime.
This is mainly due to:
1. Conventional reliability tools are extremely time-consuming for circuit-level timing
analysis and thus are not practical for large designs.
2. Mathematical models developed to expedite the process are not accurate, due to
the complexity of aging effects.
Meanwhile, it is vital to efficiently identify the paths that age at a faster rate
than others in the field. Moreover, gates that have the most impact on the degradation
of these paths must be identified for compensation purposes.
1.3.2 Clock Skew Considering NBTI and Process Variations
NBTI has emerged as a major concern, not only to the functional circuits, but also
to the clock tree. Further aggravated by process variations, aging-induced reliability
23
issues become more challenging as technology scales further. Although NBTI impact
on functional circuits has been extensively studied and various solutions have been
proposed, its impact on the clock tree has not yet been sufficiently examined.
There is little work addressing the clock skew problem considering both the pro-
cess variations and aging. Clock skew optimization considering aging is usually con-
ducted separately, based on the assumption that the timing properties of the clock tree
are already optimized considering process variations. Consequently, margins based on
process variations and aging are added. This will over-pessimistically guardband the
clock signal and lead to excessive performance loss. Furthermore, most research at-
tempts to optimize the clock skew through INC or similar methodologies by balancing
the degradation rates of different clock paths. Therefore, the introduced area and power
consumption of the control circuitry must be carefully considered to avoid large over-
head. Meanwhile, it must be ensured that the inter-node-control (INC) schemes do not
expedite the degradation of the slow-degrading paths to make a worst path.
The development of effective solutions for reducing clock skew and compensating
aging effects under process variations remains as a challenge. This thesis focuses on
the clock skew degraded by NBTI and process variations. Its objective is to develop an
effective methodology to reduce clock skew considering NBTI and process variations.
1.3.3 Post-Silicon Monitoring and Calibration for Performance Evaluation
Transistor aging degrades circuit performance and can potentially lead to functional
failure in the field. This has become a major reliability concern, especially as technology
scales to 45nm and below. It is thus necessary to design on-chip structures that can
24
provide accurate aging evaluations without performance penalties.
Many existing dynamic solutions are based on on-chip structures that monitor the
aging of a stand-alone circuit (such as ring-oscillators). However, their major limitations
are that ring-oscillators do not well represent the functional circuits’ structure and
performance, and that replica circuits can only help monitor those selected paths while
leaving all the other paths unconsidered. Meanwhile, other solutions try to monitor the
aging of a functional circuit directly by measuring critical path delay. However, normal
execution of the core under test has to be interrupted during the test, and the memory
requirement on pre-stored patterns is not trivial.
1.3.4 Path-Delay Faults Pattern Generation Considering Process
Variations
Current ATPG techniques rely on timing analysis tools to identify critical paths for gen-
erating path-delay fault (PDF) test patterns. However, the model-based conventional
static timing analysis (STA) and statistical static timing analysis (SSTA) tools are not
capable of considering the actual silicon variations. STAs are not capable of consid-
ering timing variability caused by process variations, since all physical parameters are
evaluated at nominal values. Different from STA, which models gate or path delay as a
deterministic value, SSTA represents the delay as a probability distribution, represented
by a mean and a variance. However, using SSTA leads to an overpessimistic evaluation
of the delay deviation in all corner cases. Consequently, much performance is left on
the table due to over estimated margin values.
Path-delay fault (PDF) tests exercise the critical paths at-speed to detect whether
25
the path is slow because of manufacturing defects or variations. Path delay modeling and
PDF identification are difficult tasks due to process variations, and the corresponding
PDF patterns are, in turn, difficult to generate. This is because actual silicon variations
on dies are not identical to what was modeled at pre-silicon stage.
1.3.5 Path-Delay Fault Pattern Generation Considering Process
Variations and NBTI
Modern circuit’s performance is highly vulnerable to process variations and aging effect-
s. The induced timing uncertainties require an extremely large number of critical paths
to be targeted during manufacturing test as well as in the field for high-reliability appli-
cations. Testing all of these critical paths requires a considerable number of path-delay
patterns, and incurs a significant test cost. In this thesis, we propose a methodology
for identifying testable representative paths (TRPs) by carefully analyzing the circuit
topology. TRPs are identified as a very small portion of the critical paths considering
the gate overlap condition, delay degradation due to aging, and process variation per-
turbation among all the critical paths. The maximum mean delay and variance of the
TRPs closely follow the maximum mean delay and the maximum variance of all the
critical paths at different time points. Testing TRPs requires substantially fewer test
patterns, thus shorter test time.
1.4 Contributions of This Thesis
The following contributions related to reliable integrated circuit designs and tests have
been made by this thesis research.
26
1. A comprehensive analysis is provided to highlight the importance of each aging
parameter. An aging-aware path delay analysis flow (namely APD flow) is devel-
oped, which is applicable to large-scale industry designs with high accuracy and
computational efficiency.
2. A novel technique is developed to quantitatively evaluate the importance of the
gates on critical reliability paths based on their contribution to circuit degra-
dation. Gate-sizing methods used on this minimum number of important gates
can effectively compensate circuit aging for the expense of low area and power
overhead.
3. A novel methodology is developed to accurately evaluate aging on the chip, named
as representative critical reliability paths (RCRPs). RCRPs are synthesized as a
stand-alone circuit to represent the functional circuit aging in the field.
4. A novel silicon-variation-aware (SVA) PDF pattern generation flow is proposed.
In SVA, critical paths are ranked for their ability to detect PDFs under the impact
of actual silicon variations. SVA results in the generation of fewer critical paths,
hence fewer PDF patterns, to verify circuit performance.
5. A novel methodology is developed for identifying testable representative paths
(TRPs) for the path delay tests in order to reduce PDF patterns and save test
cost and time.
27
1.5 Thesis Outline
• Chapter 1: Introduction
• Chapter 2: Aging analysis and compensation
• Chapter 3: Representative critical reliability path
• Chapter 4: Clock skew reduction
• Chapter 5: Test-cost reduction considering process variations
• Chapter 6: Testable representative paths
• Chapter 7: Conclusions
Chapter 2
Aging Analysis and Compensation
2.1 Introduction
Circuit reliability and performance [38] [39] are crucial issues affected by various manu-
facturing variabilities and runtime perturbations. The aggressive scaling of technology
into ultra deep-submicron regime (≤45nm) excessively worsen the degradation of circuit
performance, due to several aging effects, which deviate the transistor parameters from
design specifications, and may eventually fail the circuit. NBTI and HCI are widely
recognized as the most significant and challenging concerns. NBTI [9] [10] [18] [25] oc-
curs when traps are accumulated on the boundary of Si−SiO2 interface, which usually
happens when POS is reversely biased, i.e., Vgs = −Vdd. NBTI increases the absolute
value of the POS threshold voltage |Vthp|, and results in the long time kickbacks such as
POS transistor driven current Id loss and gate slow-down. HCI [8] [15] [33] is another
serious reliability concern. Drain-source electrical field accelerates the charge carriers to
reach a high kinetic energy, which enables the avalanche of secondary carriers through
impact ionization. The process consequently creates traps at the gate dielectric/silicon
substrate interface. HCI was considered less dominant in the past. However as tech-
28
29
nology further scales down, HCI also degrades device parameters considerably, and its
impact is no longer negligible [8]. In view of the fact that aging-induced degradation is
an urgent threat to circuit reliability and performance, accurate timing analysis must
embody their impacts carefully.
Currently, there are three major commercially reliability analysis tools developed
to analyze the aging impact, namely, HSPICE MOSRA from Synopsys [34], RelXpert
from Cadence [35], and Elder from Mentor Graphics [36]. Their simulation process
can be divided into two phases: (i) pre-stress simulation phase, and (ii) post-stress
simulation phase. First, the transistors in the target circuit are stressed according to
circuit aging sensitive information (e.g., temperature, stress time, capacitive load, stress
signal probabilities); Second, the delay and degradation information are regressively
calculated by using the built-in stress models. However these low-level simulators are
extremely time-consuming if applied to circuits with more than thousands of transistors.
The major reason is that the intrinsic regressive calculation process is conducted by using
the embedded models when an aging analysis is performed to achieve high accuracy.
Thus, even though they automate the analysis process and are usually regarded as golden
standards for reliability analysis at transistor and gate level, they are not applicable
to large-scale circuits considering computational efficiency. It would be infeasible to
analyze large industry designs for its gate-by-gate simulation using Spice on each design.
To facilitate the reliability analysis process, different efforts and research have
been devoted to model development. For example, several transistor-level models [11]
[41] [16] [26] [32] were proposed in order to carry out a quick timing analysis with
30
considerable accuracy. These models are developed from the numerical solution of the
standard reaction diffusion (R-D) model [82] [84], which is generally used to interpret
most of the reliability mechanisms, especially NBTI and HCI. Recent research reveals
that the charge trapping-detrapping process [17] may be the cause of NBTI effect in-
stead. However, this theory has not been fully proven. Further, the overall behavior is
similar to R-D model. Thus, in our methodology, we conservatively adopt R-D mod-
el in further developing the mathematical model. On one hand, these transistor-level
models are still effective in low levels thus will be very time-consuming if applied to
analyze large circuits considering excessive number of transistors. On the other hand,
considering the circuit complexity, approximation-induced error reduces their accura-
cy within a short-range in the aging parameter space, which is multi-dimensional and
correlated. Here, we use the terminology short-/long-range to compare the Euclidian
distance between two aging conditions in the aging parameter space. Additionally, it
is indeed very critical to differentiate delay and degradation corresponding to different
input instances. Many mathematical models do not do such, since they [11] [41] [16]
[26] [32] formulate the delay and degradation as simple functions of one threshold volt-
age value and other aging-related parameters, by derivation and approximation from
transistor-level to circuit-level using different models. To obtain better accuracy, it is
required to differentiate the input instances. However, a comprehensive aging analysis
has to be conducted via time-consuming simulations using the reliability tools.
In summary, the models and tools are either inaccurate or extremely time-consuming
at circuit-level for long-range aging simulation or considering all the aging parameters
31
comprehensively. This is because NBTI and HCI are complex effects and tightly related
to several parameters including temperature, slew rate, stress time, signal stress prob-
ability, capacitive load, and switching activities [11]- [20]. These parameters are not
spatially and temporally uniform or uncorrelated, and vary significantly from gate to
gate, design to design, and time to time due to the circuit functionalities and topologies.
To obtain an accurate aging degradation, a comprehensive analysis of the impact from
all of the parameters directly on circuit is necessary to avoid a pessimistic estimation.
Since most models are empirical, the model derivation procedure limits the effectiveness
within a small range regarding Euclidian distance around the known initial condition
in the parameter space. As a result, models are inaccurate if any aging parameter is
neglected, or they lose accuracy if the aging analysis using the models over a long-range
even all the parameters are considered. For example, the authors in [31] described
the time-dependent threshold Vth as a function of geometrical (e.g. channel length L),
environmental (e.g. temperature T ) and process-related (e.g. threshold voltage Vth0)
parameters, but neglected the importance of other parameters, such as input slew rate
and output capacitive load. In [19] [20], the pull-up/-down network of each gate is
equalized as one PMOS/NMOS transistor. Their model inevitably light-weighed the
importance of off-path signal stress, which is attracting increasing attention recently for
its same importance as on-path signal stress. [21] [27] [28] highlighted the role of off-
path signal stress and proposed a signal probability (SP)-based method to characterize
the delay of each standard gate in the cell library. Their method lacks the coverage of
capacitive load information for aging analysis. The cone structure in the circuit leads to
32
dramatic difference in capacitive load for each gate due to fan-out/-in and connection,
while it is one of the dominant parameters for delay and aging analysis. The authors
in [3] investigated the impact of capacitive load, slew rate, and some other parameters,
but failed to differentiate the importance of on-/off-path signal stress.
Meanwhile, adaptive control schemes usually introduce excessive area and power
overhead, as they have to keep monitoring a large number of paths and gates for per-
formance evaluation. A gate sizing technique was proposed in [27] to identify critical
gates defined as the gates that age the most in the circuit. We shift our focus toward
the identification and quantification of critical-reliability gates (CRGs), defined as the
minimum number of “important” gates contributing to degradation of critical-reliability
paths (CRPs), i.e., paths that are sensitive to aging and could potentially become crit-
ical in the field. By identifying CRPs and CRGs, aging compensation can be achieved
with a minimal area and power overhead. As will be elaborated further in later sec-
tions, CRGs can be flexibly identified according to different workload scenarios, e.g.,
worst-case or actual workload conditions, which makes this technique easily compatible
with conventional industry design practices.
This chapter first comprehensively investigates the impact of different parame-
ters on critical path aging. Critical paths extracted from several benchmark circuits
are simulated to demonstrate different aging analysis considering all the aging related
parameters: input workload, temperature, input slew rate, on-/off-path signal stress
probability, stress time and capacitive load. On the basis of these analysis, we pro-
pose an aging-aware path delay analysis flow (namely APD flow), which is applicable
33
to large-scale industry designs with high accuracy and computational efficiency. The
specific contributions include:
1. A comprehensive analysis of the two major aging effects (NBTI and HCI). Besides
analysis of the well-studied parameters (temperature, on-path stress probability,
and stress time), we further investigate (i) the impact of various workloads on
circuit primary inputs and (ii) the degradation of output slew rate due to aging-
effects according to different input slew rate. and (iii) the impact of gate overlap
on critical-reliability path degradation.
2. A fast and accurate aging-aware path delay (APD) analysis flow. It serves as a
generic platform to incorporate NBTI and HCI into conventional industry static
timing analysis (STA) flow and predict the degradation of circuit performance
under various operating conditions. In the APD flow, analytical models are de-
veloped to approximate the path/gate delay and degradation based on extracted
aging-sensitive information by referring to a pre-generated look-up table (LUT).
The LUT is a database storing the delay and degradation information correspond-
ing to different aging conditions, each of which incorporates all the aging related
parameters. APD facilitates both short-range and long-range aging prediction.
Consequently, our flow is scalable, i.e., it is capable of handling large-scale indus-
try circuits ensuring high accuracy. It is also applicable to worst-case or actual
workload degradation analysis.
Taking advantage of the accurate aging analysis using APD flow, we further make
the following contributions:
34
1. We develop an efficient aging-aware timing analysis flow to identify the CRPs in
the circuit given different workloads. Rather than analyzing critical path (CP)
set, which is extremely large in modern designs, we focus on selection of CRPs
from CPs (CRP set is substantially smaller than CP set) to considerably reduce
the computational complexity of the proposed CRG selection flow.
2. We develop a technique to quantitatively evaluate the importance of the CRGs
based on their contribution to CRPs’ degradation. Here, we develop a new metric
to evaluate a gate’s importance to circuit CRPs degradation making our proposed
technique different from the previous ones that attempt to identify the gates with
maximum aging. Our results demonstrate that the most essential gates in the
circuit are not necessarily the gates that have aged the most.
3. Lastly, we develop a novel algorithm to identify the minimum number of CRGs
in the circuit for sizing to ensure that performance degradation of CRPs will not
cause any failure in the field.
The rest of the chapter is organized as follows. Section 2.2 presents the analysis
of critical-reliability paths. In Sections 2.3, the aging-aware models are analyzed. Sec-
tion 2.4 presents the details of APD flow. The details of CRP and CRG identification is
presented in Section 2.5. Section 2.6 presents all the results. This chapter is summarized
in Section 2.7.
35
2.2 Critical-Reliability Paths Analysis
In this section, the impact from different parameters on reliability at circuit-level are rig-
orously investigated in simulations to elaborate their importance to aging, namely input
workload, temperature, input slew rate, on-/off-path signal stress probability, switching
activity, stress time and output capacitive load. For each simulation, the circuit un-
der analysis is processed through several steps by using commercial ASIC design tools:
synthesis, DFT insertion, capacitance extraction, and critical paths extraction. These
steps are carefully conducted to assess the circuit aging condition based on the circuit
connection and topology. The details about each step will be presented in Section 2.4.
Although our experimental results are collected from Synopsys HSPICE MOSRA simu-
lation, all other reliability analysis tools [35] [36] can be used in the proposed flow. For
each simulation, we only need to specify the input workload for RTL simulation. Once
it is specified, the corresponding stress probability can be extracted using our in-house
procedure. Other circuit information can actually be extracted from the netlist, such
as capacitive load. The extracted critical paths are first stressed under such extracted
circuit conditions considering workload, capacitive load, stress probabilities, switching
activity, temperature, and stress time to obtain the aged transistor information. Then
the gates in the paths under analysis are updated with this information for delay and
degradation calculation. This is conducted with the slew rate propagating through the
path. Note that the stress condition for generation of transistors aging information
is extracted under a specified workload condition of the paths (and each gate) in the
circuit. This step of extracting sensitive information is important for reconstructing the
36
actual stress condition for each gate, and is a necessity for revealing the exact aging
degradation.
Simulations are performed to evaluate the role of off-path stress probability in ag-
ing of transistors. Circuit switching activity information is collected using our in-house
VCS-VPI procedure. Based on the switching activity information, equivalent rectangu-
lar signals [11] are generated to reconstruct both on- and off-path stress signals. Also,
other aging-related information is collected using different tools and used to retrieve the
exact aging condition for each gate on the paths under analysis.
Simulation results on benchmark circuit s9234 are presented to demonstrate that
the off-path signal probabilities can change the “criticality” of a path. Top ten critical
paths extracted from the benchmark circuit is simulated under six different scenarios:
• NBTI only with/without off-path signal stress;
• HCI only with/without off-path signal stress;
• NBTI + HCI with/without off-path signal stress.
The results are shown in Figure 2.1, where the numbers in black listed vertically
in right indicate the “path rank” at different stress time point from the longest to the
shortest, where t0 = 0 s, t1 = 1.5E + 8 s, and t2 = 3.0E + 8 s. Figure 2.1 essentially
indicates that different paths age at different rate during the circuit’s lifetime operation.
In addition, off-path signal stress changes the rank of critical paths dramatically, i.e.,
one path initially less (more) critical may become more (less) critical compared with
other paths when considering off-path signal stress. As shown in Figure 2.1(d), where
only NBTI is considered without off-path signal probability (SP), the paths are ranked
37
0 0.5 1 1.5 2 2.5 3
3.8
3.9
4
4.1
4.2
4.3
4.4
4.5
4.6 x 10
−9
Aging Time (10 8 s)
D
el
ay
 (s
)
 
 
Path 1
Path 2
Path 3
Path 4
Path 5
Path 6
Path 7
Path 8
Path 9
Path10
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
 1
10
 5
 3
 4
 2
 9
 8
 7
 6
 1
10
 3
 4
 5
 2
 7
 8
 9
 6
Pa
th
 R
an
k
t0 t1 t2
(a) NBTI+HCI with off-path SP
0 0.5 1 1.5 2 2.5 33.8
3.9
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7 x 10
−9
Aging Time (10 8 s)
D
el
ay
 (s
)
 
 
Path 1
Path 2
Path 3
Path 4
Path 5
Path 6
Path 7
Path 8
Path 9
Path10
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
 1
 5
 9
 8
 3
 4
10
 6
 2
 7
 9 
 5 
 1 
 8 
 3 
 4 
10 
 6 
 2 
 7 
Pa
th
 R
an
k
t0 t1 t2
(b) NBTI+HCI without off-path SP
0 0.5 1 1.5 2 2.5 3
3.8
3.9
4
4.1
4.2
4.3
4.4 x 10
−9
Aging Time (10   s)8
D
el
ay
 (s
)
 
 
Path 1
Path 2
Path 3
Path 4
Path 5
Path 6
Path 7
Path 8
Path 9
Path10
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
 1
 5
10
 3
 4
 2
 9
 8
 7
 6
 1
10
 5
 3
 4
 2
 8
 9
 7
 6
Pa
th
 R
an
k
t0 t1 t2
(c) NBTI with off-path SP
0 0.5 1 1.5 2 2.5 3
3.8
3.9
4
4.1
4.2
4.3
4.4
4.5
4.6 x 10
−9
Aging Time (10 8 s)
D
el
ay
 (s
)
 
 
Path 1
Path 2
Path 3
Path 4
Path 5
Path 6
Path 7
Path 8
Path 9
Path10
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
 9
 5
 1
 8
 3
 4
10
 6
 2
 7
 9
 5
 1
 8
 3
 4
10
 6
 2
 7
Pa
th
 R
an
k
t0 t1 t2
(d) NBTI without off-path SP
0 0.5 1 1.5 2 2.5 3
3.8
3.85
3.9
3.95
4
4.05
4.1
4.15 x 10
−9
Aging Time (10  s)8
D
el
ay
 (s
)
 
 
Path 1
Path 2
Path 3
Path 4
Path 5
Path 6
Path 7
Path 8
Path 9
Path10
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
 1
 5
 9
 3
10
 4
 8
 2
 6
 7
 1
 5
 9
 3
 4
10
 8
 2
 6
 7
Pa
th
 R
an
k
t0 t1 t2
(e) HCI with off-path SP
0 0.5 1 1.5 2 2.5 33.8
3.85
3.9
3.95
4
4.05
4.1
4.15x 10
−9
Aging Time (10  s)8
D
el
ay
 (s
)
 
 
Path 1
Path 2
Path 3
Path 4
Path 5
Path 6
Path 7
Path 8
Path 9
Path10
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
 9
 5
 1
 8
 3
10
 4
 6
 2
 7
Pa
th
 R
an
k
t0 t1 t2
(f) HCI without off-path SP
Fig. 2.1: Aging analysis considering off-path SP, on-path SP (all SPs are obtained
from Monte Carlo simulation by using our VCS-VPI tool (discussed later in
Figure 2.3)), Temperature=125oC.
38
as (9, 5, 1, 8, 3, 10, 4, 6, 2, 7) at time 0. In other words, path 9 initially has the largest
delay, while path 7 has the smallest delay. Because of NBTI effect, at time 1.5E + 8
seconds, the paths are re-ranked as (9, 5, 1, 8, 3, 4, 10, 6, 2, 7). When stress time increases
to 3.0E + 8 seconds, the paths ranking is (9, 5, 1, 8, 3, 4, 10, 6, 2, 7). However when we
consider off-path SPs (Figure 2.1(c)), the rank becomes (1, 5, 10, 3, 4, 2, 9, 8, 7, 6) at time
1.5E+8 seconds and (1, 10, 5, 3, 4, 2, 8, 9, 7, 6) at time 3.0E+8 seconds, respectively. As
seen, Path 9 has become less critical due to impact of off-path SP. Such path crossover
phenomena also happens for HCI alone (the crossover may not happen as frequently
as under NBTI and NBTI + HCI) and both HCI and NBTI aging effects. Similar
argument can be made for Path 10 when considering NBTI + HCI with/without off-
path SP (Figures 2.1(a) and 2.1(b)). Path 10 becomes more critical when off-path SPs
are considered. In Figures 2.1(a), 2.1(c), and 2.1(e), we can see that both NBTI and
HCI effects individually cause critical paths crossover. However, NBTI + HCI makes
the crossover much more frequent. In addition, HCI effect cannot be simply thought of
as a shift to the NBTI effect when considering them together. As a summary, our results
for this benchmark show that without off-path SPs, the path delay would experience a
maximum of 7% inaccuracy.
Second, the benchmark circuit ISCAS’89 s38417 is repeatedly simulated with
three different workloads (WL) (0.25, 0.50, 0.75). The results are displayed in Fig-
ure 2.2. Here, WL = 0.25 means that a large number of random functional patterns
are generated with a probability of 25% for 0′s and 75% for 1′s. 0.5 and 0.75 workloads
are generated similarly. As we can see:
39
0 0.6 1.2 1.8 2.4 3
x 10 8
3.8
4
4.2
4.4
4.6
x 10 −9
Stress Time (s)
De
lay
 (s)
Red:     workload = 0.25
Blue:    workload = 0.50
Green: workload = 0.75
Min: 4.0451E-09
Path 11, WL=0.25
Max: 4.3739E-09
Path 4, WL=0.75
Max: 4.5803E-09
Path 10, WL=0.50
Min: 4.2356E-09
Path 11, WL=0.75
Fig. 2.2: Analyzing aging effects on paths delay considering different WLs.
1. At t = 0.6E + 8, the longest path is Path 4 with WL = 0.75, while the shortest
path is Path 4 with WL = 0.25;
2. At t = 2.4E + 8, the longest path is Path 10 with WL = 0.5, while the shortest
path is Path 11 with WL = 0.75.
The results clearly demonstrate that aging effects can complicate timing analy-
sis considering workload conditions. The results also indicate the incapability of just
using a single mathematical model to perform thorough aging-aware timing analysis
for large-scale industry circuits. The major reason is that the “criticality” of paths
change dramatically under different circuit workload conditions. In practice, power sav-
ing mechanisms (clock gating, power switching, and standby mode) further complicate
the paths workload condition and change the degradation rate of each path.
In summary, this study demonstrates that path delay analysis considering all
parameters is a complex task. Aging effects make the timing analysis more difficult and
40
that timing analysis for time 0 alone is simply inadequate, i.e., static timing analysis
based on conventional timing analysis tools is no longer accurate considering workload,
off-path SPs, slew rate and all other aging related circuit parameters. Path-based aging
simulation will be comprehensive only if a large number of critical paths are considered.
However, existing circuit-level reliability analysis tools are extremely time consuming.
For instance, to analyze a short critical path from a small circuit, HSPICE MOSRA takes
about 7 minutes to obtain the 10 years degradation results. It is practically impossible
to rely on conventional reliability analysis tools for large-scale industry designs which
require both high accuracy and low CPU runtime. Finally, knowing the rank of paths
at different time points would help designers perform a more efficient guardbanding.
2.3 Aging-Aware Models and Analysis
In this section, gate-level mathematical models are further enhanced to facilitate the
prediction of degradation considering complex conditions. It also enables our practical
critical-reliability paths analysis flow (details in Section 2.4 ), which is compatible to
conventional industry design flow.
2.3.1 Gate Delay Models Considering NBTI and HCI
Reaction-Diffusion model (R-D) [40] is widely used to calculate the threshold voltage
shift due to NBTI and HCI effects. The drift in Vth by NBTI is formulated as [41]:
∆Vth NBTI = ANBTI0 × tox ×
√
Cox(Vdd − Vth0)
×e(
Vdd−Vth0
toxE0
) × (e−Eak ) 1T × t0.25stress (2.1)
41
where tstress is the effective stress time, tox is the oxide thickness, and Cox is the gate
capacitance per unit area. Eo, Ea and k are technology-dependent constants. ANBTI0 is
a constant that depends on the aging rate. If we use t to represents the whole functional
time and pt (p is the probability that the transistor is under stress, namely pt = tstress)
to represent the effective stress time, the equation can be reformulated as a function of
signal probability p:
∆Vth NBTI = A1 ×B
1
T
1 × (pt)0.25 (2.2)
where:
A1 = ANBTI0 × tox ×
√
Cox(Vdd − Vth0)
×e(
Vdd−Vth0
toxE0
)
B1 = e
−Ea
k
Similarly, R-D model formulates the voltage threshold increase by HCI as [56]:
∆Vth HCI = AHCI0 × α× f × e(
Vdd−Vth0
toxE1
) × t0.5 (2.3)
where t is time, and α and f are the activity factor and the frequency, respectively. tox is
the oxide thickness, and E1 is a technology-dependent constant. AHCI is a constant that
depends on the aging rate. Here, α× f actually represents the effective switching count
n (= α× f) in one clock cycle. HCI induced aging also depends on temperature T , but
there is no analytical closed-form between ∆Vth HCI and temperature. Experimental
results from [42] show that Vth HCI has a piecewise linear relationship with T . Thus,
we can reformulate Equation (2.3) as:
∆Vth HCI = A2 × T × n× t0.5 (2.4)
42
where:
A2 =
AHCI0 × α× f × e(
Vdd−Vth0
toxE1
)
T
Alpha-power law [85] provides the analytical model between Vth and delay at
gate-level as:
Di =
CiVdd
βi[Vdd − (Vth0 + ∆Vthi)]γ (2.5)
where Ci is the effective capacitive load of i-th gate in a path, βi is a parameter depend-
ing on the gate size, and Vth0 is the initial threshold voltage without aging. By using
the approximation [86]:
1/(1− x)α ≈ 1 + α · x (when |x|  1) (2.6)
Equation (2.5) is further derived to be:
Di = A0 × (1 +B0∆Vthi) (2.7)
where:
A0 =
CiVdd
βi(Vdd − Vth0)γ
B0 =
γ
Vdd − Vth0
Now, if we substitute Equation (2.2) and Equation (2.4) into Equation (2.7)
individually, we can get the delay models for NBTI-induced gate delay and HCI-induced
gate delay as:
Di NBTI = A0 × (1 +B0 ·A1 ·B
1
T
1 · (pt)0.25) (2.8)
Di HCI = A0 × (1 +B0 ·A2 · T · n · t0.5) (2.9)
43
The above equations can be further simplified to be:
Di NBTI = ANBTI × (1 +B
1
T
NBTI · (pt)0.25) (2.10)
Di HCI = AHCI × (1 +BHCI · T · n · t0.5) (2.11)
where:
ANBTI = AHCI = A0 =
CiVdd
βi(Vdd − Vth0)γ
BNBTI = B
T
0 ·B1 ·AT1
= (
γ
Vdd − Vth0 )
T × e−Eak × (ANBTI0 × tox ×
√
Cox(Vdd − Vth0))T
BHCI = B0 ·A2
=
γ
Vdd − Vth0 ×
AHCI0 × α× f × e(
Vdd−Vth0
toxE1
)
T
The above analysis enables the delay approximation regarding several parameters
considering NBTI and HCI. The authors in [43] proposed a general form to approximate
gate delay based on two additional parameters: input slew rate τ and capacitive load
Ci. The functional form is a two-term posynomial equation:
Di =
2∑
k=1
Skτ
mk
i C
nk
i (2.12)
where Sj ≥ 0 depends on gate size, mj and nj are constants and do not depend on gate
size.
These proposed gate-level delay models enable the approximation of gate delay
considering different parameter(s) from a known aging condition. The approximation
may introduce errors, but they are virtually negligible as seen from our experimental
results shown in the Section 2.6.
44
2.3.2 Library Characterization Considering Aging
These short-range models, presented in Section 2.3.1, are not necessarily efficient for
long-range Euclidian distance in the multi-dimensional parameter space. Based on these
models, an unknown aging condition in the parameter space can be assessed from a cur-
rently known condition accurately if the two conditions are very close. Therefore, a
pre-generated look-up table (LUT) can facilitate long-range aging-degradation predic-
tion corresponding to complex conditions, e.g., multiple stress probabilities. The LUT
based on a standard technology library is generated according to discrete values of the
aging related parameters including temperature, signal probability (for NBTI), switch-
ing probability (for HCI), slew rate, capacitive load, and stress time using HSPICE
MOSRA (other reliability tools can also be used). The delay and output slew rate are
measured and stored to compose the LUT, which enables the future approximation of
gate delay and degradation according to any of the mentioned parameters rather than
repeating HSPICE MOSRA simulation. We generate the LUT, containing new aging-
aware library cells, strictly differentiating the input instances. When different inputs
are modeled, we make sure that their stress probabilities are consistent with the actual
circuit (other aging-related parameters are treated in a similar fashion). Different in-
put instance are simulated individually, and results are stored in the LUT. Comparing
with regressive calculation embedded in HSPICE MOSRA, we take advantage of the
discussed mathematical models and use the stored LUT entries as intermediate aging
conditions to effectively approximate any aging information on an unknown condition
from a close and explicit condition which is also stored in the LUT. This procedure
45
will save significant computation time, while the accuracy remains high as shown in
our results in Section 2.6. Meanwhile, our flow (in Section 2.4) offers the flexibility
to trade-off the accuracy and computational complexity. For example, based on the
accuracy requirement, smaller granularities for the aging parameters when generating
the LUT will introduce more accuracy but sacrifice more computation time. On the
contrary, larger granularitie for the parameters will highly improve the speed but the
accuracy will decrease.
Equation (2.10) and Equation (2.11) describe the relationship between gate delay
and temperature T , stress time t, stress probability, and switching probability. Consid-
ering the impact from NBTI and HCI, ANBTI , BNBTI , AHCI , and BHCI are no longer
constants for time t’s. Meanwhile, at a specific time point, when the aging condition is
known, we can regard them as constants in a close range, which enables interpolation
and extrapolation for delay approximation based on T, t, p and n. Note that for HCI,
the approximation of the parameters T , n, and t can be similarly realized by using the
LUT and Equation (2.11). As clarified in [42], Vth HCI has a piecewise linear relation-
ship with T . We then treat BHCI as a constant when it is updated to a new condition.
This simplification does not negatively impact accuracy as shown in Section 2.6.
To realize the approximation of path delay according to input slew rate τ and
capacitive load C, the whole procedure is divided into two stages.
(1) Pre-calculation for mj and nj where j = 1, 2: By using the LUT information,
we can obtain six delay values based on six combinations of (τ, C) for each gate listed
46
as:
Di =
2∑
k=1
Skτ
mk
i C
nk
i (i = 1, ..., 6) (2.13)
By solving the equation group, we can get Sj , mj , and nj (j = 1, 2). We only use mj
and nj (j = 1, 2) to update the Equation (2.12), because Sj(j = 1, 2) will not remain
as constant when considering aging effects. This stage needs to be done only once for
each gate in the library.
(2) Delay approximation based on (τ, C): Here, we use a simple example to explain
the delay approximation procedure according to τ or C. We assume a two-input gate un-
der NBTI aging and the aging condition (T, τ, t, C, p1, p2) is known (where T is temper-
ature, t is stress time, C is capacitive load, p1 and p2 are the two signal probabilities for
Pin1 and Pin2 , respectively). We also assume that conditions (T, τ1 = SL1, t, C, p1, p2)
and (T, τ2 = SL2, t, C, p1, p2) are two entries already generated and stored in the library,
which are the closest conditions to (T, τ, t, C, p1, p2), satisfying SL1 ≤ SL0 ≤ SL2.
D1 = f(T, SL1, t, C, p1, p2) and D2 = f(T, SL2, t, C, p1, p2) are the estimated delays
accordingly. Because (T, t, C, p1, p2) are fixed and Sk(k = 1, 2) are gate-dependent, we
regard that, at this condition (T, t, C, p1, p2), Sk(k = 1, 2) are constants. If we substi-
tute D1 when (T, SL1, t, C, p1, p2) and D2 when (T, SL2, t, C, p1, p2) back into Equation
(2.12), we can get the values for Sj because mk, nk (k = 1, 2) are already solved. Thus,
we can calculate the delay for τ = SL0 with all the known values Sk,mk, nk (k = 1, 2)
using Equation (2.12) under temperature T , probabilities (p1, p2), stress time t and
capacitive load C. Similarly, delay approximation for C can be done.
47
2.3.3 Output Slew Rate Function Forms
The above models realize the gate delay approximation without need to conduct H-
SPICE MOSRA simulation. Here, we reiterate the importance of the slew rate propa-
gation when estimating the paths delay. As discussed in above sections, slew rate must
be propagated to achieve an accurate path degradation. Since there is no appropri-
ate analytical models for output slew rate prediction, we use the delay aging ratios to
approximate the output slew rate. Suppose we have one parameter q = q0 and use
the LUT to do the approximation. q can be any one of temperature T , stress time t,
probability p, input slew rate τ or capacitive load C. Assume that in the LUT, there
are only two entries q = q1 and q = q2 with the delay values (D1, D2), and output slew
rates (S1, S2) close to the condition under analysis. If the approximation finalizes the
delay value D0 for q = q0, the two delay aging ratios are then calculated as:
λ1 = D0/D1 (2.14)
λ2 = D0/D2 (2.15)
Thus, we calculate the output slew rate using:
S0 =
S1 · λ1 + S2 · λ2
2
(2.16)
For multiple parameters, the output slew rate is repeatedly approximated when
a new delay approximation is conducted and a new ratio can be calculated.
Obtaining gate delay and output slew rate approximation, we can treat a path
with q gates as a connection of gate primitives. By propagating the slew rate forward,
the j-th path delay Γj (which has q gates connected in sequence) under a specific
48
condition can be approximated as:
Γj =
q∑
i=0
Di(T, τi, t, Ci, pi,1, ..., pi,m) , (2.17)
Note that, for each gate, its input slew rate is propagated from its previous gate.
The slew rate for the first gate in the path can be dynamically controlled according to
delay measurement specification, which, once designated, will be propagated forward.
2.3.4 Accuracy and Efficiency Analysis
Considering the derivation above, the following part mainly interpret the accuracy anal-
ysis. Assuming that only NBTI effect is considered, we further assume that stress
probability p and stress time t are unknown, while all the other aging parameters are
already explicit: T = 75oC, τ = 70ps, C = 1.5ff . Thus, in Equation 2.10, ANBTI and
B
1
T
NBTI become two constants. As from HSPICE simulations conducted on an Inverter
(type : INV X1) from Nangate standard cell library, we can get the aged delays and
output slew rates corresponding to different aging conditions, listed in Table 2.1.
We now just assume that only aged delays corresponding to Case 1 and Case 5
are known explicitly, and we will take advantage of Equation (2.10) to derive the aged
delay corresponding to Case 3. By substituting aging conditions in Case 1 and Case 5
to Equation (2.10), we can get:
(ANBTI)1 = 6.3178E − 11 (2.18)
(B
1
T
NBTI)1 = 8.1392E − 04 (2.19)
By substituting these two constant values and the aging condition for Case 3 back
into Equation (2.10), the corresponding aged delay for aging condition in Case 3 can be
49
T
a
b
le
2
.1
:
A
cc
u
ra
cy
an
d
effi
ci
en
cy
an
al
y
si
s
fo
r
m
at
h
em
at
ic
al
m
o
d
el
s
R
is
e
D
el
ay
S
im
.
D
el
ay
(s
)
ou
tp
u
t
sl
ew
ra
te
(p
s)
p
(%
)
t
(s
)
O
th
er
p
ar
am
et
er
s
C
a
se
1
D
i1
=
6.
32
29
E
−
01
1
S
i1
=
95
.5
5
0
0
C
a
se
2
D
i2
=
6.
75
45
E
−
01
1
S
i2
=
98
.8
02
25
1
.2
×
10
8
K
n
ow
n
C
a
se
3
D
i3
=
6.
84
42
E
−
01
1
S
i3
=
99
.4
55
50
1
.8
×
10
8
&
C
a
se
4
D
i4
=
6.
92
86
E
−
01
1
S
i4
=
10
0
75
2
.4
×
10
8
S
a
m
e
C
a
se
5
D
i5
=
6.
99
45
E
−
01
1
S
i5
=
10
0.
51
10
0
3
.0
×
10
8
50
calculated as:
(Di NBTI)1 = 6.8186E − 11 (s) (2.20)
If we take advantage of equations from Equation 2.14 to Equation 2.16, the output
slew rate based on aging condition Case 3 can be calculate as:
(S0)1 = 100.5116 (ps) (2.21)
Further, we make another assumption that only aged delays corresponding to
Case 2 and Case 4 are explicit. Compared with Case 1 and Case 5, Case 2 and Case
4 have shorter Euclidian distance from Case 3 in the parameter space. By repeating
the above process, the aged delay and output slew rate corresponding to Case 3 can be
calculated as:
(ANBTI)2 = 6.4464E − 11 (2.22)
(B
1
T
NBTI)2 = 6.4578E − 04 (2.23)
(Di NBTI)2 = 6.8519E − 11 (s) (2.24)
(S0)2 = 99.5599 (ps) (2.25)
Comparing the aged delays and output slew rates from the above two scenarios
with the HSPICE simulation result as in row 4 in Table 2.1, the errors are concluded
as in Table 2.2.
From Table 2.2, it is obvious that the calculation error for aged delay (output slew
rate) will be decreased from 0.3754% (1.0622%) to 0.1121% (0.1055%) from Scenario 1
to Scenario 2, among which the previous two conditions are further from the aging con-
dition Case 3 under interest than the later two conditions. The accuracy improvements
51
Table 2.2: Error and accuracy comparison
—
Errors
Accuracy
Scenario 1 Scenario 2 Improvement
Delay 0.3754% 0.1121% 3.35X (↑)
Output slew rate 1.0622% 0.1055% 10.07X (↑)
are 3.35X and 10.07X separately. Thus, we can conclude that more intermediate ag-
ing conditions available will lead to less calculation errors when using the mathematical
models to derive the aged delay and output slew rate directly at circuit level. The above
example is just based on the calculation of delay and output slew rate for a single gate.
If the mathematical models are adopted to derive the path delay which has multiple
gates on it, the calculation accuracy using the mathematical models will heavily depend
on the available intermediate aging conditions. One cause is that the slew rate has to
be propagated for accuracy consideration, which meanwhile accumulates the calcula-
tion errors. As a summary, path delay calculation derived solely from mathematical
models will undoubtedly incur severe errors compared with HSPICE results. However,
our experimental results (detailed in Section 2.6) confirms that HSPICE MOSRA for
aging analysis at circuit level is very time-consuming. Thus, there is a tradeoff between
accuracy and efficiency comparing the two aging analysis methods by using mathemat-
ical models and reliability analysis tools. Considering large-scale industry designs, each
of them will be insufficient for actual implementations, since both of the accuracy and
efficiency are necessaries.
52
2.4 APD Analysis Flow
In the above section, we theoretically demonstrated the relationship between path delay,
degradation and circuit parameters. However, compared with reliability analysis tools,
these models are effective while lacking accuracy. Reliability tools simulate the delay
and degradation through a regressive process by taking advantage of their intrinsic
mathematical models from transistor level to circuit level. The highly time-consuming
process in turn provides very extraordinary accuracy. To trade off the accuracy and
CPU runtime for degradation analysis, and further enable a novel method applicable
to large-scale industry design, we propose our aging-aware path delay (APD) analysis
flow as shown in Figure 2.3, which includes four steps, each of which can be realized by
using conventional ASIC design tools or our in-house programs. Our APD flow is fast
and accurate compared to reliability tools, e.g., HSPICE MOSRA, making it scalable
to implement on modern industry designs.
• Step 1. Critical Paths Selection: In this step, a commercial Static Timing
Analysis (STA) tool is used to extract a large number of critical paths for aging analysis.
As studied in [45] [46] [47], a critical path could degrade as much as 20% over a period
of ten years of aging. Accordingly, the static timing analysis selects a large number of
paths based on the following criteria: a path is selected as a potential critical path, if
its delay Pi satisfies Tp0i × 120% ≥ Tclk − Tm, where Tp0i is the path delay at time zero,
Tclk is the clock period and Tm is a safe margin (i.e., guardbanding) added to the most
critical path at time zero. The cluster of all potential critical paths is defined as the
potential critical path (PCP) set, in which they initially may not be critical at time
53
C
rit
ic
al
 P
at
hs
Se
le
ct
io
n
Lo
ok
-u
p-
T
ab
le
G
en
er
at
io
n
S
ig
na
l P
ro
ba
bi
lit
y 
P
ro
fi
le
 G
en
er
at
io
n
A
gi
ng
 D
eg
ra
da
ti
on
(p
at
h 
de
la
y 
an
al
ys
is
)
PT
(2
0%
 sl
ac
k)
C
rit
ic
al
 P
at
hs
Ve
rif
y 
wh
et
he
r p
at
hs
 a
re
 
fu
nc
tio
na
lly
 va
lid
 &
 te
st
ab
le
C
rit
ic
al
 P
at
hs
N
et
lis
t
M
C
 S
im
ul
at
io
n
us
in
g 
VC
S
-V
PI
Li
br
ar
y 
C
el
ls
H
SP
IC
E 
MO
SR
A 
Si
m
ul
at
io
n
for
 e
ac
h 
ce
ll
co
ns
ide
rin
g L
oa
d, 
SP
, T
em
p..
....
Lo
ok
-u
p-
Ta
bl
e
St
ep
 1
S
te
p 
3
S
te
p 
2
D
eg
ra
da
tio
n
C
al
cu
la
tio
n
S
te
p 
4
Si
gn
al
 P
ro
ba
bi
lit
y
( S
P)
 P
ro
fil
e
F
ig
.
2
.3
:
A
P
D
fl
ow
in
cl
u
d
in
g
fo
u
r
m
a
jo
r
st
ep
s.
54
zero, but may become critical during their lifetime of operation in the field. A path
delay pattern generation procedure is run to identify the testable critical paths, which
further reduces the total number of paths for timing analysis.
• Step 2. Signal Probability Profile Generation: A novel in-house VCS-VPI
procedure has been developed on the platform of Synopsys’ Verilog Simulator, VCS [34],
to efficiently monitor the switching activity of the paths in the circuit under analysis.
This process is extremely fast when compared with the traditional parsing of value
change dump (VCD) file format. The inputs to this program are the target critical paths
and the circuit netlist. Output of the VCS-VPI program is the switching activities for
all the pins on the target critical paths. Based on this switching information, equivalent
stress signals [11] [29] are generated for each pin on the critical paths. This can be
easily performed by generating a large number of patterns, representing workload, to
apply to the circuit and then calculating the switching probability of the target pins.
The calculation can be simply formulated by Equation (2.26).
SP = 1− t1 + t2 + · · ·+ tn−1 + tn
NT
(2.26)
= 1− (t1 + t2 + · · ·+ tn−1 + tn)/N
T
∈ [0, 1]
SP represents the average switching that happens in every clock cycle on a pin.
For NBTI, the SP value gives equivalent stress time in one clock cycle, whereas it
represents the probability of a switching in each clock cycle when 0 < SP < 1 for HCI.
Other information can be extracted as well using different tools or programs, such
as 3D parasitics extraction to obtain the capacitive load.
55
• Step 3. Look-up Table Generation: A look-up table (LUT ) is generated based on
different conditions namely aging effects (NBTI and HCI), capacitive load, on-path and
off-path stress probability, temperature, slew rates, and time points. This is equivalent
to characterizing a technology library considering aging and generated using HSPICE
MOSRA. Note that any other aging simulation tools [35] [36] can be used in our APD
flow. Generation of such an aging library is done only once for each technology library.
We assume that the on-chip PLL that provides clock frequency is designed with reliable
components with no or negligible aging. However, we believe that our proposed flow can
easily adopt impact of aging on clock tree. When we generate our LUT and create new
aging-aware library cells in HSPICE MOSRA, we ensure that different input instances
are simulated individually, and the results are stored in the LUT.
• Step 4. Degradation Calculation: Using our APD flow, instead of running very
time-consuming HSPICE MOSRA simulation, we can quickly and accurately evaluate
a large number of paths by taking advantage of the presented mathematical models.
The major advantages of our APD analysis flow are:
• Significantly low CPU runtime,
• High accuracy,
• Technology/circuit independence.
Step 3 in APD is the most time consuming step. However, as mentioned earlier,
this step is carried out only once for each technology library.
56
2.5 Critical Reliability Path and Critical Reliability Gates Identification
On the basis of APD flow, we further develop the flow for identification and quantifica-
tion of critical-reliability gates (CRGs), defined as the minimum number of important
gates contributing to degradation of critical-reliability paths (CRPs), i.e., paths that
are sensitive to aging and could potentially become critical in the field. By identify-
ing CRPs and CRGs, aging compensation can be achieved with a minimal area and
power overhead. As will be elaborated further in later sections, CRGs can be flexibly
identified according to different workload scenarios, e.g., worst-case or actual workload
conditions, which makes this technique easily compatible with conventional industry
design practices. Figure 2.4 shows our improved flow for CRP and CRG identification.
The flow includes three major steps: (1) CP Selection, (2) CRP Selection, and (3) CRG
Identification.
2.5.1 CRP Delay Analysis and Selection
CRPs are paths that may violate the timing constraint at some point in the field, that
is, ΓCRPi > Γclk−Γm, where ΓCRPi is the delay of a CRP Pi in the field. For each PCP,
aging information (signal stress probability, capacitive load, and switching activity) can
be obtained using our in-house tools along with existing commercial tools. For example,
the switching activity for gates on the PCPs can be calculated by gate-level simulation
with our developed PLI routines. By collecting the switching information, the equiv-
alent duty ratio for signal stress probability can be calculated for NBTI analysis [11].
Moreover, the switching activity can also be studied for HCI effect.
57
Circuit Netlist
Critical-reliability Path Selection
Considering Aging Degradation
Critical-reliability
Path Set
Path and Gate Aging
Degradation Analysis
Aging-Aware
LP Optimization
Technique
Critical-reliability
Gate Set
Static Timing
Analysis (STA)
Step 1: Critical Paths Selection
Step 2: CRP Selection
Step 3: CRG Identification
(20% slack time)
Workload and
Circuit Information Critical Path Set
Delay Library
Workload
Profile
Gate Sizing
Fig. 2.4: CRP and CRG selection flow.
58
Once all the information is obtained, timing analysis is conducted to select the
CRPs from PCPs using the aging-aware delay library. Generation of the aging-aware
delay library is done only once for each technology library. Note that, we assume that
the on-chip PLL that provides clock frequency is designed with reliable components
robust to aging. However, we believe that our proposed flow can easily adopt impact of
aging on clock tree as well. Algorithm 1 shows our proposed aging-aware path timing
analysis procedure. Each path is regarded as a connection of gate primitives. Path
delay Γpi is initialized in Line 1; Lines 2 and 3 specify the temperature and stress time.
Based on the aging condition, every closest condition stored in the aging-aware delay
library is traversed from Lines 6 to 8. Lines 6 to 14 describe that, after being initialized
in Lines 8 and 9, the gate delay and output slew rate can be approximated by using
interpolation and extrapolation according to the delay models described in Section 2.3;
Line 15 accumulates the gate delay for path delay analysis. Line 16 propagates the
output slew rate forward ensuring accurate delay calculation. The process is repeated
for every gate on the path to obtain the path delay as presented from Line 5 to Line 17.
Considering N paths with M gates on average for each, complexity of the algorithm is
O(N ×M).
2.5.2 CRG Identification Technique
In this section, we evaluate the importance of the gates on CRPs based on their contri-
bution to the paths’ performance degradation and their relative impact on the CRPs.
For the illustration purpose, Figure 2.5(a) shows a small circuit to demonstrate the dif-
ference between selecting gates with the largest degradation and selection of gates based
59
Algorithm 1 Improved Aging-Aware Path Timing Analysis
1: Initialize delay of the i-th path Γpi ← 0
2: Specify temperature T
3: Specify stress time t
4: Specify the 1st gate input slew rate τ0
5: for (all logic gates index j from 0 to n in i-th path) do
6: Aging conditions collect for gate Gj :(Cj , pj,1, ...,pj,m)
7: Loading aging-aware delay library
8: Find all close k conditions (Γk, tk, τk, Ck, pk,1, ...,pk,m)
9: Initialize temporary delay Dt ← 0
10: Initialize temporary output slew rate St ← 0
11: for each close condition (Γk, tk, τk, Ck, pk,1, ...,pk,m) do
12: Data fitting for (Dt, St) using the mathematical models
13: end for
14: Finalize gate Gi approximation: (Dj , Sj)← (Dt, St)
15: Γpi+ = Dj
16: τj+1 = Sj : propagate slew rate to next gate
17: end for
60
on their impact on the paths. Each gate is numbered and the pair shows (d0 and ∆dt),
where d0 is the gate delay at time zero and ∆dt shows the amount of gate degradation
at time point t in the field. There are 14 paths in total in the circuit as shown in Figure
2.5(b). Also shown is the path delay at time zero (Γt0) and time t (Γtt). The gate and
path lengths are shown using unit delay. The maximum path delay in addition to the
margin (i.e. circuit timing) is set to 16 unit delays.
As seen, at time zero, all paths delay is under the timing budget; however, at
time t, there are four paths (P2, P3, P11, and P12) above the timing budget, which will
cause a failure in the circuit. The objective is to ensure that no path’s timing become
larger than the circuit timing budget. Figure 2.5(c) shows the gates ordered based on
their amount of aging at time t. To simplify the path delay calculation, we assume that
once a gate is sized, it will not experience aging in the field. When selected based on
the largest aging [27], gate G6 is chosen first. As seen in Figure 2.5(d), sizing this gate
will impact paths P1 and P6, which are not the most critical ones. Next, gate G1 is
selected, which reduces delay of paths P2 and P3. Selection of gate G4 will reduce delay
of paths P11 and P12, and finally G9 will further reduce delay of paths P2, P3, P11, and
P12, making all path length under the timing budget of 16 unit delay. However, using
our technique, we will select only gates G9 and G7 to compensate for aging as shown
in the Figure 2.5(d). This clearly shows that the most important gates to compensate
for aging are not necessarily the gates that have aged the most.
61
1
2
3
4 5
6
7
8
9
10
11
12
13
(5, 0.45)
(3, 0.22)
(3, 0.3)
(4, 0.4) (1, 0.12)
(1, 0.46)
(2, 0.35)
(1, 0.1)
(3, 0.38)
(2, 0.28)
(2, 0.24)
(2, 0.15)
(3, 0.18)
Input1
Input2
Input3
Input4
Output1
Output2
Output3
(a)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
P
P
P
P
P
P
P
P
P
P
P
P
P
P
1 3 6 8 11 1
1 3 7 9 11 1
1 3 7 9 12 2
1 3 7 10 12 2
1 3 7 10 13 3
2 3 6 8 11 1
2 3 7 9 11 1
2 3 7 9 12 2
2 3 7 10
Output
Output
Output
Output
Output
Output
Output
Output
G G G G G
G G G G G
G G G G G
G G G G G
G G G G G
G G G G G
G G G G G
G G G G G
G G G G G
→ → → → →
→ → → → →
→ → → → →
→ → → → →
→ → → → →
→ → → → →
→ → → → →
→ → → → →
→ → → → 12 2
2 3 7 10 13 3
2 4 5 7 9 11 1
2 4 5 7 9 12 2
2 4 5 7 10 12 2
2 4 5 7 10 13 3
Output
Output
Output
Output
Output
Output
G G G G G
G G G G G G
G G G G G G
G G G G G G
G G G G G G
→
→ → → → →
→ → → → → →
→ → → → → →
→ → → → → →
→ → → → → →
13
15
15
13
14
11
13
13
11
12
15
15
13
14
14.73
16.72
16.63
14.3
15.38
12.5
14.49
14.4
12.12
13.15
16.71
16.62
14.34
15.37
Path 0tΓ ttΓ
(b)
6 1 4 9 7 3 8 11 2 13 12 5 10G G G G G G G G G G G G G
0.46 0.45 0.40 0.38 0.35 0.30 0.28 0.24 0.22 0.18 0.15 0.12 0.10
Gate
dt∆
(c)
Selection Based on the Largest Aging
2
3
11
12
P
P
P
P
16.72
16.63
16.71
16.62
16.27
16.18
16.71
16.62
16.27
16.18
16.31
16.22
15.89
15.8
15.93
15.84
Selection Using Our Algorithm
2
3
11
12
P
P
P
P
16.34
16.25
16.33
16.24
16.34
16.25
16.33
16.24
6 → 1 4 9→ → 9 7→
(d)
Fig. 2.5: CRG selection using the concept of importance to CRGs: (a) an illustrative
circuit, (b) paths in the circuit and their timing, (c) order of gate aging at
time t, and (d) the gate selection and sizing to compensate aging.
62
minmax : F (X) (2.27)
subject to : ΓCRP −G×X ≤ Γspec
where,
F (X) =

f1,1(x1) f1,2(x2) . . . f1,m(xm)
f2,1(x1) f2,2(x2) . . . f2,m(xm)
...
fn,1(x1) fn,2(x2) . . . fn,m(xm)

n×m
,
fi,j(xj) =

1, if Gj is in path Pi and xj > 0
0, otherwise
Γspec =

Γclk − Γm
Γclk − Γm
...
Γclk − Γm
Γclk − Γm

n×1
, G=

g1,1 g1,2 . . . g1,m
g2,1 g2,2 . . . g2,m
...
gn,1 gn,2 . . . gn,m

n×m
,
ΓCRP =
[
ΓCRP1 ΓCRP2 . . . ΓCRPn
]
1×n
,
X =
[
x1 x2 . . . xm
]
1×m
,
n is the number of CRPs and m is the number of unique gate instances on the
n CRPs. Each row of matrix F and G corresponds to one CRP, while each column
63
corresponds to one gate instance of all the non-duplicated gate instances on the CRPs.
ΓCRPi is the i-th CRP’s degraded delay, gi,j is gate degradation of the j-th gate on the
i-th path with a stress time of t. Based on different lifetime timing specifications, gi,j is
obtained from the CRP selection procedure discussed earlier. By solving the optimiza-
tion problem, xj (j = 1, . . . ,m) will be obtained. The larger the xj value is, the more
important the j-th gate instance is. X is initialized to be X0 = [0, 0, · · · , 0]′1×m to make
all gates equally important at the beginning. After solving the LP optimization problem,
the gates can be ranked according to their x values rather than their degradation values.
By ranking the x values from high to low, each CRP Pi can find the smallest set of gates
Ui whose compensation, once is applied, can make up for the paths degradation over the
lifetime. Assume that Pi has M gates, i.e., G1, G2, . . . , GM . Their delay degradations
are gi,1, gi,2, . . . , gi,M and corresponding x values are x1 > x2 > . . . > xM , respective-
ly. Assume that the path delay ΓCRPi does not satisfy the timing requirement, i.e.,
ΓCRPi > Γclk − Γm at some time point in the field. There are several methods [24] [21],
including gate sizing, that we can use to compensate for this degradation. For simplici-
ty, we assume that once a gate is chosen for applying compensation on, its degradation
is fully offset to 0 as gi,j = 0. Then, we need to find the critical length li for the Pi to
meet the specification. Note that the critical length li is the minimum number of gates
necessary for aging compensation. As a result, the gate set Ui = (G1, G2, . . . , Gli) out
of UPi(G1, G2, . . . , GM ) is the i-th CRG set for Pi. The gate set U =
n⋃
i=1
Ui is the CRG
set for the design.
Note that: (1) in the LP constraint function, gi,j represents the degradation for
64
gate instance Gj . Thus, by replacing gi,j with the worst-case degradation, our flow
is applicable to worst-case analysis; (2) our method of critical length li calculation
for Pi is based on the assumption that gi,j = 0 once the gate Gj is chosen for applying
compensation. This is essentially assuming that the degradation of gate Gi will decrease
at a scale of gi,j benefiting from the compensation scheme. Thus, by substituting the
value gi,j with an actual compensation value from a different compensation scheme,
the LP optimization method is scalable to any aging compensation schemes for CRG
identification. For example, for compensation using gate sizing, gi,j is calculated in the
following way:
gi,j = Dj(t)−Dsub(t), (2.28)
where Dj(t) represents the delay of the j-th gate at time t and Dsub(t) represents that
of the new gate. In addition, the gate sizing impact on the previous and next gates
has to be considered as well. For example, when a gate is substituted for a new gate,
its capacitive load has impact on the gate in the previous stage. Increasing gate size
also impacts the output slew rate, which needs to be propagated forward. However this
does not influence the flow of our method where gates are treated as a primitive unit
and paths are treated as connection of gate primitives. Thus, the delay and degradation
approximation can still be obtained from the flow with the capacitive load Cj−1 changed
in previous stage and output slew rate Sj changed in current stage.
To find the most critical gates for each CRP and obtain the entire CRG set U ,
[27] sorts all the gates on the CRPs according to their degradation from high to low as
G∗1, G∗2, . . . , G∗M with g
∗
i,1 > g
∗
i,2 > . . . > g
∗
i,M . The minimum l
∗
i gates are found to meet
65
the criterion ΓCRPi − (g∗i,1 + . . . + g∗i,l∗i ) < Γclk − Γm and the l
∗
i gates (G
∗
1, G
∗
2, . . . , G
∗
l∗i
)
are identified as CRGs for path Pi, and the united set of all the CRGs is the CRG set
U∗ for the design. In contrast, our LP technique offers several advantages over [27]: (1)
Our flow is extendable to other aging effects beyond NBTI and HCI, (2) Our analysis
of CRGs evaluates not only their own aging degradation, but also their contribution to
CRPs degradation, and (3) Our aging analysis flow also enables the xj to incorporate
the topology-related information, such as capacitive load.
2.6 Results and Analysis
We use the open-source Nangate 45nm standard-cell library to evaluate our proposed
APD analysis flow. Benchmark circuit s38417 from ISCAS’89 family is simulated with
three different input WL values (0.25, 0.5, 0.75). At each WL, each primary input pin
will have a specific input sequence when running RTL simulation. Our in-house VCS-
VPI program is used to collect the switching information when applying the patterns.
In summary, 20, 000 random input patterns are generated and over 84M switchings
are detected in each simulation run. Top 81 critical paths are extracted and their
simulation results are provided for analysis. For the gates on each path, their intrinsic
stress probabilities, capacitive load considering circuit connection and topology, and
input slew rate are collected.
Tables 2.3–2.5 list the results on three critical paths (T = 125oC) with different
workload values (WL = 0.25, 0.50, 0.75), where the results from APD flow are compared
with those obtained from HSPICE MOSRA simulations. The on-path and off-path SPs
66
are obtained from circuit simulation. For all our results, different input instances are
considered as well. For each path in the tables (Paths 1, 2, and 3), the second column
shows the HSPICE simulation results; the third column shows the delay obtained using
our APD flow considering the aging conditions (stress probabilities, capacitive load,
temperature, slew rate, voltage, etc). The relative error Erel compares the degradation
calculated using APD and HSPICE MOSRA over a lifetime degradation value, while
the absolute error Eabs calculates the error when comparing the absolute path delay
difference from the HSPICE MOSRA result. The absolute error Eabs and relative error
Erel are listed in Columns 3 and 4, respectively. For each path. Eabs and Erel are
calculated by:
Eabs =
(ΓHSPICE)ti − (ΓAPD)ti
(ΓHSPICE)ti
× 100 (2.29)
Erel =
(ΓHSPICE)ti − (ΓAPD)ti
(ΓHSPICE)tlifetime − (ΓHSPICE)t0
× 100 (2.30)
where (ΓHSPICE)ti is the path delay from HSPICE simulation for time point ti and
(ΓAPD)ti is the delay from our APD flow. From Tables 2.3 to 2.5, we can see a maximum
of Erel < 4.2% using our APD flow compared to HSPICE MOSRA simulation results.
As discussed, for higher accuracy, one effective solution is to increase the parameter
granularity when generating LUT, i.e., use smaller step-sizes for temperature, time
points, slew rates, etc, which can increase the size of LUT, but still remains significantly
faster than HSPICE MOSRA.
To further verify the accuracy of our flow, the top 81 critical paths from s38417
67
Table 2.3: Accuracy analysis for APD compared to HSPICE (T = 125oC,WL = 0.25)
Aging Time (s)
Path 1
HSPICE (ns)APD (ns)Eabs(%)Erel(%)
t0 = 0 4.0179 4.0106 0.1817 1.2845
t1 = 0.6E + 8 4.3459 4.3833 0.8606 6.5810
t2 = 1.2E + 8 4.4363 4.4703 0.7664 5.9828
t3 = 1.8E + 8 4.4912 4.5139 0.5054 3.9944
t4 = 2.4E + 8 4.5299 4.5584 0.6292 5.0150
t5 = 3.0E + 8 4.5862 4.5979 0.2551 2.0588
Avg. Err. — 0.5331 4.1527
Table 2.4: Accuracy analysis for APD compared to HSPICE (T = 125oC,WL = 0.50)
Aging Time (s)
Path 2
HSPICE (ns)APD (ns)Eabs(%)Erel(%)
t0 = 0 4.0088 3.9991 0.2420 1.6573
t1 = 0.6E + 8 4.3545 4.3763 0.5006 3.7246
t2 = 1.2E + 8 4.4428 4.4676 0.5582 4.2371
t3 = 1.8E + 8 4.4895 4.5177 0.6281 4.8180
t4 = 2.4E + 8 4.5524 4.5636 0.2460 1.9135
t5 = 3.0E + 8 4.5941 4.6022 0.1763 1.3839
Avg. Err. — 0.3919 2.9557
68
Table 2.5: Accuracy analysis for APD compared to HSPICE (125oC) under different
workload (WL = 0.75)
Aging Time (s)
Path 3
HSPICE (ns)APD (ns)Eabs(%)Erel(%)
t0 = 0 4.0131 4.0139 0.0199 0.1384
t1 = 0.6E + 8 4.3584 4.3834 0.5736 4.3260
t2 = 1.2E + 8 4.4452 4.4664 0.4769 3.6685
t3 = 1.8E + 8 4.4993 4.5121 0.2845 2.2149
t4 = 2.4E + 8 4.5351 4.5575 0.4939 3.8761
t5 = 3.0E + 8 4.5910 4.6044 0.2919 2.3187
Avg. Err. — 0.3568 2.7571
are simulated in HSPICE and estimated with APD flow with three different workloads
(0.25, 0.50, 0.75). The measurements are done at six stress time points from t = 0 to
t = 3.0E + 08 seconds (around 10 years lifetime). For both HSPICE simulation and
our APD flow, the parasitic parameters and signal probabilities are all extracted from
the gate-level circuit and its respective physical design. The correlation between path
delay obtained from HSPICE (x-axis) and APD (y-axis) are shown in Figure 2.6. The
diagonal line corresponds to the highest correlation position. The higher accuracy the
result shows the closer the point is to the red line. Each point represents a critical path.
The results show very high correlation between APD and HSPICE as listed in Tables 2.6.
For the three workloads, the average correlation between HSPICE simulation and APD
is as high as 0.9960.
One important issue in performing timing analysis considering degradation on
large-scale industry designs is the very large number of critical or near critical paths. In
69
2.5 3 3.5 4 4.5 5
x 10 −9
2.5
3
3.5
4
4.5
5 x 10
−9
HSPICE Simulation
AP
D
Benchmark: s38417 Workload = 0.25 Temperature = 125
 
 
Stress Time = 0
Stress Time = 0.6E+08
Stress Time = 1.2E+08
Stress Time = 1.8E+08
Stress Time = 2.4E+08
Stress Time = 3.0E+08
(a) Case 1: WL = 0.25
2.8 3.2 3.6 4 4.4 4.8
x 10 −9
3
3.5
4
4.5
x 10 −9
HSPICE Simulation
AP
D
Benchmark: s38417 Workload = 0.50 Temperature = 125
 
 
Stress Time = 0
Stress Time = 0.60E+08
Stress Time = 1.20E+08
Stress Time = 1.80E+08
Stress Time = 2.40E+08
Stress Time = 3.00E+08
(b) Case 2: WL = 0.5
2.5 3 3.5 4 4.5 5
x 10 −9
2.5
3
3.5
4
4.5
5 x 10
−9
HSPICE Simulation
AP
D
Benchmark: s38417 Workload = 0.75 Temperature = 125
 
 
Stress Time = 0
Stress Time = 0.60E+08
Stress Time = 1.20E+08
Stress Time = 1.80E+08
Stress Time = 2.40E+08
Stress Time = 3.00E+08
(c) Case 3: WL = 0.75
Fig. 2.6: Correlation analysis for different workloads.
70
Table 2.6: Correlation analysis for APD compared to HSPICE (125oC) with different
WLs
Workload 0.25 0.50 0.75
Correlation 0.9949 0.9966 0.9964
Avg. Corr. 0.9960
our flow as described in Step 1 in Section 2.4, we use a slack value 20% to select the long
paths for further aging analysis, for which CPU runtime may still be an issue. In the
following, we will provide our analysis of overlapping paths. The paths that share large
number of gates will degrade similarly, which means that aging effects may decrease their
delay difference over time. On the contrary, paths with no or little overlap degrade very
differently. The top 20% paths extracted using our flow can be further divided into
different path clusters according to their gate instance overlap. Then degradation and
delay analysis is conducted on several representative paths selected from each cluster.
Note that grouping the paths and selection of the representative paths are beyond the
scope of this research. In our path overlap analysis, we randomly select 100 paths from
the top 20% paths from s38417. Their delays and degradations are obtained using
HSPICE MOSRA. Delay differences at time 0 and 3.0E + 08 seconds are calculated
between each two paths. Meanwhile, the overlap between each two paths is calculated
as well. We define (∆Γij)0 and (∆Γij)t as the delay difference between Path i and Path
j at time 0 and t = 3.0E + 08 seconds. Thus, the crossover CRij between Path i and
Path j can be evaluated using the Signum function as:
71
CRij = sign[(∆Γij)0]× [(∆Γij)t]
=

(∆Γij)0 > 0 ∩ (∆Γij)0 > 0
1, or
(∆Γij)0 < 0 ∩ (∆Γij)0 < 0
0 (∆Γij)0 = 0 or (∆Γij)0 = 0
(∆Γij)0 < 0 ∩ (∆Γij)0 > 0
−1 or
(∆Γij)0 > 0 ∩ (∆Γij)0 < 0
(2.31)
where CRij = −1 indicates a crossover between Path i and Path j from time 0 to
3.0E + 08 seconds due to aging degradation. By using Equation 2.31, there are 281
crossovers detected among the selected 100 paths considering 10 years degradation. For
all the path pairs with crossover, we investigate their delay difference and path overlap.
The results are shown in Figure 2.7, where each point corresponds to overlap between
two paths. The x-axis shows the path overlap (in percentage) between each two paths.
The y-axis shows their delay difference at time 0 (blue dot) and time 3.0E + 08 (red
rectangular). From the results, we can see that paths with more overlap (e.g., ≥ 70%)
have smaller delay difference. While paths with less overlap (e.g., ≤ 30%) have larger
delay difference between them. In the mean time, degradation from aging decreases the
delay difference between paths with large overlap.
72
0 10 20 30 40 50 60 70 80 90 100
−3
−2
−1
0
1
2
3
x 10−11
Overlap (%)
De
lay
 D
i
er
en
ce
 
 
t = 0
t = 3.0E+08
Fig. 2.7: Path delay difference and overlap analysis considering degradation due to
aging.
At time 0, the low-overlap paths (e.g., ≤ 30%) have their delay difference in the
range of −23.3ps and +30ps, while the high-overlap paths (e.g., ≥ 70%) have their delay
difference between −13.7ps and +11.7ps. However, after 10 years degradation, the low-
overlap paths delay difference is around −28.5ps and +17.4ps, while the high-overlap
paths delay difference ranges from −8.6ps to +6.7ps. We can see that low-overlap
paths delay deviates significantly over time, while high-overlap paths degrade almost
similarly. This enables the paths clustering method to further reduce the number of
critical-reliability paths for aging analysis.
To ensure the APD flow is practical for large industry designs, one major objective
is to reduce the CPU runtime significantly when compared with HSPICE or any other
reliability simulation tools. Our experiments are run on a 32-bit Windows Desktop with
73
s5
37
8
s9
23
4
s1
32
07
s1
58
50
s3
84
17
s3
85
84
0
50
100
150
200
250
300
350
400
Benchmark Circuit
Sp
ee
du
p 
(HS
pic
e/A
PD
)
Fig. 2.8: Computational complexity comparison between HSPICE MOSRA and APD
flow for different benchmark circuits.
Intel Core 2 Duo CPU 3.00Ghz and 4GB of memory. For one critical path from s38417,
HSPICE simulation for 10 time points takes about 7 minutes while it takes only about
1.75 seconds using APD. Similar results have been observed for all other 81 critical
paths. Further, we compare the computational complexity for aging analysis on a large
number of paths extracted from several benchmark circuits. The results in Figure 2.8
shows the speedup over HSPICE MOSRA. On average, we have observed over 244X
speedup, which ensures that our APD flow is indeed practical for large industry designs.
Specifically, we also calculate the analysis time using HSPICE MOSRA and our
APD flow on a same computer. Averagely, degradation analysis on one path takes
464.86s and 1.91s for HSPICE MOSRA and APD flow separately. Considering our APD
flow, the most time-consuming step is generating the re-characterized aging-aware delay
74
0 100 200 300 400 500
0
0.5
1
1.5
2
2.5
x 105
Path Count (n)
Ov
er
all
 Ti
m
e (
s)
 
 
 
HSPICE
APD
n = 374
Fig. 2.9: Overall computational time comparison between HSPICE MOSRA and APD
flow considering path count.
library LUT, it takes around ≈ 172800s (48 hours). Thus, to compare the computation
efficiency of HSPICE MOSRA and APD on degradation analysis according to path
count, the following equations about overall time needed can be set up:
(Timeoverall)HSPICE = 464.86× n (s) (2.32)
(Timeoverall)APD = 172800 + 1.91× n (s) (2.33)
where n is the path count for degradation analysis. It can be calculated from Equations
(2.32) and (2.33) that when n ≥ 374, APD flow will gain superiority over HSPICE
MOSRA, which is clear in Figure 2.9.
Moreover, our current aging-aware delay library LUT is generated on the same
PC (32-bit Windows Desktop with Intel Core 2 Duo CPU 3.00Ghz and 4GB of memory).
We strongly believe that the generating time will be dramatically reduced if the library
75
is re-characterized on a high performance multicore sever with large memory using
multithreading programming techniques. Additionally, the re-characterization of the
library is a one-time work. Once the LUT is generated, all the designs based on the
same library can use it without repeating the re-characterization process. Furthermore,
degradation analysis on a large-scale industry design for actual implementation can
easily incur far more than 374 critical paths for degradation analysis. The last but
not least, our APD flow utilizes conventional ASIC tools, which renders its easy-to-
integration into current industry design flow. All these will solidify the superiority of
APD flow over HSPICE MOSRA and mathematical models.
In the following, we evaluate the efficiency of our improve APD flow for CRP
and CRG identification. Results of gate-sizing on the CRGs will also be discussed.
For the first simulation, the CRGs in the benchmark s38417 are identified for different
timing margins (Γm) over ten years considering NBTI and HCI effects at three workload
scenarios (WL = 0.25, 0.5, and 0.75). The proposed flow is run to find the CRG set
for the s38417 benchmark circuit; gi,j = 0 is still held while deciding the critical length
li for CRP Pi, for simplicity. The results are shown in Figure 2.10, which indicate that
a more stringent timing margin will lead to a larger CRG set. For example, when the
timing margin Γm is equivalent to 10% of the largest delay at time 0 when WL = 0.5,
176 gates are identified as CRG; while at the same WL value, when the timing margin
Γm is equivalent to 5%, 493 gates are identified as CRG.
The two timing margin values 5% and 10% are also used to repeat the simulations
with the stress time changing from 0.6E + 08s (≈ 2 years) to 3.0E + 08s (≈ 10 years).
76
2 4 6 8 10 12 14 16 18
0
100
200
300
400
500
600
700
800
Timing Margin   (%)
Nu
m
be
r o
f C
RG
s
 
 
WL = 0.25
WL = 0.50
WL = 0.75
mΓ
Fig. 2.10: Number of CRGs for different timing margins (when T = 75oC,WL = 0.5,
and t = 10 years).
Results in Figure 2.11 indicate that more gates are identified as CRGs when the target
lifetime is longer. Meanwhile, to satisfy the same lifetime specification, a smaller timing
margin leads to more identified CRGs. The results also demonstrate that circuit lifetime
can be extended by only introducing a small area overhead as the minimum gate set
is selected for aging compensation, because resources are efficiently allocated to CRGs
only.
Further simulations are conducted for several benchmark circuits with three tim-
ing margin values (2%,7%,12%). The results are shown in Table 2.6. Columns 2 and
3 list the total gate and flip-flop count. Under each timing margin category, the first
and the second columns list the PCPs and CRPs, respectively. The CRGs in the third
column are obtained using our flow. Finally, the fourth column shows the area over-
head due to sizing the CRGs. Figures 2.12 and 2.13 demonstrate the effectiveness of our
77
0.6 1.2 1.8 2.4 3
0
100
200
300
400
500
Stress time (108 s)
Nu
m
be
r o
f C
RG
s
 
 
        = 10%
        = 5%
mΓ
mΓ
Fig. 2.11: Number of CRGs for different stress times (when T = 75oC,WL = 0.5).
CRP and CRG identification. Gate sizing is conducted on the CRGs identified by our
flow. In each figure, at each time point, the number of CRPs and the number of CRGs
identified are presented in the form of CRP/CRG. The horizontal line (in green) is
the timing specification for the design at that time point including the timing margin.
The longer the stress time is, the more CRPs and CRGs are identified, indicating that
a longer lifetime requirement leads to more CRGs. In Figure 2.12, Γm = 5%; while in
Figure 2.13, Γm = 10%. The CRP delays before gate sizing are checked and presented
(in red) above the horizontal line (in green), without exception that their delays violate
the timing specification at different time point. After sizing the CRGs, the CRP delays
are effectively reduced and are below the horizontal line accordingly (in blue). In the fig-
ures, the CRP delays before and after gate sizing are not displayed vertically on a same
line at each time point just for demonstration purpose. In Figure 2.14, the stress time
is fixed with a varying Γm, which shows that a stringent timing margin leads to more
CRGs. From the results, we can see that aging compensation on the CRGs can effec-
78
0.5 1 1.5 2 2.5 3 3.5
4
4.2
4.4
4.6
4.8
5 x 10
−9
Stress Time (10 8 s)
CR
P D
ela
y b
ef
or
e/
af
te
r G
at
e S
izi
ng
 (s)
546/170
1092/228
1749/403
2790/486
3139/493
After gate sizing Timing specification (         = 5%)Before gate sizing mΓ
Fig. 2.12: CRP delay comparison before/after gate sizing on the CRGs with varying
stress time (when Γm = 5%,WL = 0.5, and T = 75
oC).
tively reduce the aging degradation and satisfy the design with the timing specification
for the lifetime operation.
These experimental results demonstrate that circuit performance degradation can
be minimized if early prediction and optimization is conducted during the design stage,
and that circuit aging degradation can be compensated for by focusing on a minimum
set of CRGs. The experimental results also show that we can efficiently identify a small
number of CRPs and CRGs. After identifying the CRGs, other aging compensation
mechanisms can also be utilized more effectively with lower area and power overheads.
2.7 Summary
In this chapter, a comprehensive analysis of impact of different parameters on circuit
aging was proposed. A fast and accurate aging-aware path delay analysis flow was
79
T
a
b
le
2
.7
:
C
R
P
an
d
C
R
G
id
en
ti
fi
ca
ti
on
fo
r
th
re
e
ti
m
in
g
m
ar
gi
n
s
w
it
h
10
ye
ar
s
N
B
T
I
an
d
H
C
I
d
eg
ra
d
at
io
n
(T
=
75
o
C
,W
L
=
0
.5
)
B
en
ch
.
#
of
#
of
Γ
m
=
2%
Γ
m
=
7%
Γ
m
=
12
%
C
ir
cu
it
G
a
te
s
F
F
s
P
C
P
C
R
P
C
R
G
A
re
a
o
(%
)
P
C
P
C
R
P
C
R
G
A
re
a
o
(%
)
P
C
P
C
R
P
C
R
G
A
re
a
o
(%
)
s5
37
8
13
24
15
3
94
87
10
6
1.
14
79
83
52
41
0.
50
00
45
16
19
0.
25
92
s9
23
4
11
20
12
5
85
49
29
0.
49
34
50
26
24
0.
44
64
25
10
10
0.
16
45
s1
32
07
14
77
24
5
19
13
49
0.
55
50
7
4
21
0.
16
82
4
2
5
0.
01
68
s1
58
50
33
38
45
2
89
0
86
5
21
0.
19
36
28
9
27
3
19
0.
20
05
34
33
11
0.
10
37
s3
84
17
10
22
1
15
23
96
71
56
16
49
3
1.
12
52
49
27
19
22
40
4
0.
68
81
16
89
48
5
16
2
0.
26
77
s3
85
84
12
48
2
12
46
29
3
26
7
13
7
0.
33
49
16
4
13
9
67
0.
17
08
55
39
15
0.
02
25
P
C
P
:
p
o
te
n
ti
a
l
cr
it
ic
a
l
p
a
th
;
C
R
P
:
cr
it
ic
a
l-
re
li
a
b
il
it
y
p
a
th
;
C
R
G
:
cr
it
ic
a
l-
re
li
a
b
il
it
y
g
a
te
;
A
r
ea
o
:
a
re
a
o
v
er
h
ea
d
;
F
F
:
fl
ip
-fl
o
p
;
Γ
m
:
ti
m
in
g
m
a
rg
in
.
80
0.5 1 1.5 2 2.5 3 3.5
4.2
4.4
4.6
4.8
5 x 10
−9
Stress Time (10 8 s)
CR
P D
ela
y b
ef
or
e/
af
te
r G
at
e S
izi
ng
 (s)
64/64
215/88
413/136
794/161
881/176
After gate sizing Timing specification (       = 10%)Before gate sizing mΓ
Fig. 2.13: CRP delay comparison before/after gate sizing on the CRGs with varying
stress time (when Γm = 10%,WL = 0.5, and T = 75
oC).
0 5 10 15 20
4
4.2
4.4
4.6
4.8
x 10−9
Timing margin (%)
CR
P d
ela
y b
ef
or
e/
af
te
r g
at
e s
izi
ng
 (s) After gate sizing
Timing specification (        = 1% ~ 18%)
Before gate sizing
mΓ mΓ
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
11%
12%
13%
14%
15%
16%
17%
18%
6735 / 572
5616 / 493
4641/ 491
3809 / 491
3139 / 493
2477 / 418
1922 / 404
1485 / 340
1151/ 314
881/176
657 /184
485 /162
317 /126
238 / 95
172 /107
112 / 81
64 / 29
42 /17
CRP/CRG
Fig. 2.14: CRP delay comparison before/after gate sizing on the CRGs with varying
Γm (when WL = 0.5, T = 75
oC, and t = 10 years).
81
proposed as well, which can easily take into account all the circuit parameters. Results
show that APD provides an accuracy with ≤ 5% relative error and ≤ 0.7% absolute
error. The correlation analysis shows a ≥ 0.99 correlation coefficient value on average
between the results from HSPICE MOSRA simulation and APD flow. The CPU runtime
is reduced by more than 244X. This demonstrates that our flow is applicable to modern
circuits to effectively identify the long paths that could potentially become critical in
the field as to help perform better margining before design tape-out. In addition, a new
technique was proposed to quickly identify CRPs and CRGs. Aging compensation can
therefore be focused only on the minimum number of CRGs as they contribute the most
to path delay degradation. The flow is easy to be integrated into conventional industry
IC design flow. Experimental results indicate that our technique is able to identify the
minimum number of CRGs, minimize area overhead for design margining, and ensure
performance throughout its lifetime operation.
Chapter 3
Representative Critical Reliability Paths
3.1 Introduction
As CMOS feature size shrinks down into deep nanometer regime, VLSI circuits are
facing increasing challenge of reliability degradation [38] [48] [49] caused by nega-
tive/positive bias temperature instability (NBTI/PBTI), hot carrier injection (HCI)
and time-dependent dielectric breakdown (TDDB) [50], which cause parametric shifts
and eventually device failure. One-time worst-case guardbands at design stage, e.g.,
clock frequency reduction, supply voltage increase, and gate sizing, are traditionally
introduced as a solution. However, they are usually over-pessimistic and inefficient [24].
Although there are many analytical aging models in the literature [41] [56], actual aging
can still considerably deviate from the prediction as workload plays an important role
in circuit aging making performance degradation unpredictable. Thus, it is necessary
to monitor the aging on the chip and react dynamically.
Many existing dynamic solutions are based on on-chip structures that monitor
aging of a stand-alone circuit (such as ring-oscillators). In [54], a digital reliability
monitor measures the frequency of stressed ring-oscillators. The frequency degradation
82
83
is thus detected, which can be used to estimate aging of the functional circuit. [60]
proposed the use of tunable replica circuits containing error-detection scheme to adapt to
variations of aging, voltage, and temperature. These methods offer small area overhead
and do not introduce performance penalty to the functional circuit. However, their
major limitations are that ring-oscillators do not well-represent the functional circuits’
structure and performance, and that replica circuits can only help monitor those selected
paths but leaving all the other paths unconsidered. Yet, workload is another concern
that is hard to predict at design stage and may largely affect the actual aging. Thus,
aging estimation and hence the calibration made accordingly can be rather inaccurate.
Meanwhile, other solutions try to monitor aging of functional circuit directly
by measuring critical path delay. In [52] [53], on-line aging sensors are designed and
integrated into flip-flops on the functional paths. In [52] [53], a time window is generated
to capture late transitions that indicate serious aging and thus can predict failure.
However, only being able to compensate for aging when the degradation is already
serious may be too late for optimal compensation. In [52] [53], aging sensors were
proposed to directly measure path delay and degradation during normal operation.
However, these two sensors require modification on the functional paths, especially the
flip-flops, which the designers are often reluctant to do so. In [51], off-line self-test is
scheduled periodically on selected cores in a multi-core system using test patterns pre-
stored in off-chip nonvolatile memory. However, normal execution of the core under
test has to be interrupted during the test and the memory requirement on pre-stored
patterns is not trivial.
84
In this chapter, we propose a novel methodology to accurately evaluate aging on
the chip. Representative Critical Reliability Paths (RCRPs) are synthesized as a stand-
alone circuit to represent the aging of critical reliability paths that are defined as the
paths that can potentially become critical at some time point due to aging in the field.
Both the topology and the workload of the critical reliability paths are captured by the
RCRPs. Thus, by monitoring delay degradation of RCRPs, aging, including recovery
effects (e.g., in sleep mode), of the critical reliability paths in the functional circuit can
be efficiently and accurately evaluated with no impact on the normal operation. The
aging evaluation results can then be exploited to guide calibration. Note that RCRP
is different from “representative critical path” in the literature (e.g., [62] [63]) for post-
silicon delay prediction. Rather than trying to accurately estimate as many target paths
as possible only at time zero for manufacturing test, RCRPs aim at always being able
to track the largest delay among the critical reliability paths throughout the lifetime of
operation under aging. This is more important, as only the largest delay determines the
functional speed and correctness and whether performance calibration is needed. Once
aging is evaluated by RCRPs, a dynamic reliability management (DRM) unit such as [55]
can utilize it and make reliability-control decisions accordingly. Adaptive body bias
(ABB) [64] and adaptive supply voltage (ASV) [65] traditionally used for compensating
process variations can be applied to calibrate the circuit performance [23]. Note that
RCRP circuit will not impose any constraint on physical design, will not impact circuits’
performance, and can be placed anywhere in the circuit layout.
The rest of the chapter is organized as follows. Section 3.2 presents the basic
85
concept of RCRP and aging evaluation. In Sections 3.3 and 3.4, problem formulation
and RCRP synthesis are presented, respectively. Section 3.5 discusses the simulation
results and Section 3.6 summarizes this chapter.
3.2 Concept of RCRP
We use Figure 3.1 to demonstrate the basic concept of RCRP. The objective is to use
a small number of stand-alone paths to statistically estimate the delay and aging of a
large number of critical reliability paths in the chip. More importantly, the largest delay
among the critical reliability paths at any time point during the entire lifetime can be
accurately estimated from the measurements of these stand-alone paths. Suppose that,
among many functional paths, paths p1, p2, and p3 stand out as three critical reliability
paths at different time points, e.g., path p1 has the largest delay from time 0 up to
time t1, path p2 has the largest delay during [t1, t2], and path p3 becomes the most
critical after time t2. This phenomenon results from their different aging rates as they
may have different structures (gates, interconnects, load capacitance, etc.) and process
variations, and experience different workloads. As the workload is usually unknown a
prior, it is extremely difficult to predict the circuit aging and criticality at design stage.
We design RCRPs in a way that, unlike many existing on-chip measurement
structures, the circuit structures as well as workload of the critical reliability paths can
be captured by the RCRPs. Similar to functional paths, RCRPs’ delay can be accurately
measured by on-chip structures such as [57] [58] [59], to name a few. As a result, the
largest delay estimated by RCRPs always closely tracks that of the critical reliability
86
Fig. 3.1: Demonstration of the RCRP concept.
paths and this enables adaptive calibration methods such as ABB and ASV to be
performed in a timely manner. This is a cost-efficient solution compared with inserting
monitors all over the critical reliability paths, as only a small number of RCRPs need
to be implemented and monitored plus the impact on the critical reliability paths is
minimized.
Note that the difference between the critical reliability paths’ delay and the R-
CRPs’ estimation at time 0 can be used to adjust the RCRP estimation in future time
points to compensate for the process variations and the impact of interconnects, as these
effects remain the same at time 0 and time t amid aging. Furthermore, the difference
will also be used to update the triggering threshold for performance calibration in the
ASV/ABB controller (implementation of the ASV/ABB controller is beyond the scope
of this chapter). This will be further discussed in Section 3. We acknowledge that
RCRPs cannot capture effects such as crosstalk that critical paths experience but tem-
87
perature and power supply noise in the circuit can be taken into account by the RCRPs.
Analyzing the impact of these two effects on RCRPs is part of our future work.
3.3 Problem Formulation
Suppose there are m critical reliability paths, each of which consists of delay segments
from on-path gate input to output plus interconnect to the next gate. If there are totally
n unique segments on these critical reliability paths, we can then use an m× n matrix
P = {p1, p2, ..., pm}T to denote the paths, where each row vector pi refers to a specific
path among the group and is in the form of a series of 1s and 0s, indicating whether or
not a specific segment is on the path. For example, if pi = [0, 0, 1, ...], it means segments
1 and 2 are not on path pi while segment 3 is. If the delay of segment j is xj , we can
use a vector X = [x1, x2, ..., xn]
T to denote the segments delay. Thus, if the delay of
each path pi is di, delay of the m paths can be expressed as d = [d1, d2, ..., dm]
T = PX.
We then try to build r RCRPs based on the m critical reliability paths in P (r <<
m). That is, PR = {p′1, p′2, , p′r}T , where p′j ∈ P . Similarly, the delay measurement of
each RCRP can be expressed in the form of dR = [d
′
1, d
′
2, ..., d
′
r]
T = PRX. Thus, we can
estimate the delay d based on dR. The delay estimation d̂ and estimation error Error
can be calculated respectively as below:
d̂ = PP TR (PRP
T
R )
−1dR, (3.1)
Error = d̂− d = P [P TR (PRP TR )−1dR −X]
= P [P TR (PRP
T
R )
−1PR − I]X (3.2)
where ()−1 denotes the inverse matrix and I denotes the unit matrix that has ones on
88
the main diagonal and zeros elsewhere. For the purpose of accurate aging evaluation,
the objective is to minimize the estimation error at all time points t under various work
conditions (e.g., workload, temperature, etc.) for the given area budget.
We build the RCRPs based on factorization of the path matrix P . Specifically,
after performing singular value decomposition, we can obtain P = U × S × V , where
U and V are n × n and m ×m orthogonal matrices, respectively, and S is an n ×m
diagonal matrix consisting of the eigenvalues {λi} of P ranked in the descending order,
i.e., λ1 ≥ λ2 ≥ ... ≥ λi ≥ λi+1 ≥ ... ≥ 0. When the r most significant eigenvalues
{λi}ri=1 are chosen, we can use QR decomposition with column pivoting to select the
corresponding row vectors, and in other words, the paths p′1, p′2, ..., p′r, as RCRPs to
represent the original paths in matrix P . This mathematic tool is also used in the
related work [61] [63] and many other applications in signal processing and statistics.
According to [61], the estimation error is bounded by a function of η = 1 −
∑r
1 λi∑
λi
[61].
The selected r RCRPs are then implemented as stand-alone circuit and are used to
monitor circuit aging in the field.
Note that, although RCRP synthesis is also based on matrix factorization, RCRP
is different from the representative path method proposed in [63]. [63] is proposed for
post-silicon paths delay prediction at time 0 under process variations. In contrast,
RCRPs aim at capturing the aging effect throughout the lifetime operation due to
runtime workload being unknown at design stage. More importantly, rather than trying
to accurately estimate most target paths, the RCRPs focus more on the capability of
continuously tracking the largest delay among the critical reliability paths as shown in
89
Figure 3.1.
Delay measurements on RCRPs are performed at time 0 and at any desired time
point t in the field. Let the estimation error at time 0 and time t be Error|0 and
Error|t, respectively. According to (3.2),
Error|0 = P [P TR (PRP TR )−1PR − I]X|0, (3.3)
Error|t = P [P TR (PRP TR )−1PR − I]X|t, (3.4)
X|t = A|tX|0, (3.5)
where A|t is a m × m diagonal matrix. Each element on the diagonal ai|t is greater
than 1 indicating the aging of segment i at time t compared with time 0. As discussed
in Section 2, the delay difference between RCRPs estimation and the actual delay of
critical reliability paths at time 0 is used to compensate for the systematic error as a
combined effect of non-ideal matrix factorization (when η < 1), process variations, and
the delay difference between the interconnects in the RCRPs and the counterpart in the
critical reliability paths. This is because the systematic error is not likely change over
time due to aging. After Error|0 is used to compensate for the systematic error, the
adjusted error at time t, i.e., adjError|t, can be calculated as
adjError|t = Error|t − Error|0,
= P [P TR (PRP
T
R )
−1PR − I][A|t − I]X|0. (3.6)
Compared with Error|t in (3.4), adjError|t is clearly expected to be smaller as
each diagonal element in matrix [A|t − I] is smaller than 1. For example, if ai|t = 1.2,
which indicates a 20% degradation, ai|t− 1 = 0.2 < 1. Thus, even if the error at time 0
90
is relatively large when only a small number of r RCRPs is allowed due to area budget,
adjusted error at time t can be largely reduced.
Note that the above-discussed error compensation method is based on the assump-
tion that delay of the functional paths at time 0, and hence Error|0, can be measured
during manufacturing test. When only the largest delay is obtained during speed bin-
ning process, the adjusted error will not be the same as the ideal form in Equation (3.6).
However, the adjusted error can still be improved for estimating the largest delay among
the critical reliability paths. In this chapter, we assume that only the largest delay is
obtained at time 0 for the error compensation. Thus, no extra constraint is imposed on
the functional path delay measurement, as the major objective of RCRPs is to estimate
the largest delay on the chip.
In summary, the problem of building r RCRPs for reliability monitoring can
therefore be modeled as finding the optimal r with area budget to minimize the mean
square error (MSE) of adjusted error, especially for the largest delay, under various
workload, temperature, and at different time t. That is:
Minimize: MSE(adjError|t), (3.7)
∀pi, t, workload, temperature,
Subject to: AreaRCRP ≤ Areabudget. (3.8)
3.4 RCRP Synthesis
The RCRPs’ synthesis flow shown in Algorithm 2 can be performed to synthesize the
RCRPs according to (3.7) and (3.8). The complexity mainly comes from the aging-
91
aware delay analysis in line 11, the singular value decomposition in line 4, and the
QR-decomposition in line 6.
As Spice simulation is extremely time-consuming, we developed an in-house tool
for aging-aware path delay analysis at design time. This tool generates a lookup-table
(LUT) to store delay degradation information for each gate type of a standard cell under
various operation conditions. Thus, time-consuming Spice simulation is only run once
for a given technology library. The LUT is then reused to quickly calculate path delay
under aging. As a result, the tool provides greater than 200X speedup compared to
HSPICE and still maintains a high accuracy of over 99%.
Complexity of the singular value decomposition for matrix Pm×n is O(mn2) (if
m > n) and complexity of the QR-decomposition on matrix Ur is O(r
3). In the proposed
method, we do not try to accurately estimate delay for all the paths but instead focus
on tracking the largest delay. Thus, r can be a very small number, i.e., r << m,n.
Therefore, the complexity of the singular value decomposition is the dominant factor in
the algorithm’s complexity, especially for industrial designs that have a huge number of
critical reliability paths.
To reduce the computational overhead for a large pool of critical reliability paths,
we use the “divide-and-conquer” approach. First, we divide the m critical reliability
paths into g groups, each of which has m/g paths. Second, we apply singular value
decomposition and obtain the intermediate RCRPs P
(i)
R for each group i. Last, we
update the matrix P with the intermediate RCRPs and again apply the above flow
with η = 0 to ensure that no additional error is introduced. Thus, the complexity of
92
Algorithm 2 RCRPs synthesis flow.
1: Input: P , Areabudget
2: Output: r, PR
3: Initialize r to a small value
4: while AreaRCRP ≤ Areabudget do
5: Singular value decomposition: [U, S, V ] = svd(P )
6: Use the first r columns in U to form Ur = U(:, 1 : r)
7: QR-decomposition, i.e., [Q,R,K] = qr(Ur)
8: Pn = K
TP
9: Use the first r rows in Pn to form PR = Pn(1 : r, :)
10: Synthesize RCRPs according to PR
11: Obtain AreaRCRP
12: Perform aging-aware delay analysis to evaluate MSE(adjError|t) and
adjError|t
13: if either MSE(adjError|t) or adjError|t gets improved then
14: r = r + 1
15: else
16: Break out of the while loop
17: end if
18: end while
19: Return the optimal r and PR for eventual RCRPs
93
Fig. 3.2: Implementation of an RCRP.
singular value decomposition reduces from O(mn2) to O(n(mg )
2g) = O(nm2/g). The
larger g is, the smaller the complexity will be. It is noteworthy that no matter how
many groups are formed and what the intermediate P
(i)
R s look like, the delay estimation
is only determined by PR of the eventual RCRPs’ matrix.
Once the eventual PR is obtained, RCRPs can be implemented. The basic imple-
mentation is shown in Figure 3.2 for demonstration. Suppose RCRP p′i is built based
on critical reliability path pj . Firstly, segments of pj are duplicated in RCRP p
′
i to
represent the circuit structure of the pj . Secondly, workload of important segments
(importance is determined at design stage according to their contribution to the circuit
overall reliability) can be selectively extracted, so that the stand-alone RCRP p′i can
share the same workload that path pj is experiencing in the field. Minimum-sized buffers
are used to minimize capacitive load effect on the path pj . Workload of relatively less
94
important segments is obtained directly from its previous segment on p′i. Scan flip-flops
are used at the beginning and end of the RCRPs to ensure their testability at time 0.
To determine whether to sample actual workload for a specific segment, we count
among how many functional paths (S) it is shared and set a threshold value STH . If
S ≥ STH , this segment is likely to be important, as its aging affects many critical
reliability paths at the same time. Thus, actual workload can be extracted for RCRP
as shown in Figure 3.2. Otherwise, workload is not extracted for complexity and area
overhead consideration. Note that to avoid impact on the most critical reliability paths,
selective workload is only indirectly sampled from non-critical reliability paths that
share the workload of the important segment. Also note that, as supported by our
results shown in Section 5, the selective workload sampling is not always necessary for
every circuit especially if the simulation and analysis indicates so in the design stage. In
such cases, a signal with duty ratio of, for example, 0.5, can be applied to the on-path
input of the first segment, which will be propagated through the RCRPs. This can give
the RCRPs a reasonable workload to mimic the actual workload that the functional
circuit is experiencing.
Non-controlling values are applied to off-path inputs to simplify workload control-
ling for the RCRPs. This also provides an additional benefit in that the worst-case bias
temperature instability (BTI) stress is applied to the off-path inputs. For example, as
shown in Figure 3.3, a 0 (1) for NOR-gate (NAND-gate) provides the worst-case NBTI
(PBTI) stress on the POS transistor M2 (NMOS transistor M
′
4) for the off-path pin
B, which then exacerbates performance degradation of this NOR-gate (NAND-gate) on
95
Fig. 3.3: Off-path workload analysis.
top of the contribution from on-path pin. This gives a relatively conservative estimation
on the largest delay among the critical reliability paths as a result. It is noteworthy
that, although a 1 (0) for NAND-gate (NOR-gate) gives the minimum NBTI (PBTI)
stress on the POS transistor M ′2 (NMOS transistor M4) for the off-path pin, M ′2 (M4)
does not affect the propagation delay from input A to output Zn.
As shown in Figure 3.2, MUXes are inserted for the segments that have selective
workload sampling to provide three modes of operations: (i) Stress mode (ME=0 and
CE=0), where RCRP pj is stressed under similar workload that functional path p
′
i is
experiencing; (ii) Measurement mode (ME=1 and CE=0), where rise/fall transitions
are applied at the input of pj so that the delay of this RCRP can be measured; and (iii)
Calibration mode (ME=1 and CE=1), where transitions are applied and delay of the
MUXes can be obtained. Based on delay measurements in (ii) and (iii), delay of RCRP
pj can then be accurately measured, thereby aging of all the critical reliability paths
96
can be estimated according to Equation (3.1).
3.5 Results and Analysis
In this section, we evaluate the effectiveness of the proposed method. The simulation is
performed on 45-nm technology using the open-source Nangate library [66]. Aging-aware
LUT is used to save time for HSPICE simulation [34]. Several ISCAS’89 benchmark
circuits are used in the experiments.
The optimal RCRPs are generated using the method discussed in Section 4 ac-
cording to the area budget. We first constrain the area overhead of RCRPs to be 1%
of the design and will later use different area overhead constraints to see the impact
as well. Different workload scenarios and temperatures are applied. Note that process
variations, crosstalk, and power supply noise are not included in the simulations in or-
der to focus on the aging variations. However, as discussed earlier, time 0 measurement
can largely compensate for process variations and impact of interconnects. Besides,
a design margin can be introduced to account for other effects that RCRPs may not
capture. Thus, we use the mean square error (MSE) and the adjusted error to evaluate
the accuracy of the RCRP-based aging evaluation method. MSE is calculated over all
the critical reliability paths, time points of measurement in a 10-year span, and various
temperature and workload conditions, while the adjusted error is obtained by comparing
the actual largest delay among the critical reliability paths with the RCRPs’ estimation
that is adjusted using the time 0 measurement results.
We implement RCRPs for five ISCAS benchmark circuits s5378, s9234, s13207,
97
s15850, and s38417. Different workload and temperature scenarios are generated for
each benchmark circuit. Specifically, workload (WL) = 25%, 50%, and 75% are gen-
erated as three different workload scenarios, where the value of WL is defined as the
percentage of being 1 at the primary inputs of the circuits. Two different temperature
settings are 75oC and 125oC. The goal is that the RCRPs should accurately represent
the aging of the critical reliability paths under any workload and at any temperature.
RCRPs along with the functional circuit are aged for 10 years and measurements are
taken for the RCRPs every 2 years (ti = 0, 2, 4, 6, 8, and 10 years). Note that any other
measurement time step can also be considered. The CPU runtime for building and e-
valuating RCRPs on these benchmark circuits ranges from a couple of minutes to about
50−60 minutes (for s38417) on a Windows computer equipped with a 2-GHz dual-core
processor.
We first disable the selective workload sampling described in Section 4 and look
at the adjusted error adjError|t and MSE at different time points for each benchmark
circuits’ RCRPs. In the basic setting, maximum area budget for the RCRPs is only
allowed to be 1% of the entire circuit.
Table 3.1 shows the area overhead and estimation accuracy for RCRPs in the 5
selected benchmark circuits. m in Column 2 shows the number of critical reliability
paths in the benchmarks. They are obtained by selecting paths whose delay if degraded
by 20% during the course of aging could possibly become the largest delay (as path
delay could degrade as much as 20% over a period of ten years [38] [45]). The optimal r
obtained for RCRPs using the flow in Algorithm 2 is listed in Column 3. The resulting
98
Table 3.1: Area overhead and estimation accuracy of RCRPs when Areabudget = 1%.
Circuit m r Gate Area MSE Range of
Cnt. O/H (%) adjError (%)
s5378 14 1 16 0.6 1.1E-19 -2.0 ∼ 0
s9234 459 1 16 0.7 1.9E-18 0 ∼ 7.0
s13207 28 1 21 0.6 2.6E-19 0 ∼ 4.0
s15850 962 2 55 0.7 1.9E-18 0 ∼ 3.4
s38417 4726 7 116 0.8 5.5E-18 -0.5 ∼ 0.4
gate count and area overhead are shown in Columns 4-5, respectively. MSE and the
percentage range of adjusted error for the largest delay are then calculated and listed
in Columns 6-7.
As can be observed in Table 3.1, the estimation accuracy of RCRPs is not equal
for different designs. RCRPs for smaller designs tend to have larger estimation errors
due to the 1% area constraint. For example, in circuits s5378, s9234, and s13207, the
tight area constraint only allows a maximum r of 1. In other words, only one RCRP
can be implemented to represent all the m critical reliability paths. This largely limits
the effectiveness of the proposed method. However, even in the worst case of s9234,
the adjusted error is within 7% of the actual largest delay and this is done without
selective workload sampling. It can be observed from results shown later that selective
workload sampling can reduce the adjusted error to as low as 3.1%. Besides, as the
adjusted error is always positive under all circumstances, this can be taken into account
99
Fig. 3.4: Actual largest delay vs. RCRP estimation in s9234 when WL=50% and
temperature is 75oC.
when guardbanding and calibrating the circuit. In contrast, more RCRPs can be used
to represent critical reliability paths in the larger circuits, thereby resulting in generally
smaller estimation error. This can be clearly seen from s38417, which allows 7 RCRPs
implemented in the design and hence only leads to −0.5% ∼ 0.4% adjusted error at any
time point and under any work condition. Therefore, RCRPs indeed accurately track
the largest delay among the critical reliability paths.
Note that the MSE and range of adjusted error in Table 3.1 are obtained for
all time points (ti = 0, 2, 4, 6, 8, and 10 years) under various work conditions (WL =
25%, 50%, 75% and temperature = 75oC, 125oC). A specific example of s9234 is also
shown in Figure 3.4, where the workload is WL = 50% and temperature is 75oC.
Actual largest delay among the critical reliability paths (i.e., circuit performance) and
that based on RCRP estimation (adjusted using time 0 measurement results) are both
obtained at different time points in the 10-year span. It can be seen that the RCRPs
can consistently track the largest delay of the critical reliability paths at all time points.
100
Fig. 3.5: Normalized MSE under different area budgets.
The same trend is found in different workload conditions and temperatures as well as
in other benchmark circuits.
Next, we further study RCRPs under various area budget requirements. Three
different area budget requirements are used: 1%, 1.5%, and 2%. In each scenario, the
optimal RCRPs are synthesized using the flow in Algorithm 2.
Normalized MSE for RCRPs under different area budgets is shown in Figure 3.5,
where the values are normalized using MSE when area budget is 1%, and the range
of adjusted error (i.e., the difference between the maximum and the minimum errors)
is shown in Figure 3.6. From the results we can still see the general trend for each
individual circuit that, when more RCRPs are allowed, estimation tends to be more
accurate. It is noteworthy that, in the case of s5378, s9234, and s13207, when budget
increases from 1.5% to 2%, neither MSE nor adjusted error can be improved simply by
implementing one more RCRP. Thus, the optimal RCRPs remain the same for the two
101
Fig. 3.6: Range of adjusted error under different area budgets.
area budget requirements in these cases. In other words, even when area budget raises
to 2%, the area overhead of actual RCRPs is still within 1.5%. Meanwhile, when area
budget increases from 1.5% to 2% for s15850 and s38417, the new optimal RCRPs are
implemented as they improve the MSE quite a bit when the range of adjusted error is
slightly increased. This indicates that the RCRPs can meanwhile have more accurate
estimation on all the critical reliability paths when tracking the largest delay accurately.
To study why RCRPs in s9234 and s13207 behaved differently as compared to
the other 3 benchmark circuits in these scenarios, we analyze how the critical reliability
paths are covered by the RCRPs in these cases. Critical reliability paths coverage here
is defined as the percentage of critical reliability paths that are directly or indirectly
represented by the RCRPs. If there are too few overlaps between the critical reliability
paths and the area budget is tight, there is a good chance that not all critical reliability
paths are actually covered by RCRPs. As shown in Figure 3.7, RCRPs for the other
102
Fig. 3.7: Critical reliability paths coverage under different area budgets.
three benchmark circuits can easily be close to or at 100%, which explains why the
adjusted error and MSE are relatively smaller. However, the critical reliability paths
coverage ratios for s9234 and s13207 are below 80%, making it difficult for accurate
estimation from too few RCRPs. In large designs, we expect to see RCRPs behave
more like it does in s38417 as they can allow more RCRPs implemented to provide
better coverage, hence higher estimation accuracy.
Furthermore, we study how selective workload sampling may improve the esti-
mation error. We take s9234 and s13207 as two examples. As explained in Section 4,
selective workload sampling is implemented for important segments based on different
threshold level STH and workload is only extracted from paths that do not have the
largest delay at time 0. Specifically, even when the delay of the involved functional path
slightly increased due to the workload sampling, it should not affect the largest delay a-
mong all functional paths. We set a 5% threshold for this purpose, i.e., the path delay at
103
time 0 should satisfy: di < Delaymax ∗ 95%. The results are shown in Table 3.2. When
there is no workload sampling, i.e., sampling count in Column 5 is 0, the area overhead
of RCRPs is within 2% of the total area of the design. When STH is 3 (Column 4) in
the case of s9234 for example, actual workload from functional paths can be sampled
for totally just 3 gates on the RCRPs (Column 5). This leads to slight improvement
on the adjusted error shown in Column 6 (reduced from 7.6% to 6.9%). When STH
reduces, more workload can be extracted, thereby further reducing the adjusted error.
When STH = 1, 32 out of 46 gates on the RCRPs can extract actual workload. This
reduces the adjusted error to 3.1%. Yet, as selective workload sampling only takes place
in the less-critical paths, its impact on circuit performance is minimized.
In comparison, we observed that without selective workload sampling, even if 10
more RCRPs are implemented for this design, it only improves the MSE while the ad-
justed error remains at about 7%. However, selective workload sampling will increase
the area overhead by introducing MUXes on the RCRPs as shown in Figure 3.2. Never-
theless, this approach provides an option that can specifically target at better tracking
of the largest delay.
It should be noted that selective workload sampling does not always improve
adjusted error as can be seen from the results of s13207, where adjusted error stays the
same even if 19 gates on the RCRPs are getting actual workload. If this is observed
during RCRP synthesis and analysis at design stage, the designer can simply remove
the unnecessary sampling from the synthesis.
104
Table 3.2: Selective workload sampling for s9234 and s13207.
Circuit r RCRPs’ Gate STH Sampling Range of
Count Count adjError (%)
s9234 3 46
≥ 4 0 7.6
3 3 6.9
2 11 5.0
1 32 3.1
s13207 3 65
≥ 2 0 3.8
1 19 3.8
3.6 Summary
The proposed representative critical reliability paths methodology provides an efficient
in-the-field approach to evaluate aging on the chip throughout the lifetime with no
impact on the functional circuit. Simulation results demonstrate the effectiveness of
this method. Our future work will be directed towards a more comprehensive solution
so that effects such as crosstalk and power supply noise can also be taken into account
for more accurate estimation.
Chapter 4
Clock Skew Reduction
4.1 Introduction
The rapid scaling of CMOS technology has inevitably exacerbated the reliability issues,
e.g. aging effects. Negative bias temperature instability (NBTI) is one of the most
notable aging effects impacting circuit performance and reliability. NBTI is prominent
in POS transistors, shifting the threshold voltage, reducing the drive current, and even-
tually failing the circuit. Circuit level simulations have demonstrated that NBTI can
result in a 10% circuit delay degradation over 10 years of operation [11] [67]. Although
NBTI impact on functional circuits has been extensively studied and various solutions
were proposed, its impact on the clock tree has not been sufficiently examined yet. Pro-
cess variation is another major concern in circuit design, as it deviates the performance
of ICs from the design specification. Without carefully tackling these issues, yield can
be negatively impacted. International technology roadmap for semiconductors (ITRS)
estimated that the delay variability due to process variations would reach 49% in 2010,
63% in 2015, and continues to increase in future technology nodes [6].
Prior research focuses on the impact of capacitive loads, aging or process vari-
105
106
ations separately on clock skew. Some earlier research focused on zero-skew routing
techniques. These techniques manipulate the clock skew, induced by capacitive load
mismatch, using load-matching approaches [68] [69] [70]. Such techniques insert multi-
ple types of control gates into the clock tree to gate the clock paths, which is defined
as clock signal path from the source clock to a sink D flip-flop. However, the clock
skew remained difficult to control due to the different types of logic gates and buffers
used, resulting in capacitive load mismatch. Similarly, process variations cannot be
avoided. Although power consumption may be reduced as an optimization objective,
these techniques fail to completely address the skew problem. For example, the authors
in [71] added extra circuitry to simplify clock control logic in order to reduce the clock
skew. However, their scheme was not applicable to all implementations, and the power
consumption was not optimized without gating off the unnecessary clock switching [72].
More importantly, process variation was not taken into account in these approaches.
The authors in [73] comprehensively analyzed the impact of process variations on the
clock skew. Their results confirmed that process variation is a significant source of clock
skew in deep sub-micron technology nodes. Clock tree routing algorithms have been
proposed in literature to improve skew tolerance to process variations. The authors in
[74] estimated the worst-case skew under process variations, and employed it to guide
the decision-making during the routing procedure. Similar methodologies were pro-
posed in [75] [76]; however few of them paid attention to aging effects, which is another
dominant source for inducing clock skew, especially in smaller technology nodes.
The authors in [77] proposed a solution to reduce clock skew due to NBTI. By
107
calculating the clock skew degradation induced by NBTI, the clock signal on the cor-
responding path was guardbanded with a safety margin. The safety margin is set as
exactly half of the skew degradation, thus the skew was constrained for a lifetime op-
eration. Although process variations were taken into account, this methodology was
limited by its safety margin, which could not exceed half of the skew degradation. The
authors in [78] tried to equalize the signal probability of all clock paths according to
runtime operation, and could achieve a significant reduction of clock skew induced by
NBTI. However, their scheme assumed a relatively ideal condition where the process
variations were not considered. The authors in [79] proposed an optimization algorithm
to select NAND or NOR gate as the output stage of the integrated clock gating (ICG)
cells. This enabled the control of ICG output signal to remain at “0” or “1”, thus
balancing the degradation rate of clock paths and minimize NBTI induced clock skew.
Similarly, the authors did not consider the impact of process variations. In addition,
this method lacked a comprehensive analysis on the fanout condition of clock gating. As
an example, the clock path with the worst degradation (we call it the fastest-degrading
path or the worst path) connected to an ICG cell may be slowed down by fixing the
ICG cell output at “0” through the optimization procedure. However, this may expedite
the degradation of another connected path, causing it to become the worst path. The
method in [80] was essentially an inter-node-control (INC) scheme, which balanced the
degradation rate of the clock paths by carefully manipulating the stress probabilities to
the critical POS transistors. However, process variations were not considered, and that
all control enable signals must be determined a priori.
108
In summary, there is little work to address the clock skew problem considering
both the process variations and aging. Clock skew optimization considering aging is
usually conducted separately on the assumption that the timing properties of the clock
tree is already optimized considering the process variations. Consequently, margins
based on process variations and aging are added up. This will over-pessimistically
guardband the clock signal and lose performance excessively. Furthermore, most of the
research attempt to optimize the clock skew through INC or similar methodologies by
balancing the degradation rates of different clock paths. Therefore, the introduced area
and power consumption of the control circuitry must be carefully considered to avoid
large overhead. Meanwhile, it must be ensured that the INC schemes do not expedite
the degradation of the slow-degrading paths to make a worst path.
In this chapter, delay degradation of the clock buffers with different initial thresh-
old voltages are accurately evaluated using the long-term NBTI predictive technology
model (PTM), considering the impact of signal stress probabilities, and the stress and
recovery mechanisms [11] [67]. A sub-library embracing the clock buffers in the standard
cell library is characterized for multi-Vth consideration. We propose a novel clock skew
reduction flow considering both NBTI and process variations. Our flow is a synthesis
framework using the high-Vth buffers to replace the identified standard-Vth counterparts.
Although the leakage power is not our optimization objective, our flow is effective in
reducing the leakage power as well, since a high-Vth buffer has a smaller leakage cur-
rent. We develop an extended “divided and conquer” algorithm to identify the clock
buffers for replacement, using the static timing analysis information of the clock tree,
109
and calculating the degradation information with the long-term NBTI model. To take
process variations into account, we use the existing mathematical models to evaluate
the mean and standard deviation of the delay of each high-Vth buffer. Differences from
the mean delay and standard deviation of the original standard-Vth buffer will guide the
decision-making to ensure that swapping buffers will effectively reduce the clock skew
without violating the design specifications. Different from existing clock skew reduction
methods, our flow does not change the clock tree connection and topology, inserting no
extra control circuitry. More importantly, our flow does not aggravate the degradation
rate of any clock path. In addition, our synthesis flow can work effectively without
interrupting the current conventional design flow.
The rest of the chapter is organized as follows. Section 4.2 presents the NBTI
model, and the statistical prediction of NBTI under process variations will be derived
from transistor-level to path-level. Section 4.3 describes the motivation behind our flow.
Section 4.4 presents the details of our synthesis flow for clock skew reduction. Section 4.5
shows results on several benchmark circuits, and Section 4.6 summarizes this chapter.
4.2 NBTI Effect and Process Variations Characterization
This section presents NBTI model and its characterization under process variations.
The models will be extensively studied to characterize the clock buffers considering
different threshold voltages.
110
4.2.1 NBTI Effect Modeling
NBTI effect is commonly acknowledged as a result of the generation of interface charges
at the Si−SiO2 boundary, and is analytically interpreted using the reaction diffusion
(R-D) model [81] [82]. Consequently, NBTI is highly dependent on the bias condition.
A forward bias (Vgs = −Vdd, where Vdd is the supply voltage) accelerates the NBTI
effect, which leads to an increase in the POS threshold voltage. However, a reverse
bias (Vgs = 0) recovers NBTI effect partially, and decreases the absolute value of the
threshold voltage. PTM model [67] captures the behavior of the alternate stress and
recovery phases of the degradation of POS transistor. The corresponding long-term
compact model is formulated as:
|∆Vth(t)| = (
√
K2vαTclk
1− β1/2φt
)2φ (4.1)
βt = 1− 2ξ1te +
√
ξ2C(1− α)Tclk
2tox +
√
Ct
(4.2)
Kv=(
qtox
ox
)3K2Cox(Vgs − Vth)
√
Cexp(
2Eox
Eo
) (4.3)
where α is the duty ratio, and Tclk is the clock period. The other parameters are
physical related constants. In the model, the random stress signal is converted into a
rectangular waveform with duty ratio α, and |∆Vth(t)| is a tight upper bound of the long-
term degradation of the POS transistor [11] [67]. Thus, any long-term NBTI analysis
can be successfully conducted using a deterministic duty ratio α obtained from the
actual random stress signal. Additionally, Equation (4.1) demonstrates that |∆Vth(t)|
exponentially depends on initial threshold voltage Vth. It also manifests that a lower
|Vth| POS transistor has a faster degradation rate, and thus a larger ∆Vth(t) increase
111
0
0.5
1
1.5
2
2.5
3
x 10 8
0.1
0.2
0.3
0.4
0.5
1
1.2
1.4
1.6
1.8
Stress Time t (s)
|V   |th
No
rm
ali
ze
d 
De
lay
Fig. 4.1: Normalized delay of a minimum-sized clock buffer versus initial threshold
voltage Vth and stress time t (α = 0.5, and temperature T = 125
oC).
[67]. Alpha-power law [85] provides an analytical model between Vth, ∆Vth(t) and gate-
level delay Di as:
Di =
CiVdd
βi[Vdd − (Vth + ∆Vth(t))]γ (4.4)
where Ci is the effective capacitive load connected to the gate i, and βi is a parameter
depending on the gate size. We simulate clock buffers from the 45nm Nangate Cell
Library using the PTM model and PTM transistor parameter values [84]. Our results
on a minimum-sized buffer shown in Figure 4.1 illustrate that buffers with lower initial
Vth tend to degrade faster, while those with a higher initial Vth have a slower degradation
rate. Such an argument can be made for other gate types as well [83] [86].
112
4.2.2 Statistical Prediction of NBTI Effect under Process Variations
In this section, we will propose the circuit-level model for path and clock-tree delay anal-
ysis. Delays and degradations of paths, when their cell components have different initial
threshold voltages, will be compared and the difference will be modeled. Additionally,
clock skew will finally be formulated.
Using the first-order Taylor Expansion of 1/(1 − x)α ≈ 1 + αx (when |x|  1),
Equation (4.4) can be reformulated as:
Di = Di0 × (1 + Sti ×∆Vth(t)) (4.5)
Di0 = CiVdd/[βi × (Vdd − Vth)γ ] (4.6)
Sti =
γ
(Vdd − Vth) (4.7)
where Di0 is the gate delay without process variations and NBTI degradation, with
∆Vth(0) = 0. From Equation (4.5), we can see that Sti × ∆Vth(t) decreases with the
increase in the initial threshold voltage Vth, demonstrating that gates with lower initial
Vth tend to degrade faster, while those with higher initial Vth have slower degradation
rate.
To take process variations into account, the mean value µDi(t) and standard
deviation σDi(t) of the gate delay at time t can be formulated as [86]:
µDi(t) = µDi(0)× (1 +AStitγ) (4.8)
σDi(t) = σDi(0)× (1−ASvtγ) (4.9)
µDi(0) = Di0 (4.10)
σDi(0) = Di0Stiσpvi (4.11)
113
where A and Sv are positive values, whose values are dependent on the technology
features and remain as constants under the specific operating condition.
Since standard deviation of the delay is always a positive value, we can see that
0 ≤ 1−ASvtn ≤ 1. Comparing Equation (4.8) with Equation (4.5), it satisfies that
ASti decreases when Vth increases. Equations (4.8) through (4.11) formulate the gate
delay considering NBTI degradation under process variations. By treating a path as
a concatenation of K gates on it, the mean value and standard deviation of the path
delay can be formulated as:
µ(0) = τ0 (4.12)
σ(0) = τ0Stiσpv (4.13)
σpv =
√√√√√√√
K∑
i=1
σpvi +
∑∑
1≤i≤K
1≤j≤K
i 6=j
2ρpvijσpviσpvj (4.14)
µ(t) = µ(0)(1 +AStit
γ) (4.15)
σ(t) = σ(0)(1−ASvtγ) (4.16)
where µ(0) (= τ0) is mean value of the path delay at time 0 without NBTI degradation;
σ(0) is the corresponding standard deviation; µ(t) and σ(t) are the mean delay and
standard deviation at time t; σpvi is the standard deviation of the delay of gate i; ρpvij
is the correlation between ∆Vth pvi and ∆Vth pvj .
Equations (4.12) through (4.16) formulate the mean value and standard deviation
of the path delay considering NBTI degradation under process variations. Let us assume
that, at time t, the mean value and standard deviation of the delay of paths m and n
are (µm(t), σm(t)) and (µn(t), σn(t)) respectively. If they have the correlation ρmn, the
114
delay difference between them will satisfy a distribution having the mean value ∆µ(t)
and standard deviation ∆σ(t), which can be formulated as:
|∆µ(t)| = |µm(t)− µn(t)| (4.17)
∆σ(t) =
√
σ2m(t) + σ
2
n(t)− 2ρmnσm(t)σn(t) (4.18)
From Equations (4.17) and (4.18), the delay difference ∆τ between paths m and
n (at time t) is upper-bounded as:
∆τ ≤ |∆µ(t)|+ |∆σ(t)|
= |µm(t)− µn(t)|
+|
√
σ2m(t) + σ
2
n(t)− 2ρmnσm(t)σn(t)|
= |µm(t)− µn(t)|
+|
√
σ2m(0) + σ
2
n(0)− 2ρmnσm(0)σn(0)
×(1−ASvtγ)|
≤ |µm(0)× (1 +AStimtγ)− µn(0)× (1 +AStimtγ)|
+|
√
σ2m(0) + σ
2
n(0)− 2ρmnσm(0)σn(0)|
≤ |µm(0)− µn(0)
+tγ × [AStim · µm(0)−AStin · µn(0)]|
+|
√
σ2m(0) + σ
2
n(0)− 2ρmnσm(0)σn(0)| (4.19)
Thus, we can conclude that the delay difference between two paths can be upper-
bounded mathematically considering both NBTI degradation and process variations.
For a clock tree consisting of a large number of paths, Max(∆τi,j), i 6= j, actually
115
evaluates the skew of the clock path delays with both NBTI and process variations
considered.
4.3 Motivation for Controlling on Clock Path Degradation and Rate
using Multi-Vth Clock Buffers
From Equation (4.15), it is clear that ASti evaluates the degradation sensitivity of the
mean path delay considering NBTI effect. Since there are multiple buffers on a clock
path, ASti relates to the threshold voltages of all the buffers. Clock path delay is defined
as the delay from the source clock to the sink D flip-flop. Assuming that clock paths m
and n have same aging condition and structure, some buffers on path m are replaced
with their high-Vth counterparts. From Equation (4.15), we can see that:
µm(0) > µn(0) (4.20)
AStim < AStin (4.21)
If the length of path m is sufficiently long, we can select multiple buffers to replace
with their high-Vth counterparts, which could accumulatively decrease the degradation
rate and successfully lead to AStim · µm(0) < AStin · µn(0), where µm(0) and µn(0) are
constants. By substituting it back to Equation (4.19), we can get:
∆τ ≤ |µm(0)− µn(0)|
−tγ × |AStim · µm(0)−AStin · µn(0)|
+|
√
σ2m(0) + σ
2
n(0)− 2ρmnσm(0)σn(0)| (4.22)
It can be seen from Equation (4.22) that ∆τ will decrease with time, as we replace
116
Path 2
Path 1
delay
Path 1
Path 2
Path 2
Path 1
delay
Path 1
Path 2
0 tlife t 0 tlife
Low-Vth cell unit High-Vth cell unit
t
(a) (b)
Fig. 4.2: Path delay and degradation comparison: (a) before replacement; (b) after
replacement.
sufficient number of buffers with their high-Vth counterparts on a path. It also indicates
that by replacing multiple buffers on a path with high-Vth buffers, we can adjust its
degradation rate with respect to another path. Figure 4.2 shows a simple illustrative
example of two paths. By replacing two gates with their high-Vth counterparts, path 1
will have a higher initial delay at time 0; however, its degradation rate is decreased.
We can take advantage of this feature to reduce the clock skew induced by NBTI.
However, we must also pay attention to the shift of clock path delay introduced by the
buffer replacement. From Equation (4.22), gate replacement will lead to a new path
117
delay with an upper bound as:
τm(t) = τn(t) + ∆τ
≤ τn(t) + |µm(0)− µn(0)|
−tγ × |AStim · µm(0)−AStin · µn(0)|
+|
√
σ2m(0) + σ
2
n(0)− 2ρmnσm(0)σn(0)| (4.23)
If we compare the delays before and after replacement on the same path, obviously
ρmn = 1. Thus, the upper bound of the path delay after replacement will be:
τm(t) = τn(t) + ∆τ
≤ τn(t) + |µm(t)− µn(t)|
+|
√
σ2m(0) + σ
2
n(0)− 2× 1× σm(0)σn(0)|
= τn(t) + |µm(t)− µn(t)|+ |σm(0)− σn(0)| (4.24)
If we have a timing constraint on the clock path delay: τn ≤ τ , the new clock
path delay after replacement should satisfy:
τm(t) ≤ τ −∆τ (4.25)
Otherwise, the replacement will negatively impact manufacturing yield. Now, if
we extend Equation (4.15) to include the process variations, it can be rewritten as:
µpv(t) = µpv(0)(1 +ASpv tit
γ) (4.26)
Similarly, ASpv ti will decrease if some buffers on the clock path are replaced with
their high-Vth counterparts. Assuming that the comprehensive timing analysis suggests
118
that the worst delay value will become Γ at time t, a safety margin ∆Γ has to be assigned
to the circuit at design time. Combined with the above analysis, the following constraint
must be satisfied for a new replacement so as not to violate the design specification:
< =

1 τm(0) ≤ Γ−∆Γ−∆τ,
0 Otherwise;
(4.27)
where τm(0) is the path delay at time 0 under nominal condition without process vari-
ations and NBTI impact. Here, we use the < value as a flag to indicate whether a
replacement is feasible when < = 1.
In summary, we can see that a new buffer replacement is optimal in reducing the
clock skew only if the following requirements are satisfied:
τafter(0) ≤ Γ−∆Γ−∆τ
SKafter < SKinit
(4.28)
where SKinit (SKafter) is the clock skew before (after) buffer replacement; τafter(0) is
the clock path delay at time 0 after replacement.
4.4 Clock Skew Reduction Flow
Our proposed synthesis flow for clock skew reduction is shown in Figure 4.3.
We use the PTM model to characterize the delay and degradation of the buffers in
the standard cell library considering different Vth values. Thus, a multi-Vth clock buffer
library can be characterized. In the physical design stage, clock tree structure can be
obtained. Other NBTI-related parametric information can be collected using either
commercial [34] or our in-house tools [87] [88]. Through an aging-aware timing analysis
119
Ci
rc
ui
t S
yn
th
es
is
Ph
ys
ic
al
 D
es
ig
n
R
TL
 N
et
lis
t
U
pd
at
ed
 C
lo
ck
 T
re
e
C
lo
ck
 B
uf
fe
r 
R
ep
la
ce
m
en
t F
lo
w
Ti
m
in
g 
an
d 
A
gi
ng
 
A
na
ly
si
s 
C
on
si
de
rin
g
C
lo
ck
 T
re
e 
Pa
th
s
Ex
tra
ct
ed
 P
at
h 
In
fo
rm
at
io
n
Te
ch
no
lo
gy
 L
ib
ra
ry
45
nm
 N
an
ga
te
 L
ib
ra
ry
M
ul
ti-
Vt
h 
C
lo
ck
 
B
uf
fe
r L
ib
ra
ry
PT
M
 4
5n
m
M
od
el
Pr
oc
es
s 
Va
ria
tio
ns
F
ig
.
4
.3
:
C
lo
ck
tr
ee
sy
n
th
es
is
fl
ow
fo
r
sk
ew
re
d
u
ct
io
n
.
120
using the multi-Vth clock buffer library, the delay and degradation for the buffers and
the clock paths are derived. Our clock buffer replacement algorithm make use of these
information to identify the critical clock buffers (with standard-Vth) for replacement.
The implementation of our flow will help reduce the clock skew with considering NBTI
impact and process variations for a specified lifetime operation. The key step in our
flow, namely clock buffer replacement, is described in Algorithm 3.
In the algorithm, clock paths are separated into sets by analyzing the buffer
overlap condition of the paths; paths in the same set will have high number of overlapped
buffers. Aging information and lifetime specification for analysis are obtained from lines
1 to 5. We replace the non-overlapped buffers inside the current set to reduce clock skew
from lines 10 to 15 while we select the overlapped buffers among sets from lines 16 to
23. Together, we identify all the necessary critical buffers to reduce the skew of the
clock tree from lines 7 to 24. The computational complexity mainly comes from the
optimality analysis for each clock buffer from lines 10 to 22. The overall computational
complexity is O(NMK2), where N is the number of path sets, M is the average path
count in a path set, and K is the average buffer number on a clock path.
For a large-scale industry design, there would be an extremely large number of
clock paths for analysis. To reduce the CPU runtime, one can use a “divide and conquer”
approach. First, we divide the X clock paths into G groups, each of which has M/G
clock paths. Second, we apply Algorithm 3 to each group. Accordingly, number of
paths in each set decreases proportionally to N/G. Last, we apply Algorithm 3 to the
groups to ensure any further skew optimization could be obtained. Thus, the complexity
121
Algorithm 3 Clock Buffer Replacement
1: Extract X clock paths with L buffers from the clock tree
2: Obtain STA information of the X clock paths
3: Collect aging information of the X clock paths
4: Specify ∆τ considering NBTI effect and process variations
5: Specify stress time t for NBTI aging analysis
6: Separate all the clock paths into N sets through buffer overlap analysis
7: for (All sets n from 1 to N) do
8: for All paths m from 1 to M in current set do
9: Select a clock buffer k from 1 to K on current path m
10: if Buffer k is a non-overlap buffer by all the paths in set m then
11: Evaluate the replacement using Equation (4.28) in current set
12: if It is an optimal replacement then
13: Replace the current buffer
14: end if
15: Update the timing information of the paths in current set n
16: else if Buffer k is an overlapped buffer by all the paths in set n then
17: Evaluate the optimality using Equation (4.28)
18: if The replacement decreases the clock skew among buffer sets then
122
19: Replace the current buffer
20: Update the timing information of current buffer set
21: end if
22: end if
23: end for
24: end for
reduces from O(NMK2) to O(NMK2 × 1
G2
), which is highly dependent on G. If we
divide the overall X clock paths into a sufficiently large number of groups, the overall
computational complexity will be substantially reduced.
4.5 Experimental Results
In this section, we evaluate the effectiveness of our proposed flow. Five circuits from
IWLS 2005 benchmark suite are considered in our simulation. All the simulations are
performed using open-source Nangate standard cell library with 45nm technology node
[66]. PTM model and transistor parameter values [84] [86] are used to characterize
the delay and degradation of the clock buffers in the cell library considering two Vth
values. Compared to a standard threshold voltage of 0.28V by default, we set 0.45V
as its corresponding high-Vth threshold voltage. The benchmark circuits are processed
through the conventional ASIC design flow using Synopsys [34] and our in-house tool-
s [87] [88] to extract the NBTI-related circuit information, e.g. clock tree structure,
stress probabilities, etc.
Equation (4.28) explains that our flow reduces the clock skew by taking advantage
123
20
25
30
35
40
45
50
55
0
10
0
20
0
30
0
40
0
50
0
60
0
No
rm
ali
ze
d 
Cl
oc
k P
at
h 
De
lay
Occurance Frequency
(a
)
C
lo
ck
p
a
th
d
el
a
y
d
is
tr
ib
u
ti
o
n
w
h
en
t
=
0
b
ef
o
re
b
u
ff
er
re
p
la
ce
m
en
t
30
35
40
45
50
55
60
65
70
0
10
0
20
0
30
0
40
0
50
0
60
0
No
rm
al
ize
d 
Cl
oc
k P
at
h 
De
lay
Occurance Frequency (b
)
C
lo
ck
p
a
th
d
el
a
y
d
is
tr
ib
u
ti
o
n
w
h
en
t
=
1
0
y
ea
rs
b
ef
o
re
b
u
ff
er
re
p
la
ce
m
en
t
25
30
35
40
45
50
55
0
10
0
20
0
30
0
40
0
No
rm
ali
ze
d 
Cl
oc
k P
at
h 
De
lay
Occurance Frequency
(c
)
C
lo
ck
p
a
th
d
el
a
y
d
is
tr
ib
u
ti
o
n
w
h
en
t
=
0
a
ft
er
b
u
ff
er
re
p
la
ce
m
en
t
35
40
45
50
55
60
65
70
0
10
0
20
0
30
0
40
0
No
rm
ali
ze
d 
Cl
oc
k P
at
h 
De
lay
Occurance Frequency
(d
)
C
lo
ck
p
a
th
d
el
a
y
d
is
tr
ib
u
ti
o
n
w
h
en
t
=
1
0
y
ea
rs
a
ft
er
b
u
ff
er
re
p
la
ce
m
en
t
F
ig
.
4
.4
:
C
lo
ck
sk
ew
re
d
u
ct
io
n
fo
r
et
h
er
n
et
b
en
ch
m
ar
k
ci
rc
u
it
co
n
si
d
er
in
g
N
B
T
I
an
d
p
ro
ce
ss
va
ri
at
io
n
(T
=
12
5o
C
).
124
of a previously specified safety margin value ∆Γ. To avoid specifying an unnecessary
safety margin, we modify the constraints in Equation (4.28) as follows:
τafter(0) ≤ Γworst(0)−∆τ
SKafter < SKinit
(4.29)
where Γworst(0) is the largest clock path delay without process variations and NBTI
degradation at time 0. Compared with Equation (4.28), Equation (4.29) has a tighter
constraint: only clock paths with a delay less than Γworst(0) will be selected for the
analysis on skew reduction. Results demonstrate that our flow still works efficiently
under such an extremely constrained condition.
Figure 4.4 depicts the results on benchmark circuit ethernet. The clock path
delay distributions at time 0 and 10 years are presented as a comparison before and after
buffer replacement (T = 125oC). Note that all the delay values in our presented results
are normalized for the comparison purpose. The clock tree structure extracted from
ethernet is small in size and the clock paths are short in length (with an average length
of K = 8 buffers on the clock paths). In total, X = 1748 clock paths with L = 2524
clock buffers are selected. Our flow identifies 11 buffers for replacement due to the short
length and small size of the clock tree. Comparing Figures 4.4(a) with 4.4(c), the clock
skew is reduced from 25.86 unit delay to 21.83 unit delay with a reduction of 15.58% at
time 0. While considering 10 years of NBTI degradation, the clock skew is reduced from
34.40 unit delay in Figure 4.4(b) to 29.04 unit delay in Figure 4.4(d) with a reduction of
15.58% as well. Figure 4.5 presents the results on benchmark circuit ethernet under five
temperature scenarios: 25oC, 50oC, 75oC, 100oC, and 125oC. After buffer replacement,
125
25 50 75 100 12515.57
15.58
15.59
15.6
Temperature ( oC)
Sk
ew
 Re
du
ct
io
n 
(%
)
Fig. 4.5: Clock skew reduction for benchmark circuit ethernet under different temper-
ature scenarios.
the average clock skew reduction considering 10 years of NBTI degradation are plotted
in the figure, which demonstrates that our methodology can effectively reduce clock
skew under different temperature scenarios, i.e., benchmark circuit ethernet can obtain
around 15% skew reduction processed by our flow.
We also implement our flow on four other IWLS 2005 benchmark circuits des perf ,
vga lcd, netcard, and leon3mp. We compare the skew before and after buffer replace-
ment at 4 different stress time instants: 0, 3, 6, and 10 years with the results shown in
Table 4.5 under temperature T = 125oC. Under the categories “Before Replacement”
and “After Replacement”, the clock skews corresponding to different stress times are
listed. Correspondingly, the skew reduction and their average values are listed under
the category “Skew Reduction (%)”. Each row corresponds to one benchmark circuit.
In the last row, the average skew reduction on these five benchmark circuits is calcu-
126
lated. Table 4.5 shows that our flow can effectively reduce the clock skew by around
20% on average under an over-constrained condition, considering 4 stress time points
on these five benchmark circuits. Note that implementing our flow on large-scale in-
dustry designs with the constraint in Equation (4.28) should result in a much higher
skew reduction, since all the clock paths will be equally analyzed, and all buffers will
be evaluated for selection to further reduce skew.
We further analyze the relationship between the number of identified buffers and
the skew reduction in Table 4.2, where Columns 2 and 3 list the number of clock paths
X and clock buffers L in each circuit. The average path length K for each benchmark
circuit is listed in Column 4. The number of buffers identified for replacement and their
corresponding ratio to the overall buffer count are listed in Columns 6 and 7. The last
column shows the skew reduction for each circuit using our flow. Note that this value is
an average of two skew reduction values at time 0 and 10 years. We also calculate the
“Overlap Frequency (F )” for each circuit in Column 5 as:
F =
X ×K
L
(4.30)
where X is the total number of clock paths, K is the average path length, and L is the
total number of buffers.
In Equation (4.30), the numerator calculates the necessary non-overlapped buffer
count to construct the clock paths for a circuit, assuming that no two paths go through
a same clock buffer. Thus, Equation (4.30) evaluates the average number of paths that
go through a same buffer. It also indicates the average overlapped buffer count on each
path. As discussed in Section 4.4, only replacing a non-overlapped buffer may reduce
127
T
a
b
le
4
.1
:
C
lo
ck
sk
ew
co
m
p
ar
is
on
b
ef
or
e
an
d
af
te
r
b
u
ff
er
re
p
la
ce
m
en
t
at
d
iff
er
en
t
st
re
ss
ti
m
e
(T
=
12
5o
C
)
B
en
ch
.
B
ef
o
re
R
ep
la
ce
m
en
t
A
ft
er
R
ep
la
ce
m
en
t
S
ke
w
R
ed
u
ct
io
n
(%
)
C
ir
cu
it
t
=
0
y
rs
t
=
3
y
rs
t
=
6
y
rs
t
=
10
y
rs
t
=
0
y
rs
t
=
3
y
rs
t
=
6
y
rs
t
=
10
y
rs
t
=
0
y
rs
t
=
3
y
rs
t
=
6
y
rs
t
=
10
y
rs
A
ve
ra
ge
d
es
p
er
f
1
4.
37
18
.2
6
18
.7
3
19
.1
2
9.
20
13
.4
0
14
.1
3
14
.7
2
35
.9
8
26
.6
2
24
.5
6
23
.0
1
27
.5
4
et
h
er
n
et
2
5.
86
32
.8
5
33
.7
1
34
.4
0
21
.8
3
27
.7
3
28
.4
5
29
.0
4
15
.5
8
15
.5
9
15
.6
0
15
.5
8
15
.5
9
v
g
a
lc
d
4
4.
60
56
.6
7
58
.1
4
59
.3
4
35
.4
2
46
.7
5
49
.7
2
50
.9
6
20
.5
8
17
.5
0
16
.2
1
14
.1
2
17
.1
0
n
et
ca
rd
4
0.
33
51
.2
3
52
.5
7
53
.6
5
27
.6
1
38
.6
0
41
.7
1
42
.8
9
31
.5
4
24
.6
5
20
.6
6
20
.0
6
24
.2
3
le
o
n
3m
p
4
3.
56
55
.3
4
56
.7
9
57
.9
5
37
.3
4
47
.4
4
48
.6
7
49
.6
8
14
.2
8
14
.2
8
14
.9
6
14
.2
7
14
.4
5
A
ve
ra
ge
S
ke
w
R
ed
u
ct
io
n
(%
)
≈
20
128
T
a
b
le
4
.2
:
C
lo
ck
sk
ew
re
d
u
ct
io
n
an
d
b
u
ff
er
re
p
la
ce
m
en
t
an
al
y
si
s
(T
=
12
5o
C
,
t
=
10
y
ea
rs
)
B
en
ch
.
#
P
at
h
s
#
B
u
ff
er
s
A
ve
.
P
at
h
O
ve
rl
ap
B
u
ff
er
R
ep
la
ce
d
S
ke
w
C
ir
cu
it
(X
)
(L
)
L
en
gt
h
(K
)
F
re
q
.
(F
)
#
B
u
ff
er
s
%
R
ed
u
ct
io
n
(%
)
d
es
p
er
f
33
0
83
6
6
2
.3
9
17
2
.0
3
29
.5
0
et
h
er
n
et
17
4
8
25
24
8
5
.5
4
11
0
.4
4
15
.5
8
v
ga
lc
d
30
6
8
60
02
15
7
.6
6
88
1
.4
7
17
.3
5
n
et
ca
rd
54
5
5
69
00
10
7
.9
1
82
1
.1
9
25
.8
0
le
o
n
3
m
p
79
5
5
93
80
9
7
.6
3
25
0
.2
7
14
.2
8
A
ve
ra
ge
1
.0
8
≈
20
129
the delay difference between two paths. We can conclude that a higher F value indicates
that fewer buffers are available for replacement, thus resulting in less skew reduction.
For example, benchmark circuit des perf has an average path length of 6 with an F of
2.39. On average, for each path in des perf , it has more than 50% (1 − 2.396 ≥ 50%)
of its buffers available for replacement. However, for benchmark circuit vga lcd, it has
less than 50% (1 − 7.6615 ≤ 50%) of its buffers appropriate for replacement. Intuitively,
the more buffers are available for replacement, the more clock skew reduction we can
obtain. Also from our results, we find that the skew reduction is also highly dependent
on the clock tree structure and clock path count. For example, to replace 88 out of 6002
buffers on the 3068 clock paths for benchmark circuit vga lcd, we can obtain around
17.35% skew reduction considering 10 years of degradation. While a replacement of
17 out of 836 buffers on 330 paths for benchmark circuit des perf can obtain 29.50%
skew reduction. Furthermore, benchmark circuits netcard and vga lcd have relatively
larger path lengths (K = 10, and 15) compared with the path lengths (K = 8, and
9) of ethernet and leon3mp. The skew reductions on vga lcd and netcard are thus
larger since more buffers can be selected for replacement, which renders our flow more
favorable for large-scale industry designs with larger-length clock paths. Summarized
from the results in Table 4.2, our flow replaces only 1.08% of clock buffers and can
obtain around 20% skew reduction under our over-constrained condition. One should
also note that if an excessive number of buffers on the clock paths are replaced with the
high-Vth buffers, the clock path delay may increase heavily more than the original one,
deteriorating the speed of the circuit. In our flow, we also constrain clock path delay
130
when reducing the skew as described in Equations (4.28) and (4.29).
In the replacement procedure, no additional transistors or gates are required.
Thus, our flow incurs almost no area-overhead to the circuits. Another dominant concern
in CMOS digital circuits is leakage current. As formulated in [89], this component
exponentially decreases with the increase of the threshold voltage:
ILeak,k ∝ e
q
nΛT
(Vdd−γ‘1Vs−γ‘2VDS−Vth) (4.31)
where q, Λ, Vs, γ
‘
1, and γ
‘
1 are the process or physical related constants. Considering
the total leakage power PLeak =
∑
k ILeak,kVdd (k is the gate index), the replacement
approach in our flow decreases the leakage power as well. The amount of leakage power
reduction is proportional to the number of replaced buffers. Since our major optimiza-
tion objective is to reduce the clock skew while maintaining the clock path delay within
a range not violating the design specification, the buffers for replacement is optimized in
the least number. However, our flow is still effective in reducing the leakage power. By
combining the leakage power reduction into the optimization objective, we can replace
more clock buffers to satisfy both skew and leakage power reduction. This is outside
the scope of this research.
4.6 Summary
In this chapter, we propose an effective synthesis flow for clock skew reduction con-
sidering NBTI and process variations. Our flow takes advantage of the original safety
margin, specified in conventional ASIC design flow. This margin is usually excessively
large, that leads to significant performance loss. Our algorithm uses this margin to re-
131
duce the skew by identifying the critical clock buffers in the clock tree. These identified
buffers are replaced by their high-Vth counterparts. Results show that our flow obtain-
s high efficiency even under an extremely constrained condition. By introducing the
“divide and conquer” approach, our flow is applicable to large-scale industry designs.
Chapter 5
Test-Cost Reduction Considering Silicon Variations
5.1 Introduction
Aggressive scaling of CMOS technology into ≤ 32nm feature nodes increases circuit
frequency to speeds of multi-GHz and beyond [6]. This is making circuit performance
more vulnerable to manufacturing imperfections. Unavoidable process variations intro-
duced during manufacturing process can deviate performance considerably from design
specifications, increase test cost, and potentially increase yield loss and escape [90] [91].
In current design and test flows, timing analysis becomes increasingly important, re-
quiring high-level of accuracy and efficiency. An accurate timing analysis reduces design
margins and identifies delay faults, and allows an accurate ranking of paths using timing
information, which will reduce test pattern count and test cost as well as improve test
quality [92] [93].
Modern static timing analysis (STA) tools have been commonly used in design
and test flow [94]. However, STAs are not capable of considering timing variability
caused by process variations, since all physical parameters are evaluated at nominal
values. To improve accuracy without neglecting parametric corner cases, even though
132
133
they have a low probability of occurrence, statistical static timing analysis (SSTA)
has been proposed as an alternative solution [95]. Different from STA, which models
gate or path delay as a deterministic value, SSTA represents the delay as a probability
distribution, represented by a mean and a variance. However, using SSTA, which uses
one foundry process data for all different chips, leads to over-pessimistic evaluation of
the delay deviation in all corner cases. Consequently, much performance is left on the
table due to over-estimated margin values.
Path-delay fault (PDF) tests exercise the critical paths at-speed to detect whether
the path is slow because of manufacturing defect or variations [96]. Path delay modeling
and PDF identification are difficult tasks due to process variations, and the correspond-
ing PDF patterns are, in turn, difficult to generate. This is because actual silicon
variations on dies are not identical to what was modeled at pre-silicon stage [97]. STA-
based path delay modeling and PDF pattern generation may miss some critical paths.
Thus a much larger number of paths must be targeted to guarantee high coverage of
the actual silicon critical PDFs. Moreover, SSTA-based path delay modeling and PDF
pattern generation may lead to selection of an extremely large number of critical paths
for testing and impose extra overhead to the total test cost.
In [98], a three-phase decision diagram-based approach was proposed to identify
the set of testable critical PDFs for a given circuit. However, it is an STA-based method
and is not practical when considering process variations. The authors in [99] proposed
a path selection method for delay testing and timing validation that assumes variations
in gate delay to be mutually independent. This method yields low accuracy because
134
process variations affect gates differently and a path delay accumulates impacts from
each gate. In [100], authors used deterministic timing information to evaluate testable
paths. This method fails to identify paths that considered the statistical feature of the
path delay due to process variations. The authors in [28] proposed a false-path-aware
SSTA to select critical paths for delay test. Their method adopted the worst-case slacks
for identifying critical nodes. However, using the worst-case slacks may lead to an
extremely large number of critical paths due to the SSTA analysis. In [101], the authors
compared gate node criticalities to deterministically select the longest paths traversing
the most statistically critical nodes. However, this method still relies on the statistical
timing information to guarantee high test quality.
In this work, we develop a novel silicon-variation-aware (SVA) PDF pattern gen-
eration flow. In SVA, we rank critical paths for detecting PDFs under the impact of
actual silicon variations. Unlike using timing information from STA or SSTA, we make
use of post-silicon measurements on the ROs placed on the chip to estimate the actual
variation of paths delay, rendering it more practical to implement. This is enabled by
the correlation analysis between basic gate types in the standard cell library, paths, and
ROs. ROs on a selected number of dies are measured to extract variation information.
Our results on critical paths extracted from benchmark circuits verify that SVA is more
accurate than STA in conventional PDF test flow, and faster and more accurate than
SSTA. SVA results in generation of fewer critical paths, hence fewer PDF patterns, to
verify circuit performance.
The rest of this chapter is organized as follows: Section 5.2 provides the delay
135
modeling and analysis considering process variations. Section 5.3 presents our proposed
delay estimation approach. Section 5.4 presents the SVA PDF pattern generation flow.
Section 5.5 provides our experimental results. Finally, Section 5.6 summarizes the chap-
ter.
5.2 Delay Analysis considering Process Variations
This section presents the background and motivations of our proposed methodology.
We will extend the basic gate-delay model for further correlation analysis between basic
gate types, paths, and the measured ROs from silicon.
5.2.1 The First-Order Linear Delay Model
In [102], a first-order canonical form of the gate delay d0i is given by:
d0i = µ0i +
k∑
j=1
α0i,j · r0j (5.1)
where µ0i is the mean value of d0i, r0j is a normally distributed random variable,
and α0i,j is the first-order weight of the delay with respect to r0j . The parame-
ters r0j , j = 1, · · · , k, represent a set of underlying independent sources of variations.
The principle component analysis (PCA) technique can be used to transfer spatially-
correlated parameters into principle components, which are no longer correlated [103].
This transformation of parameter space enables the formulation of Equation (5.1) to
model gate delay as a linear function of independent parameter variations. The k princi-
ple components also represent the statistical distribution of the gate delay d0i. Here, we
normalize the variation of each parameter for simplicity. Thus, for 3σ ∈ [0, 1] variance,
136
r0j ∈ [1− 3σ, 1 + 3σ] and has a mean value of 1. Meanwhile, r0j satisfies a probability
distribution as r0j ∼ N(1, σ). Based on our simulation on the Nangate 45nm standard
cell library, α0i,j > 0 is valid for all gate types.
The Cauchy-Schwarz Inequality states that for all vectors X = [x1, · · · , xk] and
Y = [y1, · · · , yk] of an inner product space, it is true that [104]:
k∑
j=1
xjyj = | < X,Y > | (5.2)
≤ ‖ X ‖ × ‖ Y ‖ (5.3)
By substituting Equation (5.1) into (5.2), gate delay d0i is upper-bounded by:
d0i ≤ µi0 +
√√√√ k∑
j=1
α20i,j
√√√√ k∑
j=1
r20j (5.4)
≤ µ0i + α0i · ri (5.5)
where:
α0i =
√√√√ k∑
j=1
α20i,j (5.6)
ri =
√√√√ k∑
j=1,k 6=s
r20j (5.7)
Thus, by finding a minimum α0i to ensure that Equation (5.5) is satisfied, the
gate delay d0i can be upper-bounded.
Equation (5.5) does incorporate the impact from process variations, while we
simplified and represented it by only using one variable ri, defined as the representative
variation of process variations. In addition, Equation (5.5) incorporates both intra-
and inter-die variations. The constant coefficient α0i, named as the weight to process
variations, varies with different input combinations of each gate type. Note that even
137
though r0j (j = 1, · · · , k) is a gaussian variable, ri is no longer a gaussian variable.
However, we still treat it as a gaussian variable for conditional probability estimation
in Section 5.3. We focus on path ranking to improve the accuracy of timing analysis
considering actual silicon variations and reducing the test cost. Thus, we are much
more concerned with the path ranking than the accurate delay measurement of paths.
Our flow may introduce error in delay estimation, however, our results in Section 5.5
demonstrate that the final path ranking is still very accurate.
We conduct 5000 Monte Carlo (MC) simulations to characterize the linear models
for all gate types in Nangate 45nm standard cell library using the PTM 45nm transis-
tor model. In our simulations, variances (15%, 15%, 15%) are assigned to three major
physical parameters: L (channel length), W (channel width), and Vth0 (threshold volt-
age). Equation (5.5) is used to model the upper-bounds of each gate delay according
to different input combinations of each gate type. Figure 5.1 shows the results from
an AND gate (AND X4) in the cell library, where each dot in blue corresponds to one
delay and the red line shows the approximated linear upper-bound. It is clear that our
linear model obtains the upper-bound delay very well as a function of the representative
variation ri. Similar results are obtained from other gate types in the cell library using
MC simulations.
However, Equation (5.1) does not explicitly take into account the impact from
capacitive load. In a circuit, the capacitive load of a gate may be very different from
those of other gates due to the topology and fan-out/-in conditions. Alpha-power law
[85] provides an analytical model to formulate gate-level delay di based on threshold
138
0.8
5
0.9
5
1.
05
8010
0
12
0
14
0
No
rm
ali
ze
d 
ch
an
ne
l le
nt
h 
(L)
Delay (s)
Up
pe
r−
bo
un
d
M
on
te
 C
ar
lo
1.
15
 
 
(a
)
In
p
u
t
co
m
b
in
a
ti
o
n
:
r1
0.8
5
0.9
5
1.
05
8010
0
12
0
14
0
No
rm
ali
ze
d 
ch
an
ne
l le
nt
h 
(L)
Delay (s)
Up
pe
r−
bo
un
d
M
on
te
 C
ar
lo
1.
15
 
 
(b
)
In
p
u
t
co
m
b
in
a
ti
o
n
:
f1
0.8
5
0.9
5
1.
05
9011
0
13
0
15
0
No
rm
ali
ze
d 
ch
an
ne
l le
nt
h 
(L)
Delay (s)
Up
pe
r−
bo
un
d
M
on
te
 C
ar
lo
1.
15
 
 
(c
)
In
p
u
t
co
m
b
in
a
ti
o
n
:
1
r
0.8
5
0.9
5
1.
05
9011
0
13
0
15
0
No
rm
ali
ze
d 
ch
an
ne
l le
nt
h 
(L)
Delay (s)
 
 
Up
pe
r−
bo
un
d
M
on
te
 C
ar
lo
1.
15
(d
)
In
p
u
t
co
m
b
in
a
ti
o
n
:
1
f
F
ig
.
5
.1
:
G
a
te
u
p
p
er
-b
ou
n
d
d
el
ay
an
a
ly
si
s
co
rr
es
p
on
d
in
g
to
d
iff
er
en
t
in
p
u
t
co
m
b
in
at
io
n
s
(r
1,
f1
,
1r
,
1f
)
on
A
N
D
X
4
ga
te
,
w
h
er
e
th
e
le
tt
er
in
d
ic
at
es
th
e
tr
an
si
ti
on
on
th
e
on
-p
at
h
in
p
u
t
(r
:
ri
se
,
f:
fa
ll
),
an
d
th
e
n
u
m
b
er
in
d
ic
at
es
th
e
st
at
u
s
of
th
e
off
-p
a
th
in
p
u
t.
139
voltage Vth, capacitive load CLi as:
di =
Vdd
βi · [Vdd − Vth]γ × CLi (5.8)
where βi is a parameter dependent on the gate type, and Vdd is the power supply voltage.
In Equation (5.8), capacitive load CLi simply works as a multiplier. Thus, we can modify
Equation (5.5) to incorporate the capacitive load CLi and reformulate it as:
di = di0 × CLi (5.9)
≤ (µi0 + α0i · ri) · CLi (5.10)
= µi + αi · ri (5.11)
where:
µi = µ0i · CLi (5.12)
αi = α0i · CLi (5.13)
Summarized from the derivation above, the gate’s upper-bound delay can be
obtained using linear Equation (5.9). A path is regarded as a combination of K gates;
we can similarly formulate the path’s upper-bound delay Di as:
Di =
K∑
j=1
(µj + αj · rj) (5.14)
If these K gates are physically close and the variability is then neglected, it can be
accepted that r1 = · · · = rK . Thus, the upper-bound delay of path i can be reformulated
as:
Di = (
K∑
j=1
µj) + (
K∑
j=1
αj) · rj (5.15)
= Γi + Λi · ri (5.16)
140
where r1 = r2 = · · · = rK = ri. This is especially applicable to an RO where all gates
are placed physically close. Its cycle time can be formulated using Equation (5.15) as
well.
5.2.2 Relationship between Delays of Gates, Paths, and Ring-Oscillators
Using Equation (5.5), a path with s gates can be represented by vector manipulation
as:
ds =

µ01
...
µ0s
+

α01 · r1
...
α0s · rs
 (5.17)
If we take into account the capacitive load of each gate, a path’s structure is
represented as:
ds =

µ01 · CL1
...
µ0s · CLs
+

α01 · CL1 · r1
...
α0s · CLs · rs
 (5.18)
Meanwhile, if we define a load vector as Cs=[CL1, · · · , CLs]T , the path delay can
be calculated as:
Ds = Cs
T × ds
= [CL1, · · · , CLs]×

µ01
...
µ0s
+[CL1, · · · , CLs]×

α01 · r1
...
α0s · rs
 (5.19)
Equation (5.19) characterizes the relationship between delays of gates, paths, and
ROs. ROs are measured after fabrication to obtain variations from silicon. It indicates
141
that a path delay or a RO cycle time Ds can be represented by using vector manipulation
of gate delays ds and capacitive loads Cs. This representation also incorporates process
variations.
5.3 Delay Estimation
In the analysis above, we can see that the delays of gate and path, and the cycle time of
an RO can be represented using linear equations to obtain their upper-bounds. We also
characterize the relationship between delays of gates, paths, and ROs in Equation (5.19).
This enables the estimation of conditional probability from one to another. A similar
conditional probability analysis is also used in [105], which estimates the exact path
delay by considering the delay as a linear function of multiple parameters rather than
a linear function of the representative variation. Meanwhile, in their method, each
gate is assumed to connect to a capacitive load with a unit value, which neglects the
deviation of the capacitive load. However, a gate’s capacitive load may be very different
from other gates’ capacitive loads due to the complicated topology and fan-in/-out
conditions. As such, this renders their method less practical for implementation. SVA
pattern generation flow focuses instead on the path’s upper-bound delay relative to other
paths’ upper-bound delays and the ranking of critical paths considering the actual silicon
variations (More details provided in Section 5.4).
In the following, we present details about the process of delay estimation. Theo-
rems 1 and 2 from [106] will be used to facilitate our estimation.
Theorem 1: Let x0 be a gaussian variable satisfying x0 ∼ N(µ0, σ0), and X = [x1, · · · , xk]T
142
is a gaussian vector satisfying X ∼ N(µX ,ΣXX) with |ΣXX | > 0. Then the conditional
probability of x0 given X = X0 is a gaussian distribution and has:
µ = µ0 + Σ
T
0XΣ
−1
XX(X0 − µX) (5.20)
σ = σ0 −ΣT0XΣ−1XXΣX0 (5.21)
Σ0X = [σ01, · · · , σ0k]T (5.22)
where: σ0i is the covariance between x0 and xi, i = 1, · · · , k.
Theorem 1 indicates that we can use cycle times measured from ROs to estimate
the probability distribution of a gate’s upper-bound delay. To implement this condition-
al probability calculation, let us investigate the circuit layout shown in Figure 5.2 as an
example. Assuming there are k ROs distributed in the circuit, we estimate the upper-
bound delay of the NAND gate based on the RO measurements. From Theorem 1,
the distribution of NAND gate delay for a specific input combination is a gaussian
distribution with:
µi = µi0 + Σ
T
0XΣ
−1
XX(DX − Γ0) (5.23)
σi = σi0 −ΣT0XΣ−1XXΣX0 (5.24)
Σ0X = [σ01, · · · , σ0k]T (5.25)
where Γ0 = [Γ1, · · · ,Γk]T is the vector of mean cycle times of the ROs,DX = [D1, · · · , Dk]T
is the measurement vector composed by measurements from the ROs, Σ0X character-
izes the correlation between the NAND gate and ROs according to process variations,
and ΣXX characterizes the correlation between the ROs considering process variations.
143
Fig. 5.2: Circuit layout example. ROs are measured after fabrication to obtain varia-
tions.
Hence, we can use measurements DX on the ROs to estimate the gate’s delay distribu-
tion determined by µi and σi in Equations (5.23) and (5.24), respectively.
Theorem 2: If X is distributed as N(µ,Σ), then any linear combination of variables
aTX = a1x1 + a2x2 + · · ·+ apxp is distributed as N(aTµ,aTΣa). Σ is the covariance
matrix characterizing the correlation between any two variables.
By further implementing Theorem 2 to estimate path (with s gates) delay dis-
tribution and considering the path delay modeled in Equation (5.19), the conditional
distribution of a path’s upper-bound delay Ds satisfies a gaussian distribution as:
Ds ∼ N(CsTµ,CsTΣXXCs) (5.26)
In summary, Equations (5.23), (5.24), and (5.26) enable the estimation using
144
RO measurements to calculate the target path’s upper-bound delay considering process
variations and capacitive load conditions.
5.4 Silicon-Variation-Aware PDF Pattern Generation
Figure 5.3 shows our proposed SVA PDF pattern generation flow. As described in Sec-
tion 5.2.1, we generate linear models for each gate type in the Nangate 45nm standard
cell library considering three process parameters: channel length L, channel width W ,
and threshold voltage Vth0. For each parameter, we consider 15% variance during the
MC simulations using HSPICE [34]. Meanwhile, the correlations between gate types
are also calculated from the MC results. In addition, the linear model for a specifically-
structured RO (a 7-stage RO constructed with the minimum-size inverter in the library)
is also extracted using the MC simulations with same parameter variations. From all
these MC results, we can further extract the correlation between different gate types
and the RO.
The SVA pattern generation process can be divided into five steps:
1. As a preprocessing step, linear models for different gate types and a specific
RO are formulated through a MC simulation procedure considering process variations.
The correlations are calculated for delays of different gate types and the RO.
2. At design stage, STA, using a relatively large timing slack, is performed to
select a large number of paths , e.g. top 20% of the paths. Meanwhile, ROs are
distributed across the design during placement and layout.
3. After fabrication, the ROs are measured to obtain their cycle time values (i.e.
145
Netlist
Static Timing Analysis
(20% paths)
Path Delay Estimation
RO Measurement
PV-Aware Gate Delay
Linear Models
Path Ranking
ATPG
PDF Patterns
High Rank Paths
Fig. 5.3: The proposed SVA PDF pattern generation flow.
146
variations).
4. After collecting the measured RO cycle times, the selected paths are analyzed
for their upper-bound delay distributions. A fast path ranking technique ranks the
paths according to their distribution information.
5. Dependent on the test requirements, a percentage of the paths are selected
according to our ranking for PDF test.
Algorithm 4 shows the pseudo-code of our path ranking technique. In the algo-
rithm, Lines 3 to 8 initialize the parameters for a new ranking. A binary search and
sorting process from Lines 9 to 25 compares the paths’ delays according to their prob-
ability distributions. When two paths are compared (Lines 13 to 17) the cumulative
probability function (CDF) of the difference of the two paths’ delays is investigated. If
the CDF shows that the delay of path i is larger than that of path j with a probability
≥ 0.5, path i is treated as a longer path compared with j. In this way, all the paths are
ranked according to their distribution information. Line 26 returns the finalized rank-
ing. The binary search and sorting is different from ranking the paths by comparing
each one with all other paths. It significantly reduces the computational complexity
from O(n2) to O(nlog(n)), where n is the number of paths.
5.5 Experimental Results
This section shows our experimental results and evaluates the efficiency of the proposed
SVA flow on critical paths extracted from benchmark circuits. We use commercial
tools [34] and our in-house programs to dump the necessary circuit information, e.g.
147
Algorithm 4 The proposed SVA ranking algorithm.
1: InputList[] – Input list: (µi, σi), i = 1, · · · , k, where µi is the mean, and σi is the
variance corresponding to path i, k is the number of paths
2: last← length(InputList[]) (= k)
3: first← 1
4: RankedList[] – Ranked path list
5: PathRanking(InputList[], first, last)
6: Initialization:
7: p← 0
8: q ← 0
9: temp← 0
10: i← 0
11: j ← 0
12: while first < last do
13: i = first− 1
14: temp = 0
15: for j = p; j < last; j + + do
16: if CDF (Dlast > Dj) <= 0.5 then
17: i = i+ 1
18: temp = InputList[i]
148
19: InputList[j] = temp
20: end if
21: end for
22: temp = InputList(i+ 1)
23: InputList[i+ 1] = InputList[last]
24: InputList[last] = temp
25: q = i+ 1
26: PathRanking(InputList[], p, q − 1)
27: PathRanking(InputList[], q + 1, last)
28: end while
29: Return the InputList[] as RankedList[]
capacitive load of the gates. The Nangate 45nm standard cell library is used for syn-
thesis, physical design, and simulation. We extract the linear models and characterize
correlations for gate types in the cell library. Meanwhile, a 7-stage RO is constructed
as a block by using the minimum-sized inverter. During physical design, the ROs are
uniformly distributed in the design. In Equations (5.23) and (5.24), ΣXX is important
in estimating path delay, which characterizes the correlation between ROs including the
spatial information. We use the grid model [107] to generate the variation profile of the
physical parameters (L,W, Vth0) considering a 15% variance for each parameter. Based
on this variation profile, we use the covariance model [97] to approximate the correlation
149
between every two ROs according to their physical distance, which is formulated as:
Cov(Di, Dj) = σi · σj · ρij(l) (5.27)
ρij(l) = e
−λ·l, λ > 0 (5.28)
where σi and σj are the variances of the delays from ROs indexed by i and j, and l
is the physical distance between the two ROs. Using MC simulation results at design
stage, we can approximate λ based on the technology library and obtain the λ · l for the
correlation model in Equations (5.27) and (5.28). For actual implementation of SVA
flow on ASIC designs, we can measure a set of dies on a wafer and use the measured
RO cycle times to improve the model accuracy. Once the correlation model is ready, we
measure the physical distance of the ROs using the placement and layout information
to calculate the ΣXX in Equations (5.23), (5.24), and (5.26). Currently, we leave the
model improvement and verification using silicon data to future research.
To implement SVA flow and to compare its efficiency with rankings using STA
and SSTA results, we performed the following experiments:
1. For a selected number of testable critical paths, we conducted an SSTA based
on the nominal parameter values considering a 15% variance for each major parameter
as mentioned earlier. 5000 MC simulations were conducted including the corner cases.
The paths’ delays were measured in HSPICE and collected to obtain their probability
distributions according to SSTA. Similarly for every two paths, a threshold 0.5 was used
to evaluate the CDF of their delay difference and compare their path lengths in timing.
The paths were then ranked accordingly and the ranking was recorded as the “SSTA
ranking”.
150
2. According to the nominal parameter values and worst-case values stored in the
cell library, an STA was conducted to calculate the paths’ delays. The paths were ranked
according to their delay values and the ranking was recorded as the “STA ranking”.
3. We used the spatial correlation model [97] [107] to generate the variation profile
for each parameter according to specified nominal values and variances. Another round
of MC simulations was conducted on the paths and ROs. These simulations mimic the
actual silicon variations on a real chip after fabrication. We used HSPICE to measure
the paths’ delays and the paths were ranked accordingly. The ranking based on direct
measurements via HSPICE was used as a criteria to evaluate other rankings from STA,
SSTA and SVA.
4. The RO cycle times were measured and recorded as measurements representing
silicon. Using the RO cycle times, Algorithm 4 was used to rank the paths.
5. The ranking results from SVA were compared with rankings obtained from
STA and SSTA.
Figure 5.4 compares rankings of two paths (paths i and j) by SVA with that from
SSTA when using different number of measurements from the ROs. Using multiple
measurements, we calculated the mean value first and used it at the delay estimation
step in SVA. From our 2000 MC simulations, mimicking the actual silicon variations,
HSPICE measurements show that path i is always longer than path j with a probability
of > 99%. SSTA ranking shows that path i is longer than j with a probability of
≈ 96.5%. The ranking using SVA consistently provides higher accuracy (> 99%) even
when different number of RO measurements are used.
151
96
96.5
97
97.5
98
98.5
99
99.5
100
1 2 3 5 10 15 20 30 50 10
0
20
0
50
0
10
00
20
00
Number of measurements
SVA Ranking
SSTA Ranking
Pr
ob
ab
ili
ty
 (%
)
Fig. 5.4: Comparing two paths length using SVA and SSTA.
In the following ranking comparisons, we introduce a tolerance parameter ∆L.
When ∆L is specified, we regard a rank L from SVA, STA, and SSTA as a correct
ranking if it is located within the range of [LMC −∆L/2, LMC + ∆L/2], where LMC is
the rank determined by MC simulation results. For example, MC results rank a path
and obtain its ranking as LMC = 50. When ∆L = 6, the ranking L from SVA, STA, or
SSTA is regarded as correct, if 47 ≤ L ≤ 53.
In Figure 5.5, 18 gate-dominant paths, with relatively small capacitive load for
each gate, are ranked using SVA, STA, and SSTA. In Figure 5.6, 12 interconnect-
dominant paths with relatively large capacitive load for each gate, are ranked using
different methods. Both Figures 5.5 and 5.6 show that SVA always provides better
ranking results according to different ∆L.
Figure 5.7 compares the rankings of 400 critical paths from benchmark circuit
s38417 using SVA, STA, and SSTA, respectively. The results in Figure 5.7 show that
152
Fig. 5.5: Comparison of ranking accuracy on gate-dominant paths using SVA, STA,
and SSTA.
Fig. 5.6: Comparison of ranking accuracy on interconnect-dominant paths using SVA,
STA, and SSTA.
153
0 10 20 30 40 500
20
40
60
80
100
∆ L
Co
rre
ct
 ra
nk
in
g 
(%
)
 
 
Using SVA Results
Using STA Results
Using SSTA Results
Fig. 5.7: Comparison of ranking accuracy using SVA, STA, and SSTA for 400 critical
paths.
rankings using SVA are more accurate than using STA and SSTA. When ∆L reaches
a value ≥ 24, SVA obtains 100% correctness. Figure 5.8 shows the improvement in
ranking when using SVA, according to different ∆L values. The results in Figure 5.8
indicate that as ∆L increases, the ranking improvement will decrease compared with
STA and SSTA. A larger ∆L leads to less effectiveness of SVA flow. However, rankings
based on a larger ∆L will lead to more paths to be treated with a same ranking. If we
select the top percentage of paths for PDF test for a high fault coverage, all of the paths
with same high ranking need to be included. It is inaccurate to neglect any of them.
This may lead to an over-sized path set. As a consequence, a smaller ∆L is preferred
for a lower test cost. When ∆L < 10, ranking accuracy using SVA is highly effective.
Even when ∆L approaches 50, SVA can still obtain ≥ 50% more accuracy than STA
and SSTA.
In the following, we evaluate the test cost reduction using SVA. Top q (in percent-
154
0 10 20 30 40 500
100
200
300
400
∆ L
Im
pr
ov
em
en
t (%
)
 
 
Compared w/ STA results
Compared w/ SSTA results
Fig. 5.8: Improvement of ranking accuracy using SVA over STA and SSTA.
age %) testable critical paths are selected according to MC simulation results considering
different process variations. These q paths represent the actual critical paths in silicon.
To ensure these q paths are tested, tools usually end up selecting >> q paths. SVA,
STA, and SSTA are used to select the minimum number of paths that include these
q paths. Assuming the number of paths for 100% coverage of the q paths are PSV A,
PSTA, and PSSTA obtained from SVA, STA, and SSTA, respectively, we can calculate
the test cost reduction as:
RSTA =
PSV A − PSTA
PSTA
× 100 (5.29)
RSSTA =
PSV A − PSSTA
PSSTA
× 100 (5.30)
where RSTA and RSSTA represent the reduction in the number of paths for achieving
the same coverage detecting q paths by using SVA compared with using STA and SSTA.
The results in Figure 5.9 show that using SVA for PDF pattern generation we
can significantly reduce the number of paths necessary for test. On average, SVA
155
Fig. 5.9: RSTA and RSSTA using SVA over STA and SSTA.
reduces ≥ 52% and ≥ 15% paths for testing aiming at 100% coverage of the q critical
paths. To detect the top q paths, STA will lead to a selection of PSTA > q paths
for test; however, when using SVA to rank the paths, we only need to select the top
PSV A = PSTA × (1 − RSTA) paths. Similarly, to detect the top q paths according to
ranking using SSTA, PSSTA >> q paths must be tested; however, using SVA only leads
to PSV A = PSSTA × (1−RSSTA) paths.
In SVA, we introduced a time-consuming ranking step to rank the critical paths.
Even though we use a rather quick binary search and sort algorithm to reduce the time
overhead, it still introduces some time overhead. Currently, we are using a Desktop
with a 3.0GHz dual-core CPU and 3GB memory. The path ranking technique detailed
in Algorithm 4 is programmed using Matlab. According to our simulations, the time
overhead TOH caused by our ranking technique is ≈ 0.012 · n seconds, when the path
count n ≥ 5000. We believe that the ranking overhead TOH could be further reduced if
the algorithm is implemented using a more efficient program and is executed on higher-
156
performance servers. To analyze the efficiency of SVA, we make an assumption that
one PDF corresponds to one critical path, hence one test pattern. It is reasonable to
assume that for a pattern detecting multiple PDFs, we only need to put one of them
into the ranking procedure.
As described in [108], the test time for each die (TT ) is:
TT = [(n+ 2) · nsff + n+ 4] · Tclk (5.31)
where n is the test pattern count, nsff is the number of flip-flops in the scan chain
(nsff > 0), and Tclk is the clock cycle time. Using SVA, for the same test coverage
we analyzed above, n · p (p = RSTA for comparison with STA, and p = RSSTA for
comparison with SSTA) patterns can be saved, which leads to test time reduced per die
TRP = n · p · (nsff + 1) · Tclk. The RO measurement of a selected number of dies on a
wafer and ATPG runtime on PSV A paths are considered negligible since they are run
only once for a chip production test line.
For m dies, the overall test time reduced per die (TTR), considering the ranking
overhead TOH , can be calculated as:
TTR = TRP − TOH
m
= n · p · (nsff + 1) · Tclk − 0.012 · n
m
(5.32)
TTR > 0 is needed to ensure the effectiveness of SVA:
m >
0.012
p · (nsff + 1) · Tclk (5.33)
Using Equation (5.33), we can calculate the minimum number of dies, mSTA
and mSSTA, ensuring that SVA is advantageous to STA and SSTA with ≥ 15% and
157
≥ 52% test cost reduction, respectively. Assuming that Tclk = 20ns, nsff = 670,
and p for RSTA = 15% (used in Equation (5.34)) and p for RSSTA = 52% (used in
Equation (5.35)), we can obtain:
mSTA >
0.012
1.36204× 10−5 × 0.15 ≈ 5962 (5.34)
mSSTA >
0.012
1.36204× 10−5 × 0.52 ≈ 1720 (5.35)
In an actual production line, the fabricated chip volume may easily exceed 5962;
also the scan chain may have a length nsff longer than 670. These conditions ensure
that SVA can significantly reduce the test cost.
5.6 Summary
In this chapter, we proposed a SVA PDF pattern generation flow. We used post-
silicon measurements to facilitate the prediction of actual delay variation. We then
ranked the paths’ delays and identified PDFs accordingly. This technique is shown to
be more accurate than STA and SSTA. In large industrial designs, this method could
considerably reduce test cost by reducing the number of paths to test and significantly
improve speed binning efficiency and yield.
Chapter 6
Testable Representative Paths
6.1 Introduction
When technology advanced into lower feature nodes, circuit performance became more
vulnerable to process variations and aging effects, such as the negative bias temperature
instability (NBTI) [6] [90]. An immediate negative impact of process variations and
NBTI is that path delays are no longer deterministic from chip to chip [86] [109]. On
a production line, delay from the same path on different manufactured chips may span
a wider range, which renders it very difficult to accurately identify the critical paths of
all chips. As a consequence, a large number of critical paths must be tested in order to
evaluate circuit performance.
Current automatic test pattern generation (ATPG) flows rely on timing analy-
sis tools to identify critical paths for pattern generation and delay tests. Thus, timing
analysis becomes increasingly important, requiring high levels of accuracy and efficiency.
Conventional static timing analysis (STA) [94] and statistical static timing analysis (SS-
TA) [95] have major difficulties in identifying critical paths. The major problem is that
STA is not capable of considering the timing variability caused by process variations,
158
159
since all physical parameters are evaluated at nominal values or worst-case values [110].
SSTA, on the other hand, represents path delay as a probability distribution, deter-
mined by a mean and a variance considering only process variations, and not the aging
mechanisms. Using SSTA leads to an overpessimistic evaluation of the delay deviation
because it includes all corner/worst cases.
In recent years, researchers have attempted to reduce the number of critical paths
by performing more accurate analysis of circuit timing considering process variations.
For example, the authors in [111] proposed a path selection algorithm to select a set
of critical paths at the pre-ATPG stage to achieve near optimal fault coverage. The
authors in [112] proposed a path-pruning technique to reduce the number of paths by
discarding all paths but the longest. However, these methods cannot efficiently reduce
the number of critical path,s since they lack accuracy in timing analysis. Thus, those
critical paths, which could be essential in verifying a manufactured IC’s timing impacted
by process variations, may not be accurately identified.
In addition to the effects of process variations, NBTI also introduces uncertainty
to circuit timing by changing the gate/path delay over time [11] [87]. Unfortunately,
STA and SSTA are not capable of efficiently incorporating NBTI into their timing
analysis processes. Over the years, different mathematical models have been proposed
to predict the delay change by NBTI. The authors in [3] proposed an analytical model
to characterize NBTI degradation considering both stress and recovery mechanisms. An
improved model was proposed in [11], which quantifies the gate-delay degradation by
equalizing the stress signals into rectangular signals. NBTI is an effect of operation
160
stress time, temperature, stress probability, and several other factors. Actual circuit
degradation is not predictable due to variances in the stress conditions on chips used
in different systems/applications. This uncertainty makes it difficult to identify all
the critical paths, which are degraded at different time points by the NBTI effect.
Furthermore, considering process variations together with NBTI makes the paths delays
probabilistic. Both mean and variance evolve with time [86], and circuit timing analysis
becomes more difficult.
To obtain highly accurate circuit timing, using sensors were proposed as a viable
option in [113] [114] [116]. These sensors are inserted into designs to monitor and
measure the critical paths’ delay. The major issue is that these sensors cannot be
widely implemented over a large number of critical paths without causing excessive
area and power overhead, and design and test effort.
As circuit size increases with technology shrinking further to 22nm and below, the
number of critical paths required to test across different chips considering aging degra-
dation and variations significantly increases. The total number of critical path obtained
using STA [94] [110] and SSTA [95] [111] can be extremely large. Hence, in this thesis,
we propose a novel methodology for identifying testable representative paths (TRPs)
for the path delay test in order to reduce PDF patterns and save test cost and time.
The objective of this work is to use a small number of TRPs to statistically estimate
the delay and aging of a large number of critical paths in a set of chips considering both
process variations and NBTI effect. TRPs are a small number of paths selected from
the critical paths according to gate overlap and circuit topology analysis. As a subset
161
of all the critical paths, TRPs can represent the unselected paths, based on gate overlap
and their interconnections. The TRP maximum mean delay and variance closely follow
the maximum mean delay and variance of all the critical paths at different time points.
Instead of testing a large number of critical paths, testing TRPs alone will considerably
reduce PDF patterns during production test. Moreover, TRPs can be implemented with
logic built-in self-test (LBIST) methods and in multi-core system environments, which
can repeatedly apply the stored PDF patterns over time to evaluate circuit performance
and reliability in the field. Furthermore, the identification of TRPs enables adaptive
calibration methods to be performed in a timely manner to ensure lifetime reliability.
Testing on TRPs is a cost-effective test solution compared with: (1) Inserting monitors
over the circuit for all critical paths. As only a small number of TRPs need to be mea-
sured and monitored, the impact on the circuit is minimized. (2) Conventional ATPG
testing of all critical paths. Since the number of TRPs is significantly less than the num-
ber of critical paths, a relatively small on-chip memory is needed to store all the TRP
delay patterns. Compatible with LBIST methods and multi-core system environments,
these patterns can be repeatedly used to evaluate circuit performance.
The contribution here includes:
1. Development of a novel QR decomposition based algorithm to select a small num-
ber of TRPs to represent all critical paths. We significantly reduce the critical
path number by selecting the TRPs for PDF pattern generation and delay test.
2. Development of a novel PDF pattern generation flow to reduce the number of PDF
patterns by focusing on the TRPs. Testing on TRPs can provide accurate path
162
delay estimation and circuit performance evaluation, since the TRP maximum
mean delay and variance closely follow the maximum mean delay and variance of
all critical paths.
3. Demonstration of the effectiveness of our technique on six benchmark circuits.
The rest of this chapter is organized as follows: Section 6.2 provides the delay
analysis under the impact of process variations and NBTI aging. Section 6.3 presents the
details of our proposed TRP identification flow. Section 6.4 provides our experimental
results and analysis. Section 6.5 summarize this chapter.
6.2 Delay Analysis Considering Process Variations and NBTI Effect
In this section, we present the background and motivations of our proposed methodology.
6.2.1 Gate Delay Analysis and Approximation
Circuit performance is predominantly determined by critical paths, which are impacted
by process variations and NBTI aging effect. Gate delay can no longer serve as a
deterministic value when these effects are considered. Taking these two factors into
account, the authors in [86] formulate gate delay di as a probability distribution function
of time t with a mean value µdi(t) and a variance value σdi(t) as:
µdi(t) = µdi(0)× (1 +A · Sti · tγ) (6.1)
σdi(t) = σdi(0)× (1−A · Sv · tγ) (6.2)
µdi(0) = d0i (6.3)
σdi(0) = d0iStiσpvi (6.4)
163
where the value of A depends on both technology parameters and operating conditions,
Sv describes the NBTI degradation rate to the threshold voltage V th shift, Sti indicates
the degradation rate of gate delay to Vth shift, and µdi(0) (= d0i) and σdi(0) are the
mean and variance values of the gate delay at time 0 without NBTI degradation.
If we simplify Equations (6.1) and (6.2), we get:
dµi = µ0i + aµi · tγµi (6.5)
dσi = σ0i + bσi · tγσi (6.6)
where:
dµi = µdi(t) (6.7)
dσi = σdi(t) (6.8)
µ0i = µdi(0) (6.9)
σ0i = σdi(0) (6.10)
aµi = µdi(0) ·A · Sti (6.11)
bσi = −σdi(0) ·A · Sv (6.12)
Comparing Equations (6.5) and (6.6) with (6.1) and (6.2), the above modifica-
tion only changes the formats of the mean and variance equations. Equations (6.5)
and (6.6) reveal that there are two unique triplets, (µ0i, aµi , γµi) and (σ0i, bσi , γσi) for
the mean dµi and variance dσi that correspond to gate delay. Note that Equation-
s (6.1) and (6.2) have the same exponent γ. However, we use two individual expo-
nents γµi and γσi in Equations (6.5) and (6.6) after transformation. The major reason
for this is that we use pre-silicon Monte-Carlo (MC) simulation results to approximate
164
(µ0i, aµi , γµi , σ0i, bσi , γσi) of the propagation delay for each gate type corresponding with
different input combinations. By using two individual γµi and γσi , we can obtain better
approximation accuracy for the mean and variance values. We verified this using our
simulation results obtained for different gate types in the Nangate 45nm standard cell
library [66]. Our TRP identification flow in Section 6.3 will employ this modification.
Figure 6.1 compares the falling delays of gate type INV X1 from the 45nm Nan-
gate cell library at 11 stress time points (temperature is 25oC). Delays were measured
in HSPICE MOSRA through 5000 MC simulations to incorporate process variations
and NBTI degradation. Three physical parameters, channel length L, channel width
W , and threshold voltage Vth0 are assumed to have 15% variance respectively. We used
the results to extract the corresponding approximation models and then compared them
with the simulation results. In this figure, the blue line with symbol “−−” shows the
simulations results, while the red line with symbol “−♦−” represents the results from the
models. From Figure 6.1, we can see that models approximate the simulation results
very closely. This approximation is expected to be highly accurate and applies to all
gate types in the technology library.
In Equations (6.5) and (6.6), aµi and bσi separately evaluate the degradation
rates of gate mean delay and variance from the combined effect of process variations
and NBTI degradation over time. Given the different structures of each gate type, the
delays and degradation rates differ from gate type to gate type. Even amongst the same
gate type, delays and degradation rates are biased from different input combinations,
since NBTI is an aging effect upon PMOS transistors, and the stack condition of PMOS
165
0 0.5 1 1.5 2 2.5 360
70
80
90
100
Stress time (108 s)
M
ea
n 
de
lay
 (p
s)
 
 
Approximation
Monte−Carlo
(a) Mean value approximation
0 0.5 1 1.5 2 2.5 35
10
15
Stress time (108 s)
Va
ria
nc
e (
ps
)
 
 
Approximation
Monte−Carlo
(b) Variance value approximation
Fig. 6.1: Mean and variance approximations corresponding to gate type INV X1,
where the falling delays are measured using HSPICE MOSRA and compared
with model approximation at 11 stress time points.
166
transistors will change the degradation rate. When we characterize different gate types
using the two triplets, (µ0i, aµi , γµi) and (σ0i, bσi , γσi) for the mean dµi and variance dσi ,
we differentiate not only from gate types, but also from different input combinations.
We can get a corresponding delay pair (aµi , bσi) for the specific input combination of
each gate type. All the delay pairs (aµi , bσi) can be obtained using HSPICE MOSRA
analysis and MC simulations based on the technology library. In our simulations, a
unit capacitive load is used to obtain all the delay pairs. However, in actual circuit, the
capacitive load to each gate may be different due to the interconnects and fan-in/fan-out
conditions. Note that in this chapter, we take into account the capacitive load from
interconnects. We leave the impact of process variations on interconnects to future
research. Alpha-power law [85] provides an analytical model to formulate gate-level
delay di based on threshold voltage Vth, capacitive load CLi as:
di =
Vdd
βi · [Vdd − Vth]γ × CLi (6.13)
where βi is a parameter dependent on the gate type, and Vdd is the power supply voltage.
In Equation (6.13), capacitive load CLi simply works as a multiplier. Thus, considering
the capacitive load CLi, we would rather use di×CLi to represent the delay di obtained
from the simulation results as:
di → di × CLi (6.14)
Since di is characterized as a probability distribution as in Equations (6.1) and (6.2),
considering capacitive load CLi, the gate delay di has its mean and standard deviation
167
as:
µdi(t)→ µdi(t)× CLi ⇒ aµi → aµi × CLi (6.15)
σdi(t)→ σdi(t)× CLi ⇒ bσi → bσi × CLi (6.16)
In Equations (6.15) and (6.16), the format of the delay pair (aµi , bσi) is not changed.
However, by multiplying CLi, the capacitive load (from interconnect and followed gates)
of the gate is incorporated. Note that this transformation will not take into account
of variation of the interconnect. Thus, we can still use the same format of the delay
pair (aµi , bσi) to describe the gate delay di, even with the capacitive load from the
interconnect and following gates considered.
6.2.2 Path Structure and Delay Analysis
Considering the circuit topology, paths go through different gates, and gates may be
shared by multiple paths. We treat each unique gate as a delay segment with its
capacitive load from interconnect and following gates. Suppose there are n critical
paths, each of which consists of delay segments from on-path gate input to output.
Assuming there are totally m unique segments on these critical paths, we can then use
an n × m matrix P = {p1, p2, · · · , pn}T to denote the paths, where each row vector
pi = [pi1, pi2, · · · , pim] (i = 1, · · · , n) refers to a specific path among the group and is in
the form of a series of 1s and 0s, indicating whether or not a specific segment is on the
path. If pij = 1, delay segment j is on path i; otherwise, delay segment j is not on path
i. Since the delay segment j has its own unique delay pair as (aµj , bσj ), we can use two
vectors µ = [aµ1 , aµ2 , · · · , aµm ]T and σ = [bσ1 , bσ2 , · · · , bσm ]T to denote the segments
168
delay, where [·]T calculates the transpose matrix. We define the vector multiplication
as:
Definition 1: If A and B are two vectors,
A = [a1, a2, · · · , ak]
B = [b1, b2, · · · , bk]T
A⊗B calculates the element-by-element product of vectors A and B as:
A⊗B = [a1, a2, · · · , ak]⊗ [b1, b2, · · · , bk]T
= [a1 · b1, a2 · b2, · · · , ak · bk]
where A and B must have the same size.
Thus, for path i, we can obtain two unique vectors Γµi and Γσi, which can be
calculated as:
Γµi = pi ⊗ µ (6.17)
Γσi = pi ⊗ σ (6.18)
In this way, the total n paths can be represented by using two matrices Γµ and
Γσ, which can be calculated as:
Γµ = P ⊗ µ (6.19)
Γσ = P ⊗ σ (6.20)
Equations (6.19) and (6.20) reveal the structure and connection of the n paths.
Delay degradation of m gates, due to process variations and NBTI, on these n paths
are also described by listing their means and variances in these two equations.
169
G1
G2
G3
G4
G5
G6
G7
G8
G9
d1 d3
d2
d5
d6 d8
d7 d9d4
Fig. 6.2: An example circuit with 9 basic gates.
6.3 Testable Representative Paths (TRP) Identification Flow
In this section, we present the details of TRP identification. Matrix manipulation will
help to identify TRPs as a subset of critical paths. The path structure and gate overlap
condition in the circuit will be taken into consideration. The TRPs that are finally
selected represent the unselected paths. Since the TRPs are part of the original critical
paths, and our flow selects TRPs without impacting the connection of the gates, the
layout of the critical paths, or the circuitry, the correlation between the gates will still
be remained.
Table 6.1: The Structure of Paths in Figure 6.2
Path # Path Structure
P1 G1 → G3 → G5 → G7 → G9
P2 G1 → G3 → G5 → G6 → G8
P3 G2 → G4 → G5 → G6 → G8
P4 G2 → G4 → G5 → G7 → G9
170
Figure 6.2 shows an example circuit with a very simple structure, where on-path
gate inputs are indicated by a solid line, and off-path gate inputs are indicated by dots.
Table 6.1 shows the corresponding structure of each path from primary input to primary
output in the dummy circuit. From the table, we can see that many gates are shared
by multiple paths, e.g., gate G5 is shared by paths P1, P2, P3, and P4. As described
in Section 5.2, if we regard each gate and its corresponding capacitive load as a delay
segment (di), path delay can be represented as a summation of the delays of several
segments Dj =
∑
i
di. Thus, the four path delays can be represented as:

D1 = d1 + d3 + d5 + d7 + d9
D2 = d1 + d3 + d5 + d6 + d8
D3 = d2 + d4 + d5 + d6 + d8
D4 = d2 + d4 + d5 + d7 + d9
(6.21)
where Dj indicates the delay of path Pj , and di is the delay of gate Gi.
Considering the gate overlap and the structure of each path, path delay D1 can be
calculated as a linear combination of path delays D2, D3 and D4 as D1 = D2−D3 +D4.
A similar relationship exists between other paths. This representation of one path delay
by other paths’ delay indicates that we can select part of the paths in a circuit to
analyze the delays of other paths. The number of selected paths then depends on the
circuit topology, e.g., gate overlap and path structure. Assuming again that there are
m unique segments on n paths, the delay of these m gates can be represented as a delay
vector d = [d1, d2, · · · , dm]T . We can then use an n ×m matrix P = {p1, p2, · · · , pn}T
to represent the circuit structure of these n paths. Another n×m matrix D, calculated
171
from P and d, can be used to represent the delays of these n paths. D can be calculated
as:
D = P ⊗ d
=

p11 p12 · · · p1m
p21 p22 · · · p2m
...
pn1 pn2 · · · pnm

⊗ [d1, d2, · · · , dm]T (6.22)
The rank of a matrix is defined as the number of independent rows that can
linearly represent all other rows [115]. This is to say that the top r paths, where
r = rank(Dn×m), can exactly express Dn×m as linear combinations of Dr×m. QR de-
composition with column pivoting, that makes such representation possible, can be used
to select the corresponding r row vectors. It has been successfully employed in [114] [63]
and in many other applications in signal processing and statistics [61]. Our method uses
QR decomposition with column pivoting to identify the TRPs.
The above analysis enables the selection of a path subset, which can linearly
represent the delays of unselected paths. However, neither process variations nor NBTI
impact is taken into account during formulation and analysis. Suppose that, among
many functional paths in the circuit shown in Figure 6.2, paths P1, P2, P3, and P4
stand out as four critical paths, as shown in Figure 6.3. We select one path as TRP in
such a way that the maximum delay of the TRP can always closely track the maximum
delay of all the critical paths. For large industry designs that have a large number of
critical paths, we expect to select multiple paths as TRP.
In order to bring in consideration process variations and NBTI, and select path
172
pdf Critical paths
TRP
Stress time = 0
delay
(a) Path delay distributions at time 0
Critical paths
TRP
pdf
delay
Stress time = t
(b) Path delay distributions at time t
Fig. 6.3: Delay distributions of 4 paths at time 0 and time t considering process vari-
ations and NBTI (“pdf” stands for probability distribution function).
subset (i.e., TRPs) accordingly, we use (aµi , bσi) corresponding to each unique gate on
all the critical paths to construct two matrices as in Equations (6.19) and (6.20):
Γµ = P ⊗ µ
=

p11 p12 · · · pim
p21 p22 · · · p2m
...
pn1 pn2 · · · pnm

⊗[aµ1 , aµ2 , · · · , aµm ]T (6.23)
173
Γσ = P ⊗ σ
=

p11 p12 · · · pim
p21 p22 · · · p2m
...
pn1 pn2 · · · pnm

⊗[bσ1 , bσ2 , · · · , bσm ]T (6.24)
QR decomposition with column pivoting is used on Γµ and Γσ to identify two
TRP subsets from the original critical paths. After performing singular value decompo-
sition and using steps of QR decomposition with column pivoting on Γµ as an example,
we can obtain Γµ = Uµ×Sµ×Vµ, where Uµ and Vµ are n×n and m×m orthogonal
matrices, respectively, and Sµ is an n×m diagonal matrix consisting of the eigenvalues
λµi of Γµ ranked in the descending order, i.e., λµ1 > λµ2 > · · · > λµi > λµi+1 > · · · > 0.
The number of non-zero eigenvalues λµi (i = 1, · · · , rµ) equals the rank rµ of the matrix
Γµ, which also indicates the rµ most important row vectors (i.e., paths as a subset) that
can be used to represent the path matrix Γµ based on the mean values of all gates on
the paths affected by process variations and NBTI. Therefore, we can use QR decom-
position with column pivoting to select the corresponding row vectors; in other words,
the paths corresponding to Pµ = {p′1, p′2, ..., p′rµ}T can be used as a TRP subset Sµ, to
represent the original paths in matrix Γµ. However, Sµ is obtained only based on the
mean values. We repeat a similar procedure on Γσ to obtain another path subset Sσ
corresponding to Pσ = {p′′1, p′′2, ..., p′′rσ}T , which is selected based on the variance values
of all the gates on the original paths. Finally, the union of Sµ and Sσ is identified as
the TRPs representing all the paths affected by both process variations and NBTI:
S = Sµ
⋃
Sσ (6.25)
174
Algorithm 5 summarizes the TRP identification process. Lines 2 to 12 identify two
path subsets Sµ and Sσ separately. Line 13 returns the unified path set S as the TRP
set. The major computational complexity comes from the singular value decomposition
in Line 5 and the QR decomposition in Line 7. The computational complexity increases
exponentially with the path and gate counts. Hence, a “divide-and-conquer” approach
can be used to reduce the computational overhead for industrial designs that have a
huge pool of critical paths. By dividing them into multiple groups, we can apply the
TRP identification to each group in parallel, and eventually identify TRPs for all the
critical paths.
S = Sµ
⋃
Sσ (6.26)
Algorithm 5 summarizes the TRP identification process. Lines 2 to 12 identify two
path subsets Sµ and Sσ separately. Line 13 returns the unified path set S as the TRP
set. The major computational complexity comes from the singular value decomposition
in Line 5 and the QR decomposition in Line 7. The computational complexity increases
exponentially with the path and gate counts. Hence, a “divide-and-conquer” approach
can be used to reduce the computational overhead for industrial designs that have a
huge pool of critical paths. By dividing them into multiple groups, we can apply the
TRP identification to each group in parallel, and eventually identify TRPs for all the
critical paths.
Figure 6.4 shows the entire process of selecting TRPs and generating their PDF
patterns (TRP-PDF patterns):
Step 1: We performed MC simulations with NBTI aging analysis in HSPICE MOSRA [34]
175
Algorithm 5 TRP identification.
1: Input: P , Sµ, and Sσ
2: Output: S
3: Initialize S, SD
4: for e doachD ∈ {µ,σ}
5: Γ = P ⊗D
6: Singular value decomposition: [UD,SD,VD] = svd(Γ)
7: Use the first r columns in UD to form Ur = UD(:, 1 : r)
8: QR-decomposition, i.e., [Q,R,K] = qr(Ur)
9: Γn = K
TΓ
10: Select the first r rows in Γn to form Γr = Γn(1 : r, :)
11: Select SD as a subset of paths corresponding to Γr
12: Let S = S
⋃
SD
13: end for
14: Return paths in S as the identified TRPs
176
Netlist
PV-and NBTI-Aware
Gate Models
Standard Cell
Library
Static Timing Analysis
(Top 10% or 20%)
Critical Paths
ATPG
All the Testable 
Critical Paths
TRP Identification
(Algorithm 1)
ATPG
TRPs
TRP-PDF 
Patterns
St
ep
 1
St
ep
 2
St
ep
 3
St
ep
 4
Fig. 6.4: TRP selection and TRP-PDF pattern generation.
177
for each gate in the Nangate 45nm standard cell library. We consider 15% variance on
channel length (L), channel width W , and threshold voltage (Vth). For NBTI effect, we
consider a total 10 years (≈ 3× 108 seconds) degradation using HSPICE MOSRA. We
measure the delay every year (≈ 0.3 × 108 seconds). Via data approximation on the
models in Equations (6.5) and (6.6), we obtain delay pair (aµi , bσi) for each gate type
differentiating input combinations in the cell library. This step must be done only once
for each technology library.
Step 2: Testable critical paths extraction: STA with a large threshold (top 10% or
20% of the paths in timing) is conducted on the netlist of the design to select a large
number of critical paths for further analysis. Note that for large industry designs, a
smaller threshold, e.g. 2% may be used, as the number of paths would be considerably
large. Using ATPG in this step, we will only select paths that are “testable” to be used
by Algorithm 5 for TRP identification.
Step 3: TRP identification: We use Algorithm 5 to identify TRPs. For large industry
designs, a “divide-and-conquer” approach can be used to reduce the computational
complexity.
Step 4: TRP pattern generation: The identified TRPs are fed into the ATPG tool
for PDF pattern generation. Since the number of TRPs is significantly less than the
number of critical paths, PDF patterns will also be considerably lower.
178
6.4 Experimental Results
This section presents our experimental results and evaluates the efficiency of the pro-
posed TRP flow on critical paths extracted from the ISCAS′89 benchmark circuits. We
use commercial tools [34] and our in-house programs to extract the necessary circuit
information, e.g. the gate capacitive load. The Nangate 45nm standard cell library and
PTM 45nm transistor models [84] are used for synthesis, physical design, and simula-
tions. We extract the process variations- and NBTI-aware gate models for each gate type
differentiating input combinations through 5000 MC simulations in HSPICE MOSRA.
We assumed 15% variance as the 3σ variance on channel length L, channel width W ,
and initial threshold voltage Vth0. We implemented Algorithm 5 with Matlab and Perl,
and run the code on a Desktop with a 3.0GHz dual-core CPU and 3GB memory.
Figure 6.5 compares measurement results from TRPs and all the testable critical
paths of benchmark circuit s9234. We extract 1992 testable critical paths (top 20%
since the benchmark circuit is small) for timing analysis. Note that we select all the
paths designated as the top 20% in order to include all the potential timing-critical
paths considering both process variations and NBTI degradation. The total number of
unique gates on these paths is 2106. Clearly, there are a large number of paths sharing
the same gates. Here, we define all the different gate instances on the critical paths as
the unique gates.
Our flow identifies 638 TRPs. In the MC simulations using HSPICE MOSRA,
the temperature is set at 25oC. For the sake of simplicity and comparison, we put
every gate under a state of constant stress condition with a stress probability of 100%
179
0
0.5
1
1.
5
2
2.
5
3
0
50
0
10
00
15
00
20
00
St
re
ss
 ti
m
e (
10
8  s
)
Max. mean delay (ps)
 
 
M
ax
 d
el
ay
 o
f a
ll p
at
hs
M
ax
 d
el
ay
 o
f T
RP
s
(a
)
C
o
m
p
a
ri
so
n
o
f
m
a
x
.
m
ea
n
d
el
a
y
s
a
t
d
iff
er
en
t
st
re
ss
ti
m
e
p
o
in
ts
.
0
0.5
1
1.
5
2
2.
5
3
0.8
50.90.9
51
1.
05
St
re
ss
 ti
m
e (
10
8  s
)
Max. mean delay error (%)
(b
)
E
rr
o
rs
o
f
m
a
x
.
m
ea
n
d
el
a
y
s
a
t
d
iff
er
en
t
st
re
ss
ti
m
e
p
o
in
ts
.
0
0.5
1
1.
5
2
2.
5
3
05010
0
15
0
20
0
St
re
ss
 ti
m
e (
10
8  s
)
Max. variance (ps)
 
 
M
ax
 va
ria
nc
e o
f a
ll p
at
hs
M
ax
 va
ria
nc
e o
f T
RP
s
(c
)
C
o
m
p
a
ri
so
n
o
f
m
a
x
.
v
a
ri
a
n
ce
a
t
d
iff
er
en
t
st
re
ss
ti
m
e
p
o
in
ts
.
0
0.5
1
1.
5
2
2.
5
3
1
1.
52
2.
5
St
re
ss
 ti
m
e (
10
8  s
)
Max. variance error (%)
(d
)
E
rr
o
rs
o
f
m
a
x
.
v
a
ri
a
n
ce
s
a
t
d
iff
er
en
t
st
re
ss
ti
m
e
p
o
in
ts
.
F
ig
.
6
.5
:
C
as
e
1
:
C
o
m
p
a
ri
so
n
of
th
e
m
ax
im
u
m
m
ea
n
d
el
ay
s
an
d
va
ri
an
ce
s
m
ea
su
re
d
fr
om
al
l
19
92
cr
it
ic
al
p
at
h
s
an
d
63
8
T
R
P
s
fr
om
b
en
ch
m
a
rk
ci
rc
u
it
s9
2
34
ov
er
ti
m
e
co
n
si
d
er
in
g
p
ro
ce
ss
va
ri
at
io
n
s
an
d
N
B
T
I
(T
em
p
er
at
u
re
:
T
=
25
o
C
).
180
(called Case 1). In Figures 6.5(a) and 6.5(c), the maximum mean delays and variances
measured from the TRPs are compared with those from the critical paths at 11 stress
time points from time 0 to ≈10 years; the green bars show results from all the paths,
while the blue bars are the results of TRPs. Figures 6.5(b) and 6.5(d) calculate the
measurement errors accordingly. The results demonstrate that we can significantly
reduce the number of paths during the PDF test (from 1992 to 638, about 67.97%
reduction), while ensuring the accuracy. On average, the measurement error for the
maximum mean delay is ≈ 0.92% (Figure 6.5(b)), while the average measurement error
for the maximum variance is ≈ 1.63% (Figure 6.5(d)).
Figure 6.6 shows the comparison results of much more realistic aging condition-
s (called Case 2). For the same 1992 paths extracted from s9234, we repeated MC
simulations for 100 rounds. During each round, we performed 5000 MC simulations
considering process variations and NBTI degradation. Each gate was assigned a ran-
dom stress probability in each round of MC simulations. The temperature was set to
25oC. This case is closer to the actual circuit stress conditions. Figures 6.6(a) and
6.6(c) compare the maximum mean delays measured from all critical paths, indicated
in green curve, with those from the TRPs indicated in the blue curve. x-axis represents
each simulation round, which includes 5000 MC simulations. Figures 6.6(b) and 6.6(d)
show the measurement errors. On average, for these 100 rounds of simulations, the error
for maximum mean delay and variance are 1.10% and 1.51%, respectively.
To verify that our TRPs closely follow the maximum mean delay and variance of
the critical paths considering process variations and NBTI, we repeat the 100 round of
181
0
20
40
60
80
10
0
16
00
16
50
17
00
17
50
In
de
x
Max. mean delay (ps)
 
 
M
ax
. m
ea
n 
de
lay
 o
f T
RP
s
M
ax
. m
ea
n 
de
lay
 o
f a
ll p
at
hs
(a
)
C
o
m
p
a
ri
so
n
o
f
m
a
x
.
m
ea
n
d
el
a
y
s.
0
20
40
60
80
10
0
012345
In
de
x
Max. mean delay error (%)
(b
)
E
rr
o
rs
o
f
m
a
x
.
m
ea
n
d
el
a
y
s.
0
20
40
60
80
10
0
16
5
17
0
17
5
18
0
18
5
In
de
x
Max. variance (ps)
 
 
M
ax
. v
ar
ian
ce
 o
f T
RP
s
M
ax
. v
ar
ian
ce
 o
f a
ll p
at
hs
(c
)
C
o
m
p
a
ri
so
n
o
f
m
a
x
.
v
a
ri
a
n
ce
.
0
20
40
60
80
10
0
0246
In
de
x
Max. variance error (%)
(d
)
E
rr
o
rs
o
f
m
a
x
.
v
a
ri
a
n
ce
s.
F
ig
.
6
.6
:
C
as
e
2:
1
00
co
m
p
a
ri
so
n
s
of
m
ax
im
u
m
m
ea
n
d
el
ay
s
an
d
va
ri
an
ce
s
m
ea
su
re
d
fr
om
al
l
19
92
cr
it
ic
al
p
at
h
s
an
d
63
8
T
R
P
s
fr
om
b
en
ch
m
a
rk
ci
rc
u
it
s9
23
4
ov
er
ti
m
e
co
n
si
d
er
in
g
p
ro
ce
ss
va
ri
at
io
n
s
an
d
N
B
T
I
(T
em
p
er
at
u
re
:
T
=
25
o
C
,
S
tr
es
s
ti
m
e:
t
≈
10
y
ea
rs
).
182
T
a
b
le
6
.2
:
C
o
m
p
a
ri
so
n
of
th
e
m
a
x
im
u
m
m
ea
n
d
el
ay
s
an
d
va
ri
an
ce
s
m
ea
su
re
d
fr
om
al
l
19
92
cr
it
ic
al
p
at
h
s
an
d
63
8
T
R
P
s
fr
om
b
en
ch
m
a
rk
ci
rc
u
it
s9
23
4
a
t
fo
u
r
st
re
ss
ti
m
e
p
oi
n
ts
(t
=
2.
5,
5,
7.
5,
an
d
10
ye
ar
s)
co
n
si
d
er
in
g
p
ro
ce
ss
va
ri
at
io
n
s
an
d
N
B
T
I
(T
em
p
er
at
u
re
:
T
=
25
o
C
,
50
o
C
,
75
o
C
,
10
0
o
C
,
an
d
12
5
o
C
)
(f
or
ea
ch
ca
se
,
10
0
ro
u
n
d
of
M
C
si
m
u
la
ti
on
s
w
er
e
ru
n
).
T
em
p
er
a
tu
re
T
(o
C
)
M
ea
su
re
m
en
t
E
rr
o
r
A
n
a
ly
si
s
(%
)
t
=
2
.5
y
ea
rs
t
=
5
y
ea
rs
t
=
7
.5
y
ea
rs
t
=
1
0
y
ea
rs
M
ea
n
D
el
ay
E
rr
o
r
V
a
ri
a
n
ce
E
rr
o
r
M
ea
n
D
el
ay
E
rr
o
r
V
a
ri
a
n
ce
E
rr
o
r
M
ea
n
D
el
ay
E
rr
o
r
V
a
ri
a
n
ce
E
rr
o
r
M
ea
n
D
el
ay
E
rr
o
r
V
a
ri
a
n
ce
E
rr
o
r
2
5
0
0
0
0
.4
5
0
.3
6
0
.8
4
1
.1
0
1
.5
1
5
0
1
.0
4
1
.0
2
1
.1
8
1
.1
6
1
.1
2
1
.0
5
1
.2
2
1
.1
1
7
5
0
.9
8
0
.8
8
1
.1
0
1
.0
3
0
.9
6
0
.8
8
1
.0
8
1
.0
0
1
0
0
0
.9
7
1
.4
0
1
.0
7
1
.5
9
1
.0
1
1
.5
5
1
.0
1
1
.6
1
1
2
5
1
.0
6
1
.3
0
0
.9
4
1
.2
3
1
.0
3
1
.4
1
1
.1
2
1
.5
3
A
v
er
a
g
e
E
rr
o
rs
A
v
er
a
g
e
M
ea
n
D
el
ay
E
rr
o
r
(%
)
A
v
er
a
g
e
V
a
ri
a
n
ce
E
rr
o
r
(%
)
0
.9
2
1
.1
3
183
MC simulations with the temperature set as 50oC, 75oC, 100oC, and 125oC. Each gate
is still assigned a random stress probability during each round of MC simulation. The
results are shown in Table 6.2. Corresponding to each temperature from Rows 2 to 6,
mean delay errors and variance errors corresponding to four time points are compared.
In the last Row 7, the average mean delay error and variance error are calculated, which
are as low as 0.92% and 1.13%, respectively.
We summarize the TRP identification results of 6 benchmark circuits in Table 6.3
as well as measurement error analysis. In Table 6.3, Column 2 lists the number of critical
paths (NCP ) selected by STA as the top 20% of the paths. Other threshold values
could be used as well. Column 3 shows the number of unique gates on the critical
paths. The number of TRPs (NTRP ) identified using our flow is shown in Column 4.
The corresponding reduction in the number of paths is shown in Column 5, which is
calculated as R = NCP−NTRPNCP ×100. Column 6 shows the time overhead (TOH , e.g., CPU
runtime) introduced by TRP identification from the critical paths using our flow. We
evaluate the efficiency of TRP identification under two temperature scenarios, 25oC and
125oC for these benchmark circuits. Under each temperature scenario, the first column
(Column 7 for temperature 25oC, and Column 9 for temperature 125oC) shows average
errors comparing the maximum mean delay measured from the TRPs with that of all the
critical paths; the second column (Column 8 for temperature 25oC, and Column 10 for
temperature 125oC) shows average errors comparing the maximum variance measured
from TRPs with that of the critical paths. The last row in the table shows the average
values.
184
T
a
b
le
6
.3
:
T
R
P
id
en
ti
fi
ca
ti
o
n
co
n
si
d
er
in
g
p
ro
ce
ss
va
ri
at
io
n
s
an
d
N
B
T
I
(T
=
25
o
C
,
an
d
12
5o
C
,
10
0
ro
u
n
d
of
M
C
si
m
u
la
ti
on
s)
B
en
ch
m
a
rk
C
ir
cu
it
N
u
m
b
er
o
f
T
es
ta
b
le
C
ri
ti
ca
l
P
a
th
s
(N
C
P
)
N
u
m
b
er
o
f
U
n
iq
u
e
G
a
te
s
N
u
m
b
er
o
f
T
R
P
s
(N
T
R
P
)
R
ed
u
ct
io
n
(R
)
T
im
e
O
v
er
h
ea
d
(T
O
H
se
co
n
d
)
M
ea
su
re
m
en
t
E
rr
o
r
A
n
a
ly
si
s
T
=
2
5
o
C
T
=
1
2
5
o
C
M
ea
n
D
el
a
y
E
rr
o
r
(%
)
V
a
ri
a
n
ce
E
rr
o
r
(%
)
M
ea
n
D
el
a
y
E
rr
o
r
(%
)
V
a
ri
a
n
ce
E
rr
o
r
(%
)
s5
3
7
8
1
9
5
9
1
6
2
3
6
1
9
6
8
.4
0
%
1
5
.1
0
1
.1
8
2
.1
5
1
.0
0
1
.7
4
s9
2
3
4
1
9
9
2
2
1
0
6
6
3
8
6
7
.9
7
%
2
0
.8
9
1
.1
0
1
.5
1
1
.1
2
1
.5
3
s1
3
2
0
7
1
6
1
0
1
6
6
9
7
4
1
5
3
.9
8
%
3
2
.7
7
1
.7
0
1
.6
9
1
.1
4
1
.5
9
s1
5
8
5
0
7
9
9
1
1
6
6
0
5
0
2
9
3
.7
2
%
7
6
.8
7
2
.6
7
2
.9
0
2
.7
3
3
.3
7
s3
8
4
1
7
1
2
7
6
3
3
3
7
6
7
2
3
9
4
.3
4
%
3
5
6
.0
2
2
.8
7
2
.6
9
2
.6
5
2
.8
1
s3
8
5
8
4
1
1
2
5
2
1
2
7
3
7
3
1
5
6
7
1
.9
5
%
7
9
1
.7
2
3
.0
0
2
.8
5
2
.9
2
2
.8
9
A
v
er
a
g
e
7
5
.0
6
%
2
1
5
.5
6
2
.0
9
2
.2
9
1
.9
3
2
.3
2
185
Table 6.3 shows that our flow identifies very few TRPs compared with the original
large number of critical paths. On average for these 6 benchmark circuits, our flow
reduces path count by 75.06%, and can obtain ≤ 2.32% errors for both maximum
mean delay and variance estimation. Comparing the numbers of critical paths with
the TRP numbers, we expect the number of patterns to decrease significantly. Since
the maximum mean delay and variance from TRPs closely follow those of the critical
paths, testing TRPs can accurately indicate circuit path delay conditions. Instead of
generating a large number of PDF patterns for all the critical paths, a much reduced
number of PDF patterns only for TRPs can effectively verify circuit performance. For
each benchmark circuit, the critical paths and TRPs are given to the ATPG tool for PDF
pattern generation. The results are shown in Table 6.4, where Rows 2 to 7 compare the
number of PDF patterns for all critical paths with those for the TRPs corresponding to
each benchmark circuit. The reductions in the number of PDF patterns are calculated
as well and are shown in Column 4. As seen in the last row, the average PDF pattern
reduction (RPDF ) is 64.32%.
6.5 Summary
In this chapter, we proposed a testable representative paths (TRPs) identification flow
to reduce the number of paths for PDF pattern generation. The maximum mean delay
and variance of TRPs follow closely maximum mean delay and variance of all critical
paths considering process variations and NBTI over time. Simulations show that we can
use only a few selected TRPs to predict path delays with very high accuracy. TRPs can
186
Table 6.4: PDF pattern reduction analysis.
Benchmark
Circuit
# of PDF Patterns for
Testable Critical Paths
# of PDF Patterns
for TRPs
PDF Patterns
Reduction (%)
s5378 988 506 48.79
s9234 912 312 65.79
s13207 442 319 27.83
s15850 3408 237 93.05
s38417 1630 212 86.99
s38584 4125 1527 62.98
Average Reduction (RPDF ) 64.24
be used in LBIST methods or multi-core system environments to effectively evaluate
circuit performance in the field, where a much smaller memory is needed to store these
PDF patterns.
Chapter 7
Conclusions
As semiconductor technology advances to 45nm and beyond, circuit design is facing
severe reliability issues. Timing related parametric failures are growing worse as tech-
nology further scales, which leads to increase of test cost and increased yield loss and
escape. As one of the major concerns, aging-related failure mechanisms now play an
highly important role threatening the product lifetime and performance quality, espe-
cially, NBTI and HCI. NBTI and HCI are highly dependent on temperature, operational
time, and circuit parametric information such as gate capacitive load, stress voltage, and
gate input signal slew. NBTI and HCI shift circuit delay over time at a rate jointly
determined by all the related parameters. Another important design concern is the
impact from process variations, which have been a long-time research topic. Process
variations deviate the gate/path delay into a probability distribution with its mean and
variation dependent on the silicon variations on dies.
Conventional aging analysis is performed using SPICE tools, such as Synopsys
HSPICE, Cadence Relxpert, and Mentor Graphics Elder. These gate-level SPICE tools
provide highly accurate analysis results. However, these tools are not applicable to in-
dustry designs for their long CPU runtime on large circuits. Another popular method
187
188
for circuit reliability analysis is using on-chip sensors for in-the-field timing informa-
tion. These sensors measure the sophisticatedly selected critical paths to extract timing
information over time. The immediate difficulties include:
• The routing and layout of the original circuitry may be disturbed, since the sensors
are inserted at locations chosen according to circuit timing analysis after layout
stage.
• The inaccuracy in selection of the critical paths for sensor insertion and monitoring
highly impact the measurement of circuit performance degradation.
Path delay test is another effective way to evaluate circuit performance. However,
due to the extremely large number of critical paths caused by process variations and
aging effects, the test process is highly time-consuming.
In this thesis, an aging-aware path delay analysis flow is proposed, which is appli-
cable to large-scale industry designs. Chapter 2 begins with a comprehensive analysis
of NBTI and HCI effects with an abundance of simulations results. Using existing
mathematical models and a pre-generated LUT using the reliability tool, APD flow is
proposed. APD flow can analyze large circuit and provide high accuracy. Compared
with reliability tools and mathematical models, the proposed APD flow is efficient in
CPU runtime and accurate in analysis. Gate sizing method taking advantage of APD
analysis results can effectively compensate circuit aging degradation with a minimal
area and power overhead.
Chapter 3 proposes the RCRP design. RCRP synthesize stand-alone circuitry
for on-line aging analysis and delay measurement. RCRP is synthesized based on the
189
structure of the functional circuit and the degradation rate of different critical paths.
According to accuracy requirements, RCRP can sample the actual workload from the
functional circuit without disturbing the operations. Or RCRP can be synthesized
with off-path pins on RCRP connected to ground or supply voltage to ensure that the
maximum delay measured from RCRP is the upper-bound of the actual largest delay
in the functional circuit.
Chapter 4 focuses on the clock slew reduction considering both NBTI and pro-
cess variations. Taking advantage of the timing safety margin for NBTI and process
variations, an optimization algorithm is proposed to further reduce the skew.
Chapters 5 and 6 devote efforts in reducing PDF patterns considering performance
impacts. In Chapters 5, a PDF pattern reduction scheme is proposed by analyzing the
actual silicon variations using measurements of distributed on-chip ROs. Instead of
depending on STA and SSTA timing analysis, RO measurement can provide the in-
formation of actual silicon variations. PDF pattern generated based on actual silicon
variations are small in number and accurate in PDF test. In Chapters 6, a TRP s-
election flow is proposed. TRPs are selected based on the circuit topology and gate
degradation under the impact of NBTI and process variations. Results demonstrate
that the maximum mean delay and variance of TRPs follow closely the maximum mean
delay and variance of all the critical paths. Instead of generating PDF patterns for an
extremely large number of critical paths, PDF pattern for TRPs are significantly small
in number and highly accurate in evaluation of circuit performance.
Bibliography
[1] G.E. Moore, Cramming More Components Onto Integrated Circuits, in Proceedings
of the IEEE, vol.86, no.1, pp.82-85, Jan. 1998.
[2] D. Kececioglu, Reliability Engineering Handbook, Part I and II, Prentice Hall, En-
glewood Cliffs, New Jersey, 1991.
[3] W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu, and Y. Cao, The Impact
of NBTI Effect on Combinational Circuit: Modeling, Simulation, and Analysis, in
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol.18, no.2,
pp.173,183, Feb. 2010.
[4] A. T. Krishnan, C. Chancellor, S. Chakravarthi, P. E. Nicollian, V. Reddy, A.
Varghese, R. B. Khamankar, and S. Krishnan Material Dependence of Hydrogen
Diffusion: Implications for NBTI Degradation, in Electron Devices Meeting, 2005.
IEDM Technical Digest. IEEE International, pp.,691, 5-5, Dec. 2005.
[5] J. Martin-Martinez, S. Gerardin, E. Amat, R. Rodriguez, M. Nafria, X. Aymerich,
A. Paccagnella, and G. Ghidini, Channel-Hot-Carrier Degradation and Bias Tem-
perature Instabilities in CMOS Inverters, in Electron Devices, IEEE Transactions
on, vol.56, no.9, pp.2155,2159, Sept. 2009.
[6] ITRS, International technology roadmap for semiconductors (itrs) roadmap, 2011.
[Online]. Available: http://www.itrs.net
[7] Y. Lee, S. Jacobs, S. Stadler, N. Mielke, and R. Nachman, The Impact of PMOST
Bias-Temperature Degradation on Logic Circuit Reliability Performance, in Micro-
electronics Reliability, vol. 45, issue 1, pp. 107-114, Jan. 2005.
[8] T. Nigam, Impact of Transistor Level Degradation on Product Reliability, in Custom
Integrated Circuits Conference, 2009. CICC ’09. IEEE, pp.431,438, 13-16, Sept.
2009.
[9] N. Kimizuka, T. Yamamoto, K. Yamaguchi, K. Imai, and T. Horiuchi, Impact of
Bias Temperature Instability for Direct Tunneling Ultrathin Gate Oxide on MOS-
FET Scaling, in VLSI Technology, pp. 73-74, 1999.
[10] G. Chen, M. Li, C. Ang, J. Zheng, and D. Kwong, Dynamic NBTI of p-MOS
Transistors and Its Impact on MOSFET Scaling, in Electron Device Letters, IEEE,
vol.23, no.12, pp.734,736, Dec. 2002.
190
191
[11] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, An Analytical Model for Negative
Bias Temperature Instability (NBTI), in Computer-Aided Design, 2006. ICCAD
’06. IEEE/ACM International Conference on, vol., no., pp.493,496, 5-9, Nov. 2006.
[12] M. A. Alam, A Critical Examination of the Mechanics of Dynamic NBTI for p-
MOSFETs, in IEEE International Electronic Devices Meeting, pp. 14.4.1-14.4.4,
Dec. 2003.
[13] S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula, Predictive Mod-
eling of the NBTI Effect for Reliable Design, in Custom Integrated Circuits Con-
ference, 2006. CICC ’06. IEEE, pp.189,192, 10-13, Sept. 2006.
[14] P. Fang, J. Tao, J. F. Chen, and C. Hu, Design in Hot-carrier Reliability for High
Performance Logic Applications, in Custom Integrated Circuits Conference, 1998.
Proceedings of the IEEE 1998, pp.525,531, 11-14, May 1998.
[15] L. F. Wu, J. K. Fang, H. Yonezawa, Y. Kawakami, N. Iwanishi, H. Yan, P. Chen, A.
Chen, N. Koike, Y. Okamoto, C. Yeh, and Z. Liu, GLACIER: A Hot Carrier Gate
Level Circuit Characterization and Simulation System for VLSI Design, in Quality
Electronic Design, 2000. ISQED 2000. Proceedings. IEEE 2000 First International
Symposium on, pp. 73-79, 2000.
[16] M. Dai, C. Gao, K. Yap, Y. Shan, Z. Cao, K. L, L. Wang, B. Cheng, and S.
Liu, A Model With Temperature-Dependent Exponent for Hot-Carrier Injection in
High-Voltage nMOSFETs Involving Hot-Hole Injection and Dispersion, Electron
Devices, IEEE Transactions on, vol.55, no.5, pp.1255,1258, May 2008.
[17] T. Grasser, W. Gos, V. Sverdlov, and B. Kaczer, The Universality of NBTI Relax-
ation and its Implications for Modeling and Characterization, in eliability physics
symposium, 2007. proceedings. 45th annual. ieee international, pp.268,280, 15-19,
April 2007.
[18] Y. Lu, L. Shang, H. Zhou, H. Zhu, F. Yang, and X. Zeng, Statistical Reliability
Analysis Under Process Variation and Aging Effects, in Design Automation Con-
ference, 2009. DAC ’09. 46th ACM/IEEE , pp.514,519, 26-31, July 2009.
[19] B. Paul, K. Kang, H. Kufluoglu, M. Alam, and K. Roy, Impact of NBTI on the
Temporal Performance Degradation of Digital Circuits, in Electron Device Letters,
IEEE, vol.26, no.8, pp.560,562, Aug. 2005.
[20] B. Paul, K. Kang, H. Kufluoglu, M. Alam, and K. Roy, Temporal Perfor-
mance Degradation under NBTI: Estimation and Design for Improved Reliability
of Nanoscale Circuits, in Design, Automation and Test in Europe, 2006. DATE
’06, vol.1, no., pp.1,6, 6-10, March 2006.
[21] S. Kumar, C. Kim, and S. Sapatnekar, NBTI-Aware Synthesis of Digital Circuits,
in Design Automation Conference, 2007. DAC ’07. 44th ACM/IEEE, pp.370,375,
4-8, June 2007. 2007.
192
[22] R. Zhang, J. Velamala, V. Reddy, V. Balakrishnan, E. Mintarno, S. Mitra, S. Kr-
ishnan, and Y. Cao, Circuit Aging Prediction for Low-Power Operation, in Custom
Integrated Circuits Conference, 2009. CICC ’09. IEEE, pp.427,430, 13-16, Sept.
2009.
[23] S. Kumar, C. Kim, and S. Sapatnekar, Adaptive Techniques for Overcoming Perfor-
mance Degradation due to Aging in Digital Circuits, in Design Automation Con-
ference, 2009. ASP-DAC 2009. Asia and South Pacific, pp.284,289, 19-22, Jan.
2009.
[24] E. Mintarno, J. Skaf, R. Zhang, J. Velamala, Y. Cao, S. Boyd, R. Dutton, and S.
Mitra, Optimized Self-Tuning for Circuit Aging, in Design, Automation & Test in
Europe Conference & Exhibition (DATE), 2010, pp.586,591, 8-12, March 2010.
[25] K. Kang, S. Gangwal, S. Park, and K. Roy, NBTI Induced Performance Degrada-
tion in Logic and Memory Circuits: How Effectively Can We Approach a Reliability
Solution?, in Design Automation Conference, 2008. ASPDAC 2008. Asia and South
Pacific, pp.726,731, 21-24, March 2008.
[26] M. DeBole, R. Krishnan, V. Balakrishnan, W. Wang, H. Luo, Y. Wang, Y. Xie,
Y. Cao, and N. Vijaykrishnan, New-Age: A Negative Bias Temperature Instability-
Estimation Framework for Microarchitectural Components, in International Journal
of Parallel Programming, pp. 37 417C431, 2009.
[27] W. Wang, Z. Wei, S. Yang, and Y. Cao, An efficient Method to Identify Critical
Gates under Circuit Aging, in International Conference on Computer Aided Design,
pp. 735-740, 2007.
[28] W. Wang, S. Yang, and Y. Cao, Node Criticality Computation for Circuit Timing
Analysis and Optimization under NBTI Effect, in Quality Electronic Design, 2008.
ISQED 2008. 9th International Symposium on, pp.763,768, 17-19, March 2008.
[29] A. Stempkovsky, A. Glebov, and S. Gavrilov, Calculation of Stress Probability for
NBTI-Aware Timing Anlysis, in Quality of Electronic Design, 2009. ISQED 2009.
Quality Electronic Design, pp.714,718, 16-18, March 2009.
[30] H. Konoura, Y. Mitsuyama, M. Hashimoto, and T. Onoye, Comparative Study on
Delay Degrading Estimation due to NBTI with Circuit/Instance/Transistor-level
Stress Probability Consideration, in Quality Electronic Design (ISQED), 2010 11th
International Symposium on, pp.646,651, 22-24, March 2010.
[31] D. Lorenz, G. Georgakos, and U. Schlichtmann, Aging Analysis of Circuit Timing
Considering NBTI and HCI, in On-Line Testing Symposium, 2009. IOLTS 2009.
15th IEEE International, pp.3,8, 24-26, June 2009.
[32] K. Kang et al., Efficient Transistor-Level Sizing Technique under Temporal Per-
formance Degradation due to NBTI, in Computer Design, 2006. ICCD 2006. In-
ternational Conference on, pp.216,221, 1-4, Oct. 2007.
193
[33] P.-C. Li, and I. N. Hajj, Computer-Aided Redesign of VLSI Circuits for Hot-Carrier
Reliability, in Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on, vol.15, no.5, pp.453,464, May 1996.
[34] Synopsys, http://www.synopsys.com/
[35] Cadence, http://www.cadence.com/
[36] Mentor Graphics, http://www.mentor.com/
[37] S. Park, K. Kang, and K. Roy, Reliability Implications of Bias-Temperature Insta-
bility in Digital ICs, in Design & Test of Computers, IEEE, vol.26, no.6, pp.8,17,
Nov.-Dec. 2009.
[38] S. Borkar, Designing reliable systems from unreliable components: The Challenges
of Transistor Variability and Degradation, Micro, IEEE, vol.25, no.6, pp.10,16,
Nov.-Dec. 2005.
[39] T. Austin, V. Bertacco, S. Mahlke, and Y. Cao, Reliable Systems on Unreliable
Fabrics, in Design & Test of Computers, IEEE, vol.25, no.4, pp.322,332, July-Aug.
2008.
[40] S. Ogawa, and N. Shiono, Generalized Diffusion-Reaction Model for the Low-Field
Charge Build up Instability at the Si− SiO2 Interface, in Phys. Rev. B., February
1995.
[41] R. Vattikonda, W. Wang, and Y. Cao, Modeling and Minimization of PMOS NBTI
Effect for Robust Nanometer Design, in Design Automation Conference, 2006 43rd
ACM/IEEE, July 2006.
[42] W. -K. Yeh et al., Temperature Dependence of Hot-Carrier-Induced Degradation
in 0.1 µm SOI nMOSFETs with Thin Oxide, in Electron Device Letters, IEEE,
vol.23, no.7, pp.425,427, July 2002.
[43] H. Tennakoon, and C. Sechen, Efficient and Accurate Gate Sizing with Piecewise
Convex Delay Models, in Design Automation Conference, 2005. Proceedings. 42nd,
pp.807,812, 13-17, June 2005.
[44] A. Kahng, and S. Muddu, Gate Load Delay Computation Using Analytical Models,
in Circuits and Systems, 1996., IEEE Asia Pacific Conference on, pp.433,436, 18-
21, Nov 1996.
[45] W. Wang, S. Yang, S. Bhardwaj, R. Vattikonda, S. Vrudhula, F. Liu, and Y. Cao,
The Impact of NBTI on the Performance of Combinational and Sequential Circuits,
in Design Automation Conference, 2007. DAC ’07. 44th ACM/IEEE, pp.364,369,
4-8, June 2007.
[46] S. Borkar, Electronics Beyond Nano-Scale CMOS, in Design Automation Confer-
ence, 2006 43rd ACM/IEEE, pp. 807-808, 2006.
194
[47] D. K. Schroder, and J. A. Babcock, Negative Bias Temperature Instability: Road
to Cross in Deep Submicron Silicon Semiconductor Manufacturing, in Journal of
Applied Physics, vol.94, no.1, pp.1,18, Jul 2003.
[48] J. G. Massey, NBTI: What We Know and What We Need to Know - A Tutorial
Addressing the Current Understanding and Challenges for the Future, in Integrated
Reliability Workshop Final Report, 2004 IEEE International, pp.199,211, 18-21
Oct. 2004.
[49] J. W. McPherson, Reliability Challenges for 45nm and Beyond, in Design Automa-
tion Conference, 2006 43rd ACM/IEEE, pp. 176–181, 2006.
[50] J. Chen, P. Song, T. M. Shaw, F. Stellari, L. Gignac, C. Breslin, D. Pfeiffer, and G.
Bonila, A Novel Integrated Reliability Test System for BEOL TDDB Study, in Istfa
2012: Conference Proceedings from the 38th International Symposium for Testing
and Failure Analysis, pp. 297, 2012.
[51] Y. Li, S. Makar, and S. Mitra, CASP: Concurrent Autonomous Chip Self-Test
Using Stored Test Patterns, in Design, Automation & Test in Europe, 2008. DATE
’08, pp.885,890, 10-14, March 2008.
[52] M. Agarwal, B. C. Paul, M. Zhang, and S. Mitra, Circuit Failure Prediction and
Its Application to Transistor Aging, in VLSI Test Symposium, 2007. 25th IEEE,
pp.277,286, 6-10, May 2007.
[53] J. C. Vazquez, V. Champac, A. M. Ziesemer, R. Reis, I. C. Teixeira, M. B. Santos,
and J. P. Teixeira, Low-Sensitivity to Process Variations Aging Sensor for Auto-
motive Safety-Critical Applications, in VLSI Test Symposium (VTS), 2010 28th,
pp.238,243, 19-22, April 2010.
[54] T.-H. Kim, R. Persaud, and C. H. Kim, Silicon Odometer: An On-Chip Reliability
Monitor for Measuring Frequency Degradation of Digital Circuits, in VLSI Circuits,
2007 IEEE Symposium on, pp.122,123, 14-16, June 2007.
[55] C. Zhuo, D. Sylvester, and D. Blaauw, Process Variation and Temperature-Aware
Reliability Management, in Design, Automation & Test in Europe Conference &
Exhibition (DATE), 2010, pp.580,585, 8-12, March 2010.
[56] E. Takeda, C. Y. Yang, and A. Miura-Hamada, Hot-Carrier Effects in MOS De-
vices, in Academic Press, 1995.
[57] E. Saneyoshi, K. Nose, and M. Mizuno, A Precise-Tracking NBTI-Degradation
Monitor Independent of NBTI Recovery Effect, in Solid-State Circuits Conference
Digest of Technical Papers (ISSCC), 2010 IEEE International, pp.192,193, 7-11,
Feb. 2010.
[58] C. Metra, M. Omana, T. M. Mak, A. Rahman, and S. Tam, Novel On-Chip Clock
Jitter Measurement Scheme for High Performance Microprocessors in Defect and
195
Fault Tolerance of VLSI Systems, 2008. DFTVS ’08. IEEE International Sympo-
sium on, pp.465,473, 1-3, Oct. 2008.
[59] X. Wang, M. Tehranipoor, and R. Datta, A Novel Architecture for On-Chip Path
Delay Measurement, in Test Conference, 2009. ITC 2009. International, pp.1,10,
1-6, Nov. 2009.
[60] J. Tschanz, K. Bowman, S. Walstra, M. Agostinelli, T. Karnik, and V. De, Tunable
Replica Circuits and Adaptive Voltage-Frequency Techniques for Dynamic Voltage,
Temperature, and Aging Variation Tolerance, in VLSI Circuits, 2009 Symposium
on, pp.112,113, 16-18 June 2009.
[61] D. Chua, E. Kolaczyk, and M. Crovella, Network Kriging, in IEEE Journal on
Selected Areas in Communications, vol. 24, no. 12, pp. 2263–2272, 2006.
[62] Q. Liu, and S. S. Sapatnekar, Synthesizing A Representative Critical Path for Post-
Silicon Delay Prediction, International Symposium on Physical Design, pp. 183–
190, 2009.
[63] L. Xie, and A. Davoodi, Representative path selection for post-silicon timing pre-
diction under variability, in Design Automation Conference (DAC), 2010 47th
ACM/IEEE, pp.386,391, 13-18, June 2010.
[64] J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and
V. De, Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die
Parameter Variations on Microprocessor Frequency and Leakage, in Solid-State
Circuits, IEEE Journal of, vol.37, no.11, pp.1396,1402, Nov 2002.
[65] J. Tschanz, S. Narendra, R. Nair, and V. De, Effectiveness of Adaptive Supply Volt-
age and Body Bias for Reducing the Impact of Parameter Variations in Low Power
and High Performance Microprocessors, in IEEE Journal of Solid-State Circuits,
vol. 38, pp. 826–829, 2003.
[66] Nangate at http://www.nangate.com/
[67] S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula, Predictive Mod-
eling of the NBTI Effect of Reliable Design, in Custom Integrated Circuits Confer-
ence, 2006. CICC ’06. IEEE, pp.189,192, 10-13, Sept. 2006.
[68] A. D. Mehta, Y. P. Chen, N. Menezes, D. F. Wong, and L. T. Pileggi, Clustering and
Load Balancing for Buffered Clock Tree Synthesis in Computer Design: VLSI in
Computers and Processors, 1997. ICCD ’97. Proceedings., 1997 IEEE International
Conference on, pp.217,223, 12-15, Oct 1997.
[69] C. C. Teng, Method for Balancing a Clock Tree, in United States Patent, Patent
No. 6351840, 2002.
[70] R. S. Rodgers, and S. T. Evans, Method and Apparatus for Minimizing Clock Skew
in a Balanced Tree when Interfacing to An Unbalanced Load, in United States
Patent, Patent No. 6769104, 2004.
196
[71] D. Borkovic, and K.S. McElvain, Reducing Clock Skew in Clock Gating Circuits, in
United States Patent, Patent No. 7082582, 2006.
[72] C. Chang, S. Huang, Y. Ho, J. Lin, H. Wang, and Y. Lu, Type-Matching Clock
Tree for Zero Skew Clock Gating, in Design Automation Conference, 2008. DAC
2008. 45th ACM/IEEE, pp.714,719, 8-13, June 2008.
[73] A. Nardi, A. Neviani, M. Quarantelli, S. Saxena, and C. Guardiani, Analysis of
the Impact of Process Variations on Clock Skew, in Semiconductor Manufacturing,
IEEE Transactions on, vol.13, no.4, pp.401,407, Nov 2000.
[74] B. Lu, J. Hu, G. Ellis, and H. Su, Process Variation Aware Clock Tree Routing, in
Proceedings of the 2003 international symposium on Physical design (ISPD), pp.
174-181, 2003.
[75] X. Wei, Y. Cai, and X. Hong, Clock Skew Scheduling Under Process Variations,
in Quality Electronic Design, 2006. ISQED ’06. 7th International Symposium on,
pp.6 pp.,242, 27-29, March 2006.
[76] C. J. Akl, R. A. Ayoubi, M, and A. Bayoumi, Post-Silicon Clock-nvert (PSCI) for
Reducing Process-Variation Induced Skew in Buffered Clock Networks, in Quality
of Electronic Design, 2009. ISQED 2009. Quality Electronic Design, pp.794,798,
16-18, March 2009.
[77] J. M. Cohn, Method for Reducing Design Effect of Wearout Mechanisms on Signal
Skew in Integrated Circuit Design, in United States Patent, Patent No. 6651230,
2003.
[78] A. Chakraborty, G. Ganesan, A. Rajaram, and D. Pan, Analysis and Optimization
of NBTI Induced Clock Skew in Gated Clock Trees, in Design, Automation & Test
in Europe Conference & Exhibition, 2009. DATE, pp.296,299, 20-24, April 2009.
[79] A. Chakraborty, and D. Pan, Skew Management of NBTI Impacted Gated Clock
Trees, in Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans-
actions on, vol.32, no.6, pp.918,927, June 2013.
[80] A. Chakraborty, and D. Pan, Critical-PMOS-Aware Clock Tree Design Methodology
for Anti-Aging Zero Skew Clock Gating, in Design Automation Conference (ASP-
DAC), 2010 15th Asia and South Pacific, pp.480,485, 18-21, Jan. 2010.
[81] V. Huard, M. Denais, and C. Parthasarathy, NBTI Degradation: From Physical
Mechanisms to Modeling, in the Microelectronics Reliability, pp. 1-23, Jan. 2006.
[82] M. Alam, A Critical Examination of the Mechanics of Dynamic NBTI for PMOS-
FETs, in Electron Devices Meeting, 2003. IEDM ’03 Technical Digest. IEEE In-
ternational, pp. 346-349, 2002.
197
[83] Y. Chen, Y. Xie, Y. Wang, and A. Takach Minimizing Leakage Power in Aging-
Bounded High-Level Synthesis with Design Time Multi-Vth Assignment, in De-
sign Automation Conference (ASP-DAC), 2010 15th Asia and South Pacific, p-
p.689,694, 18-21, Jan. 2010.
[84] Predictive Technology Model (PTM) Developed by Arizona State University
http://ptm.asu.edu/
[85] T. Sakurai, Alpha-Power Law MOSFET Model and its Application to CMOS Logic,
in Solid-State Circuits, IEEE Journal of, vol.25, no.2, pp.584,594, Apr 1990.
[86] W. Wang, V. Reddy, B. Yang, V. balakrishnan, S. Krishnan, and Y. Cao, Statistical
Prediction of Circuit Aging under Process Variations, in Custom Integrated Circuits
Conference, 2008. CICC 2008. IEEE, pp.13,16, 21-24, Sept. 2008.
[87] J. Chen, S. Wang, and M. Tehranipoor, Efficient selection and analysis of critical-
reliability paths and gates, in Proceedings of the great lakes symposium on VLSI, ser.
GLSVLSI ’12. New York, NY, USA: ACM, 2012, pp. 45–50. [Online]. Available:
http://doi.acm.org/10.1145/2206781.2206793
[88] J. Chen, S. Wang, and M. Tehranipoor, Critical-Reliability Path Identification and
Delay Analysis, in the ACM Journal on Emerging Technologies in Computing Sys-
tems (JETC), 2012.
[89] K. Roy, S. Mukhopadhyay, and H. Mahmoodi Meymand, Leakage Current Mech-
anisms and Leakage Reduction Technique in Deep-Submicrometer CMOS Circuits,
in the Proceedings of the IEEE (PIEEE), Vol. 91, No. 2, pp. 305-327, Feb. 2003.
[90] V. Iyengar, J. Xiong, S. Venkatesan, V. Zolotov, D. Lackey, P. Habitz, and
C. Visweswariah, Statistical test to uncover process variations.
[91] J. Chen and M. Tehranipoor, A Novel Flow for Reducing Clock Skew Consider-
ing NBTI Effect and Process Variations, in International Symposium on Quality
Electronics Design (ISQED), pp. 327-334, 2013.
[92] Mango, C.-T. Chao, L.-C. Wang, and K.-T. Cheng, Pattern Selection for Testing
of Deep Sub-Micron Timing Defects, in Design, Automation and Test in Europe
Conference and Exhibition, 2004. Proceedings, vol.2, pp.1060,1065 Vol.2, 16-20,
Feb. 2004.
[93] C.-J. Chang and T. Kobayashi, Test Quality Improvement with Timing-Aware
ATPG: Screening Small Delay Defect Case Study, in Test Conference, 2008. ITC
2008. IEEE International, pp.1,1, 28-30, Oct. 2008.
[94] H. Chen, B. Lu, and D.-Z. Du, Static Timing Analysis with False Paths, in Com-
puter Design, 2000. Proceedings. 2000 International Conference on, pp.541,544,
2000.
198
[95] G. Yu, W. Dong, Z. Feng, and P. Li, Statistical Static Timing Analysis Consid-
ering Process Variation Model Uncertainty, Computer-Aided Design of Integrated
Circuits and Systems, IEEE Transactions on, vol.27, no.10, pp.1880,1890, Oct.
2008.
[96] M. Sivaraman, and A. Strojwas, Delay Fault Coverage: A Realistic Metric and
An Estimation Technique for Distributed Path Delay Faults, in Computer-Aided
Design, 1996. ICCAD-96. Digest of Technical Papers., 1996 IEEE/ACM Interna-
tional Conference on, pp.494,501, 10-14, Nov. 1996.
[97] B. Hargreaves, H. Hult, and S. Reda, Within-Die Process Variations: How Accu-
rately Can They Be Statistically Modeled, in Design Automation Conference, 2008.
ASPDAC 2008. Asia and South Pacific, pp. 524 –530, March 2008.
[98] S. Padmanaban, and S. Tragoudas, A Critical Path Selection Method for Delay
Testing, in Test Conference, 2004. Proceedings. ITC 2004. International, pp. 232
– 241, Oct. 2004.
[99] J.-J. Liou, A. Krstic, L.-C. Wang, and K.-T. Cheng, False-Path-Aware Statistical
Timing Analysis and Efficient Path Selection for Delay Testing and Timing Vali-
dation, in Design Automation Conference, 2002. Proceedings. 39th, pp. 566 – 569,
2002.
[100] C. Visweswariah, K. Ravindran, K. Kalafala, S. Walker, S. Narayan, D. Beece,
J. Piaget, N. Venkateswaran, and J. Hemmett, First-Order Incremental Block-
Based Statistical Timing Analysis, Computer-Aided Design of Integrated Circuits
and Systems, IEEE Transactions on, vol. 25, no. 10, pp. 2170 –2180, Oct. 2006.
[101] V. Iyengar, J. Xiong, S. Venkatesan, V. Zolotov, D. Lackey, P. Habitz, and
C. Visweswariah, Variation-Aware Performance Verification Using At-Speed Struc-
tural Test and Statistical Timing, in Computer-Aided Design, 2007. ICCAD 2007.
IEEE/ACM International Conference on, pp. 405 –412, Nov. 2007.
[102] A. Agarwal, F. Dartu, and D. Blaauw, Statistical Gate Delay Model Considering
Multiple Input Switching, in Proceedings of the 41st annual Design Automation
Conference, ser. DAC ’04. New York, NY, USA: ACM, 2004, pp. 658–663. [Online].
Available: http://doi.acm.org/10.1145/996566.996746
[103] C. Guardiani, S. Saxena, P. McNamara, P. Schumaker, and D. Coder, An Asymp-
totically Constant, Linearly Bounded Methodology for The Statistical Simulation
of Analog Circuits Including Component Mismatch Effects, in Design Automation
Conference, 2000. Proceedings 2000, pp. 15–18, 2000.
[104] J. Singh, Z.-Q. Luo, and S. Sapatnekar, A Geometric Programming-Based Worst
Case Gate Sizing Method Incorporating Spatial Correlation, Computer-Aided De-
sign of Integrated Circuits and Systems, IEEE Transactions on, vol.27, no.2, p-
p.295,308, Feb. 2008.
199
[105] Q. Liu, and S. S. Sapatnekar, A Framework for Scalable Postsilicon Statis-
tical Delay Prediction Under Process Variations, Trans. Comp.-Aided Des. In-
teg. Cir. Sys., vol. 28, no. 8, pp. 1201–1212, Aug. 2009. [Online]. Available:
http://dx.doi.org/10.1109/TCAD.2009.2021732
[106] J. Richard, and W. Dean, Applied Multivariate Statistical Analysis (6th ed), April
2, 2007.
[107] A. Agarwal, D. Blaauw, V. Zolotov, S. Sundareswaran, M. Zhao, K. Gala, and
R. Panda, Path-Based Statistical Timing Analysis Considering Inter-and Intra-Die
Correlations, in Proc. TAU, pp. 16–21, 2002.
[108] M. Bushnell, and V. Agrawal, Essentials of Electronic Testing (6th ed), 2000.
[109] A. Agarwal, D. Blaauw, V. Zolotov, S. Sundareswaran, M. Zhao, K. Gala, and
R. Panda, Statistical Delay Computation Considering Spatial Correlations, in De-
sign Automation Conference, 2003. Proceedings of the ASP-DAC 2003. Asia and
South Pacific, pp. 271–276, 2003.
[110] M. Sharma, and J. Patel, Finding A Small Set of Longest Testable Paths That
Cover Every Gate, in Test Conference, 2002. Proceedings. International, pp. 974–
982, 2002.
[111] J. Xiong, Y. Shi, V. Zolotov, and C. Visweswariah, Pre-ATPG Path Selection
for Near Optimal Post-ATPG Process Space Coverage, in Computer-Aided Design
- Digest of Technical Papers, 2009. ICCAD 2009. IEEE/ACM International Con-
ference on, pp. 89–96, 2009.
[112] X. Lu, Z. Li, W. Qiu, D. M. H. Walker, and W. Shi, Longest Path Selection for
Delay Test Under Process Variation, in Proceedings of the 2004 Asia and South
Pacific Design Automation Conference, pp. 98–103, 2004.
[113] M.-C. Tsai, C.-H. Cheng, and C.-M. Yang, An All-Digital High-Precision Built-in
Delay Time Measurement Circuit, in VLSI Test Symposium, 2008. VTS 2008. 26th
IEEE, pp. 249–254, 2008.
[114] S. Wang, J. Chen, and M. Tehranipoor, Representative Critical Reliability Paths
for Low-Cost and Accurate On-Chip Aging Evaluation, in Computer-Aided Design
(ICCAD), 2012 IEEE/ACM International Conference on, pp. 736–741, 2012.
[115] C. D. Meyer, Ed., Matrix Analysis and Applied Linear Algebra. Philadelphia,
PA, USA: Society for Industrial and Applied Mathematics, 2000.
[116] A. Drake, R. Senger, H. Deogun, G. Carpenter, S. Ghiasi, T. Nguyen, N. James,
M. Floyd, and V. Pokala, A Distributed Critical-Path Timing Monitor for a 65nm
High-Performance Microprocessor, in Solid-State Circuits Conference, ISSCC 2007.
Digest of Technical Papers. IEEE International, pp. 398C399, 2007.
