Efficient algorithms for fundamental statistical timing analysis problems in delay test applications of VLSI circuits by Wagner, Marcus
Efficient Algorithms for Fundamental Statistical
Timing Analysis Problems in Delay Test Applications
of VLSI Circuits
Von der Fakultät Informatik, Elektrotechnik und
Informationstechnik der Universität Stuttgart
zur Erlangung der Würde eines
Doktors der Naturwissenschaften (Dr. rer. nat.)
genehmigte Abhandlung
Vorgelegt von
Marcus Wagner
aus Jena, Deutschland
Hauptberichter: Prof. Dr. rer. nat. Hans-Joachim Wunderlich
Mitberichter: Prof. Dr. rer. nat. Bernd Becker
Tag der mündlichen Prüfung: 03. November 2016
Institut für Technische Informatik
der Universität Stuttgart
2016

Contents
Acknowledgments xv
Abstract xvii
German Abstract — Zusammenfassung xix
1 Introduction 1
1.1 Trends and Challenges in Semiconductor Manufacturing . . . . . . . . 1
1.2 Delay Testing of VLSI circuits . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Scan Delay Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Delay Fault Models . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Variation-Aware Delay-Testing . . . . . . . . . . . . . . . . . . . 6
1.3 Statistical Timing Analysis for Path Delay Fault Testing . . . . . . . . . 7
1.3.1 Transition Propagation Condition . . . . . . . . . . . . . . . . . 7
1.3.2 Invalidation of Path Delay Fault Tests . . . . . . . . . . . . . . . 8
1.3.3 Probabilistic Sensitization Analysis . . . . . . . . . . . . . . . . 9
1.4 Statistical Timing Analysis for Small Delay Fault Testing . . . . . . . . 10
1.4.1 Testing for Small Delay Faults . . . . . . . . . . . . . . . . . . . 10
1.4.2 Invalidation and Optimization of Small Delay Fault Tests . . . 11
1.4.3 Computation of Target Paths Delay Fault Probability . . . . . . 12
1.5 Variation-Aware Delay Fault Simulation . . . . . . . . . . . . . . . . . . 13
1.5.1 Block-Based Statistical Timing Analysis . . . . . . . . . . . . . . 13
1.5.2 Efficient Computation of Statistical SUM and MAX-Operations 14
1.6 Organisation and Contributions of this Work . . . . . . . . . . . . . . . 15
2 Fundamentals of Statistical Timing Analysis 17
2.1 Review of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Sources of Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Environmental Variations . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Model Inadequacy and Numerical Errors . . . . . . . . . . . . . 22
2.2.4 Other Sources of Variability . . . . . . . . . . . . . . . . . . . . . 22
2.2.5 Impact on Important Electrical Parameters of Transistors . . . . 22
2.3 Classification of Process Variations . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Systematic Variations . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Non-Systematic Variations . . . . . . . . . . . . . . . . . . . . . 23
2.4 Formal Modelling of Variability . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.3 Canonical Delay Model . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.4 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iii
Contents
2.5 Circuit Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Structural Circuit Instance Modelling . . . . . . . . . . . . . . . 38
2.5.2 Behavioural Modelling of Circuit Instance . . . . . . . . . . . . 39
3 State of the Art 43
3.1 Evaluation of Path Delay Fault Tests . . . . . . . . . . . . . . . . . . . . 43
3.1.1 Proof of Path Sensitization by Test Vector-Pair . . . . . . . . . . 43
3.1.2 Probability that Path is Sensitized by Test Vector-Pair . . . . . . 44
3.2 Evaluation of Small Delay Fault Tests . . . . . . . . . . . . . . . . . . . 45
3.3 Statistical SUM and MAX-Operations . . . . . . . . . . . . . . . . . . . 47
3.3.1 Normal Distribution based SUM-operation . . . . . . . . . . . . 47
3.3.2 Normal Distribution based MAX-operation . . . . . . . . . . . 48
3.3.3 Latest Approaches to Improve Accuracy of MAX-operation . . 49
4 Probabilistic Sensitization Analysis 51
4.1 Probabilistic Sensitization Analysis Problem . . . . . . . . . . . . . . . 51
4.2 Identification and Comparison of Sensitized Paths . . . . . . . . . . . . 53
4.3 Analysis of Target Path Sensitization by Test Vector-Pair . . . . . . . . 54
4.3.1 Tracing of Inconsistently Controlled Path . . . . . . . . . . . . . 55
4.3.2 Analysis of Transition Propagation Condition . . . . . . . . . . 56
4.4 Construction of Representative Subcircuit . . . . . . . . . . . . . . . . . 58
4.5 Simplified Probabilistic Sensitization Analysis . . . . . . . . . . . . . . 59
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Computation of Target Paths Delay Fault Probability 61
5.1 Computation of Critical Target Paths Delay Distribution . . . . . . . . 61
5.1.1 Identification of Target Paths . . . . . . . . . . . . . . . . . . . . 62
5.1.2 Computation of Delay Distribution of a Target Path . . . . . . . 62
5.1.3 Computation of Joint Delay Distribution . . . . . . . . . . . . . 63
5.2 Non-Incremental Computation . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.1 Computation of Critical Target Paths Delay Distribution . . . . 64
5.2.2 Dimension Reduction with Statistical MAX-Operation . . . . . 65
5.2.3 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Incremental Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.1 Approximation of Delay Distribution of New Test Vector-Pair . 68
5.3.2 Update of Test Vector-Pairs Delay Distribution . . . . . . . . . . 69
5.3.3 Approximation of Target Paths Delay Fault Probability . . . . . 71
5.3.4 Changing other Delay Test Parameters . . . . . . . . . . . . . . 71
5.4 Extension of Normal Distribution based MAX-operation . . . . . . . . 71
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 SUM and MAX-Operations based on Skew-Normal Distribution 75
6.1 The Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.1 Definition with Azzalini-Parametrization . . . . . . . . . . . . . 76
iv
Contents
6.1.2 Alternative Parametrization Adopted in this Work . . . . . . . 77
6.1.3 Equivalence of Parametrizations . . . . . . . . . . . . . . . . . . 81
6.2 Statistical SUM-operation . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 Statistical MAX-Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3.1 Computation of Mean Vector mˆ and Covariance Matrix Sˆ . . . 86
6.3.2 Computation and Properties of Third Multivariate Cumulant . 87
6.3.3 Estimation of Shape Vector lˆ . . . . . . . . . . . . . . . . . . . . 89
6.3.4 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4 Incremental Update of Inverse Cholesky Factor . . . . . . . . . . . . . 92
6.5 Quadratic Time Algorithm for MAX-operation . . . . . . . . . . . . . . 94
6.5.1 Fast Algorithm based on Restricted Skew-Normal Distribution 94
6.5.2 Transformation to Restricted Skew-Normal Distribution . . . . 96
6.5.3 Description of Quadratic Time Algorithm . . . . . . . . . . . . 98
6.6 Application to the Computation of max(X1, . . . ,Xn) . . . . . . . . . . . 99
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7 Experimental Evaluation 103
7.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2 Benchmark Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.3 Probabilistic Sensitization Analysis . . . . . . . . . . . . . . . . . . . . . 106
7.3.1 Evaluation of Representative Subcircuit . . . . . . . . . . . . . . 106
7.3.2 Simplified Probabilistic Sensitization Analysis . . . . . . . . . . 110
7.4 Computation of Target Paths Delay Fault Probability . . . . . . . . . . 113
7.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.4.2 Non-Incremental Computation . . . . . . . . . . . . . . . . . . . 114
7.4.3 Incremental Computation . . . . . . . . . . . . . . . . . . . . . . 116
7.4.4 Application to Variation-Aware Pattern Selection . . . . . . . . 119
7.5 SUM and MAX-Operations based on Skew-Normal Distribution . . . 121
7.5.1 Results for max(X1, . . . ,Xn) with Random m and S . . . . . . . 122
7.5.2 Results for max(X1, . . . ,Xn) for Critical Target Path Delays
X1, . . . ,Xn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.5.3 Analysis and Further Reduction of Approximation Error . . . . 126
7.5.4 Empirical Runtime Complexity of Algorithm for MAX-Operation 127
8 Conclusions 129
8.1 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.2 Ongoing Research and Future Work . . . . . . . . . . . . . . . . . . . . 130
A Mathematical Details 133
A.1 Moments involving the Maximum max(Xn 1,Xn) . . . . . . . . . . . . 133
A.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
A.2.1 Statistical SUM-Operation . . . . . . . . . . . . . . . . . . . . . . 137
A.2.2 Statistical MAX-Operation . . . . . . . . . . . . . . . . . . . . . 138
A.2.3 Quadratic Time Algorithm for MAX-operation . . . . . . . . . . 140
v
Contents
B Additional Result Tables 145
Bibliography 155
Curriculum Vitae of the Author 167
Publications of the Author 169
vi
List of Figures
Chapter 1
1.1 Delay testing using (enhanced) scan testing . . . . . . . . . . . . . . . . . . 3
1.2 Conversion of full scan design to combinational equivalent circuit . . . . 4
1.3 Circuit samples in which the same delay fault of fixed size is detected by
test vector-pair ’A’, ’B’ and ’C’ . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 SPICE simulation results of a 2-input NAND gate with a falling transition
at input ’a’, a rising transition at input ’b’ and a glitch at output ’y’ . . . . 7
1.5 Invalidation of a path delay fault test for the orange path a-b-c-d, which
has a path delay fault [Konuk00] . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Detection of small delay fault at gate output ’e’ over the longest sensitiz-
able path (orange) through the fault site [Goel13] . . . . . . . . . . . . . . 10
1.7 Small delay fault test optimization . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Statistical SUM-operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Statistical MAX-operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Chapter 2
2.1 Model of a single-gate FinFET structure without random-dopant fluctu-
ation (left) and with random-dopant fluctuation (right), adopted from
[Leung12a] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Taxonomy of process variations [Blaau08] . . . . . . . . . . . . . . . . . . . 24
2.3 Description of a box plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Probability density function of the bivariate normal distribution . . . . . 37
2.5 Structural representation of a circuit instance at gate level . . . . . . . . . 39
2.6 Definition of propagation delay and rise and fall times [Rabae03] . . . . . 40
Chapter 3
3.1 Box plot of the absolute error jej of the approximation of the distribution
of max(X1, . . . ,Xn) by using normal distribution based MAX-operation,
where X1, . . . ,Xn are critical target path delays . . . . . . . . . . . . . . . . 49
Chapter 4
4.1 Circuit and representative subcircuit (red) for a given test vector-pair . . 53
4.2 Transition t violates and t˜ satisfies dynamic sensitization condition of
NOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Input and output waveforms of XNOR gate u7 in the circuit instance and
its subcircuit after the simulation of a test vector-pair . . . . . . . . . . . . 58
vii
Figures
Chapter 5
5.1 Flowchart of the non-incremental algorithm . . . . . . . . . . . . . . . . . 65
5.2 Runtime for approximating the integral in eq. (5.15) . . . . . . . . . . . . . 66
5.3 Computation of the maximum delay of the target paths in two steps . . . 67
5.4 Flowchart of the incremental algorithm . . . . . . . . . . . . . . . . . . . . 68
5.5 Approximation of max(Xk,1,Xk,2,Xk,3,Xk,4) using normal distribution
based MAX-operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6 Extension of the mean vector mY and the covariance matrix SY of the
normal distribution approximation of Y , after insertion of a test vector-pair 70
5.7 Example for the forest data structure of the incremental algorithm, where
each tree represents the computation of the delay of a test vector-pair. . . 70
Chapter 6
6.1 Probability density function of univariate skew-normal distribution . . . 78
6.2 Probability density function of the bivariate skew-normal distribution . . 80
6.3 Example for the application of the MAX-operation in a logic level . . . . 85
6.4 Probability density functions of max(X1,X2) and its approximation by a
normal distribution and a skew-normal distribution . . . . . . . . . . . . . 86
Chapter 7
7.1 CDF of the circuit delay D for several NXP benchmark circuits . . . . . . 105
7.2 Average probability of observing an inconsistent delay test result with the
subcircuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3 Relative size of subcircuit and relative size of joint input cone of all critical
paths that exist in the subcircuit . . . . . . . . . . . . . . . . . . . . . . . . 108
7.4 Speedup of Monte Carlo simulation by constructing and simulating only
the subcircuit, compared to Monte Carlo simulation of complete circuit . 108
7.5 Average speedup of non-incremental algorithm compared to Monte Carlo
simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.6 Error d caused by approximating the delay fault detection probability
with the target paths delay fault probability approximation of the non-
incremental algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.7 Average speedup of incremental algorithm after insertion of test vector-
pair, compared to a Monte Carlo simulation of the extended test subset . 117
7.8 Error e caused by approximating the delay fault detection probability with
the target paths delay fault probability approximation of the incremental
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.9 Average number of test vector-pairs required for the detection of a small
delay fault of fixed size under the impact of delay variations . . . . . . . . 120
7.10 Error in test subset size when using target paths delay fault probability
approximation instead of delay fault detection probability . . . . . . . . . 121
viii
Figures
7.11 Log-log plot of the runtime for the approximation of max(X1, . . . ,Xn),
where (X1, . . . ,Xn)  Nn(m,S)with random m and S and large correlation
coefficient variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.12 Error e of max(X1, . . . ,Xn) approximation, where (X1, . . . ,Xn)  Nn(m,S)
with random m and S and large correlation coefficient variations . . . . . 124
7.13 Absolute error jej of the approximation of the maximum max(X1, . . . ,Xn)
of critical target path delays X1, . . . ,Xn . . . . . . . . . . . . . . . . . . . . 125
7.14 Absolute error jej of the approximation of max(X1, . . . ,Xn) using covari-
ance matrix scaling, where X1, . . . ,Xn are critical target path delays . . . . 127
7.15 Log-log plot of the runtime of the proposed algorithm for the skew-normal
distribution based MAX-operation for a huge number of random variables 128
ix
List of Tables
Chapter 1
1.1 Variability limits set by International Technology Roadmap for Semicon-
ductors (ITRS) [ITRS12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2
2.1 Trends in FinFET device variability caused by line-edge roughness (LER)
and random-dopant fluctuation (RDF) [Leung12b, Leung12a] . . . . . . . 23
Chapter 7
7.1 Benchmark circuit characteristics . . . . . . . . . . . . . . . . . . . . . . . . 104
7.2 Sensitization condition satisfied by an off-path input . . . . . . . . . . . . 110
Appendix B
B.1 Average results for construction and Monte-Carlo simulation of subcircuits
S¯ and S , compared to Monte-Carlo Simulation of complete circuit con-
sidering subcircuit size, accuracy and runtime for subcircuit construction
and simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
B.2 Average results of simplified probabilistic sensitization analysis of critical
target paths over all test vector-pairs . . . . . . . . . . . . . . . . . . . . . . 147
B.3 Runtime and error of approximating the delay fault detection probability
by the target paths delay fault probability approximation of the non-
incremental algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
B.4 Runtime and absolute error of approximating the delay fault detection
probability by the target paths delay fault probability approximation of
the incremental algorithm, after insertion or removal of a test vector-pair 149
B.5 Runtime T and error e for the approximation of the distribution of
max(X1, . . . ,Xn), where (X1, . . . ,Xn)  Nn(m,S) with random m and
random S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
B.6 Runtime T and error e for the approximation of the distribution of the
maximum delay of n critical target paths, sensitized by 1, 5, 10 and 20 test
vector-pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B.7 Runtime T and error e for the approximation of the distribution of the
maximum delay of n critical target paths, sensitized by 1, 5, 10 and 20 test
vector-pairs, using covariance matrix scaling proposed in section 7.5.3 . . 152
x
List of Abbreviations and Acronyms
ATPG Automatic Test Pattern Generator
BVND FORTRAN routine to compute the CDF of the bivariate normal distribution,
developed by Alan Genz [Genz04]
CDF Cumulative Distribution Function
CMOS Complementary Metal-Oxide-Semiconductor
CMP Chemical Mechanical Polishing
CPU Central Processing Unit
DPPM Defect Parts Per Million
FDP Fault Detection Probability
FinFET Fin Field Effect Transistor
GPGPU General Purpose Computation on Graphics Processing Unit
HDL Hardware Description Language
ITRS International Technology Roadmap for Semiconductors
LER Line Edge Roughness
LHS Left-Hand Side
MPU Microprocessor unit
MVNDST FORTRAN routine to compute the CDF of the multivariate normal
distribution, developed by Alan Genz [Genz92]
NR-test Non-Robust test
NXP NXP Semiconductors N.V. global semiconductor manufacturer
OPC Optical Proximity Correction
PDF Probability Density Function
PPI Pseudo-Primary Input
PPO Pseudo-Primary Output
PSA Probabilistic Sensitization Analysis
RAM Random-Access Memory
RDF Random Dopant Fluctuation
RHS Right-Hand Side
SDF Standard Delay Format (IEEE standard for the representation and
interpretation of timing data in electronic design process)
TPDFP Target Paths Delay Fault Probability
TVTL FORTRAN routine to compute the CDF of the trivariate normal distribution,
developed by Alan Genz [Genz04]
VLSI Very Large Scale Integration
XNOR Digital logic gate whose function is the logical complement of the exclusive
OR
XOR Digital logic gate that implements the exclusive OR function
xi
List of Abbreviations and Acronyms
Units
GiB gibibyte, 230 bytes
KiB kibibyte, 210 bytes
MiB mebibyte, 220 byte
xii
List of Mathematical Symbols and Notation
Symbol Explanation
0 zero vector
0k,l (k l) zero matrix
In (n n) identity matrix
AT transpose of matrix A
A 1 inverse of matrix A
A T inverse of transpose of matrix A
q0 nominal circuit instance q0 2 Q
q randomly chosen circuit instance q 2 Q
Q infinite sample space of circuit instances
M skewness matrix
L lower triangular Cholesky factor
Y target paths delay fault probability
X delay fault detection probability
X random variable X
f(x) probability density function of standard normal distribution N (0, 1)
F(x) cumulative distribution function of standard normal distribution
N (0, 1)
P(X  0) short hand notation for probability of event fq 2 Q : X(q)  0g
E(X) expected value of random variable X
Var(X) variance of random variable X
X random vector X
f2(x1, x2; r) probability density function of bivariate standard normal distribution
with correlation coefficient r
fn(x; m,S) probability density function of n-dimensional normal distribution with
mean vector m and full rank covariance matrix S
F2(x1, x2; r) cumulative distribution function of bivariate standard normal distribu-
tion with correlation coefficient r
Var(X) covariance matrix of random vector X
Cov(X1,X2) covariance between random variables X1 and X2
k3(X) third multivariate cumulant of random vector X
Nn(m,S) n-dimensional normal distribution with mean vector m and covariance
matrix S
SN n(m,S,l) n-dimensional skew-normal distribution with mean vector m, covariance
matrix S and shape vector l
xiii

Acknowledgments
I would like to take this opportunity to thank all individuals who supported me during
my time as a PhD student. I would particularly like to thank my thesis supervisor,
Prof. Dr. rer. nat. Hans-Joachim Wunderlich, for his continuous support, ideas and
advice. I also would like to thank my co-supervisor Prof. Dr. rer. nat. Bernd Becker for
his commitment to assess my thesis in a timely manner.
A further thank you goes to my colleagues and students who at some time were
involved in my research.
I would especially like to thank my parents for their kindness and unlimited support
throughout my educational career. Without them, this work would not have been
possible.
xv

Abstract
Tremendous advances in semiconductor process technology are creating new challenges
for the delay test of today’s digital VLSI circuits. The complexity of state-of-the-art
manufacturing processes does not only lead to greater process variability, it also makes
today’s integrated circuits more prone to defects such as resistive shorts and opens. As
a consequence, some of the manufactured circuits do not meet the timing requirements
set by the design specification. These circuits must be identified by delay testing and
sorted out to ensure the quality of shipped products.
Due to the increasing process variability, key transistor and interconnect parameters
must be modelled as random variables. These random variables capture the uncertainty
caused by process variability, but also the impact of modelling errors and variations in
the operating conditions of the circuits, such as the temperature or the supply voltage.
The important consequence for delay testing is that a particular delay test detects a
delay fault of fixed size in only a subset of all manufactured circuits, which inevitably
leads to the shipment of defective products. Despite the fact that this problem is well
understood, today’s delay test generation methods are unable to consider the distortion
of the delay test results, caused by process variability. To analyse and predict the
effectiveness of delay tests in a population of circuits which are functionally identical
but have varying timing properties, statistical timing analysis is necessary. Although
the large runtime of statistical timing analysis is a well known problem, little progress
has been made in the development of efficient statistical timing analysis algorithms for
the variability-aware delay test generation and delay fault simulation.
This dissertation proposes novel and efficient statistical timing analysis algorithms
for the variability-aware delay test generation and delay fault simulation in presence
of large delay variations. For the detection of path delay faults, a novel probabilistic
sensitization analysis is presented which analyses the impact of process variations on the
sensitization of the target paths. Furthermore, an efficient method for approximating
the probability of detecting small delay faults is presented. Beyond that, efficient
statistical SUM and MAX-operations are proposed, which provide the fundamental
basis of block-based statistical timing analysis.
The experiment results demonstrate the high efficiency of the proposed algorithms.
xvii

German Abstract
—Zusammenfassung—
Die rasanten Fortschritte bei der Halbleiterprozesstechnologie führen zu immer neuen
Herausforderungen beim Test heutiger hochintegrierter digitaler Schaltungen. Die
steigende Komplexität der Herstellungsprozesse vergrößert nicht nur die Prozessvaria-
bilität, sie erhöht auch die Anfälligkeit der Schaltungen gegenüber Fabrikationsdefekten,
wie z.B. widerstandsbehafteter Kurzschlussdefekte oder widerstandsbehafteter Lei-
terunterbrechungen. In Folge dessen erfüllen einige der produzierten Schaltungen
nicht die durch die Designspezifikation vorgegebenen zeitlichen Anforderungen. Diese
Schaltungen müssen durch Verzögerungstests identifiziert und aussortiert werden, um
die Qualität der ausgelieferten Produkte zu garantieren.
Aufgrund der zunehmenden Prozessvariabilität müssen wesentliche Parameter von
Transistoren und Verbindungsleitungen durch Zufallsvariablen modelliert werden.
Durch diese Zufallsvariablen wird die Unsicherheit dieser Parameter aufgrund von
Prozessvariationen ausgedrückt, aber auch der Einfluss von Modellierungsfehlern und
Schwankungen in den Betriebsbedingungen der Schaltung, wie z.B. der Temperatur
oder der Versorgungsspannung, berücksichtigt.
Die wesentliche Konsequenz für den Verzögerungstest ist, dass ein bestimmter Verzöge-
rungstest einen Verzögerungsfehler fester Größe in nur noch einem Teil der fehlerhaften
Schaltungen erkennen kann, was unweigerlich zur Auslieferung defekter Schaltun-
gen führt. Obwohl dieses Problem inzwischen gut bekannt ist, berücksichtigen die
bestehenden Verfahren zur Testmusterzeugung die Verfälschung der Testergebnisse
aufgrund von Prozessvariationen nicht. Um die Wirksamkeit von Verzögerungstests
in einer Population von Schaltungen zu bewerten, die funktional identisch sind aber
unterschiedliches Zeitverhalten besitzen, bedarf es einer statistischen Timing-Analyse.
Obwohl der enorme Rechenaufwand einer solchen statistischen Timing-Analyse ein
wohlbekanntes Problem ist, wurden bislang nur geringe Fortschritte bei der Entwick-
lung effizienter statistischer Timing-Analyse Algorithmen für die Berücksichtigung
von Prozessvariationen bei der statistischen Verzögerungstestmustererzeugung und
Verzögerungsfehlersimulation erzielt.
Die vorliegende Dissertation stellt neue und effiziente statistische Timing Analyse-
Algorithmen für die statistische Verzögerungstestmustererzeugung und Verzögerungs-
fehlersimulation unter Berücksichtigung großer Prozessvariationen vor. Für die Erken-
nung von Pfadverzögerungsfehlern wird eine neuartige probabilistische Sensibilisie-
runganalyse präsentiert, welche den Einfluss von Prozessvariationen auf die Sensi-
bilisierung der zu testenden Pfade analysiert. Ebenso wird ein effizientes Verfahren
zur Bestimmung der Erkennungswahrscheinlichkeit kleiner Verzögerungsfehler vorge-
stellt. Überdies hinaus wurden effizientere statistische SUM und MAX-Operationen
entwickelt, welche die fundamentale Grundlage der blockbasierten statistischen Timing-
Analyse bilden. Die experimentellen Ergebnisse zeigen die hohe Effizienz der vorge-
stellten Algorithmen.
xix

C
h
a
p
t
e
r1
Introduction
The increasing complexity of manufacturing processes makes today’s integrated circuits
more prone to defects such as resistive shorts and opens and also exacerbates the
problem of accurately predicting the timing properties of the manufactured circuits.
In order to ensure the quality of shipped products, delay testing must identify the
manufactured circuits which fail the timing requirements.
The uncertainty introduced by the manufacturing process, modelling errors and vari-
ability in the circuit’s operating conditions causes some delay tests to be effective in
only a subset of all manufactured circuits, which can result in a large number of test
escapes. Today’s delay test generation tools are incapable of analysing the detrimental
impact of process variations on the delay tests, which requires sophisticated statistical
timing analysis algorithms.
The focus of this work is the development of efficient statistical timing analysis algo-
rithms for the variability-aware delay fault simulation and delay test generation.
This chapter briefly introduces the reader to the challenges of semiconductor manufac-
turing and the main concepts of delay testing. The following sections highlight three
fundamental problems of statistical timing analysis, which must be solved to analyse
and compare the effectiveness of delay tests considering uncertain gate and interconnect
delays. Finally, the organisation and contributions of this work are presented.
1.1 Trends and Challenges in Semiconductor Manufacturing
Over the past decades, the complexity of the semiconductor manufacturing process
has been increasing rapidly. Whereas in 2000 a 0.25mm CMOS manufacturing process
involved 25 mask levels with around 30 lithography steps [Frans04, p.260, 356], a state-
of-the-art 18nm process requires 65 mask levels [ITRS12] and over 1000 manufacturing
2 Chapter 1  Introduction
process steps [Goel13]. Where a gate electrode was once formed by a doped silicon
polysilicon film, now a stack of up to seven precisely formed metal films is required to
achieve the desired threshold voltage, gate resistance and device stability [Schue13].
One of the key challenges of the manufacturing process is the availability of suitable
lithography solutions. Today’s state-of-the-art immersion 193nm lithography systems
can only resolve features of approximately 40nm and extreme ultraviolet lithography
(13.6 nm) is still not available due to the limited power and reliability of the available
light sources. To continue scaling, reticle enhancement techniques such as optical
proximity correction (OPC) with predistorted mask patterns are used. In addition,
conventional single lithography exposure is supplanted by multi-patterning lithography
which involves several exposure, etch and deposition steps [Bench09, Ma11]. Multi-
patterning increases the number of design rule constraints, restricts cell placement and
routing and ultimately increases the process complexity significantly.
Each additional manufacturing step bears the risk of introducing a defect by some
contamination or impurity. Two of the most frequent types of physical defects are metal
mousebites and metal slivers.
With continued scaling, transistor feature dimensions are approaching fundamental
limits given by the atomic nature of matter. This does not only lead to greater variability
of the electrical transistor parameters, the transistor performance also becomes more
sensitive to variations of these parameters. For example, the saturation current Ion
becomes more sensitive to variations of the effective channel length Leff, carrier mobility
and supply voltage Udd over technology generations [Zhao06].
These so called process variations are widely known to play an increasingly important role
in future technology scaling [Kuhn10, Leung12b, Kang13, Wang13]. The 3s variability
goals set by the international technology roadmap for semiconductors are shown in
table 1.1. While the physical gate length is decreasing, the threshold voltage (Uth) and
consequently the gate delay variability is expected to increase significantly. This delay
variability will cause increasingly large circuit performance variations and additional
complications for the circuit design sign-off.
sTable 1.1 — Variability limits set by International Technology Roadmap for Semi-
conductors (ITRS) [ITRS12]
2015 2018 2021 2024 2026
MPU physical gate length (nm) 17 12.8 9.7 7.4 tbd
Udd variability 10% 10% 10% 10% 10%
Uth variability 23% 28% 30% 35% 38%
channel length variability 12% 12% 12% 12% 12%
circuit performance variability 45% 50% 52% 57% 60%
contribution of variability to sign-off delay 18% 20% 26% 32% 38%
1.2  Delay Testing of VLSI circuits 3
1.2 Delay Testing of VLSI circuits
The purpose of delay testing is to detect timing defects in the circuit and to check that the
circuit meets the specified timing requirements. Delay testing has become an essential
part of the VLSI design testing process, due to the advances in process technology and
the aggressive timing requirements of modern high speed designs. Delay testing is
performed by loading the circuit into automatic test equipment, applying a test vector-
pair to the circuit inputs and comparing the logic values observed after the clock cycle
time at the circuit outputs with the expected fault free values. If an unexpected value is
observed at any of the outputs, the circuit does not meet the timing requirements and
is sorted out. Otherwise, the next test vector-pair is applied.
A circuit which does not fail for any applied test vector-pair is not necessarily defect free
as it might fail under different operating conditions or when different test vector-pairs
are applied. This is because modern VLSI circuits can have an extremely large number
of paths and only a small subset can be tested by each test vector-pair.
The following subsections introduce the reader to important concepts of scan testing,
important delay fault models and variation-aware delay-testing.
1.2.1 Scan Delay Test
One of the most widely used methods for VLSI circuit testing is scan test [Eiche77].
In scan test, the state elements like flip-flops and latches are joined together into shift
registers called scan chains. During the normal operation of the circuit, the scan chain
is disabled. However in scan mode, the scan chain is used to shift in the next test
vector while shifting out the combinational output response to the previous test vector
[Wang06]. In practise, often multiple scan chains are used in parallel to reduce the
number of shift cycles, as shown in fig. 1.1.
qFigure 1.1 — Delay testing using (enhanced) scan testing
The test vector-pair can be applied in three different ways: launch-on-capture [Savir94],
launch-on-shift [Savir93] and enhanced-scan [Dervi91]. In launch-on-capture, the first
test vector is shifted into the scan chain at a slow speed. The second test vector is
defined by the next state values, which are computed by the circuit one clock cycle after
4 Chapter 1  Introduction
the application of the first test vector. The test response is captured after the functional
at-speed clock cycle time.
In launch-on-shift, the first k  1 bits of the first k-bit test vector are shifted into the scan
chain at slow speed. The second test vector is obtained by shifting the last bit into
the scan chain, which creates the desired input transitions. Again, the test response is
captured after the functional at-speed clock cycle time.
Finally, in enhanced-scan testing, both vectors of the test vector-pair are shifted into
the scan chain. At first, the first test vector is shifted into the scan chain and it
is immediately applied to initialize the circuit. The outputs of the scan chain are
preserved while the second test vector is shifted into the scan chain. Once the last bit
of the second vector has been shifted into the scan chain, the scan chain outputs are
changed to the second test vector and the test response is captured after the functional
at-speed clock cycle time. For enhanced-scan, the flip-flops are required to stored two
bits that can be applied consecutively to the combinational logic driven by the scan
cells.
After the application of a test vector-pair, the captured test response is shifted out of the
scan chain and compared to the expected fault free test response. If the test response
differs from the expected test response, then a fault has been detected.
In a full scan design, every state element in the circuit is part of a scan chain. In
this case, delay test algorithms can ignore the state elements and instead focus on the
combinational logic block between the state elements. This idea is illustrated by an
example. Figure 1.2a shows a logic circuit, where all state elements are arranged into
a scan chain. Because the value of these state elements can be directly observed and
controlled, the inputs of these state elements can be considered as pseudo-primary
outputs and the outputs of the state elements can be considered as pseudo-primary
inputs of the combinational logic block [Bushn00, p.135], which is shown in fig. 1.2b.
(a) Full scan design (b) Combinational equivalent
qFigure 1.2 — Conversion of full scan design to combinational equivalent circuit
It is important to observe that this so called combinational equivalent circuit is inde-
pendent of the number and structure of the scan chains. Hence, by assuming a full
scan design, the algorithms discussed and presented in this thesis can ignore the state
elements and operate directly on the combinational equivalent circuit without loss of
generality.
1.2  Delay Testing of VLSI circuits 5
A structural path of a circuit is an ordered list of gates g1, . . . , gn such that the output of
gate gi is connected to an input of gate gi+1 for 1  i < n. Furthermore, an input of g1
is connected to an input of the circuit and the output of gn is connected to an output
of the circuit. The term logical path refers to a structural path with either a rising or
a falling transition at the beginning of the path. This work is mostly concerned with
logical paths so that in the following, "path" is used to refer to a logical path.
Definition 1.1 (sensitized path). A logical path is sensitized by a test vector-pair if the
transition at the start of the path is propagated along the path to the end of the path.
Definition 1.2 (target path). A target path of a test vector-pair is a logical path that is
sensitized by the test vector-pair in the circuit with nominal gate delays (nominal circuit
instance). In this case, it is said that the test vector-pair targets the (logical) path.
1.2.2 Delay Fault Models
In this work, it is assumed that the delay of each gate is defined depending on a
rising or falling transition at the gate output and the path along which a transition is
propagated from a gate input to the gate output (pin-to-pin delay). The interconnect
delay model also distinguishes between rising and falling transitions. For simplicity,
the term "gate delay" will be used to denote the sum of pin-to-pin and interconnect
delays for a rising or a falling transition at the gate output.
The defects introduced during the manufacturing process are modelled at higher levels
of abstractions to alleviate the test generation complexity [Wang06]. A fault model
represents many physical defects with a single fault at a suitable abstraction level. Two
delay fault models are considered in this work: the gate delay fault model and the path
delay fault model.
The gate delay fault model assumes that the defect affects only the delay of a single
gate in the circuit [Iyeng88a, Iyeng88b]. The additional propagation delay of the gate
introduced by the defect is called the size of the delay fault. A gate delay fault whose
size is smaller than the clock cycle time is called small delay fault, otherwise it is called
gross-delay fault.
The path delay fault model assumes that a circuit is faulty if any of its paths exceeds a
specified delay, which in most cases is the circuit’s functional clock cycle time [Smith85].
The delay of a path is the sum of the corresponding gate delay values along the path.
The path delay fault model is more comprehensive than the gate delay fault model as it
captures the combined effects of physical defects and process variations. Thus, a path
delay fault may be caused by process variations, by a defect or by a combination of
both. However, the path delay fault model may require much more test generation and
test application time because VLSI circuits can have a huge number of paths.
Many more specialized fault models exist. For example, the resistive bridging fault model
assumes that a defects (e.g. metal sliver) connects two or more interconnects with some
specific resistance, which should not be connected [Ingel11]. This connection may then
cause a gate delay fault [Hopsc10].
6 Chapter 1  Introduction
1.2.3 Variation-Aware Delay-Testing
The increasing manufacturing process variations can no longer be ignored in delay
testing and must be considered during delay test generation to achieve a sufficiently
high defect coverage [Becke10]. Under the impact of process variations, the detectability
of delay faults can be very different in different manufacturing samples. A given test
vector-pair might detect a particular delay fault of fixed size in one circuit sample, but
fail to detect the same fault in another circuit sample. In the latter, the fault might still
be detectable by a different test vector-pair or be provably undetectable.
Classical delay test quality measures fail to capture the detrimental impact of process
variations on the delay test and new quality measures have been proposed for variation-
aware delay testing, which are based on the probability of detecting each delay fault by
a given set of test vector-pairs. The efficient computation of a large number of these
probabilities requires very sophisticated statistical timing analysis algorithms. This
is not only because multiple test vector-pairs may be required to detect a particular
delay fault of fixed size with sufficiently high probability, but also because each test
vector-pair can detect multiple delay faults and the total number of test vector-pairs
may not exceed some given upper bound to limit test cost. Hence, statistical timing
analysis must compute these probabilities for a large number of alternative choices of
test vector-pairs to guide the delay test optimization process.
The optimization problem is illustrated by an example. Figure 1.3 visualizes the circuit
samples in which test vector-pairs ’A’, ’B’ and ’C’ can detect a particular delay fault
of fixed size. If the initial set of test vector-pairs {’A’} is extended by test vector-
pair ’B’, then the probability of detecting the delay fault increases significantly. By
adding another test vector-pair ’C’, this probability increases further. However, now
the contribution of the test vector-pair ’B’ to the detection of the delay fault becomes
negligible and ’B’ can be removed to limit test cost. Hence, a good solution to the given
optimization problem is the set of test vector-pairs {A,C}.
qFigure 1.3 — Circuit samples in which the same delay fault of fixed size is detected
by test vector-pair ’A’, ’B’ and ’C’
1.3  Statistical Timing Analysis for Path Delay Fault Testing 7
1.3 Statistical Timing Analysis for Path Delay Fault Testing
Path delay fault tests are typically applied to a selected set of near critical paths to
characterize the speed of a circuit after manufacturing. These tests can also be repeated
at multiple different clock frequencies for speed-binning.
1.3.1 Transition Propagation Condition
A physical gate can only propagate a subset of all transitions at the gate inputs to the
gate output. Whether or not a particular input transition is propagated depends on
many factors including the gate input and output voltages at the time of the input
transition and the capacitive load at the gate output. One important condition for an
input transition to be propagated is that the logic function of the gate must imply a
change in the gate output logic value in response to the input transition. For example,
a NAND gate with two inputs will only propagate a transition at one of its inputs if
the other input has logic value ’1’.
Another important condition applies to glitches, which are electrical pulses of short
duration. During the application of a test vector-pair, one or multiple glitches may
appear at a gate output before the output settles to the correct logic value. If a gate
output transition is part of a glitch, then this glitch must be sufficiently long so that the
gate is able to fully charge and fully discharge the capacitive load at the gate output.
For example, fig. 1.4 depicts the waveforms of the input and output voltages of a
NAND-gate with two inputs. It shows that if the falling transition at input ’a’ (blue)
occurs slightly earlier and the rising transition at input ’b’ (orange) is slightly delayed,
then the glitch at the gate output disappears.
0 50 100 150 200
0.0
0.2
0.4
0.6
0.8
1.0
1.2
time [ps]
V
o
lt
a
g
e
(a) glitch occurs
0 50 100 150 200
0.0
0.2
0.4
0.6
0.8
1.0
time [ps]
V
o
lt
a
g
e
(b) glitch is filtered
qFigure 1.4 — SPICE simulation results of a 2-input NAND gate with a falling
transition at input ’a’, a rising transition at input ’b’ and a glitch at output ’y’
8 Chapter 1  Introduction
1.3.2 Invalidation of Path Delay Fault Tests
During the operation of the circuit, the transitions at the inputs of a gate can interact in
complex ways depending on the order and the time intervals between the transitions
[Konuk00]. As a consequence of the gate delay variability, the propagation of a
transition may become blocked by another transition or redirected along a different
path. In other words, a path which is sensitized by a test vector-pair in the circuit with
nominal gate delays might not be sensitized under the impact of process variations.
Definition 1.3. A path delay fault test is called invalidated, if the delay test fails to
sensitize the path it is supposed to test.
Path delay fault test invalidation causes two major problems for the delay test:
• A test vector-pair which sensitizes a particular path in the circuit with nominal
gate delays (nominal circuit instance) might not sensitize the same path under
the impact of process variations, leading to test escapes.
• A path which cannot be sensitized during the normal operation of the circuit
is not required to meet the timing specification of the circuit. However, such a
path might become sensitized by some test vector-pair due to delay variations,
resulting in yield loss [Cheng93].
Invalidation of a path delay fault test can occur if no other path delay fault exists in the
circuit and an off-path input stabilizes to the non-controlling value after the transition
at the on-path input [Konuk00]. If the inertial delay property of physical gates is also
considered, then test invalidation is possible even if the off-path transition occurs before
the transition at the on-path input [Devad92]. In the following, one of the invalidation
mechanisms of a path delay fault test is illustrated by an example.
The circuit in fig. 1.5 contains a small delay fault in the last NAND gate which is
activated by a falling transition at ’c’ that causes a rising transition at ’d’. The path
delay fault test of path a-b-c-d creates a falling transition at ’c’ and a rising transition at
’e’, resulting in a glitch at the output ’d’ of the last NAND gate. The faulty logic value
’0’ caused by the glitch can be detected during the delay test.
It is assumed, that the clock cycle time is 1.75ns. Due to the small delay fault, the
path a-b-c-d has a nominal delay of 1.85ns and therefore a path delay fault. All other
paths have delays less than 1.75ns. In those manufactured circuits in which, during the
application of the test vector-pair, the falling transition at ’c’ occurs when the off-path
input ’e’ still has the logic value ’0’, does the gate output ’d’ remain constant ’1’ so that
the path delay fault test is invalidate. Only in those manufactured circuits, where the
off-path input has already stabilized to the non-controlling value when the transition
at ’c’ occurs, can the path delay fault be detected.
It is important to note that even if the off-path input ’e’ has stabilized to ’1’ well before
the falling transition at the on-path input ’c’, the glitch might not appear at the gate
output ’d’, as explained in section 1.3.1. As shown in fig. 1.4, if the gate output voltage
1.3  Statistical Timing Analysis for Path Delay Fault Testing 9
1 &
&
&
qFigure 1.5 — Invalidation of a path delay fault test for the orange path a-b-c-d, which
has a path delay fault [Konuk00]
has not dropped below 50% by the time the falling transition appears at the on-path
input ’c’, then the gate output voltage will rise immediately and no glitch will occur at
the gate output.
The reader should note that the small delay fault in fig. 1.5 is not necessary for the path
a-b-c-d to have a path delay fault, because process variability alone can cause the delay
of each gate (pin-to-pin delay) along the path a-b-c-d to increase so that the path delay
exceeds the clock cycle time. This slightly modified example shows that path delay
fault test invalidation can occur regardless of any small delay fault in the circuit.
1.3.3 Probabilistic Sensitization Analysis
It is well known that often many different path delay fault tests exist for a sensitizable
path [Cheng94, Egger11]. To reduce the number of test escapes due to path delay
fault test invalidation, the tolerance of different path delay fault tests towards delay
variations must be evaluated to identify those tests with a sufficiently small probability
of being invalidated. In particular, statistical timing analysis is required to solve the
following problems:
Detection of test invalidation mechanisms: Given sufficiently detailed knowledge of
the test invalidation mechanism, such as the location of the affected target path off-
path input, the test generation process can carefully optimize the test vector-pair
to minimize the risk of test invalidation.
Computation of test invalidation probability: Under the impact of large process vari-
ations, all possible path delay fault tests of a target path may be affected by path
delay fault test invalidation. In this case, it is necessary to select those path delay
fault tests with a sufficiently low probability of test invalidation or to apply a
suitable combination of path delay fault tests to the target path.
The underlying fundamental statistical timing analysis problem is stated in section 4.1.
10 Chapter 1  Introduction
1.4 Statistical Timing Analysis for Small Delay Fault Testing
Recent studies show that a large fraction of delay faults in the latest technology
nodes are caused by small-delay faults [Matti09, Ahmed06], which are also becoming
increasingly difficult to detect [Ryan14]. While a small-delay fault of size 20 ps would
unlikely affect the circuit operation at 100 MHz (0.2% of the clock cycle time), the same
fault is much more likely to cause a timing failure at 5 GHz (10% of clock cycle time).
Furthermore, certain types of small-delay faults can grow over time [Segur02] and
therefore pose a reliability risk. Hence, delay testing for small-delay faults is extremely
important for the quality and reliability of CMOS circuits.
1.4.1 Testing for Small Delay Faults
For a gate delay fault to be detectable, the fault size must be sufficiently large to cause
at least one path through the small delay fault site to have a path delay fault. The
smallest detectable delay fault size is equal to the slack of the longest sensitizable path
through the delay fault site. The slack of a path is defined as the difference between
the clock cycle time and the delay of the path.
The necessity of testing for small delay faults by exercising the longest path has been
demonstrated in high volume production manufacturing test [Daasc07]. For example
in fig. 1.6, a small delay fault at line ’e’ may be missed by testing the path a-e-f-h-m
because the delay of the path is too small [Goel13]. By testing the path a-e-g-j-k-m,
which has a much larger delay, can the small delay fault be detected.
1 &
&
&
h
a
d
k
b
c
me&
j
f
g
small delay fault
qFigure 1.6 — Detection of small delay fault at gate output ’e’ over the longest
sensitizable path (orange) through the fault site [Goel13]
The impact of certain types of faults depends on many circuit parameters, which are
unknown due to process variations [Erb14], but also on the operating conditions of
the circuit. Repeating the delay test under several different operation conditions can
aid the detection of small delay faults. The major advantages and disadvantages of
adjusting the supply voltage, the temperature and the clock frequency are summarized
in the following.
1.4  Statistical Timing Analysis for Small Delay Fault Testing 11
Testing at very low voltage was shown to significantly increase the effect of certain de-
fects that cause small delay faults, such as resistive shorts [Hao93, Chang96, Chang98].
However, this technique may increase the test time due to the reduced switching speed
of the transistors in the scan chain.
Testing at very low temperatures (e.g. 0°C) was also shown to be effective to detect small
delay faults caused by resistive salicide [Needh98, Tseng00]. However, this method
presents technological challenges and might require more expensive circuit cooling
solutions, especially for state-of-the-art high performance circuits.
Delay test at higher than operational clock frequencies (faster-than at speed test) reduces
the slack of all paths and therefore promotes the detection of small delay faults of
smaller size [Kruse04, Lee05a]. However, this approach may increases the risk of yield
loss due to hazard capture and it also exacerbates the already-well-known issues of
peak power and IR-drop during manufacturing test.
1.4.2 Invalidation and Optimization of Small Delay Fault Tests
Under the impact of process variations, a different path can be the longest path through
the small delay fault site in different circuits and under different operating conditions.
It is therefore widely accepted that conventional delay tests for small delay faults must
be extended to account for the detrimental impact of process variations on the delay
test [Ingel09, Becke10, Polia11].
Even if a small delay fault is undetectable under some operating conditions, the fault
could significantly reduce the timing margins and cause timing failures under different
operating conditions. Testing all sensitizable paths through the small delay fault site
under all admissible operating conditions would guarantee the detection of the small
delay fault. However, this kind of exhaustive testing is impractical since, depending on
the circuit structure, the number of paths can increase exponentially with the circuit
size. Furthermore, there may be an infinite number of admissible operating conditions.
In order to find a suitable compromise between the delay test quality and the test cost, it
is necessary to optimize the set of test vector-pairs [Becke10, Czutr12, Sauer14] and all
delay test parameters [Needh98, Fonse10]. The set of delay test parameters includes the
supply voltage [Chang98], the temperature [Tseng00], the clock cycle time [Chakr12]
and possibly the masking of the combinational network outputs for faster-than-at-speed
testing [Lee05a]. The choice of the burn-in time also has a significant impact on the
detectability of small delay faults [Sumik12].
The delay test optimization procedure is illustrated in fig. 1.7. The process starts by
adjusting the delay test parameters and the set of test vector-pairs to improve the delay
test quality and the test cost. Afterwards, the resulting change of the delay test quality
and the test cost is evaluated to obtain the new value of the objective function, which is
to be maximized. Based on the result, further adjustments to the set of test vector-pairs
and the operating conditions are made and this process continues until a suitable
compromise has been found.
12 Chapter 1  Introduction
qFigure 1.7 — Small delay fault test optimization
It is apparent that the computational cost for solving this multidimensional optimization
problem strongly depends on the objective function, which involves the probability
that at least one of the target paths has a path delay fault. The stepwise optimization
process involves a huge number of probability evaluations and therefore requires very
sophisticated statistical timing analysis methods.
1.4.3 Computation of Target Paths Delay Fault Probability
One of the fundamental problems for the optimization of small delay fault tests is
the efficient computation of the probability, that at least one of the target paths (see
definition 1.2) has a path delay fault. Formally, the fundamental statistical timing
analysis problem is described by the following two definitions.
Definition 1.4. Given a set of k test vector-pairs and let Xi,1, . . . ,Xi,ni denote the delays
of the ni target paths of the i-th test vector-pair. Then the maximum delay of the target
paths Z is defined as
Z = max(Y1, . . . ,Yk), (1.1)
where Yi = max(Xi,1, . . . ,Xi,ni) for all 1  i  k.
Above definition is not only applicable to a small delay fault of fixed size, but it can
also capture the distribution of the delay fault size as part of the delay distributions of
those target paths, which contain the faulty gate.
Definition 1.5 (target paths delay fault probability). Let Z denote the maximum delay
of the target paths of a given set of test vector-pairs. Then the target paths delay fault
probability Y is defined as
Y = P(Z > Tclk), (1.2)
where Tclk denotes the clock cycle time.
The main difficulty for computing the target paths delay fault probability is caused
by the relationship between the random variables. To express these relationships, the
individual path delays must be grouped into a random vector, which is described by
1.5  Variation-Aware Delay Fault Simulation 13
a joint probability distribution (e.g. a multivariate normal distribution). Computing
the target paths delay fault probability requires numerical integration over the joint
probability distribution, which is computationally very expensive in general.
1.5 Variation-Aware Delay Fault Simulation
The goal of the variation-aware delay fault simulation is to evaluate the delay test
response of a circuit with any of a list of delay faults during the application of a set
of test vector-pairs [Radha11]. By comparing the responses to the specification of the
circuit, the probability of a circuit passing/failing the delay test is determined. The
fault simulation also provides vital information for the prediction of important test
metrics such as defective parts per million (DPPM) and yield loss [Qian10].
An easy way of conducting the fault simulation is by injecting a delay fault into the
circuit followed by a Monte-Carlo simulation with each test vector-pair to evaluate the
delay test response under the impact of process variations. However, since there are
many small delay faults and the fault simulation has to be repeated for different circuit
operating conditions, a Monte-Carlo simulation based approach is inefficient or even
infeasible.
A more suitable approach is based on block-based statistical timing analysis, which
is described in section 1.5.1. The problem of efficiently computing the fundamental
statistical operations is explained in section 1.5.2.
1.5.1 Block-Based Statistical Timing Analysis
In block-based statistical timing analysis, the delay values of a gate and the arrival
time of a transition are represented by random variables of some known probability
distribution. A marginally detectable small delay fault can be modelled by increasing
the mean of one of multiple delay values of a gate by a fixed fault size.
During the simulation of a test vector-pair, the transitions at the circuit inputs are
propagated along the sensitized paths to the circuit outputs [Lee05b]. At each gate, the
analysis computes the distribution of the arrival time of the last transition at the gate
output. For a gate with one input, the gate delay is added to the arrival time of the
transition at the gate input to obtain the arrival time of the transition at the gate output.
This is done using the statistical SUM-operation as shown in fig. 1.8.
For a gate with two inputs, the steady state logic value of the gate output is known from
the second test vector. If the output is switching because a gate input is switching to
the non-controlling value, then the maximum of the sum of arrival times and pin-to-pin
delays of all gate inputs that switch to the non-controlling value must be computed.
For example, let U1 and U2 denote the arrival times of two rising transitions at a
NAND gate, as shown in fig. 1.9. The arrival time of the transition at the gate output is
max(U1 + D1,U2 + D2), where D1 and D2 denote the gate delays for propagating the
transition at the first and the second gate input, respectively, to the gate output.
14 Chapter 1  Introduction
1
qFigure 1.8 — Statistical SUM-operation
On the other hand, if the gate output is switching because any gate input is switching
to a controlling value, then the earliest arrival time of those gate input transitions at the
gate output must be computed. For example, if both rising transitions at the gate inputs
in fig. 1.9 are replaced by falling transitions, then the arrival time of the transition at
the gate output is min(U1 + D1,U2 + D2). The problem of computing the minimum
of two random variables X1 and X2 can always be transformed into the problem of
computing the maximum using the identity
min(X1,X2) =  max( X1, X2). (1.3)
It is apparent that the accuracy and the computational cost of block-based statistical
timing analysis is determined by the efficiency of the statistical SUM and MAX-
operations.
&
qFigure 1.9 — Statistical MAX-operation
1.5.2 Efficient Computation of Statistical SUM and MAX-Operations
The main problem of block-based statistical timing analysis is the limited accuracy of
the SUM and the inaccuracy of the MAX-operation.
The SUM-operation is usually easy to compute efficiently. Many families of multivariate
distributions, such as the family of multivariate normal distributions, are closed under
affine transformations. Therefore, the exact distribution of the sum of random variables
can usually be computed using closed form formulas.
1.6  Organisation and Contributions of this Work 15
The efficient computation of the MAX-operation is, on the other hand, one of the
most challenging problems of statistical timing analysis. Only very few exotic families
of distributions are closed under maximum [Balas91] and these are of little practical
value for statistical timing analysis. In other words, for the vast majority of probability
distributions, the distribution of the maximum max(X1,X2) is not a member of the
same family of distributions as the distributions of X1 and X2. For example, if X1 and
X2 have a normal distribution then it is well known that the distribution of max(X1,X2)
belongs to a much more complicated family of distributions [Nadar08]. To preserve
the efficiency of block-based statistical timing analysis, it is necessary to approximate
the distribution of the maximum max(X1,X2) with a member of the same family of
distributions to which the distributions of the operands X1 and X2 belong.
Any approach which can efficiently compute the statistical SUM and the statistical MAX-
operation can immediately be applied to many other statistical timing analysis problems.
For example, the SUM-operation can be used to compute the delay distribution of the
sensitized paths and the MAX-operation can be applied to compute the distribution of
the maximum delay of the target paths, as defined in definition 1.4.
1.6 Organisation and Contributions of this Work
This work targets key statistical timing analysis problems affecting delay test applica-
tions in innovative technology nodes.
Chapter 2 introduces the reader to the sources and classification of variability as well
as to the formal foundation on which subsequent chapters build upon. This includes
definitions of essential terms and concepts in probability theory and circuit modelling.
Chapter 3 provides a concise summary of the state of the art in statistical timing
analysis for the evaluation of path and small delay fault tests. Recent advances in
the computation of the statistical MAX-operation are reviewed. The scalability and
limitations of existing methods are discussed in the context of variation-aware delay
fault testing.
Chapter 4 presents a novel probabilistic sensitization analysis, which detects for a
given test vector-pair the most likely path delay fault test invalidation mechanisms and
identifies the paths that are responsible for the invalidation. The information provided
by the analysis allows the delay test generation to specifically target the location and
mechanism of path delay fault test invalidation. The proposed method is applied to all
critical paths (definition 2.17) that are sensitizable by a given test vector-pair under the
impact of delay variations. The result of the analysis is a subcircuit that gives the same
delay test response to the application of the test vector-pair as the complete circuit with
almost 100% probability, while requiring only a fraction of the computational cost for a
Monte-Carlo simulation.
Chapter 5 presents novel algorithms, which compute and incrementally update an
approximation of the target paths delay fault probability after delay test parameter
16 Chapter 1  Introduction
modifications. In particular, the insertion/removal of a test vector-pair, the mask-
ing/unmasking of primary outputs and changes of test clock cycle time Tclk are
addressed. The accuracy of the results does not depend on the type or the order
in which these delay test parameter are modified. Furthermore, the new algorithms
only require that the joint delay distribution of sufficiently long target paths can be
accurately approximated by a multivariate normal distribution. In other words, the
algorithms are not restricted to normally distributed gate and interconnect delays.
Chapter 6 introduces the novel skew-normal distribution based SUM and MAX-
operations, which significantly reduce the approximation error compared to the normal
distribution based SUM and MAX-operations. Several optimizations are presented,
which make the new operations applicable to a large number of random variables.
Chapter 7 presents the experimental results for several large benchmark circuits.
Chapter 8 summarizes the findings of this work and starting points for further research
are pointed out.
C
h
a
p
t
e
r2
Fundamentals of Statistical Timing Analysis
Essential facts from linear algebra are reviewed in section 2.1. The major sources of
variations are introduced in section 2.2. In order to create a compact gate delay model
considering many sources of variability, it is beneficial to distinguish between different
classes of process variations, which are explained in section 2.3. The variations in
gate and interconnect delays are formally described in section 2.4 by random variables
with known probability distributions. This section also reviews important concepts
of probability theory, which is an integral component of all statistical timing analysis
algorithms. The circuit model, which will be used throughout this work, is defined in
section 2.5.
2.1 Review of Linear Algebra
This section briefly reviews important facts from linear algebra, which are essential
for the understanding of this thesis. Unless explicitly cited otherwise, the following
definitions and facts can be found in [Golub13].
The indicator function is
1A(x) :=
(
1 if x 2 A,
0 if x /2 A. (2.1)
The eigenvalues of a real valued matrix A 2 Rnn are the roots of the characteristic
polynomial
p(x) = det(A  xIn), (2.2)
where In 2 Rnn is the identity matrix. Hence, A has exactly n not necessarily distinct
eigenvalues. If A is symmetric, then all its eigenvalues are real.
18 Chapter 2  Fundamentals of Statistical Timing Analysis
If H 2 Rnn is non-singular and B = H 1AH, then A and B are called similar. If two
matrices are similar, they have exactly the same eigenvalues.
If y is an eigenvalue of A, then there exists a nonzero vector x so that Ax = yx. In this
case, x is said to be an eigenvector of A associated with the eigenvalue y. Throughout
this work, it is assumed that the eigenvectors are normalized to unit magnitude.
The greatest eigenvalue of A is called dominant eigenvalue and the corresponding
eigenvector is called dominant eigenvector.
Closed form formulas for the computation of the eigenvalues exist for 2  n  4
and can be used to efficiently compute the eigenvectors [Kopp08]. However, several
numerical issues due to floating point round-off errors must be solved to obtain a
numerically stable algorithm. One of the main problems is the exact computation of
the rank of A  yIn. For higher dimensions, the eigenvalues and the eigenvectors can
be computed using iterative methods.
Definition 2.1. A matrix A 2 Rnn is positive definite if xTAx > 0 for all non-zero
x 2 Rn and positive semidefinite if xTAx  0 for all x 2 Rn.
For example, the matrix
B =
0@ 1  2 3 2 13  3
3  3 19
1A (2.3)
is positive definite, because for any vector x = (x1, x2, x3)
T 2 R3
xTBx = x1(x1   2x2 + 3x3) + x2( 2x1 + 13x2   3x3) + x3(3x1   3x2 + 19x3)
= x21   4x1x2 + 13x22 + 6x1x3   6x2x3 + 19x23
= 6x22 + 7x
2
3 + (x1   2x2 + 3x3)2 + 3(x2 + x3)2
is non-negative and zero only for x1 = x2 = x3 = 0. Clearly, any positive definite
matrix is also positive semidefinite. It is also obvious that a positive definite matrix A
is non-singular. Otherwise, there would be a non-zero vector x such that xTAx = 0.
Lemma 2.2. If A 2 Rnn is positive semidefinite, then all its eigenvalues are non-negative.
Proof. Let x 2 Rn be an eigenvector of A corresponding to the eigenvalue y, such
that Ax = yx. Left multiplication with xT gives xTAx = yxTx, where xTx > 0 and
xTAx  0 because A is positive semidefinite. Therefore, y  0.
Lemma 2.3. If A 2 Rnn is a positive definite matrix and v 2 Rn, then A+ vvT is positive
definite.
Proof. It must be shown that xTAv + xTvvTx > 0 for all non-zero x 2 Rn. Clearly,
xTvvTx = (xTv)(xTv)T  0 and xTAx > 0 holds because A is positive definite.
Theorem 2.4. If A 2 Rnn is a positive definite matrix and the matrix G 2 Rnk has rank k,
then B = GTAG 2 Rkk is also positive definite.
2.1  Review of Linear Algebra 19
The proof can be found in [Golub13, Theorem 4.2.1]. One of the important implication
of this theorem is that if A 2 Rnn is positive definite, then A TAA 1 = A T is
positive definite. In particular, if A is symmetric positive definite then A 1 is also
symmetric positive definite.
The following theorem shows that deleting the ith row and column of a symmetric
positive definite matrix preserves positive definiteness, which will be used in section 6.4.
Corollary 2.5. If A is positive definite, then all its principal submatrices are positive definite.
In particular, all the diagonal entries are positive.
The proof can be found in [Golub13, Corollary 4.2.2].
Theorem 2.6 (Cholesky Factorization). If A 2 Rnn is symmetric positive definite, then
there exists a unique lower triangular L 2 Rnn with positive diagonal entries such that
A = LLT.
The proof can be found in [Golub13, Theorem 4.2.7]. This factorization is called Cholesky
factorization (or Cholesky decomposition) and the matrix L is called the (lower) Cholesky
factor.
Algorithm 2.1 is one of several possible algorithms for computing the lower Cholesky
factor L. The computation of the Cholesky factor using this algorithm requires n3/3
floating point operations.
Algorithm 2.1: Gaxpy Cholesky [Golub13, Algorithm 4.2.1]
input: n n symmetric positive definite matrix A
output: For all i  j, Li,j overwrites Ai,j
1 for j = 1 to n do
2 if j>1 then
3 Aj:n,j = Aj:n,j   Aj:n,1:j 1  ATj,1:j 1
4 Aj:n,j = Aj:n,j/
q
Aj,j
Again considering the example matrix B given in eq. (2.3), the Cholesky factorization
of matrix B is 0@ 1  2 3 2 13  3
3  3 19
1A =
0@ 1 0 0 2 3 0
3 1 3
1A0@1  2 30 3 1
0 0 3
1A (2.4)
The Sherman-Morrison-Woodbury-Matrix identity states that the inverse of a rank-k
correction of a matrix can be computed by a rank-k correction of the inverse matrix
[Golub13, p.65]. More precisely, the Sherman-Morrison-Woodbury-Matrix identity is
(A+UVT) 1 = A 1   A 1U(Ik +VTA 1U) 1VTA 1, (2.5)
where A 2 Rnn and U,V 2 Rnk.
20 Chapter 2  Fundamentals of Statistical Timing Analysis
The Kronecker product is an operation, which can be applied to two matrices of
arbitrary size and results in another matrix.
Definition 2.7. For a matrix A 2 Rmn and a matrix B 2 Rpq, the Kronecker product
A
 B of A and B is defined as the mp nq block matrix
A
 B =
0B@ a1,1B . . . a1,nB... . . . ...
am,1B . . . am,nB
1CA 2 Rmpnq. (2.6)
For example, the Kronecker product of two 2 2 matrices is

a1,1 a1,2
a2,1 a2,2




b1,1 b1,2
b2,1 b2,2

=
0BB@a1,1

b1,1 b1,2
b2,1 b2,2

a1,2

b1,1 b1,2
b2,1 b2,2

a2,1

b1,1 b1,2
b2,1 b2,2

a2,2

b1,1 b1,2
b2,1 b2,2

1CCA
=
0BB@
a1,1b1,1 a1,1b1,2 a1,2b1,1 a1,2b1,2
a1,1b2,1 a1,1b2,2 a1,2b2,1 a1,2b2,2
a2,1b1,1 a2,1b1,2 a2,2b1,1 a2,2b1,2
a2,1b2,1 a2,1b2,2 a2,2b2,1 a2,2b2,2
1CCA .
(2.7)
If u and v are vectors then
uvT = u
 vT = vT 
 u. (2.8)
Important properties of the Kronecker product are [Golub13, p.27]
(B
 C)T = BT 
 CT (2.9)
(B
 C)(D
 F) = BD
 CF (2.10)
(B
 C) 1 = B 1 
 C 1 (2.11)
B
 (C
 D) = (B
 C)
 D, (2.12)
where BD and CF must be defined for eq. (2.10) to make sense and the matrices B and
C must be nonsingular in eq. (2.11).
2.2 Sources of Variability
This section provides a summary of the major sources of variability, which can have
a large impact on the gate and interconnect delays and ultimately on the delay test
[Kenne01, Sriva05, Blaau08].
2.2.1 Process Variations
Process variations are the deviations of the manufactured circuit structure and parame-
ters from the designed layout and expected parameters [Forza09]. In state-of-the-art
2.2  Sources of Variability 21
nanoscale CMOS circuits, even variations in the placement of a few atoms may signif-
icantly change the electrical properties of a transistor. Of particular importance are
variations in the threshold voltage and the saturation-current of the transistors. For
example, the average number of dopant atoms in the channel of a transistor in 32-nm
process technology has dropped below 100 [Kuhn08] so even small variations in the
number of dopants and the distribution of dopants across the channel, as shown in
fig. 2.1, can have a large impact on the threshold voltage. This so-called random-dopant
fluctuation is known to contribute around 60% of the total transistor threshold voltage
variability in technology nodes below 90nm [Kuhn08, Aseno08, Yamag13]. Variations
in the gate materials and the gate dielectric constitute another major source of threshold
voltage and saturation-current variability [Li10].
qFigure 2.1 — Model of a single-gate FinFET structure without random-dopant
fluctuation (left) and with random-dopant fluctuation (right), adopted from [Leung12a]
In the latest technology nodes, line-edge roughness has become a major source of
variability [Dietr12]. Line-edge roughness describes the noisy pattern in the line edges
of each gate, which is caused by variations in the number of photons and the chemical
composition of the photoresist during the lithographic exposure. In future technology
nodes, line-edge roughness is expected to supplant random dopant fluctuation as the
dominant source of variability [Kuhn09].
2.2.2 Environmental Variations
Environmental variations arise during the circuit operation due to variations in the
environment that surrounds the chip [Sriva05]. Of particular importance for the gate
delays are the temperature, the supply voltage and the switching activity. A drop in
supply voltage lowers the saturation-current of the transistors, resulting in increased
gate and interconnect delay. The actual voltage seen by a circuit component is the supply
voltage minus the IR-drop, where the IR-drop refers to the voltage drop across the
power distribution network due to the resistivity of the interconnects. High switching
22 Chapter 2  Fundamentals of Statistical Timing Analysis
activity causes large currents to flow in power and ground networks which can lead to
a significant drop in the voltage seen by a transistor. Increasing the temperature has
a similar negative impact on the circuit performance by increasing the interconnect
resistance and the transistor switching delay.
2.2.3 Model Inadequacy and Numerical Errors
Every circuit model, no matter how detailed, is only an imperfect and idealized
representation of a real physical circuit and it cannot perfectly describe the true
electrical behaviour of a real circuit during delay testing. For example, the delay of a
particular gate may be overestimated or underestimated by the model, depending on
the operating conditions of the circuit and the transitions at the inputs of the gate.
The uncertainty introduced by model inadequacy may be reduced by using more
detailed models at lower abstraction levels. But even if the model takes all important
physical laws into account, it may not be possible to determine all model parameters
with sufficient accuracy. Furthermore, the model may be too complex to evaluate
exactly and require approximations which introduce numerical errors.
2.2.4 Other Sources of Variability
Some sources of variability change the electrical properties of gates and interconnects
over time. Important examples are negative bias temperature instability and hot-
carrier injection, which change the threshold voltage of transistors over time. Another
important effect is electromigration, which increases the resistance of interconnects by
reducing the wire width and therefore increases the transition propagation delay along
the wire.
2.2.5 Impact on Important Electrical Parameters of Transistors
Physical parameter variations cause variations in the electrical parameters such as
saturation current, threshold voltage and gate capacitance. The electrical parameter
variations cause gate delay variations, which can then lead to delay test invalidation.
To study the impact of process variations on the electrical parameters of a FinFET
transistor, the structure of a transistor can be modelled as a 3-dimensional grid and
simulated with statistical technology computer-aided design tools. For example, the
simulation model of a single-gate FinFET for the analysis of the effects of random
dopant fluctuation is shown in Figure 2.1. Based on this approach, the impact of random
dopant fluctuation and line-edge roughness on state-of-the-art FinFET transistors has
been studied in [Leung12b] and [Leung12a]. The standard deviation of several electrical
parameters, expressed as percentage of their nominal value, is presented in table 2.1.
2.3  Classification of Process Variations 23
sTable 2.1 — Trends in FinFET device variability caused by line-edge roughness (LER)
and random-dopant fluctuation (RDF) [Leung12b, Leung12a]
LER (1 nm amplitude) RDF (10 nm fin height)
Technology node (nm) 32 nm 21 nm 15 nm 32 nm 21 nm 15 nm
threshold voltage variability 5.3% 7.6% 10.8% 42% 57% 69%
saturation current variability 2.8% 3.8% 5.8% 19% 22% 23%
2.3 Classification of Process Variations
To analyse the impact of process variations on the delay test, it is beneficial to distin-
guish between different components of process variations [Sriva05, Blaau08, Forza09],
which are shown in fig. 2.2. These components have a different impact on the circuit
performance and will be explained in the following.
2.3.1 Systematic Variations
Systematic variations are presumed to be caused by a lack of knowledge or data
[Kiure09]. For example, the imperfections of a photomask or the impact of optical
proximity effects on the circuit layout might not be measurable with sufficient accuracy,
or the circuit model used for analysis might neglect these effects altogether. Their influ-
ence on a particular circuit layout could also be predicted before circuit manufacturing
based on detailed analysis of the layout and the manufacturing process. However, the
necessary data for such a detailed analysis is often kept secret by a semiconductor
foundry for commercial reasons.
2.3.2 Non-Systematic Variations
Non-systematic variations describe the truly unknown component of process variations,
of which only statistical characteristics are available during test generation. The impact
of non-systematic variations must be modelled by random variables with some known
probability distributions and relationships. Major sources of non-systematic variations
are for example random-dopant fluctuation and line-edge roughness. Depending on
the spacial scales of process variations, non-systematic variations can be classified into
inter-die and intra-die variations.
The inter-die variations are those non-systematic process variations, which have the same
values across a die but different values from die-to-die, wafer-to-wafer and lot-to-lot.
For example, variations in the lithography process like variations in the exposure time
affect the gate length of all gates within the same die similarly.
Intra-die variations are non-systematic process variations that cause deviations between
different parts of the same die. For example, random dopant fluctuation is widely
considered to be a purely random phenomenon which occurs independently for each
transistor. Therefore, each transistor in the design requires its own random variable to
model intra-die variations, which hugely increases the complexity of statistical timing
24 Chapter 2  Fundamentals of Statistical Timing Analysis
qFigure 2.2 — Taxonomy of process variations [Blaau08]
analysis. Even for variations which show a systematic trend across the wafer, these
trends might change over time in unpredictable ways. Therefore, this kind of systematic
variation is usually also considered part of intra-die variations.
Intra-die variations are expected to dominate in future technology nodes, due to a shift
to purely random and independent physical sources of process variations like random
dopant fluctuation and line edge roughness [Li10, Agarw07].
2.4 Formal Modelling of Variability
Due to the limitations of the manufacturing process, each circuit has slightly different
parameters. To analyse the impact of process variations on the delay test, it would be
ideal to simulate the entire manufacturing process and the delay test. However, this
kind of approach is prohibitively complex. Instead, a mathematical model, which can
be validated with silicon measurements, can used to understand, describe, and quantify
important aspects of the circuit’s timing behaviour and to predict the effectiveness of
delay tests considering variability.
The mathematical concepts described in this section form the basis for the modelling
and the analysis of the circuits timing behaviour in presence of process variations,
which is called statistical timing analysis. Statistical timing analysis employs probability
theory and statistics to treat the variability in all critical circuit parameters in an explicit
manner [Sriva05, Blaau08].
Unless cited otherwise, the following definitions and theorems are wherever possible
quoted verbatim from: M. H. DeGroot and M. J. Schervish. Probability and Statistics.
Pearson Education, fourth edition, 2012. ISBN 978-0-321-50046-5
2.4.1 Probability Space
The semiconductor manufacturing process of VLSI circuits is a random experiment,
which is an experiment, which can produce different outcomes (circuits with different
physical and electrical properties) even though it is repeated in the same manner every
2.4  Formal Modelling of Variability 25
time [Montg13]. The process variations introduced by the manufacturing process are
described by a probability space (Q,A,P). The probability space consists of the sample
space Q, which is the set of all possible outcomes and a set of events A, where each
event A 2 A is a set with one or more outcomes.
To describe the impact of process variations on the manufacturing of a particular VLSI
circuit layout, the sample space Q is defined as an infinite set of circuit instances. Each
circuit instance q 2 Q has unique deviations from the expected shape of the circuit
layout due to process variations. The circuit instance with zero deviations is called
nominal circuit instance and denoted by q0. Layout deviations which are of particular
importance for the circuit timing behaviour are for example the transistor channel
length and width, the oxide thickness and the channel dopant profile.
An event A 2 A is a subset of circuit instances which, for example, all share the same
physical defect. Other events of interest can be described by combinations of existing
events by using basic set operations such as unions, intersections and complements.
The probability measure P assigns a probability to the occurrence of each event A 2 A.
Definition 2.8. A probability measure, or simply probability, on a sample space Q is a
function P : A ! [0, 1], which satisfies the following three conditions
P(Q) = 1, (2.13)
P(A)  0 for all A 2 A (2.14)
and for every infinite sequence of disjoint events A1, A2, . . .,
P
 
¥[
i=1
Ai
!
=
¥
å
i=1
P(Ai). (2.15)
The expression P(A) represents how likely it is that the experiment’s actual outcome is
a member of A.
Definition 2.9 (Conditional Probability). Suppose that we learn that an event B has
occurred and that we wish to compute the probability of another event A taking into
account that we know that B has occurred. The new probability of A is called the
conditional probability of the event A given that the event B has occurred and is denoted
P(A jB ). If P(B) > 0, we compute this probability as
P(A jB ) = P(A \ B)
P(B)
. (2.16)
The conditional probability P(A jB ) is undefined if P(B) = 0.
Definition 2.10 (partition of sample space). Let (Q,A,P) be a probability space. A
partition of Q is a collection of events B1 2 A, . . . , Bk 2 A such that B1, . . . , Bk are
disjoint and
Sk
i=1 Bi = Q.
26 Chapter 2  Fundamentals of Statistical Timing Analysis
Theorem 2.11 (law of total probability). Suppose that the events B1, . . . , Bk form a partition
of the sample space Q and P

Bj

> 0 for j = 1, . . . , k. Then, for every event A in A,
P(A) =
k
å
j=1
P

Bj

P

A
Bj. (2.17)
The proof can be found in [DeGro12, Theorem 2.1.4].
Definition 2.12 (independence). The k events A1 2 A, . . . , Ak 2 A are called independent
(or mutually independent) if, for every subset Ai1 , . . . , Aij of j of these events (j = 2, . . . , k),
P

Ai1 \ . . . \ Aij

= P

Ai1

  P

Aij

. (2.18)
2.4.2 Random Variables
A random variable can be thought of as a quantity that can be measured in a random
experiment. The quantity is determined once an experiment has been performed, in
other words, an outcome q 2 Q (a circuit instance) has been chosen.
Formally, a random variable X is a function that assigns a real number to an outcome
of an experiment
X : q 2 Q! R. (2.19)
For compactness of notation, q is omitted and X is written to denote X(q), wherever it
is safe to to do.
Definition 2.13 (Continues Random Variable). We say that a random variable X has
a continuous distribution or that X is a continuous random variable if there exists a non-
negative function f , defined on the real line, such that for every interval of real numbers
(bounded or unbounded), the probability that X takes a value in the interval is the
integral of f over the interval.
This work is mostly concerned with continues random variables so that in the following,
"random variable" is used to refer to a continues random variable, wherever it is safe to
do so.
Random variables can be used to define events. For example, if X(q) denotes the delay
of a particular path, then
fq 2 Q : X(q)  Tclkg (2.20)
represents the event (subset of all circuit instances) in which the delay X of the path is
less or equal the clock cycle time Tclk. Then the probability of this event is
P(fq 2 Q : X(q)  Tclkg). (2.21)
Instead of the above expression, the shorthand notation P(X  Tclk) is used throughout
this work.
The distribution of the random variable X can be described by a probability density
function, which assigns probabilities to intervals.
2.4  Formal Modelling of Variability 27
Definition 2.14 (Probability Density Function (PDF)). If X has a continuous distribution,
then the function f described in definition 2.13 is called the probability density function
(PDF) of X.
The distribution of the random variable X can also be described by the cumulative
distribution function.
Definition 2.15 (Cumulative Distribution Function (CDF)). The distribution function or
cumulative distribution function (CDF) F of a random variable X is the function
F(x) = P(X  x) for  ¥ < x < ¥. (2.22)
It can be shown that the cumulative distribution function F has the following properties
• F is monotonically increasing, right-continuous, 0  F  1,
• lim
x!+¥ F(x) = 1 and
• lim
x! ¥ F(x) = 0.
Theorem 2.16. Let X have a continuous distribution, and let f (x) and F(x) denote its PDF
and the CDF, respectively. Then F is continuous at every x,
F(x) =
xZ
 ¥
f (t)dt, (2.23)
and
dF(x)
dx
= f (x) (2.24)
at all x such that f is continuous.
The proof can be found in [DeGro12, Theorem 3.3.5].
The term critical path is redefined in a statistical setting as a path whose probability of
exceeding a deterministic critical delay for the circuit is higher than a certain threshold
[Sriva05]. This is formalized in the following definition.
Definition 2.17. A sensitizable logical path with delay X is called critical path if and
only if
P(X > Tclk) > pcri, (2.25)
where Tclk denotes the clock cycle time and pcri is a user defined threshold.
Moments
Important properties of random variables can often be described by specific quantitative
measures, called moments, instead of the distribution function. For example, the
distribution of a random variable X is often summarized by two numbers: the mean as
a measure of the centre and the variance as a measure of the dispersion or spread.
28 Chapter 2  Fundamentals of Statistical Timing Analysis
Definition 2.18. Let X denote a random variable with probability density function f .
Suppose that at least one of the following integrals is finite:
+¥Z
0
x f (x)dx,
0Z
 ¥
x f (x)dx. (2.26)
Then the mean, expectation, or expected value of X is said to exist and is defined to be
E(X) =
+¥Z
 ¥
x f (x) dx. (2.27)
If both of the integrals in eq. (2.26) are infinite, then E(X) does not exist.
Definition 2.19. [Variance/Standard Deviation] Let X be a random variable with finite
mean m = E(X). The variance of X, denoted by Var(X), is defined as
Var(X) = E((X  m)2). (2.28)
If X has infinite mean or if the mean of X does not exist, it is said that Var(X) does not
exist. The standard deviation of X is
p
Var(X) if the variance exists.
Higher order moments like E(X2) could be computed from definition 2.18 by comput-
ing the PDF of the random variable Y = X2. However, the expectation of X2 can always
be calculated directly using the following theorem.
Theorem 2.20. Let X be a random variable, and let r be a real-valued function of a real variable.
If X has a continuous distribution, then
E(r(X)) =
+¥Z
 ¥
r(x) f (x) dx, (2.29)
if the mean exists.
The proof can be found in [DeGro12, Theorem 4.1.1].
Definition 2.21. Suppose that X is a random variable with E(X) = m well defined. For
every positive integer k, the expectation E((X  m)k) is called the kth central moment of
X or the kth moment of X about the mean.
For example, the variance of X is the second central moment of X.
The mean and the variance do not uniquely identify a probability distribution. In fact,
random variables of very different probability distributions can have the same mean
and variance. This leads to a measure that can be used to quantify the lack of symmetry
of the PDF.
2.4  Formal Modelling of Variability 29
Definition 2.22 (Skewness). Let X be a random variable with mean m = E(X), standard
deviation s and finite third moment. The skewness of X, denoted by Sk(X), is defined
as
Sk(X) = E((X  m)3)/s3. (2.30)
The reason for dividing by s3 is to make this measure independent of the variance of
the distribution. In fact, Var((X  m)3/s3) = 1. The skewness is often used to quantify
the deviation of a distribution of a random variable from a normal distribution.
Univariate Normal Distribution
The normal distribution is a very common continuous probability distribution.
Definition 2.23. A random variable X has the univariate normal distribution with mean m
and variance s2 ( ¥ < m < ¥ and s > 0) if X has a continuous distribution with the
following PDF:
f (x; m, s2) =
1p
2ps2
exp
 
  (x  m)
2
2s2
!
for  ¥ < x < ¥. (2.31)
The proof that the above defined function is indeed a PDF can be found in [DeGro12,
Theorem 5.6.1]. In the following, the convenient shorthand notation X  N (m, s2) is
used to describe that X has a univariate normal distribution with parameters m and s.
Theorem 2.24. The mean and variance of the distribution with PDF given by eq. (2.31) are m
and s2, respectively.
The proof can be found in [DeGro12, Theorem 5.6.3].
The univariate standard normal distribution is a special case of the univariate normal
distribution with m = 0 and s = 1. In this work, the function f with
f(x) :=
1p
2p
e 
x2
2 (2.32)
will be used to denote the probability density function and the function F with
F(x) :=
xZ
 ¥
f(y)dy (2.33)
will be used to denote the cumulative distribution function of the standard normal
distribution.
Whenever a random experiment is replicated many times, the random variable that
represents the average result over all replicates tends to be similar to a normal distribu-
tion. The underlying fundamental result is known as the central limit theorem, which
makes the normal distribution remarkably useful.
30 Chapter 2  Fundamentals of Statistical Timing Analysis
Theorem 2.25 (Lindeberg-Feller Central Limit Theorem). Suppose that fXig, i = 1, . . . , n
is a sequence of independent random variables with finite means mi and finite positive variances
s2i and X¯n = (1/n)å
n
i=1 Xi. Let
m¯n =
1
n
(m1 + m2 +   + mn) and s¯2n =
1
n
(s21 + s
2
2 +   + s2n).
If no single term dominates the average variance s¯2n , say
lim
n!¥
1
ns¯n
max
i
(si) = 0 (2.34)
and if the average variance converges to a finite constant
s¯2 = lim
n!¥ s¯
2
n , (2.35)
then p
n(X¯n   m¯n) d ! N (0, s¯2). (2.36)
Theorem 2.25 is a simplified description of the central limit theorem, which is quoted
from [Green12, Theorem D.19]. The complete theorem, the Lindeberg condition and an
elaborate proof can for example be found in [Resni14, Theorem 9.8.1].
Descriptive Statistics: Quartiles and Box Plot
In some situations the distribution of a random variable is unknown and only a
sample of measurements is available, for example of the oscillation frequency of a
particular ring oscillator on a large number of circuits. These measured values must be
summarized for further analysis [Montg13].
The sample median describes the central tendency of the data by dividing the data into
two parts of equal size, half below and half above the median. In case the number of
measurements is even, then the median is defined to be halfway between the two central
measured values. The ordered set of measured values can be further divided into four
equal sized parts, where the division points are called quartiles. The first quartile has
approximately 25% of the measured values below and 75% of the measured values
above it. The second quartile is the median and the third quartile has approximately
75% of the measured values below its value.
A box plot describes several important features of the data, such as center, spread and
skewness, in a compact diagram. The box plot (box-and whisker plot) shows the
minimum, maximum and the three quartiles in a rectangular box. The left/lower edge
of the box is the first quartile and the right/upper edge is the third quartile. The black
line inside the box marks the second quartile, which is the median of the data. The
left/lower whisker is the line between the minimum and the first quartile. Likewise,
the right/upper whisker is the line between the third quartile and the maximum.
A typical example is shown in fig. 2.3. It shows that the distribution is fairly symmetric
because the lengths of the left and right whisker are about the same and the black line
(median) roughly divides the box into two equal parts.
2.4  Formal Modelling of Variability 31
qFigure 2.3 — Description of a box plot
2.4.3 Canonical Delay Model
In [Zhong11], the authors show for a 65nm manufacturing technology that the abstrac-
tion of process variations by delay variations is possible and reasonable to limit the
complexity of the statistical timing analysis.
Following the classification of non-systematic variability given in section 2.3.2, a process
parameter Z is a random variable and can be written as [Sriva05]
Z = z+ Z˘+ Zˆ, (2.37)
where z denotes the nominal value of the process parameter, Z˘ denotes a random vari-
able representing the inter-die variation and Zˆ denotes a random variable representing
the intra-die variation. The random variables Z˘ and Zˆ are usually assumed to have a
normal distribution with zero mean.
In statistical timing analysis, each delay value of a gate is a function of random variables
and therefore itself a random variable. According to [Viswe04], a delay value U can be
expressed as a linear function
U = u+
k
å
i=1
aiZi + ak+1Z˜ (2.38)
of k process parameters Z1, . . . ,Zk, where u denotes the mean delay, the coefficients
a1, . . . , ak represent the sensitivity of the delay to the process parameters Z1, . . . ,Zk and
ak+1 is the sensitivity of the delay to the residual variability Z˜, which is also a random
variable with univariate normal distribution and mean zero.
More details about the circuit and gate modelling can be found in sections 2.5 and 7.2.
2.4.4 Random Vectors
The delays of n paths in a circuit, described by n random variables, are usually not
independent due to structural and spacial correlations. To describe the relationship
between the path delays, the random variables must be grouped into an n-dimensional
random vector X, which is a column vector of n random variables on the same probability
space. For any positive integers 1  i1 < i2 < . . . ir  n with 1  r  n the random
vector
(Xi1 ,Xi2 , . . . ,Xir)
T (2.39)
is called an r-dimensional subvector of the random vector X.
32 Chapter 2  Fundamentals of Statistical Timing Analysis
The distribution of a random vector is described by extending the concept of a proba-
bility density function for a single random variable to n-dimensional spaces.
Definition 2.26 (Joint CDF). The joint cumulative distribution function (CDF) of n random
variables X1, . . . ,Xn is the function F whose value at every point (x1, . . . , xn)
T 2 Rn is
specified by the relation
F(x1, . . . , xn) = P(X1  x1, . . . ,Xn  xn). (2.40)
Definition 2.27 (Joint PDF). It is said that n random variables X1, . . . ,Xn have a contin-
uous joint distribution if there is a nonnegative function f defined on Rn such that for
every subset C  Rn,
P((X1, . . . ,Xn) 2 C) =
Z
C
f (x1, . . . , xn) dx (2.41)
with x = (x1, . . . , xn) if the integral exists. The function f is then called the joint
probability density function (PDF) of X1, . . . ,Xn.
For the given random vector X = (X1,X2)
T of two path delays X1 and X2 with joint
PDF fX , the probability that the observed path delays are within a particular region R
of a two-dimensional space is given by the double integral
P((X1,X2) 2 R) =
ZZ
R
fX(x1, x2) dx1dx2 (2.42)
of fX over the region R. The above expression can be interpreted as the volume under
fX in the region R. For example, the probability that none of the paths has a path delay
fault is
P(X1  Tclk,X2  Tclk) =
TclkZ
 ¥
TclkZ
 ¥
fX(x1, x2) dx1dx2, (2.43)
where Tclk denotes the clock cycle time.
Theorem 2.28. Let X and Y have a joint CDF F. The CDF F1 of just the single random
variable X can be derived from the joint CDF F as F1(x) = limy!¥F(x, y). Similarly, the
CDF F2 of Y equals F2(y) = limx!¥F(x, y), for 0 < y < ¥.
The proof can be found in [DeGro12, Theorem 3.4.5].
Definition 2.29 (Marginal CDF/PDF). Suppose that X and Y have a joint distribution.
The CDF of X derived by theorem 2.28 is called the marginal CDF of X. Similarly, the
PDF of X associated with the marginal CDF of X is called marginal PDF of X.
2.4  Formal Modelling of Variability 33
Theorem 2.30. If X and Y have a continuous joint distribution with joint PDF f , then the
marginal PDF f1 of X is
f1(x) =
+¥Z
 ¥
f (x, y) dy for  ¥ < x < ¥. (2.44)
Similarly, the marginal PDF f2 of Y is
f2(y) =
+¥Z
 ¥
f (x, y) dx for  ¥ < y < ¥. (2.45)
The proof can be found in [DeGro12, Theorem 3.5.2].
Definition 2.31 (Conditional PDF). Let X and Y have a continuous joint distribution
with joint PDF f and respective marginals f1 and f2. Let y be a value such that f2(y) > 0.
Then the conditional PDF g1 of X given that Y = y is defined as follows:
g1(xjy) =
f (x, y)
f2(y)
for  ¥ < x < +¥. (2.46)
Moments
The concept of moments of a single random variable can be extended to random vectors.
The mean (expectation) of the n-dimensional random vector X = (X1, . . . ,Xn)
T is the
vector of the expected values of all random variables
E(X) = (E(X1), . . . ,E(Xn))
T. (2.47)
A common measure for the relationship between two random variables is the covariance.
To formally define the covariance, it is necessary to introduce the expected value of a
function of the two random variables.
Theorem 2.32. Suppose that X1, . . . ,Xn are random variables with the joint PDF f . Let r be a
real-valued function of n real variables, and suppose that Y = r(X1, . . . ,Xn). Then E(Y) can
be determined directly from the relation
E(Y) =
Z
R
n
r(x1, . . . , xn) f (x1, . . . , xn) dx (2.48)
with x = (x1, . . . , xn)
T, if the mean exists.
The proof can be found in [DeGro12, Theorem 4.1.2].
Definition 2.33 (Covariance). Let X and Y be random variables with finite means mX
and mY, respectively. The covariance of X and Y, which is denoted by Cov(X,Y), is
defined as
Cov(X,Y) = E((X  mX)(Y  mY)), (2.49)
if the expectation in eq. (2.49) exists.
34 Chapter 2  Fundamentals of Statistical Timing Analysis
The sign of the covariance shows the tendency of linear relationship between X1 and
X2. However, a covariance of zero does not imply independence because a non-linear
relationship might exist between the random variables. A well-known example is the
relationship between X and Y = X2, which are clearly not independent. However, if
E(X) = 0 and E(X3) = 0, then Cov(X,Y) = E(XY) E(X)E(Y) = E(X3) = 0.
Theorem 2.34. For all random variables X and Y such that Var(X) < ¥ and Var(Y) < ¥,
Cov(X,Y) = E(XY) E(X)E(Y). (2.50)
The proof can be found in [DeGro12, Theorem 4.6.1].
Interpreting the magnitude of the covariance Cov(X,Y) is difficult because it depends
on the variances of X and Y. A normalized version of the covariance is the correlation,
which is defined as follows.
Definition 2.35 (Correlation). Let X and Y be random variables with finite variances
s2X and s
2
Y, respectively. Then the correlation of X and Y, which is denoted by r(X,Y),
is defined as follows:
r(X,Y) =
Cov(X,Y)
sXsY
. (2.51)
It can be shown that  1  r(X,Y)  +1 [DeGro12, Theorem 4.6.3].
Definition 2.36 (Mean Vector/Covariance Matrix). If Y is a n-dimensional random
vector, then the vector E(Y) = (E(Y1), . . . ,E(Yn))
T is called the mean vector of Y . The
covariance matrix Var(Y) of Y is defined to be the n n matrix such that, for i = 1, . . . , n
and j = 1, . . . , n, the element in the ith row and jth column is Cov(Yi,Yj).
For any affine transformation of the random vector of the form Y = AX + b with
A 2 Rkn of rank k and b 2 Rk, the mean vectors and the covariance matrices of X
and Y are related as follows.
Theorem 2.37. Let X denote a n-dimensional random vector with mean vector mX = E(X)
and covariance matrix SX = Var(X). If A 2 Rkn has rank k and b 2 Rk, then the
k-dimensional random vector
Y = AX + b (2.52)
has a mean vector mY = E(Y) and covariance matrix SY = Var(Y) given by
mY = AmX + b (2.53)
SY = ASXA
T. (2.54)
The proof can be found in [DeGro12, Theorem 11.5.2]. The above theorem implies that
Var(X   c) = Var(X) for any real-valued vector c 2 Rn.
Theorem 2.38. Correlation matrices and covariance matrices are symmetric and positive
definite.
2.4  Formal Modelling of Variability 35
The proof can be found in [Mille12, Theorem 6.1].
Using theorem 2.32, the concept of covariance can be generalized to higher order joint
moments, which will be frequently used in chapter 6.
Definition 2.39. Let X1 and X2 be random variables with finite means mX1 and mX2 ,
respectively. Then E(Xm1 X
n
2 ) is called the (m, n)th joint moment and
E((X1   mX1)
m(X2   mX2)
n) (2.55)
is called the (m, n)th joint central moment of X1 and X2.
Let fX denote the joint PDF of X1 and X2, then from theorem 2.32 it follows that
E(Xm1 X
n
2 ) =
+¥Z
 ¥
+¥Z
 ¥
xm1 x
n
2 fX(x1, x2) dx1dx2 (2.56)
and
E((X1   mX1)
m(X2   mX2)
n) =
+¥Z
 ¥
+¥Z
 ¥
(x1   mX1)
m(x2   mX2)
n fX(x1, x2) dx1dx2. (2.57)
Definition 2.40. The third multivariate cumulant, denoted by k3, of a random vector X is
k3(X) = E((X  E(X))
 (X  E(X))
 (X  E(X))T), (2.58)
where 
 denotes the Kronecker product.
Important properties of k3(X) can be found in chapter 6 and in [Franc10, Luca15].
Definition 2.41 (Conditional Expectation). Let X and Y be random variables such that
the mean of Y exists and is finite. If Y has a continuous conditional distribution given
X = x with conditional PDF g2(yjx), then
E(Y jx ) =
+¥Z
 ¥
yg2(yjx)dy. (2.59)
is called the conditional expectation (or conditional mean) of Y given X = x.
Definition 2.42 (Conditional Means of Random Variables). Let h(x) = E(Y jx ) denote
the conditional mean of Y given X = x, then the conditional mean E(Y jX ) of Y given
X is E(Y jX ) = h(X).
Clearly, E(Y jX ) is a function of the random variable X and therefore itself a random
variable whose value is E(Y jx ) when X = x. An important special case is given by the
following theorem.
36 Chapter 2  Fundamentals of Statistical Timing Analysis
Theorem 2.43. Let (Q,A,P) be a probability space and B 2 A be an event with P(B) > 0.
Furthermore, let X(q) := 1B(q) and E(Y(q) jB ) := E(Y(q) jX(q) = 1 ), then
E(Y(q) jB ) = E(Y(q)1B(q))
P(B)
, (2.60)
where 1 denotes the indicator function.
The proof can be found in [Ash00, p.212].
Bivariate Normal Distribution
Theorem 2.44. Suppose that Z1 and Z2 are independent random variables, each of which has the
standard normal distribution. Let m1, m2, s1, s2, and r be constants such that  ¥ < mi < ¥,
si > 0 for i = 1, 2 and  1 < r < 1. If the random variables X1 and X2 are defined as
X1 = s1Z1 + m1 (2.61)
X2 = s2

rZ1 + (1  r2)1/2Z2

+ m2, (2.62)
then the joint PDF of X1 and X2 is
f (x1, x2) =
1
2p(1  r2)1/2s1s2
exp
 
  1
2(1  r2)
 
x1   m1
s1
2
 2r

x1   m1
s1

x2   m2
s2

+

x2   m2
s2
2!!
. (2.63)
The proof can be found in [DeGro12, Theorem 5.10.1].
Definition 2.45 (Bivariate Normal Distributions). If the joint PDF of two random
variables X1 and X2 is of the form in eq. (2.63), it is said that the random vector
X = (X1,X2)
T has a bivariate normal distribution with means m1 and m2, variances s
2
1
and s22 , and correlation r.
The probability density function of the bivariate normal distribution with m = (0, 0)T
and
S =

1 0.5
0.5 1

is shown in fig. 2.4. A more general and flexible distribution will be introduced in
section 6.1, whose PDF is shown in fig. 6.2 for the same mean vector m and covariance
matrix S.
2.4  Formal Modelling of Variability 37
qFigure 2.4 — Probability density function of the bivariate normal distribution with
m = (0, 0)T and S =
 
1 0.5
0.5 1

The bivariate standard normal distribution is a special case of the bivariate normal distri-
bution with m1 = m2 = 0 and s1 = s2 = 1. In this work, the function f2 with
f2(x1, x2; r) :=
1
2p(1  r2)1/2
exp
 
  1
2(1  r2)

x21   2rx1x2 + x22
!
(2.64)
will be used to denote the probability density function and the function F2 with
F2(x1, x2; r) :=
x1Z
 ¥
x2Z
 ¥
f2(y1, y2; r) dy1dy2 (2.65)
will be used to denote the cumulative distribution function of the standard bivariate
normal distribution.
Multivariate Normal Distribution
The multivariate normal distribution is a generalization of the bivariate normal distri-
bution to random vectors of arbitrary dimension [Czado11].
Definition 2.46. A n-dimensional random vector X is said to have a n-dimensional
normal distribution, if there exits an m 2 Rn and an L 2 Rnn with rank (L) = n, such
that
X = LZ+ m, (2.66)
where Z = (Z1, ...,Zn)
T is a random vector of independent random variables Z1, . . . ,Zn
and Zi  N (0, 1) for all i 2N with 1  i  n.
In this case, X is called normal random vector and it is written X  Nn(m,S), where
S = LLT.
From eq. (2.66) and theorems 2.24 and 2.37 it follows that E(X) = m, Var(X) = S and
that the family of multivariate normal distributions is closed under affine transforma-
tions. This is formalized in the following theorem.
38 Chapter 2  Fundamentals of Statistical Timing Analysis
Theorem 2.47. If X  Nn(mX ,SX), b 2 Rk and A 2 Rkn has rank k, then the random
vector
Y = AX + b (2.67)
has a k-dimensional normal distribution Nk(mY ,SY) with parameters
mY = AmX + b (2.68)
SY = ASXA
T. (2.69)
The proof can be found in [Gut09, chapter 5, Theorem 3.1].
Theorem 2.48. Let det(S) denote the determinant of the matrix S. If rank (S) = n and the
n-dimensional random vector X  Nn(m,S) has a normal distribution with parameters m and
S, then the probability density function of X is
fn(x; m,S) =
1p
(2p)n det(S)
exp

 1
2
(x  m)TS 1(x  m)

. (2.70)
The proof can be found in [Gut09, chapter 5, Theorem 5.2].
If X  Nn(m,S) and Cov(Xi,Xj) = 0 for any 1  i < j  n, then Xi and Xj are
independent.
2.5 Circuit Modelling
The physical, structural and behaviour aspects of a VLSI circuit can be modelled
at different abstraction levels [Gajsk83]. The algorithms presented in this work are
based on the structural representation at gate level, which provides a suitable trade-off
between the accuracy and the computational cost. The following subsections describe
the structure and the behaviour of any randomly chosen circuit instance q 2 Q (see
section 2.4.1).
2.5.1 Structural Circuit Instance Modelling
The circuit structure at gate level is a representation of the circuit components (gates,
standard cells, ports) and their interconnections. In the following and throughout
this work, only the primitive logic gates BUF, INV, AND, OR, NAND, NOR and the
complex cells XOR, XNOR with at most two inputs are considered.
To formally define the circuit instance model, the circuit netlist is transformed into a
directed graph G = (V, E) with a set of vertices V and a set of edges E  V V. Each
vertex represents an input/output pin of a gate/complex cell or an input/output port
of the netlist. An edge either corresponds to a pin-to-pin delay arc within a gate or
represents an interconnection between gates/complex cells or input/output ports of
the netlist. To account for asymmetric rising/falling and conditional path delays, a
tuple of real valued delays is associated with each edge. The construction of the circuit
instance graph is illustrated by the example in fig. 2.5.
2.5  Circuit Modelling 39
(a) Representation as netlist
(b) Representation as graph
qFigure 2.5 — Structural representation of a circuit instance at gate level
A structural path is a sequence of gates g1, . . . , gn such that an input of g1 is connected
to a circuit input, an output of gn is connected to an observable circuit output and the
output of gi is connected to an input of gi+1 for all 1  i < n. The input of gi+1, which
is connected to the output of gi, is called on-path input and any other input of gi is
called off-path input.
2.5.2 Behavioural Modelling of Circuit Instance
The circuit instance graph defines the structure of the circuit instance at gate level. This
subsection describes the propagation of transitions through the circuit instance during
the application of a test vector-pair. Each node of the circuit instance is associated with
a waveform, which consists of one or multiple transitions [Yalci97].
Definition 2.49 (transition). A transition is an ordered pair t = (v, t), where v 2 f0, 1g
denotes the logic value after the transition and t 2 R denotes the time at which the
transition occurs.
Definition 2.50 (waveform). A waveform w is a non-empty list of transitions
w = ((v0, t0), ..., (vm, tm)), (2.71)
such that tk 1 < tk and vk 1 6= vk for all k 2N with 0 < k  m and w contains at least
the special transition (v0, t0) with t0 =  ¥, which defines the initial value v0 of the
waveform.
40 Chapter 2  Fundamentals of Statistical Timing Analysis
The initial value of a waveform is defined as the logic value set by the first transition in
the waveform. Likewise, the final value of a waveform is defined by the last transition
in the waveform.
The above definition of a waveform is an abstraction of the voltage waveforms obtained
by SPICE simulations at electrical level. This abstraction provides a suitable trade-off
between the accuracy and the complexity of statistical timing analysis.
The definition of the gate model at this higher abstraction level requires a formal
foundation, which will be motivated by the example given in fig. 1.4a. The figure shows
that a transition at a gate input causes a transition at the gate output only after some
delay. It is also visible, that the gate output voltage changes more rapidly in response
to the rising transition at input ’b’ than in response to the falling transition at ’a’.
This behaviour is described by this model at a higher abstraction level as follows. The
arrival time t of a transition (v, t) at a circuit node describes the time at which the
voltage crosses Udd/2, where Udd denotes the supply voltage. If a transition (v1, t1) at
the input of a gate causes a transition (v2, t2) at the output of the gate, then the arrival
time difference t2   t1 is called propagation delay. This is illustrated in fig. 2.6 for an
inverter (INV) with input ’i’ and output ’o’. The figure shows that the propagation
delay is measured for a falling and a rising transition at the gate output and that these
delays are different in general. It also defines the rise and fall time of a transition as the
time between neighbouring 0.1Udd and 0.9Udd voltage crossing times.
qFigure 2.6 — Definition of propagation delay and rise and fall times [Rabae03]
For primitive gates with two inputs, the gate delay also depends on which gate input is
switching. This is due to the cell design but also because the slope of the gate input
transitions depends on the circuit structure. For example, one of the gate inputs might
be connected to a fanout-gate, that is, a gate whose output is connected to the inputs of
multiple gate. As a consequence, the voltage at the gate input might change only very
slowly, leading to higher propagation delay. Hence, in this model, each primitive gate
with two inputs is associated with four propagation delay values, depending on which
gate input is switching and if the transition at the gate output is a rising or falling
transition.
2.5  Circuit Modelling 41
For the complex cells XOR/XNOR, the propagation delay also depends on the logic
values at the other gate inputs at the arrival time of the input transition. This is again
due to the cell design and the circuit structure. Hence, each XOR/XNOR cell with
two inputs is associated with eight delay values. From among these delay values, the
appropriate value is selected for each input transition depending on the input at which
the transition occurs, the state of the other gate input and the direction of the output
transition created by the input transition.
A physical gate can only propagate a subset of the transitions at the gate inputs to the
gate output. A transition (v, t) at one of the input nodes of a gate is propagated to
the gate output node if and only if (v, t) satisfies the dynamic sensitization condition and
the inertial delay condition. The conjunction of dynamic sensitization and inertial delay
condition is called propagation condition.
The dynamic sensitization condition is always satisfied for a gate with only a single input
and for XOR/XNOR complex cells. For AND/NAND and OR/NOR gates with two
inputs, the opposite gate input must have the non-controlling value at the time t. A
gate input is said to have a controlling value if it determines the gate output logic value
irrespective of the other input values. The logic complement of the controlling value
is called non-controlling value. For example, for an OR and NOR gate the controlling
value is ’1’ and the non-controlling value is ’0’.
The inertial delay condition captures the physical limitation that a real gate cannot
produce arbitrary short glitches at the output, as shown in section 1.3.1. Without loss
of generality, the following description assumes that the gate output has stabilized to
logic ’0’ or logic ’1’ by the time the transition (v, t) occurs at one of the gate inputs.
Furthermore, it is assumed that (v, t) satisfies the dynamic sensitization condition
so that the gate can propagate this transition to the gate output after a particular
propagation delay d. This delay represents the time the gate requires to charge or
discharge the capacitive load connected to the gate output. The transition can only
appear at the gate output if no other transition (v0, t0), which also satisfies the dynamic
sensitization condition, occurs at the gate inputs during this time. More precisely, if
there exists an input transition (v0, t0) that satisfies the dynamic sensitization condition
of the gate and t < t0 < t+ d, then the inertial delay condition is violated by (v, t)
and (v0, t0). Otherwise, (v, t) satisfies the inertial delay condition of the gate and is
propagated to the gate output after delay d.

C
h
a
p
t
e
r3
State of the Art
This chapter discusses the state of the art in fundamental statistical timing analysis
problems that arise in delay test applications under the impact of delay variability.
Recent methods that can be used for the variation-aware evaluation of path delay
fault tests are discussed in section 3.1. The latest advances in the evaluation of the
small delay fault tests and their limitation are reviewed in section 3.2. The normal
distribution based SUM and MAX-operations are presented in section 3.3.1. Finally,
recent approaches to improve the efficiency of the statistical SUM and MAX-operations
are summarized in section 3.3.3.
3.1 Evaluation of Path Delay Fault Tests
This section summarizes previous methods, which can be used to analyse the sensitiza-
tion of a path by a set of test vector-pairs under the impact of delay variations.
3.1.1 Proof of Path Sensitization by Test Vector-Pair
A notable special case are methods, which treat gate delays as unknown quantities
but don’t explicitly consider the probability distributions of the gate delays. Although
strictly speaking not statistical timing analysis, these methods are nevertheless impor-
tant for path delay fault testing to proof that, under the timing-model assumptions, a
particular path (1) can be sensitized or (2) is always sensitized by a test vector-pair under
the impact of delay variations.
To formally describe path sensitization considering unknown gate delays, the authors of
[Ishiu89] propose time-symbolic simulation which treats time as an algebraic expression.
Each circuit node is associated with a set of waveforms, where a particular waveform
44 Chapter 3  State of the Art
occurs depending on the conditions satisfied by the gate delays of the circuit. For
a given test vector-pair, the condition that a path is sensitized is the disjunction of
the conditions associated with those waveforms in which a transition was propagated
along the path. A solution to any of those conditions yields a subset of the set of all
circuit instances Q in which the path is sensitized. Although this method works well
for small subcircuits, it is infeasible to evaluate and represent all possible waveforms at
each circuit node in large circuits.
An important special case, where the gate delays are unknown but bounded by some
known upper and lower bounds, has been addressed in [Ishiu90]. The authors propose
to use discrete gate delay values and to enumerate all possible delay values between
the lower and upper bound for each gate. The path sensitization by a given test
vector-pair is evaluated for all possible combinations of delay values. Although the
runtime complexity of this approach is exponential in the number of gates, several
optimizations such as representing all possible waveforms at the circuit nodes with
shared binary decision diagrams, can be applied to efficiently analyse small subcircuits.
However, this approach cannot be directly applied to large circuits due to its high
runtime complexity.
3.1.2 Probability that Path is Sensitized by Test Vector-Pair
In [Bose07, Jayar13], the authors use the bounded delay model to evaluate the probabil-
ity of path delay fault test invalidation. However, the bounded delay model implicitly
assumes that the gate delays have a uniform distribution and that all gate and path
delays are independent. These assumptions can lead to incorrect results. For example,
a target path might only be sensitized if and only if all gates along the target path have
highest possible delay and the gates of another path all have their lowest possible delays.
In practice, this event is extremely unlikely or even impossible in case of reconvergence
where at least two gates must lie on both paths. The identification of such impossible
cases requires time consuming reconvergent fanout analysis [Chakr99], which must
consider a large number of gate delay value combinations.
The authors of [Jung12] represent a waveform as a sequence of random variables,
where each random variable denotes the arrival time of a rising or falling transition,
respectively. To compute the waveform at the output of the gate, the normal distribution
based SUM and MAX-operations are applied to the waveforms at the gate inputs using
the corresponding gate delay values, which are also represented by random variables.
The main advantage of this method is that it can approximate the delay distribution of
the paths that are sensitized by a given test vector-pair while considering the impact of
the delay variability on the sensitization of the paths. However, this method doesn’t
consider the inertial delay property of the gates and a non-trivial extension would be
required to compute the probability that a particular path is sensitized by a given test
vector-pair under the impact of delay variations.
The most general and flexible approach to analyse the sensitization of a path is a Monte
Carlo simulation of the given test vector-pair considering delay variations [Xie09].
3.2  Evaluation of Small Delay Fault Tests 45
However, it is well known that this approach has a very high computational cost and
might even necessitate the use of (possibly specialized) parallel hardware architectures
for practical purposes [Schne15].
In [Tang14a], the authors propose a statistical simplified transistor model which can
be simulated by solving random differential equations. A voltage waveform is repre-
sented by a piecewise linear function of nominal voltages, where each voltage point
is associated with a sensitivity coefficient, which relates the variation of the voltage
to the variation of the process parameters. Although this method could in theory
be applied to compute the probability that a particular target path is sensitized by
a given test vector-pair, a lot of simplifying assumptions are made in the algorithm
and the experimental setup. In particular, intra-die variations are ignored during the
experimental evaluation and only small circuits are considered.
3.2 Evaluation of Small Delay Fault Tests
This section summarizes previous methods, which can be used to analyse the detection
of a given small delay fault by a given set of test vector-pairs under the impact of delay
variations.
The statistical timing analysis of large circuits can be drastically simplified by neglecting
all structural and spatial correlations [Yilma08, Wang09]. This is because the probability
that several events occur simultaneously is equal to the product of the probabilities of
the individual events under the assumption that all events are independent. However,
this assumption is unrealistic and results in large errors, except for tiny delay variations
in classical and mature technology nodes.
Some authors went even further and proposed measures for the "quality" or "effec-
tiveness" of a test vector-pair, which are misleading even under the assumption of
independent path delays. For example in [Peng10, Peng13], the authors compute the
probability distributions of the path delays by a Monte-Carlo simulation. For each
path that is sensitized by a test vector-pair, the probability that the delay of the path is
longer than a predefined threshold, is computed. But then the authors compute the
sum of these probabilities and use it as a measure of how likely a small delay fault
will be detected by the test vector-pair. It can easily be seen that this is an unrealistic
measure because it ignores structural and spacial correlations and the result is no
longer a probability. For example, a test vector-pair which sensitizes two paths that
have all but one gate in common might get a weight of e.g. 0.5+ 0.6 = 1.1, while a test
vector-pair which sensitizes ten slightly shorter but mutually independent paths might
get a weight of 5  0.02+ 5  0.01 = 0.15. Clearly, the latter test vector-pair is more likely
to detect a randomly chosen small delay fault. Even if the path delays were mutually
independent, the probability that the delay of at least one of these paths exceeds the
predefined threshold would be computed by multiplication and not addition.
In the pioneering work [Ingel11], the authors introduce the "test robustness" metric
for a test vector-pair, considering resistive bridging faults with uniformly distributed
46 Chapter 3  State of the Art
resistance. Given a resistive bridging fault of random resistance that can be detected by
at least one test vector-pair, the authors define the "test robustness" of a test vector-pair
as the probability P(A) of the event A 2 A in which the resistive bridging fault is
detected by the test vector-pair under the impact of process variations. The authors
propose to approximate this probability by computing the conditional probability
P(A jQi ) of detecting the resistive bridging fault in a subset of circuit instances Qi  Q,
for a large number of pairwise disjoint subsets Q1  Q, . . . ,Qn  Q of circuit instance.
The authors then approximate the probability of detecting the fault with the test
vector-pair ("test robustness") by
P(A) 
n
å
i=1
P(Qi)P(A jQi )
n
å
i=1
P(Qi)
, (3.1)
where P(Qi) denotes the probability that a randomly chosen circuit instance lies in Qi.
The expected error of this approximation decreases with the size of the denominator
and becomes negligible if the denominator is close to 1, which can be seen as follows.
Suppose that the subsets Q1, . . . ,Qn are pairwise disjoint and Q1 [ . . . [Qn = Q holds,
then Q1, . . . ,Qn is a partition of Q. From eq. (2.13) it follows that the denominator in
eq. (3.1) becomes 1 and theorem 2.11 implies the equality of both sides.
Except for mentioning a SPICE simulation, the authors don’t explicitly describe the
computation of the probabilities involved in eq. (3.1) and no information about the
runtime of this approach is given in the paper. Furthermore, it is questionable if the
assumption of a uniformly distributed resistive bridging fault resistance is realistic.
A Monte Carlo simulation based approach can naturally consider structural and
spatial correlations even in very complex circuit models. However, it is well known
that this approach is computationally very expensive and even requires the use of
parallel hardware architectures for practical applications. For example, the authors of
[Czutr12, Sauer14] compute the probability of detecting a small delay fault of fixed
size with a given set of test vector-pairs under the impact of process variations by
a Monte-Carlo simulation. The simulation time is reduced by simulating multiple
circuit instances in parallel on a GPGPU. To avoid the large overhead of repeating the
Monte-Carlo simulation of the entire test set after ever insertion or removal of a test
vector-pair, the authors store the random delay values of all circuit instances together
with the delay test results for all vector-pairs. However, this requires a large amount of
memory. An alternative approach was taken in [Aftab09], where the authors reduce the
computational cost of the Monte-Carlo simulation through compiled code simulation.
However, the compile time increases rapidly with the circuit size.
After a more careful analysis of the statistical timing analysis problem, it becomes clear
that a Monte-Carlo simulation of the entire circuit is unnecessary for the computation
of the probability of detecting a given small delay fault of fixed size with a given set of
test vector-pairs. Instead, only the critical paths (see definition 2.17), which are also
likely sensitized by the test vector, can have a significant impact on the delay test result.
3.3  Statistical SUM and MAX-Operations 47
In [Lee05b], the authors propose to combine a sensitization analysis with a block based
statistical timing analysis approach. At first, a single event driven timing simulation
using only nominal delay values is performed with the given test vector-pair. This is
done to identify the sensitized cone of the circuit on which the subsequent statistical
timing analysis is applied. Afterwards, the distribution of the arrival time of the last
transition at each gate output is computed using block-based statistical timing analysis.
However, this approach uses the normal distribution based SUM and MAX-operation
and therefore requires, that all gate delays have a normal distribution. This requirement
is not satisfied in low power applications, where the gate delays are more accurately
approximated by a log-normal distribution [Hanso05]. Furthermore, the distribution
of the size of small delay faults is widely believed to be similar to an exponential
distribution [Nigh00, Hopsc10].
3.3 Statistical SUM and MAX-Operations
The following sections 3.3.1 and 3.3.2 introduce the normal distribution based SUM
and MAX-operations. The main problem of block-based statistical timing analysis
is the limited accuracy of the SUM and the inaccuracy of the MAX-operation in the
recent technology nodes [Ul Ha11], which spurred research into a number of ways to
improve the accuracy. A summary of the most promising approaches is presented in
section 3.3.3.
3.3.1 Normal Distribution based SUM-operation
In the following, the SUM-operation is defined in a more general setting in which m
sums and their linear relationships are computed and each sum involves an arbitrary
subset of n random variables.
Definition 3.1. Let X = (X1, . . . ,Xn)
T denote a n-dimensional normal random vector.
Let B 2 f0, 1gkn be a binary matrix of rank k and let bi,j 2 f0, 1g denote the element
in the ith row and jth column of B with 1  i  k and 1  j  n. Then the
normal distribution based SUM-operation computes a k-dimensional random vector Y =
(Y1, . . . ,Yk)
T, defined as
Yi =
n
å
j=1
bi,jXj for all 1  i  k,
where bi,j = 1 if and only if Xj is part of the ith sum Yi, otherwise bi,j = 0.
In a notable special case, the matrix B is a row vector of zeros and ones so that the
result of the SUM-operation is a random variable Y with univariate normal distribution.
The normal distribution based SUM-operation can always be accurately computed by
Y = BX using theorem 2.47 with b = 0, because the family of multivariate normal
distributions is closed under affine transformations.
48 Chapter 3  State of the Art
3.3.2 Normal Distribution based MAX-operation
Definition 3.2. Given a random vector X = (X1, . . . ,Xn)  Nn(m,S) of n-dimensional
normal distribution, where m = (m1, . . . , mn)
T denotes the mean vector and S 2 Rnn
denotes the covariance matrix. Then the accurate approximation of the (n   1)-
dimensional random vector
Y = (Y1, . . . ,Yn 1)
T = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T. (3.2)
with a normal distribution Nn 1(mˆ, Sˆ) is called normal distribution based MAX-operation.
A very fast algorithm for the normal distribution based MAX-operation was presented
by Clark [Clark61]. At first, the algorithm approximates the distribution of the max-
imum Yn 1 = max(Xn 1,Xn) by a normal distribution. Afterwards, the non-linear
relationship between Yn 1 and each remaining random variable Yi with 1  i  n  2
is approximated by the covariance Cov(Yi,Yn 1), which only describes the degree of
linear relationship between Yi and Yn 1.
Let si,j and sˆi,j denote the element in the ith row and jth column of S and Sˆ, respectively.
Following the definitions
a :=
q
sn 1,n 1   2sn 1,n + sn,n (3.3)
a :=
mn 1   mn
a
, (3.4)
and using eqs. (2.32) and (2.33), the mean and the variance of Yn 1 = max(Xn 1,Xn) is
mˆn 1 = mn 1F(a) + mnF( a) + af(a) (3.5)
sˆn 1,n 1 = (sn 1,n 1 + m
2
n 1)F(a) + (sn,n + m
2
n)F( a)
+ (mn 1 + mn)af(a)  mˆ2n 1. (3.6)
The mean vector mˆ of Y is obtained by removing the last and replacing the second last
component of m, such that the last component of mˆ is mˆn 1.
Likewise, the covariance matrix Sˆ of Y is obtained from S by removing the last and
replacing the second last row and column. The variance sˆn 1,n 1 of Yn 1 is located
in the bottom right corner of Sˆ. Each of the remaining elements of the last row and
column of Sˆ is replaced with the covariance between max(Xn 1,Xn) and Xi, which is
computed by
sˆn 1,i = sˆi,n 1 = si,n 1F(a) + si,nF( a) for all 1  i  n  2. (3.7)
Although the computational cost of this algorithm is very small, it can be further
reduced by separate handling of boundary cases like F(a)  1 for a > 4 [Kuruv13].
Suppose X = (X1, . . . ,Xn)
T is a random vector of critical target path delays and
the normal distribution based MAX-operation is repeatedly applied to approximate
3.3  Statistical SUM and MAX-Operations 49
max(X1, . . . ,Xn) with a random variable Z of univariate normal distribution. A critical
target path is a target path of a test vector-pair that also satisfies eq. (2.25) of defini-
tion 2.17. Let the absolute error e of this approximation be defined as the maximum
absolute difference
e = max
t2R
(jP(Z  t) P(max(X1, . . . ,Xn)  t)j) . (3.8)
A box plot (see section 2.4.2) of the absolute error jej is shown in fig. 3.1 in percent for
a large number of critical target path sets of various sizes. It is apparent that using
the normal distribution based MAX-operation to compute the statistical maximum can
result in a large absolute error of more than 20%. Suppose the number n of critical target
path delays is between 64 and 127, then the absolute error caused by approximating
the distribution of max(X1, . . . ,Xn) by repeated application of the normal distribution
based MAX-operation was found to be between 5% and 10% in about 50% of the
conducted experiments.
4-15 16-31 32-63 64-127 128-255 256-511 512-1023
0
5
10
15
20
25
Number of critical target path delays
A
b
so
lu
te
er
ro
r
[%
]
qFigure 3.1 — Box plot of the absolute error jej of the approximation of the distri-
bution of max(X1, . . . ,Xn) by using normal distribution based MAX-operation, where
X1, . . . ,Xn are critical target path delays
3.3.3 Latest Approaches to Improve Accuracy of MAX-operation
In [Liou01, Devga03], the authors model gate delays as discrete random variables,
described by a joint probability mass function. However, the computational cost of the
MAX-operation increases exponentially in the number of possible delay values due to
structural and spacial correlations.
In [Chopr06], the authors define the MAX-operation based on the two piece normal
distribution, which has to be distinguished from the skew-normal distribution proposed
by Azzalini [Azzal13]. This approach is fundamentally limited by the unfortunate
choice of the bivariate two-piece normal distribution as the underlying joint distribution
of the operands of the MAX-operation, because this distribution cannot accurately
50 Chapter 3  State of the Art
represent the linear relationship between the random variables. Also, the relationship
between max(Xn 1,Xn) and the remaining random variables X1, . . . ,Xn 2 is ignored.
The authors of [Cheng08] use the identity
max(Xi,Xj) = Xi +max(0,Xj   Xi) (3.9)
and estimate the coefficients of a second order polynomial to approximate the dis-
tribution of max(0,Xj   Xi). However, this significantly complicates the approxima-
tion of max(Xi,Xj) due to the singularity of the distribution of max(0,Xj   Xi) at 0.
Even more importantly, this identity creates a non-linear relationship between Xi and
max(0,Xj  Xi), which clearly depends on the condition Xj > Xi. In fact, this approach
transforms the problem of accurately computing the distribution of the maximum of
correlated random variables into a new an equally hard problem of computing the sum
of Xi and max(0,Xj   Xi).
A gaussian mixture model based approach to represent the time and slope/slew of
transitions was studied in [Tsuki11]. The computation utilizes the algorithm for the
normal distribution based MAX-operation. However, accurately fitting mixture models
is known to be very difficult due to the higher number of parameters. The benefit
of this MAX-operation in terms of accuracy could only be demonstrated for certain
extreme cases. This approach also ignores the relationship between max(Xn 1,Xn) and
the remaining random variables X1, . . . ,Xn 2.
In [Vijay14], the authors investigate the use of the skew-normal distribution based
MAX-operation. However, the presented algorithm only considers the last two random
variables Xn 1 and Xn and ignores the relationship between max(Xn 1,Xn) and the
remaining random variables X1, . . . ,Xn 2.
C
h
a
p
t
e
r4
Probabilistic Sensitization Analysis
This chapter presents the novel probabilistic sensitization analysis, which efficiently
detects and analyses path delay fault test invalidation for a given test vector-pair. Some
of the techniques described here are used in the next chapter for the computation of
the target paths delay fault probability (see definition 1.5).
Section 4.1 formally defines the probabilistic sensitization analysis problem. A method
to detect the invalidation of a path delay fault test is introduced in section 4.2. Section 4.3
presents an efficient analysis algorithm, which identifies the paths that control the
sensitization of a target path during the application of a test vector-pair under the
impact of delay variations. To evaluate the impact of path delay fault test invalidation
on the delay test with a given test vector-pair, a representative subcircuit is constructed
in section 4.4, which enables the efficient Monte-Carlo simulation of the test vector-pair.
The experimental results are presented in section 7.3.
4.1 Probabilistic Sensitization Analysis Problem
The problem of path delay fault test invalidation is addressed by solving the underlying
fundamental statistical timing analysis problem, which is stated as follows.
Definition 4.1 (Representative Subcircuit). Given a test vector-pair n that can sensitize
any path from a set of critical paths P with non-negligible probability. Let q 2 Q
denote a randomly chosen circuit instance and let S be a subcircuit of q of well defined
fixed structure, such that S contains at least the paths P and all floating inputs of S
are set to either constant ’0’ or ’1’. The subcircuit S is called representative subcircuit for
n, if the probability that P contains a critical path that is sensitized by n in only either
S or q, is below a given threshold.
52 Chapter 4  Probabilistic Sensitization Analysis
It is important to emphasize that the structure of the subcircuit S is fixed but S always
"inherits" the propagation delay values from the circuit instance it is compared to.
Suppose q1, q2 2 Q with q1 6= q2 are two unique circuit instances, then the propagation
delays of the fixed set of gates in S are equal to those of q1 or equal to those of q2,
depending on if S is compared to q1 or to q2.
Definition 4.2. Finding the minimal representative subcircuit for a given test vector-pair
is called the probabilistic sensitization analysis problem.
A simple analytical method for the computation of the representative subcircuit is
explained by the example circuit shown in fig. 4.1, which has inputs ’a’,’b’,’c’ and ’d’
and output ’l’. A test vector-pair ((1, 1, 0, 1)T, (0, 0, 1, 0)T) is applied to the circuit inputs,
which is supposed to sensitize the path d-h-k-l. The sensitization of the path depends
on the waveforms at its off-path inputs ’c’ and ’j’. The transition at the off-path ’c’ is
known to occur at time 0. If X3 denotes the delay of the inverter u3, then X3 > 0 must
hold for the rising transition at ’h’ to be propagated to the gate output ’k’, which is
always satisfied.
The analysis of the off-path input ’j’ is more complex. At first, it is determined if a glitch
can occur at ’i’. Let X2 and X5 denote the delay of u2 and u5, respectively. Furthermore,
let X6 denote the delay of u6 to propagate the falling transition from ’g’ to ’i’. Then
X5 + X6 < X2 (4.1)
is necessary for the glitch to occur at ’i’. Given the probability distribution of the
random vector (X2,X5,X6)
T, the probability of the event defined by eq. (4.1) can be
computed. In this example, this probability is extremely small so that ’i’ can be assumed
constant logic value ’0’. If X1 denotes the delay of u1 and X7 denotes the delay of u7
to propagate the transition from ’e’ to ’j’, then a falling transition will occur at ’j’ at
time X1 + X7. Similarly, a falling transition will occur at ’k’ at time X3 + X4, where X4
denotes the delay of u4 to propagate the transition from ’h’ to ’k’. Then
X1 + X7 < X3 + X4 (4.2)
is necessary for the path d-h-k-l to be sensitized, that is, for the falling transition at
’k’ to be propagated to the circuit output ’l’. Given the probability distribution of
the random vector (X1,X3,X4,X7)
T, the probability of the event defined by eq. (4.2)
can be computed. Suppose P(X1 + X7 < X3 + X4)  0.5, then the sensitization of the
path d-h-k-l clearly depends on the path a-e-j, which must therefore be part of the
representative subcircuit. In this case, the path a-e-j is said to be a controlling path of the
path d-h-k-l.
Definition 4.3 (Controlling Path). A path p0 in the input cone of a path p is said to
be a controlling path of p, if in a randomly chosen circuit instance q 2 Q a random
variation of the delays of the gates on p0 changes the sensitization of p (that is p becomes
sensitized/not sensitized) with non-negligible probability.
4.2  Identification and Comparison of Sensitized Paths 53
qFigure 4.1 — Circuit and representative subcircuit (red) for a given test vector-pair
Any test invalidation mechanism is immediately apparent from the structure of the
representative subcircuit. In fact, the delay test generation process is not only made
aware of precisely those off-path inputs of the sensitized critical paths where test
invalidation is likely to occur, but the invalidation is also explained by the responsible
controlling paths. Beyond applying more stringent sensitization conditions directly to
the identified off-path inputs, the delay test generation can also prevent the sensitization
of the controlling paths that are responsible for the test invalidation.
For any critical path p, the probability that p is sensitized by the test vector-pair in
a randomly chosen circuit instance can be efficiently computed with a Monte-Carlo
simulation of the representative subcircuit. It will be shown that the construction and
Monte-Carlo simulation of the subcircuit is much faster than a Monte-Carlo simulation
of the complete circuit.
The analytical approach shown here works well for small circuits, but becomes very
complex for larger circuits. A more efficient method, which scales well to large circuits,
will be developed in the following sections of this chapter.
4.2 Identification and Comparison of Sensitized Paths
For a given circuit instance and a test vector-pair, this section describes the efficient
identification of the sensitized paths, along which transitions propagate from the circuit
inputs to the circuit outputs.
At first, the given circuit instance is simulated with the given test vector-pair. During
the simulation, each transition at the output of a gate is assigned a reference to its
predecessor transition at one of the inputs of the gate. These references are stored such
that after the simulation, any sensitized path can be efficiently identified by following
the reference of a transition to its predecessor transition until a circuit input is reached.
54 Chapter 4  Probabilistic Sensitization Analysis
Alternatively, if the trace starts at a circuit input, then the transition is traced to the
first gate on the path. To identify the output transition corresponding to the input
transition, it is checked that a transition t0 at the gate output has a reference to the
transition t at the input of the gate. If no such output transition t0 exists, then t
violates the propagation condition of the gate and the path terminates at the gate input.
This process is repeated until the end of the path has been reached.
To efficiently identify and compare the sensitized critical paths in different circuit
instances, a hash from the structural information of each sensitized critical path is
computed and used for identification.
4.3 Analysis of Target Path Sensitization by Test Vector-Pair
This section details the probabilistic sensitization analysis of a single target path, which
is sensitized by a given test vector-pair in the nominal circuit instance q0. The analysis is
performed on n randomly selected circuit instances q1 2 Q, . . . , qn 2 Q. This approach
is extended in section 4.4 to all critical paths that are sensitizable by the given test
vector-pair considering delay variations. The result of the analysis is a subcircuit
consisting of at least the target path. If the target path is not sensitized in at least one
circuit instance, then this subcircuit is extended by the controlling paths of the target
path.
For any i 2 N with 1  i  n, the basic principle of the analysis is to compare and
analyse the path sensitization in a circuit instance qi 2 Q with the path sensitization in
the subcircuit S , which initially contains only the target path.
Definition 4.4. A path is said to be consistently controlled by a test vector-pair in a
circuit instance q and the subcircuit S , if the path is sensitized in either both the circuit
instance q and the subcircuit S or in none of them. Otherwise, the path is said to be
inconsistently controlled in q and S .
If the target path is inconsistently controlled in a circuit instance qi and the subcircuit
S , then the sensitization of the target path depends on a structural path that is missing
in S . The proposed analysis explains an inconsistently controlled target path by
tracing waveform inconsistencies inside the input cones of the gate, which blocks
the propagation of the transition along the path in either the simulation of qi or the
simulation of S .
Definition 4.5. If t denotes a transition which is propagated along an inconsistently
controlled path, then the waveform at the terminal node of the path is said to have the
waveform inconsistency t.
An example for a waveform inconsistency is a transition, which is part of a glitch that
exists in only either the simulation of qi or in the simulation of S , but not in both.
Once a waveform inconsistency t˜ has been identified, the analysis is applied to the in-
consistently controlled path along which t˜ was propagated. In each step, the proposed
4.3  Analysis of Target Path Sensitization by Test Vector-Pair 55
analysis moves closer to the circuit inputs until a sensitized path, which is missing in
S , is found. Afterwards, the subcircuit S is extended with the corresponding structural
path by adding all gates and interconnects of that path. All unspecified off-path inputs
are set to the logic values, which they had during the simulation of the circuit instance
qi at the time the transition occurred at the on-path input of the gate.
In some cases, multiple paths jointly determine the sensitization of the target path in
the given circuit instance. Furthermore, the extension of the subcircuit made by the
analysis for one circuit instance might necessitate further extensions of the subcircuit
for other circuit instances. Therefore, the simulation of the subcircuit and the proposed
analysis is repeatedly applied to all circuit instances until the target path is consistently
controlled in the subcircuit S and all circuit instances q1, . . . , qn. The number of circuit
instances n is chosen such that the probability, that the target path is inconsistently
controlled in any circuit instance q 2 Q and the subcircuit S , is sufficiently small.
The following subsections present the details of the proposed analysis. For simplicity,
it is assumed that the inconsistently controlled path under analysis is sensitized only in
the circuit instance qi but not in the subcircuit S. However, the same description also
applies to the opposite case by swapping qi with S . The tracing of an inconsistently
controlled path is explained in section 4.3.1. The analysis of the propagation condition
of a gate is detailed in section 4.3.2.
4.3.1 Tracing of Inconsistently Controlled Path
To explain an inconsistently controlled target path, the propagation of the transition
is traced backwards starting at an observable circuit output node. In each step, the
transition at the input of the gate is identified by following the reference from the
output transition. Suppose the path is sensitized by the given test vector-pair in qi but
not in S , then the backtracing continues as long as the following two conditions are
satisfied:
(i) The gate input node is part of the subcircuit S .
(ii) The transition at the gate input occurs in qi but is missing in the subcircuit S .
If the first condition (i) is violated, then at least one gate of the currently traced path
must be missing in the subcircuit S . If the second condition (ii) is violated, then the
gate input transition t is only propagated through this gate in the simulation of qi,
but not in the simulation of S . By definition, the circuit instance qi and the subcircuit
S always share the same gate and interconnect delays. This implies that a waveform
inconsistency t˜ must exist at one of the inputs of the gate, which causes t to violate
the propagation condition during the simulation of S . After the responsible waveform
inconsistency t˜ has been identified, the analysis continues by backwards tracing t˜
in the same manner. The identification of the responsible waveform inconsistency is
explained in the next subsection.
56 Chapter 4  Probabilistic Sensitization Analysis
4.3.2 Analysis of Transition Propagation Condition
This subsection details the analysis of the propagation condition of the gate, which
blocks the propagation of a transition along the currently traced path. Let t = (v, t) be
a transition at an input node of the gate, which was propagated along a consistently
controlled path. However, t satisfies the propagation condition of the gate, without
loss of generality, only in the simulation of the circuit instance qi, causing a gate output
transition t0 = (v0, t0), but not in the simulation of the subcircuit S . Then the proposed
analysis must identify a responsible waveform inconsistency t˜ at one of the inputs of
the gate.
Analysis of Dynamic Sensitization Condition
At first, the waveform at the off-path input node of the gate is analysed to check if t
violates the dynamic sensitization condition in the simulation of the subcircuit S . If
that is the case, then at least one waveform inconsistency must exist at the off-path
input and the analysis proceeds by tracing the waveform inconsistency t˜, which occurs
closest to t at the off-path input. Otherwise, t must violate the inertial delay condition
of the gate in the simulation of S .
Let t = (v, t) denote a transition at the on-path input node of the gate at time t. For
the dynamic sensitization condition to be satisfied, the off-path input node of the gate
must have the non-controlling value at the time t. More generally to consider also
XOR/NXOR gates, the off-path input node must have the same logic value during
the simulation of qi and S at time t. If this condition is not satisfied, then at least
one waveform inconsistency must exist at the off-path input node and the analysis
proceeds by tracing the waveform inconsistency t˜ at the off-path input node. If multiple
waveform inconsistencies exist at the off-path input, then the one closest to time t is
selected. Needless to say, this condition is always satisfied for logic gates with only a
single input.
A simple example is given by the last NOR gate (u8) in fig. 4.1 with a falling transition
at each gate input. Figure 4.2 shows the input and output waveforms of the gate during
the simulation of the circuit instance qi, where the transition t˜ at the off-path input ’j’ of
the target path occurs after the transition t at the on-path input ’k’. As a consequence,
t violates the dynamic sensitization condition of the gate so that the target path is
not sensitized. Suppose the path a-e-j does not exist in the subcircuit, then t˜ occurs
in the simulation of qi but not in the simulation of S and is therefore a waveform
inconsistency. Consequently, the analysis proceeds by backwards tracing t˜ and then
finally adds the path a-e-j to the subcircuit.
Analysis of Inertial Delay Condition
If t satisfies the dynamic sensitization condition but t is not propagated to the gate
output, then t must violate the inertial delay condition. For an inverter or a buffer, this
implies that the waveform at the gate input node must contain at least one waveform
4.3  Analysis of Target Path Sensitization by Test Vector-Pair 57
qFigure 4.2 — Transition t violates and t˜ satisfies dynamic sensitization condition of
NOR gate
inconsistency. Then the proposed analysis identifies and starts tracing the last waveform
inconsistency t˜ before the transition t0 = (v0, t0) occurs at the gate output in qi.
The analysis of the inertial delay condition for a logic gate with two inputs is more
complex. Clearly, only those input transitions which satisfy the dynamic sensitization
condition can have an impact on the gate output waveform. From among those
transitions, only those which occur before or directly after t but no later than t0 are
relevant for the propagation of t. Let Lqi and LS denote the set of relevant transitions
at the inputs of the gate in the simulation of S and qi, respectively. Then a waveform
inconsistency must exist in Lqi [ LS , which is responsible for t satisfying the inertial
delay condition of the gate in either the simulation of qi or the simulation of S , but not
in both.
A simple example is provided by the XNOR gate u7 in fig. 4.1, under the assumption
that the subcircuit S also contains the path c-g-i-j-l with ’f’ set to logic ’0’. In other
words, the subcircuit S is obtained from the complete circuit, depicted in fig. 4.1, by
removing the path b-f and setting ’f’ to logic ’0’. The waveforms at the inputs and
output of gate u7 are shown in fig. 4.3 for random gate delays. The problem is that the
glitch at the output ’i’ of u6 does not appear in the simulation of the circuit instance qi,
so that ’i’ remains constant zero. However, in the simulation of the subcircuit S , the
falling transition at ’g’ is propagated through u6, because ’f’ is defined as constant ’0’.
As a consequence, a rising transition t˜ appears at ’i’ shortly after the rising transition t
at ’e’. In this example it is assumed that the time between t and t˜ is too short for a
glitch to appear at the output ’j’ so that t violates the inertial delay condition.
For the example given above, the sets of relevant transitions are
Lqi = ftg
LS = ft, t˜g.
The proposed analysis then explains the violation of the inertial delay condition by
t with the first waveform inconsistency in Lqi [ LS after t. If no such waveform
inconsistency exists, then the last waveform inconsistency in Lqi [ LS before t is selected
instead. In this example, the only waveform inconsistency in Lqi [ LS is t˜, which occurs
directly after t.
58 Chapter 4  Probabilistic Sensitization Analysis
(a) Transition t is propagated in circuit instance qi
(b) Transition t violates inertial delay condition in subcircuit S
qFigure 4.3 — Input and output waveforms of XNOR gate u7 in the circuit instance
and its subcircuit after the simulation of a test vector-pair
4.4 Construction of Representative Subcircuit
Under the impact of large delay variations, all possible delay tests of a target path
may be affected by path delay fault test invalidation. In this case, it is necessary to
identify those delay tests that have a sufficiently low probability of test invalidation
or to apply a suitable combination of path delay fault tests to the target path. All
necessary probabilities can be efficiently obtained from a Monte-Carlo simulation of a
small representative subcircuit, which will be constructed in this section.
The idea is to apply the proposed analysis in section 4.3 to all critical paths that can
be sensitized by the given test vector-pair with non-negligible probability. It will be
shown that the construction and Monte-Carlo simulation of the resulting subcircuit is
much faster than the Monte-Carlo simulation of the complete circuit. One important
observation is that the result of a delay test with a given test vector-pair is uniquely
determined by only the critical paths that are sensitized by the test vector-pair in the
circuit instance. A formal definition of a critical path is given by definition 2.17.
Given a circuit instance qi and a test vector-pair, the analysis starts by identifying
all sensitized critical paths in qi and in the subcircuit S . A critical path might be
sensitized in qi but not in S for two possible reasons. The first reason is that the
corresponding structural path is missing in S and must be added to S . Otherwise, the
path is inconsistently controlled and the sensitization analysis described in section 4.3
is applied to that path.
It is also possible for a critical path to be sensitized only in the subcircuit S but not in
the circuit instance qi. In that case, it can only be an inconsistently controlled path and
the sensitization analysis described in section 4.3 is applied to that path.
To repeat the simulation of the subcircuit S after adding one or more paths, the
subcircuit is reconstructed from all gates and interconnects which lie on at least one
of the paths of the subcircuit. All floating gate input nodes are set to their respective
non-controlling values. For the floating off-path input node of a XOR/XNOR gate, the
4.5  Simplified Probabilistic Sensitization Analysis 59
logic value the off-path input had at the time of the on-path transition is used. Finally,
all floating circuit outputs are set to the value observed at the clock cycle time.
The procedure outlined above is repeatedly applied to all circuit instances q0, . . . , qn
until every critical path, which is sensitized in at least one circuit instance, is consistently
controlled in all circuit instances q0, . . . , qn. The number of circuit instances is chosen
such that the probability that a critical path is sensitized by the test vector-pair in only
either the randomly chosen circuit instance qi or the subcircuit S , is sufficiently small.
For the given test vector-pair, the sensitization probabilities of the critical paths can now
be efficiently computed with a Monte-Carlo simulation of the obtained representative
subcircuit S . In each iteration, the sensitization of a particular critical path can be
determined by tracing the transition from the beginning to the end of the path, as
described in section 4.2.
4.5 Simplified Probabilistic Sensitization Analysis
This section presents a simplified version of the sensitization analysis, which merely
detects likely path delay fault test invalidation but doesn’t explain the invalidation by
the responsible paths. Instead, this analysis provides the location of the gate which
blocks the propagation of the transition along the target path and other useful details
for the path delay fault test generation process.
A given test vector-pair is simulated with the nominal circuit instance q0 and a randomly
chosen circuit instance qi 2 Q with 1  i  n. Afterwards, the simulation results are
compared and any critical path that is sensitized in the nominal circuit instance, but
not in the randomly chosen circuit instance, is subjected to this analysis.
By tracing the propagation of the transition along the path as explained in section 4.3.1,
the gate which blocks the propagation is identified. Afterwards, it is determined if the
propagation condition for the gate has changed because of a waveform inconsistency at
one of the gate inputs. In this context, the term "waveform inconsistency" is redefined
and now describes for a given test vector-pair a transition, which is propagated along
a logical path in only either the nominal or the randomly chosen circuit instance, but
not in both. For example, due to delay variations, a glitch may have appeared or
disappeared or any of the gate input transitions may have been propagated along
different paths.
If a waveform inconsistency exists at one of the gate inputs, then it is possible that this
inconsistency directly causes the invalidation of the path delay fault test. In other words,
the gate might not cause but merely propagate the effects of a waveform inconsistency
from inside the input cones of the gate inputs. Alternatively, the gate itself might
directly cause the waveform inconsistency at the gate output due to a variation of the
arrival time of the input transitions, regardless of any waveform inconsistency at the
gate inputs.
To distinguish these two cases, the proposed analysis first corrects all waveform
inconsistencies at the gate inputs. This is done by tracing the gate input transitions in
60 Chapter 4  Probabilistic Sensitization Analysis
the nominal circuit instance q0 and computing the propagation delay of these paths
in the randomly chosen circuit instance qi. Afterwards, the simulation of the gate
is repeated and it is checked if the target transition is successfully propagated. This
analysis can be extended to arbitrary modifications of the input waveforms.
4.6 Summary
The invalidation of path delay fault tests is a serious problem which can cause a large
number of test escapes. Probabilistic sensitization analysis is essential to efficiently
guide the delay test generation process into generating path delay fault tests that are
more tolerant towards delay variations.
This chapter introduced a novel probabilistic sensitization analysis, which examines
likely path delay fault test invalidation mechanisms. The analysis not only provides the
location of the gate which blocks the propagation of the transition along a target path,
but it also identifies the controlling paths that control the sensitization of the target
path.
The probability of path delay fault test invalidation with a given test vector-pair is
evaluated by constructing a representative subcircuit, which allows the efficient Monte-
Carlo simulation of the test vector-pair. The results of the analysis are essential to
efficiently compare the effectiveness of alternative test vector-pair and combinations
of test vector-pairs in order to minimize the risk of path delay fault test invalidation
without increasing test cost.
C
h
a
p
t
e
r5
Computation of Target Paths Delay Fault
Probability
This chapter presents an incremental and a non-incremental algorithm for the computa-
tion of the target paths delay fault probability, which is defined by definition 1.5, assuming
a single small delay fault of fixed size is present in the circuit. The analysis considers
only those test vector-pairs, which are applicable for the detection of the small delay
fault.
Definition 5.1. The term test subset refers to the subset of all test vector-pairs in the test
set, which target at least one path through the site of the particular small delay fault
under consideration.
The set of target paths of a test vector-pair is defined by definition 1.2.
The computation of the joint delay distribution of the critical target paths, presented in
section 5.1, is a fundamental part of both algorithms. The non-incremental algorithm is
described in section 5.2. In section 5.3, the faster but slightly less accurate incremental
algorithm is presented. The incremental algorithm requires an extension of the normal
distribution based MAX-operation, which is introduced in section 5.4.
In section 7.4, both algorithms are applied to approximate the delay fault detection
probability of a given test vector-pair, which is defined by definition 7.1.
5.1 Computation of Critical Target Paths Delay Distribution
For a given small delay fault of fixed size and a suitable test subset, this section present
the computation of the joint delay distribution of a subset of all target paths. This
62 Chapter 5  Computation of Target Paths Delay Fault Probability
subset contains only those target paths that have a significant impact on the target
paths delay fault probability. The precise meaning of the term "target path" is given by
definition 1.2.
5.1.1 Identification of Target Paths
For a given test vector-pair, the target paths are identified using the method described
in section 4.2. At first, the nominal circuit instance is simulated with the given test
vector-pair. Afterwards, the sensitized paths are identified by tracing the transitions
from the circuit outputs to the circuit inputs. This procedure is repeated for each test
vector-pair.
In general, a large number of paths might be targeted by the test vector-pairs in the
test subset. However, only a small subset of those target paths might have a path delay
fault with non-negligible probability. For the computation of the target paths delay
fault probability, it is sufficient to consider only those target paths that have a path
delay fault with non-vanishing probability. Therefore, the analysis in this chapter can
be restricted to the set of critical target paths, which is defined as the intersection of the
set of target paths (definition 1.2) and the set of critical paths (definition 2.17).
5.1.2 Computation of Delay Distribution of a Target Path
To distinguish between critical and non-critical target paths, the delay distribution
of every target path must be computed. Let n denote the number of gates along a
particular target path and let the random variables U1, . . . ,Un denote the corresponding
propagation delays of the gates along the path.
For independent gate delays U1, . . . ,Un, the convolution of all probability density
functions of U1, . . . ,Un is the probability density function of their sum, that is, the prob-
ability density function of the target path delay. Furthermore, the central limit theorem
implies that under certain mild conditions, the delay of a path with a sufficiently large
number of gates has approximately a normal distribution, regardless of the gate delay
distributions. This statement can be proven by theorem 2.25, which is also called the
Lindeberg-Feller central limit theorem. By using the notation of theorem 2.25 with Ui and
U¯ instead of Xi and X¯, respectively,
p
n(U¯n   m¯n) =
1p
n
 
n
å
i=1
Ui   nm¯n
!
, (5.1)
so that after multiplication with
p
n and subtraction of nm¯n, eq. (2.36) becomes
n
å
i=1
Ui
d ! N (nm¯n, ns¯2). (5.2)
In other words, under the conditions of theorem 2.25, the sum of independent random
variables åni=1Ui converges in distribution (denoted by
d !) to a normal distribution
with mean nm¯n and variance ns¯
2.
5.1  Computation of Critical Target Paths Delay Distribution 63
Generalizations of the central limit theorem exist for dependent random variables
[Louhi02], which supports the assumption that the critical target path delay can be
accurately approximated by a normal distribution even if the gate delays U1, . . . ,Un
are not normally distributed.
If the gate delays are not independent, all gate delays are grouped into a random vector
U = (U1, . . . ,Un)
T. (5.3)
The computation of the target path delay distribution simplifies if U has approximately
a normal distribution Nn(mU ,SU) with mean vector mU and covariance matrix SU .
Then theorem 2.47 with
A = (1, . . . , 1) 2 R1n
implies that the delay X of the target path also has a normal distribution with mean
E(X) = AmU (5.4)
and variance
Var(X) = ASUA
T. (5.5)
The above equation also considers any spacial correlations between the gate delays,
which are represented by the off-diagonal elements of the covariance matrix SU .
5.1.3 Computation of Joint Delay Distribution
Let n be redefined as the number of critical target paths, sensitized by a test vector-pair.
The delays of n critical target paths are grouped into a n-dimensional normal random
vector X = (X1, . . . ,Xn)
T. The approach described in this chapter requires, that the
joint delay distribution of X can be accurately approximated by a normal distribution.
According to theorem 2.37, this is satisfied if all gate delays are jointly normally
distributed. In this case, the joint path delay distribution can be efficiently computed
using the normal distribution based SUM-operation, presented in section 3.3.1. In
any other case, the joint delay distribution of X can be approximated by a normal
distribution by computing the mean vector and the covariance matrix of X. An example
of such an approach is presented in the next chapter in section 6.3.1.
In the following, it is assumed that the random vector of critical target path delays
X  Nn(m,S) has a multivariate normal distribution with mean vector m 2 Rn and
covariance matrix S 2 Rnn. The i-th component mi of the mean vector m represents
the mean of the i-th path delay. Likewise, the i-th diagonal element of the covariance
matrix is the variance of the i-th path delay.
The off-diagonal elements of the covariance matrix describe the degree of linear rela-
tionship between the path delays, originating from structural and spacial correlation.
If only structural correlations are considered, then the covariance between two path
delays is equal to the sum of the variances of all gate delays, which are shared by
both paths. If also spacial correlations are considered, then the covariance between the
64 Chapter 5  Computation of Target Paths Delay Fault Probability
delays of two target paths is computed as follows. Let U1 + . . .+Uk1 denote the sum
of k1 gate delays along path A and let V1 + . . .+Vk2 denote the sum of k2 gate delays
along path B. Then the covariance between the delay of path A and the delay of path B
is
Cov(U1 + . . .+Uk1 ,V1 + . . .+Vk2) =
k1
å
i=1
k2
å
j=1
Cov(Ui,Vj), (5.6)
which follows directly from theorem 2.37 with X = (U1, . . . ,Uk1 ,V1, . . . ,Vk2)
T,
A =
 k1z }| {
1    1
k2z }| {
0    0
0    0 1    1

2 R2(k1+k2) (5.7)
and b = 0.
5.2 Non-Incremental Computation
As stated by definition 1.5, the target paths delay fault probability is the probability,
that at least one of the target paths has a path delay fault. That is, the delay of at
least one target path is larger than the clock cycle time Tclk. Instead of enumerating all
possible cases in which at least one path has a path delay fault, it is more efficient to
compute the probability of the complementary event
Y = P(fq 2 Q : "none of the target paths has a path delay fault"g). (5.8)
Once Y has been computed, the target paths delay fault probability is
Y = 1 Y. (5.9)
The flowchart of the proposed non-incremental algorithm for the computation of the
target paths delay fault probability is presented in fig. 5.1. The algorithm approximates
the probability in three steps, which are described below.
5.2.1 Computation of Critical Target Paths Delay Distribution
In the first step, the joint delay distribution of the critical target paths, which terminate
at an unmasked observable circuit output node, is computed as described in section 5.1.
If the probability that a particular target path has a path delay fault is very large (e.g.
above 0.98), then the target paths delay fault probability will be close to one and the
remaining steps of the algorithm can be omitted. It is important to note that a path
might be targeted by multiple test vector-pairs in the test subset. To identify and remove
any duplicates among the critical target paths, a unique hash signature is computed
from the input transition and structural information (e.g. unique gate and interconnect
names) of each critical target path.
5.2  Non-Incremental Computation 65
START
Read Test Vector-Pair
Simulate Test Vector-Pair
1.1 Identification of Critical Target Paths
more test
vector-pairs?
1.2. Computation of Critical Target Paths Delay Distribution
2. Dimension Reduction with Statistical MAX-Operation
3. Numerical Integration
Output Target Paths Delay Fault Probability
END
yes
no
qFigure 5.1 — Flowchart of the non-incremental algorithm
Let n be redefined as the number of critical target paths, that are sensitized by all test
vector-pairs in a given test subset. Then let the random vector X be defined as
X = (X1, . . . ,Xn)
T, (5.10)
where X1, . . . ,Xn denote the delays of the critical target paths. By assumption X has a
multivariate normal distribution.
5.2.2 Dimension Reduction with Statistical MAX-Operation
In the second step of the algorithm, the dimension of X is reduced by the repeated
application of the normal distribution based MAX-operation, which was introduced in
section 3.3.2. By the first application of the normal distribution based MAX-operation,
the distribution of the random vector
(X1, . . . ,Xn 2,max(Xn 1,Xn))
T (5.11)
is approximated by a (n  1)-dimensional normal random vector (X1, . . . ,Xn 2,Y)T.
By the second application of the normal distribution based MAX-operation, the random
vector
(X1, . . . ,Xn 3,max(Xn 2,Y))
T (5.12)
is approximated by a (n  2)-dimensional normal random vector (X1, . . . ,Xn 3, Y˜)T.
This process continues until the number of random variables has dropped below a user
defined threshold. In this approach, the normal distribution based MAX-operation is
applied as long as the number of random variables is greater than 1000.
66 Chapter 5  Computation of Target Paths Delay Fault Probability
5.2.3 Numerical Integration
The target paths delay fault probability is computed in the last step with an efficient
numerical integration algorithm. Let m  n denote the number of the remaining
random variables after the previous step and let
(X1, . . . ,Xm 1, X˜m)
T  Nm(m,S) (5.13)
denote the m-dimensional normal random vector consisting of the normal distribution
approximation of the maximum delay X˜m of a subset of the critical target paths and
the delays X1, . . . ,Xm 1 of the remaining critical target paths. Then the probability that
none of the target paths has a path delay fault is approximately
Y = P
 
X1  Tclk, . . . ,Xm 1  Tclk, X˜m  Tclk

. (5.14)
According to eq. (2.40), the RHS of the above equation is merely an integral over the
probability density function of the m-dimensional normal distribution, specifically
Y =
TclkZ
 ¥
  
TclkZ
 ¥
fm(x; m,S)dx1...dxm. (5.15)
The above integral can be efficiently approximated by specialized numerical integration
algorithms for small [Genz04] and large dimensions [Genz92], which quickly converge
to the necessary accuracy. However, care must be taken because any linear dependency
(multicollinearity) between the path delays X1, . . . ,Xn implies that Var(X) is not positive
definite and therefore not a covariance matrix. Various techniques are available to
transform such a matrix into a positive definite matrix [Wothk93]. In this approach, all
diagonal elements of Var(X) are multiplied by a very small constant greater than unity.
Given Y, the target paths delay fault probability is finally computed with eq. (5.9).
The advantage of the non-incremental method is its relatively high accuracy. However,
the generation of multivariate normal random numbers for the Monte-Carlo simulation
in [Genz92] requires the computation of the Cholesky factorization of the covariance
matrix S 2 Rmm, which has O(m3) worst-case runtime complexity [Golub13]. For
example, fig. 5.2 shows the average runtime of the algorithms [Genz04, Genz92] on a
workstation with Intel(R) Core(TM) i7-2600 CPU at 3.40GiHz for a large number of
randomly chosen multivariate normal distributions, which describe the delays of the
critical target paths for a large number of test vector-pairs.
qFigure 5.2 — Runtime for approximating the integral in eq. (5.15)
5.3  Incremental Computation 67
5.3 Incremental Computation
This section presents an efficient incremental algorithm for the computation of the
target paths delay fault probability. The idea is to describe each test vector-pair by the
maximum delay of its sensitized critical target paths.
Definition 5.2 (Delay of Test Vector-Pair). If the random vector (Xi,1, . . . ,Xi,ni)
T consists
of the delays of all ni critical target paths, which are sensitized by the i-th test vector-pair
in the nominal circuit instance q0, then the random variable
Yi = max(Xi,1, . . . ,Xi,ni) (5.16)
is called the delay of the i-th test vector-pair.
The delays of k test vector-pairs are then grouped into a k-dimensional random vector
Y = (Y1, . . . ,Yk)
T, (5.17)
which is of central importance in this approach. This is because the maximum delay of
the target paths (see definition 1.4) can be expressed by the maximum max(Y1, . . . ,Yk)
of the delays of the test vector-pairs, as illustrated in fig. 5.3.
For greater efficiency, the distribution of Y is approximated by a multivariate normal
distribution. The incremental computation is then based on efficient modifications
of the parameters of this multivariate normal distribution. For example, the random
vector Y can easily be extended by the delay of a new test vector-pair without the need
to recompute the distribution of (Y1, . . . ,Yk)
T itself.
The proposed approach exploits that delay test parameter updates tend to imply only
small changes to the distribution of Y . For example, adding or removing a test vector-
pair will only extend or reduce the distribution of Y by one dimension. Hence, the
runtime for approximating the change of the target paths delay fault probability after a
delay test parameter update can be substantially reduced. This approach also benefits
from the typically small dimension of Y . However, the disadvantage of this approach is
the slightly reduced accuracy due to the approximation of the distribution of Y by a
multivariate normal distribution.
qFigure 5.3 — Computation of the maximum delay of the target paths in two steps
68 Chapter 5  Computation of Target Paths Delay Fault Probability
The following description focuses on the insertion and the removal of a test vector-pair
from the test subset. The update of other delay test parameters is described in sec-
tion 5.3.4. The flowchart in fig. 5.4 shows the four major steps of the proposed algorithm.
Following the insertion of a new test vector-pair, the joint delay distribution of the
critical target paths, which are sensitized by the new test vector-pair, is determined
in the first step as described in section 5.1. In the second step, the delay of the new
test vector-pair is approximated by a normal distribution. Afterwards, the multivariate
normal distribution, which approximates the distribution of Y , is extended by one
dimension in step 3. Finally, the target paths delay fault probability is computed in
step 4 using an efficient numerical integration method.
Following the removal of a test vector-pair from the test subset, the corresponding
entries in the mean vector and the covariance matrix of the multivariate normal
distribution approximation of Y , are deleted in step 3. Finally, the target paths delay
fault probability is computed in the last step.
START
update
type
Read Test Vector-Pair
1. Computation of Critical Target Paths Delay Distribution
2. Approximation of Delay Distribution of New Test Vector-Pair
3. Update of Normal Distribution Approximation of Y = (Y1, . . . ,Yk)T
4. Approximation of Target Paths Delay Fault Probability
Output Target Path Delay Fault Probability
END
REMOVE(test vector-pair)
INSERT(test vector-pair)
qFigure 5.4 — Flowchart of the incremental algorithm
5.3.1 Approximation of Delay Distribution of New Test Vector-Pair
The proposed algorithm requires that the delay of any test vector-pair is described
by a normal distribution. However, the distribution of the maximum of multiple
normally distributed random variables is not normally distributed [Nadar08]. Then
eq. (5.16) implies that the delay of a test vector-pair, which sensitizes at least two critical
target paths, is not normally distributed. To satisfy the condition of the algorithm,
this subsection explains how the delay of a test vector-pair can be approximated by a
normal distribution. It is assumed, that a given test vector-pair extends the current test
subset from k  1 to k test vector-pairs.
5.3  Incremental Computation 69
Let the random vector (Xk,1, . . . ,Xk,nk)
T consist of the delays of all critical target paths
that are sensitized by the new test vector-pair. Then according to eq. (5.16), the delay of
the new test vector-pair is
Yk = max(Xk,1, . . . ,Xk,nk). (5.18)
The distribution of the delay Yk is approximated by a normal distribution using
the extension of the normal distribution based MAX-operation that is presented in
section 5.4. The main advantage of using this extension is the additional flexibility,
which allows the formation of balanced binary tree like dataflow graphs, as shown
in fig. 5.5b. If changes to the distribution of the critical target paths are made (e.g.
by masking or unmasking of observable circuit outputs), only those MAX-operations
which lie on the path to Yk must be recomputed.
Without this extension, the normal distribution based MAX-operation forms a list
of MAX-operations, as shown in fig. 5.5a. As a consequence, on average half of the
MAX-operation results depend on the distribution of a single critical target path delay.
(a) classical (b) proposed extension
qFigure 5.5 — Approximation of max(Xk,1,Xk,2,Xk,3,Xk,4) using normal distribution
based MAX-operation
5.3.2 Update of Test Vector-Pairs Delay Distribution
By approximating the delay distribution of each test vector-pair by a normal distri-
bution, the distribution of the random vector Y = (Y1, . . . ,Yk)
T is approximated by a
multivariate normal distribution with mean vector mY and covariance matrix SY .
Following the insertion of a new test vector-pair into the test subset, the dimension of
the random vector Y increases by one. Consequently, the length of the mean vector and
the dimension of the covariance matrix increases from k  1 to k, as shown in fig. 5.6.
The new last entry of the mean vector equals mk = E(Yk) and the new diagonal element
in the bottom right corner of the covariance matrix equals sk,k = Var(Yk), where Yk
denotes the delay of the new test vector-pair that was approximated in section 5.3.1.
The remaining entries in the last row and last column of the covariance matrix are the
covariances between the delay Yk of the new test vector-pair and the delays Y1, . . . ,Yk 1
of the remaining test vector-pairs. These entries still need to be computed.
70 Chapter 5  Computation of Target Paths Delay Fault Probability
mY =
0BBB@
m1
...
mk 1
mk
1CCCA SY =
0BBB@
s1,1 . . . s1,k 1 s1,k
...
. . .
...
...
sk 1,1 . . . sk 1,k 1 sk 1,k
sk,1 . . . sk,k 1 sk,k
1CCCA
qFigure 5.6 — Extension of the mean vector mY and the covariance matrix SY of the
normal distribution approximation of Y , after insertion of a test vector-pair
The normal distribution based MAX-operation doesn’t provide a function to compute
the covariance sk,i = Cov(Yk,Yi) between two maxima of normally distributed random
variables. This function will be introduced in section 5.4 as the proposed extension of
the normal distribution based MAX-operation.
The computation of the covariances proceeds as shown in fig. 5.7. The nodes at the
leaf level represent the critical target path delays and the root nodes represent the
delays of the test vector-pairs. The internal nodes symbolize the application of the
normal distribution based MAX-operation. To compute the covariances between the
root nodes, the computation starts at the leaf level and proceeds level by level until
the root nodes have been reached. The proposed covariance computation function in
section 5.4 is applied to all pairs of nodes in one level, where one node is in the tree
corresponding to Yk and the other node is in any of the other trees at the same level.
After completing one level, the computation proceeds with the level above. It is not
necessary to compute the covariances within the new tree for Yk itself, since those have
already been computed during the approximation of the distribution of Yk.
qFigure 5.7 — Example for the forest data structure of the incremental algorithm,
where each tree represents the computation of the delay of a test vector-pair.
Following the removal of a test vector-pair. The dimension of the distribution of Y and
the corresponding multivariate normal distribution approximation decreases from k to
k  1. Consequently, the entries in the mean vector mY and the covariance matrix SY ,
corresponding to the removed test vector-pair, have to be deleted.
The computational cost for extending the multivariate normal distribution increases
linearly with the test subset size k under the assumption, that the number of target
paths, sensitized by any test vector-pair in the nominal circuit instance q0, is bounded
by some constant upper bound.
5.4  Extension of Normal Distribution based MAX-operation 71
5.3.3 Approximation of Target Paths Delay Fault Probability
After the test subset size has been increased or reduced to k test vector-pairs, the
probability Y¯ that none of the target paths have a path delay fault is
Y = P(Y1  Tclk, . . . ,Yk  Tclk). (5.19)
The proposed approach approximates the joint distribution of the test vector-pair delays
Y = (Y1, . . . ,Yk)
T by a k-dimensional normal distribution Nk(mY ,SY) with mean vector
mY and covariance matrix SY . Then using eq. (5.9) and theorem 2.48, the approximation
Yˆ of the target paths delay fault probability is
Yˆ = 1 
TclkZ
 ¥
  
TclkZ
 ¥
fk(y; mY ,SY)dy1...dyk, (5.20)
where y = (y1, . . . , yk) and fk(y; m,S) denotes the probability density function of the
k-dimensional normal distribution Nk(mY ,SY). This integral can be approximated by
efficient numerical integration algorithms for small [Genz04] and large dimensions
[Genz92] of the random vector Y .
5.3.4 Changing other Delay Test Parameters
The proposed incremental computation approach can be applied to other delay test
parameter updates. For example if changes to the clock cycle time or the masking of
observable circuit outputs are made, the set of target paths can be stored and reused for
each test vector-pair. However, some of the target paths may become critical target paths
if the clock cycle time is reduced or observable circuit outputs are unmasked. Likewise,
some critical target paths may need to be removed if observable circuit outputs are
masked.
If the set of critical target paths of a test vector-pair changes, the algorithm proceeds as
if this test vector-pair had been added to the test set. However, instead of extending
the multivariate normal distribution approximation of the distribution of Y , only the
operations affected by the modification of the set of critical target paths must be
repeated.
5.4 Extension of Normal Distribution based MAX-operation
Let (X1, . . . ,X4)
T denote a 4-dimensional normal random vector. This section introduces
a function for the normal distribution based MAX-operation, which computes the
covariance between the two maxima U = max(X1,X2) and V = max(X3,X4). The
accurate computation of this covariance is of critical importance for the proposed
incremental algorithm.
72 Chapter 5  Computation of Target Paths Delay Fault Probability
The normal distribution based MAX-operation [Clark61] computes the covariances
Cov(X1,V) and Cov(X2,V) using eq. (3.7), which can be written as
Cov(Xk,max(Xi,Xj)) = P

Xi > Xj

Cov(Xk,Xi) +P

Xi  Xj

Cov(Xk,Xj). (5.21)
However, the resulting random vector (X1,X2,V)
T doesn’t have a multivariate normal
distribution. Therefore, the above formula cannot be applied a second time to accurately
compute the covariance Cov(max(X1,X2),V).
The accurate covariance Cov(U,V) is computed from theorem 2.34, which states that
Cov(U,V) = E(UV) E(U)E(V), (5.22)
where the formula for the mean values E(U) and E(V) of the maxima is already
provided by eq. (3.5) as part of the normal distribution based MAX-operation. The
cross-moment E(UV) is expanded by applying the definition of the maximum over the
real numbers x1, . . . , x4, which gives
uv =
8>>>><>>>>:
x1x3 if x1 > x2, x3 > x4
x1x4 if x1 > x2, x3  x4
x2x3 if x1  x2, x3 > x4
x2x4 if x1  x2, x3  x4.
By replacing u and v with the random variables U and V, the above conditions imply a
partition of the sample space into four partitions
A1 =fq 2 Q : X1(q) > X2(q),X3(q) > X4(q)g
A2 =fq 2 Q : X1(q) > X2(q),X3(q)  X4(q)g
A3 =fq 2 Q : X1(q)  X2(q),X3(q) > X4(q)g
A4 =fq 2 Q : X1(q)  X2(q),X3(q)  X4(q)g,
where for clarity, Xi(q) is written instead of the short-hand notation Xi for all i = 1, . . . , 4
to emphasize the fact that a random variable is a function that depends on the outcome
q of the random experiment (see eq. (2.19)). Using this notation and eq. (2.1), E(UV)
can formally be written as
E(U(q)V(q)) = E(X1(q)X3(q)1A1(q))
+E(X1(q)X4(q)1A2(q))
+E(X2(q)X3(q)1A3(q))
+E(X2(q)X4(q)1A4(q)). (5.23)
Again using the shorthand notation, theorem 2.43 implies that
E(UV) = E(X1X3 jA1 )P(A1)
+E(X1X4 jA2 )P(A2)
+E(X2X3 jA3 )P(A3)
+E(X2X4 jA4 )P(A4). (5.24)
5.4  Extension of Normal Distribution based MAX-operation 73
For simplicity, the following presentation will focus on the random vector W , which is
a linear transformation of X, defined as
(W1,W2,W3,W4)
T = (X1,X1   X2,X3,X3   X4)T,
According to theorem 2.47, the transformed random vector W also has a 4-dimensional
normal distribution with mean vector
m¯ = (m1, m1   m2, m3, m3   m4)T
and covariance matrix
S¯ =
0BB@
s1,1 s1,1   s1,2 s1,3 s1,3   s1,4
s1,1 + s2,2   2s1,2 s1,3   s2,3 s1,3   s1,4   s2,3 + s2,4
s3,3 s3,3   s3,4
s3,3 + s4,4   2s3,4
1CCA ,
where mi denotes the mean of Xi and si,j denotes the covariance Cov(Xi,Xj) with
i, j 2 f1, . . . , 4g. Due to spacial limitations, only the upper triangular part of the
covariance matrix S¯ = fai,jg is shown here. The lower triangular part can easily be
obtained due to the symmetry S¯ = S¯T of the covariance matrix.
Let m¯i denote the mean and ai =
pai,i denote the standard deviation of Wi. Then
ri,j = ai,j/(aiaj) is called Pearson correlation coefficient, which describes the correlation
between Wi and Wj. After introducing the following notations
ai =  m¯i/ai (5.25)
b2 = (a4   r2,4a2)/
q
1  r22,4 (5.26)
b4 = (a2   r2,4a4)/
q
1  r22,4 (5.27)
and using the relationship
E(WiWj) = ai,j + m¯im¯j, (5.28)
the formulas for the truncated multivariate normal distribution, presented in [Birnb51]
and [Talli61], are simplified to
E(UV) = E(W1W3)
 E(W1W4)F(a4)+(m¯1  m¯2F(b4)) a4f(a4)
 E(W2W3)F(a2)+(m¯3  m¯4F(b2)) a2f(a2)
+E(W2W4)F2(a2, a4; r2,4)+(1 r22,4)a2a4f2(a2, a4; r2,4) (5.29)
with
E(U) = m¯1   m¯2F(a2) + a2f(a2) (5.30)
E(V) = m¯3   m¯4F(a4) + a4f(a4), (5.31)
where f and F are defined by eqs. (2.32) and (2.33) and f2 and F2 are defined by
eqs. (2.64) and (2.65), respectively. The cumulative distribution function F2 of the
74 Chapter 5  Computation of Target Paths Delay Fault Probability
standard bivariate normal distribution can be efficiently approximated using a fast
double precision algorithm [Genz04].
In one particular special case where r2,4 = 0, the computation can be simplified to
Cov(U,V) = a1,3   a1,4F(a4)  a2,3F(a2). (5.32)
Another noticeable special case is when either U or V has a normal distribution, in
which case the computation reduces to the special case considered by eq. (5.21).
5.5 Conclusion
Large delay variations severely degrade the quality and reliability of small delay fault
tests, which can result in a large number of test escapes. In order to find a suitable
compromise between the delay test quality and the test cost, recent variation aware
delay test generation methods are guided by the probability that at least one target
path has a path delay fault. This probability can change significantly even by small
modifications of the delay test parameters or the set of test vector-pairs.
This chapter presented two efficient algorithms for the computation of the target paths
delay fault probability. The non-incremental algorithm provides high accuracy but may
become inefficient if the delay test parameters are frequently modified. To address
this shortcoming, an efficient incremental algorithm has been presented to minimize
the computational cost after delay test parameter modifications. This algorithm is
well suited to evaluate the impact of small changes to the test subset and delay test
parameters on the target paths delay fault probability. The high efficiency and small
approximation error of this algorithm makes it suitable for the inner loop of automatic
test pattern generation methods.
In section 7.4, both algorithms have been compared to the computation of the delay
fault detection probability (definition 7.1) using extensive Monte-Carlo simulations.
The results show a very large speedup with only a small loss of accuracy, mainly caused
by path delay fault test invalidation.
The main limitation of the path based approach taken in this chapter is that the number
of critical target paths might get extremely large. While a Monte-Carlo simulation can
be used in those easily identifiable cases, a block based approach may be much more
efficient. The fundamental operations of a block based approach are the focus of the
next chapter.
C
h
a
p
t
e
r6
SUM and MAX-Operations based on
Skew-Normal Distribution
This chapter introduces the skew-normal distribution based SUM and MAX-operation.
The new MAX-operation serves as a plug-in replacement for the normal distribution
based MAX-operation and significantly reduces the approximation error.
This chapter consists of six parts. The skew-normal distribution is introduced in
section 6.1. A detailed description of the SUM operation and its role in statistical
timing analysis is given in section 6.2. Section 6.3 presents the basic algorithm for the
skew-normal distribution based MAX-operation. The following two sections 6.4 and 6.5
introduce optimizations which minimize the runtime complexity of this algorithm
without sacrificing the accuracy. The last section 6.6 explains the approximation of
the distribution of max(X1, . . . ,Xn) using the skew-normal distribution based MAX-
operation. The experimental results are presented in section 7.5.
6.1 The Skew-Normal Distribution
The skew-normal distribution is a natural extension of the normal distribution. The
definition of the skew-normal distribution in the original parameterization is presented
in section 6.1.1. To simplify the presentation of the statistical SUM and MAX-operations,
an alternative parameterization is adopted in section 6.1.2. It is subsequently shown in
section 6.1.3 that the two definitions are indeed equivalent. For compactness of notation
let b =
p
2/p.
76 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
6.1.1 Definition with Azzalini-Parametrization
This section presents the definitions and a few key properties of the univariate and the
multivariate skew-normal distribution as defined in [Azzal13, chapter 2]. If a random
variable X has probability density function
f1(x) =
2
w
f

x  x
w

F

a
x  x
w

( ¥ < x < ¥) (6.1)
where f and F are defined by eqs. (2.32) and (2.33), then X is a skew-normal random
variable with location parameter x, scale parameter w and shape parameter a. The
moment generating function of the univariate skew-normal distribution is
M(t) = 2 exp

xt+
1
2
w2t2

F(dwt) (6.2)
Differentiating M(t) yields the formulas for the moments of a random variable X,
which are
E(X1) = x + bwd
E(X2) = x2 + 2bxwd+w2
E(X3) = x3 + 3bx2wd+ 3xw2 + 3bw3d  bw3d3
with d = a/
p
1+ a2.
A n-dimensional random vector Z is said to have a multivariate skew-normal distribu-
tion if it is continuous with density function
2fn(z; 0, W¯)F

aTz

, (6.3)
where fn(z; 0, W¯) is the n-dimensional normal probability density function with zero
mean vector and correlation matrix W¯, F() is given by eq. (2.33), and a 2 Rn is a
n-dimensional vector [Azzal13, chapter 5]. For simplicity, W¯ is assumed to be of full
rank.
In eq. (6.3), the location and scale parameters were omitted. To introduce them, let
X := x +wZ, (6.4)
where
x = (x1, . . . , xk)
T (6.5)
is the location parameter vector and
w = diag(w1, . . . ,wk) (6.6)
is the scale parameter matrix, which is a square diagonal matrix with the positive
elements w1, . . . ,wk on the main diagonal. Then the n-dimensional random vector X
has the probability density function
fn(x) = 2fn(x; x,W)F

aTw 1(x  x)

(6.7)
where W = wW¯w is a covariance matrix.
6.1  The Skew-Normal Distribution 77
Let t 2 Rn, then the moment generating function for the multivariate skew-normal
distribution is
M(t) = 2 exp

tTx +
1
2
tTWt

F

dTwt

(6.8)
Differentiating M(t) with respect to ti, tj and tk with 1  i, j, k  n and setting t = 0
yields
E(Xi) = xi + bdiwi (6.9)
E(XiXj) = bdix jwi +wi,j + xi(x j + bdjwj) (6.10)
E(X2i Xj) = 2xi(bdix jwi +wi,j) + x
2
i (x j + bdjwj)
+w2i (x j + bdjwj) + bdiwi(2wi,j   didjwiwj) (6.11)
E(XiX
2
j ) = 2x jwi,j + xi(x
2
j + 2bdjx jwj +w
2
j )
+ b(2djwi,jwj + diwi(x
2
j   d2jw2j +w2j )) (6.12)
E(XiXjXk) = xkwi,j + x jwi,k + xi(bdjxkwj +wj,k + x j(xk + bdkwk))
+ b(djwi,kwj + dkwi,jwk + diwi(x jxk +wj,k   djwjdkwk)) (6.13)
All first and second order moment can be written as follows [Azzal13, eq. (5.31-32)]
E(X) = x + bwd (6.14)
Var(X) = W  b2wddTw (6.15)
6.1.2 Alternative Parametrization Adopted in this Work
To simplify the presentation of the statistical SUM and MAX-operations, an alternative
parameterization is introduced, which will be used throughout this chapter.
Univariate Skew-Normal Distribution
Definition 6.1. A random variable X is called skew-normal random variable with mean
m, variance s2 and shape parameter l if
jlj <
q
2/(p   2)s (6.16)
and the probability density function of X is
f1(x) =
2p
l2 + s2
f
 
x+ l  mp
l2 + s2
!
F
0@ l(x+ l  m)
b(l2 + s2)
q
1  l2/(b2(l2 + s2))
1A, (6.17)
where b =
p
2/p. The distribution of X is then denoted by SN (m, s2,l).
The probability density function of the univariate skew-normal distribution for different
choices of parameters is shown in fig. 6.1.
78 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
qFigure 6.1 — Probability density function of univariate skew-normal distribution
The moment generating function M(t) = E(exp(tX)) of the skew-normal random
variable X  SN (m, s2,l) is
M(t) = 2 exp

(m  l)t+ 1
2
(s2 + l2)t2

F(lt/b). (6.18)
Differentiating M(t) with respect to t and setting t = 0 yields the formulas for the first
three moments of a random variable X, which are
E(X1) = m (6.19)
E(X2) = m2 + s2 (6.20)
E(X3) = m3 + 3ms2 +
1
2
(4  p)l3 (6.21)
Using equations eqs. (2.27), (2.28) and (2.30), the variance and the skewness of X is
Var(X) = s2 (6.22)
Sk(X) =
(4  p)l3
2s3
. (6.23)
If E(X), Var(X) and Sk(X) of X are known, then the above equations can be solved for
s and l, which is called the method of moments (moment matching) approach. The result
6.1  The Skew-Normal Distribution 79
is
m = E(X) (6.24)
s =
q
Var(X) (6.25)
l = sign(Sk(X)) 3
s
2jSk(X)j
4  p s, (6.26)
where
sign(x) =
8>><>>:
 1 if x < 0,
0 if x = 0,
1 if x > 0.
(6.27)
denotes the sign function (signum function).
Multivariate Skew-Normal Distribution
The multivariate skew-normal distribution extends the univariate skew-normal distri-
bution to more than one dimension.
Definition 6.2. A n-dimensional random vector X is called skew-normal random vector
with mean vector m, covariance matrix S and shape vector l, if
lTS 1l <
2
p   2 (6.28)
and the probability density function of X is
fn(x) = 2fn

x; m  l,S+ llT

F
 
lTS 1(x  m+ l)p
(1+ c) ((2/p) (1+ c)  c)
!
, (6.29)
where b =
p
2/p and c = lTS 1l. The distribution of X is then denoted by
SN n(m,S,l).
The 3D surface plot and contour plot of the probability density function of the bivariate
skew-normal distribution with mean vector m = (0, 0)T, covariance matrix
S =

1 0.5
0.5 1

(6.30)
and different shape vectors are shown in fig. 6.2. The reader can compare the shape of
the PDF to the shape of the PDF of a bivariate normal distribution in fig. 2.4, which has
the same mean vector and the same covariance matrix.
80 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
(a) shape vector l = (1.0, 1.2)T
(b) shape vector l = ( 1.0, 0.2)T
qFigure 6.2 — Probability density function of the bivariate skew-normal distribution
Let t 2 Rn, then the moment generating function M(t) = E(exp(tTX)) of the skew-
normal random vector X  SN n(m,S,l) is [Azzal13]
M(t) = 2 exp

tT(m  l) + 1
2
tT(S+ llT)t

F

lTt/b

. (6.31)
Differentiating M(t) with respect to ti, tj and tk with 1  i, j, k  n and setting t = 0
yields
E(Xi) = mi (6.32)
E(XiXj) = si,j + mimj (6.33)
E(X2i Xj) = 2misi,j + mj(m
2
i + s
2
i ) + (4  p)l2i lj/2 (6.34)
E(XiX
2
j ) = mi(m
2
j + s
2
j ) + 2mjsi,j + (4  p)lil2j /2 (6.35)
E(XiXjXk) = mi(mjmk + sj,k) + mjsi,k + mksi,j + (4  p)liljlk/2. (6.36)
6.1  The Skew-Normal Distribution 81
The reader should note the simplicity introduced by the alternative parameterization
by comparing the above equations to the equivalent eqs. (6.9) to (6.13).
The third order moments (eqs. (6.34) to (6.36)) can also be expressed using matrix
notation as
E(X 
 X 
 XT) = m
 S+ SV 
 mT + S
 m+ m
 m
 mT +

2  p
2

l
 lT 
 l,
(6.37)
where SV denotes the vector obtained by stacking the columns of the matrix S on top
of each other [Franc10]. Theorem 6.3 implies that (X  E(X))  SN n(0,S,l), so that
the third multivariate cumulant k3(X) of X (see definition 2.40) can be obtained from
eq. (6.37) with m := 0, which yields
k3(X) =

2  p
2

l
 lT 
 l. (6.38)
6.1.3 Equivalence of Parametrizations
This subsection presents the proof that the alternative parameterization of the skew-
normal distribution is equivalent to the classical definition by [Azzal13].
Univariate Skew-Normal Distribution
For any valid choice for the (m, s,l) parameterization used in this work, a feasible
(x,w, a) pair can be found using the equations
x = m  l (6.39)
w =
q
s2 + l2 (6.40)
a = d/
q
1  d2, (6.41)
where d = l/(bw). For this, it must be shown that
d2 =
l2
b2(s2 + l2)
< 1 (6.42)
which is satisfied if and only if
l2 <
b2s2
1  b2 . (6.43)
It can be seen that the above inequality is equivalent to eq. (6.16). Furthermore, for any
valid choice of (x,w, a), a feasible (m, s,l) pair can be found using the equations
m = x + l (6.44)
s =
q
w2   l2 (6.45)
l = bwd, (6.46)
where d = a/
p
1+ a2 [Azzal13, eq.(2.6)]. Hence, the parametrizations and the proba-
bility density functions eq. (6.1) and eq. (6.17) are indeed equivalent.
82 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
Multivariate Skew-Normal Distribution
In this section, it is shown that the (m,S,l) parametrization adopted in this work is
equivalent to the (x,W, a) parametrization of [Azzal13].
Specifically, it is shown that for any valid choice of (x,W, a), a feasible (m,S,l) pair
exits and can be computed as
d =
W¯ap
1+ aTW¯a
(6.47)
l = bwd, (6.48)
where eq. (6.47) is taken from [Azzal13, eq. (5.11)], W¯ = w 1Ww 1 is the correlation
matrix corresponding to W and w is defined by eq. (6.6). Clearly, aTW¯a > 0 because W¯
is a positive definite matrix according to theorem 2.4. Plugging eq. (6.48) into eqs. (6.14)
and (6.15) yields
m = x + l (6.49)
S = W  llT. (6.50)
It remains to be shown, that eq. (6.50) is a symmetric positive definite matrix. By using
theorem 2.4 and eq. (6.47), it is sufficient to show that
w 1(W  llT)w 1 = W¯  b2ddT (6.51)
= W¯  b2W¯a(1+ aTW¯a) 1aTW¯ (6.52)
is symmetric positive definite. This is done by using eq. (2.5) with A 1 = W¯ and
U = V = ba so that eq. (6.52) becomes (W¯ 1 + b2aaT) 1, which is positive definite
according to lemma 2.3 and theorem 2.4. It is also symmetric because the sum of
symmetric matrices is a symmetric matrix. Therefore, w 1(W  llT)w 1 is symmetric
positive definite and because w is a non-singular matrix, theorem 2.4 implies that
W  llT is also symmetric positive semidefinite.
Next, for any valid assignment to the parameters m, S and l, the corresponding values
of the (x,W, a) parametrization are
x = m  l (6.53)
W = S+ llT. (6.54)
The equation for a is derived from [Azzal13, eq. (5.12)]
a =
W¯ 1dp
1  dTW¯ 1d
(6.55)
by using eq. (6.48) to obtain the intermediate result
a =
wW 1wdq
1  dT(wW 1w)d
=
wW 1lb 1p
1  b 2lTW 1l
=
wW 1lp
b2   lTW 1l
. (6.56)
6.2  Statistical SUM-operation 83
The application of eq. (2.5) with U = V = l to eq. (6.54) gives
W 1 = S 1   1
1+ lTS 1l
S 1llTS 1 (6.57)
so that
W 1l =
S 1l(1+ lTS 1l)
1+ lTS 1l
  S
 1l(lTS 1l)
1+ lTS 1l
=
S 1l
1+ lTS 1l
(6.58)
must hold. Plugging the last equation into eq. (6.56) finally yields
a =
wS 1lq
b2(1+c) c
1+c (1+ c)
=
wS 1lq
(1+ c)(b2(1+ c)  c)
, (6.59)
where c := lTS 1l and w is defined by eq. (6.6). Hence, it remains to be shown that
(1+ c)(b2(1+ c)  c) > 0 (6.60)
for any valid choice of S and l. It can be seen that eq. (6.60) is a quadratic polynomial
with exactly two roots: one at c =  1 and the other at c = 2/(p   2). Furthermore,
the LHS is equal to b2 for c = 0, which confirms that the above inequality is true for
all  1 < c < 2/(p   2). It is also clear that c > 0 because S 1 is positive definite and
by definition c < 2/(p   2). Hence, the parametrizations and the probability density
functions eq. (6.7) and eq. (6.29) are indeed equivalent.
6.2 Statistical SUM-operation
This section describes and presents the algorithm for the skew-normal distribution
based SUM-operation. The operation is described for the general case that k sums are
computed given n random variables X1, . . . ,Xn, e.g. to compute the propagation delays
of k paths given the propagation delays of n gate. The problem of computing the arrival
time of the gate output transition given the arrival time of a gate input transition and
the propagation delay of the gate is considered as the special case k = 1 and n = 2.
Let X = (X1, . . . ,Xn)
T denote a n-dimensional skew-normal random vector. Let
B 2 f0, 1gkn be a binary matrix of rank k and let bi,j 2 f0, 1g denote the element
in the ith row and jth column of B with 1  i  k and 1  j  n. Then the
skew-normal distribution based SUM-operation computes a k-dimensional random
vector Y = (Y1, . . . ,Yk)
T, defined as
Yi =
n
å
j=1
bi,jXj for all 1  i  k,
where bi,j = 1 if and only if Xj is part of the ith sum Yi, otherwise bi,j = 0.
84 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
Because the family of multivariate skew-normal distributions is closed under affine
transformations, the skew-normal distribution based SUM-operation can be efficiently
computed by Y = BX using the following theorem.
Theorem 6.3. If X  SN n(m,S,l), b 2 Rk and A 2 Rkn has rank k, then the random
vector
Y = AX + b (6.61)
has a k-dimensional skew-normal distribution SN k(mY ,SY ,lY) with parameters
mY = Am+ b (6.62)
SY = ASA
T (6.63)
lY = Al. (6.64)
The proof is presented in section A.2.1. The reader should note the similarities to
the affine transformation of a random vector with multivariate normal distribution in
theorem 2.47.
The application of the skew-normal distribution based SUM-operation is illustrated
by two examples. The first example is a NAND gate with a falling transition at each
input. Let X1 and X2 denote the transition arrival time at input A and B, respectively.
Furthermore, let X3 and X4 denote the propagation delay for the transition at input A
and B, respectively. To compute the sums Y1 = X1 + X3 and Y2 = X2 + X4, the matrix
B is defined as
B =

1 0 1 0
0 1 0 1

. (6.65)
Another example is the computation of the propagation delay of a path given the
propagation delays of n gates. Let
A = (1, . . . , 1) 2 R1n (6.66)
denote a n-dimensional row vector and let b = 0 be the zero vector. Let si,j denote the
element in the ith row and jth column of S, then theorem 6.3 implies that the path
delay is a skew-normal random variable Y  SN (mY, s2Y,lY) with parameters
mY =
n
å
i=1
mi (6.67)
s2Y =
n
å
i=1
n
å
j=1
si,j (6.68)
lY =
n
å
i=1
li. (6.69)
6.3 Statistical MAX-Operation
This sections describes and presents a basic algorithm for the skew-normal distribution
based MAX-operation. The application of the MAX-operation is illustrated with an
6.3  Statistical MAX-Operation 85
example in fig. 6.3, which shows two gates in a logic level. A NAND gate with two
rising transitions at the gate inputs and an inverter with a falling transition at the gate
input. The arrival times of the input transitions are denoted by X1, X2 and X3. The
corresponding propagation delays of the gates are denoted by D1, D2 and D3. For
the inverter, the arrival time of the output transition is simply the sum of the arrival
time of the input transition and the corresponding propagation delay of the gate. For
the NAND gate, the arrival time of the last output transition is computed by adding
the corresponding propagation delays of the gate to the arrival times of the input
transitions followed by the computation of the statistical maximum of both results.
For simplicity, let Ui := Xi + Di for all 1  i  3. To compute the arrival time of the
last transitions at the outputs of the logic level, the statistical maximum operation must
compute the distribution of max(U1,U2), but also the relationship (covariance) between
max(U1,U2) and U3.
qFigure 6.3 — Example for the application of the MAX-operation in a logic level
To minimize the approximation error, the MAX-operation must be defined on a flexible,
general family of probability distributions which allows the accurate approximation
of max(X1,X2) with a member of the same family of distributions. Compared to the
normal distribution based MAX-operation in section 3.3.2, the maximum max(X1,X2)
can be much more accurately approximated by a skew-normal distribution, as shown
in fig. 6.4.
More generally, the skew-normal distribution based MAX-operation is described as
follows.
Let (X1, . . . ,Xn)
T  SN n(m,S,l) denote a n-dimensional skew-normal random
vector. The approximation of the (n  1)-dimensional distribution of the random
vector
Y = (Y1, . . . ,Yn 1)
T := (X1, . . . ,Xn 2,max(Xn 1,Xn))
T (6.70)
by a skew-normal distribution SN n 1(mˆ, Sˆ, lˆ) with mean vector mˆ, covariance
matrix Sˆ and shape vector lˆ is called skew-normal distribution based MAX-operation.
86 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
PDF of
PDF of approximation
by normal distribution
PDF of approximation by
skew-normal distribution
-1 0 1 2 3 4
0.0
0.1
0.2
0.3
0.4
0.5
qFigure 6.4 — Probability density functions of max(X1,X2) and its approximation by
a normal distribution and a skew-normal distribution
The computation of the parameters mˆ, Sˆ and lˆ is explained in the following three
subsections. At first, the mean vector and the covariance matrix of Y are computed.
Afterwards, section 6.3.2 introduces the third multivariate cumulant Y , which will be
used to estimate the shape vector lˆ in section 6.3.3.
6.3.1 Computation of Mean Vector mˆ and Covariance Matrix Sˆ
Let Y be a (n  1)-dimensional random vector as defined in eq. (6.70). The first two
parameters of the skew-normal distribution SN n 1(mˆ, Sˆ, lˆ) are approximated by the
mean vector and the covariance matrix of Y . Explicit formulas for the computation of
all required moments can be found in section 6.1.2 and in appendix A.1.
According to definition 2.36, the mean vector of Y is
mˆ = (E(Y1),E(Y2), . . . ,E(Yn 1))
T (6.71)
and the covariance matrix of Y is
Sˆ =
0BBBB@
E(Y¯21 ) E(Y¯1Y¯2)    E(Y¯1Y¯n 1)
E(Y¯2Y¯1) E(Y¯
2
2 )    E(Y¯2Y¯n 1)
...
...
. . .
...
E(Y¯n 1Y¯1) E(Y¯n 1Y¯2)    E(Y¯2n 1)
1CCCCA , (6.72)
6.3  Statistical MAX-Operation 87
where Y¯1, . . . , Y¯n 1 denote the components of the centred random vector, defined as
Y¯ = Y   mˆ = (Y1  E(Y1), . . . ,Yn 1  E(Yn 1))T. (6.73)
For example, suppose n = 4 then the 3-dimensional random vector Y has mean vector
mˆ = E(Y) = (E(Y1),E(Y2),E(Y3))
T
and covariance matrix
Sˆ = Var(Y) =
0B@E(Y¯
2
1 ) E(Y¯1Y¯2) E(Y¯1Y¯3)
E(Y¯1Y¯2) E(Y¯
2
2 ) E(Y¯2Y¯3)
E(Y¯1Y¯3) E(Y¯2Y¯3) E(Y¯
2
3 )
1CA .
Only those moments which depend on
Yn 1 = max(Xn 1,Xn)
like E(Y¯n 1) or E(Y¯1Y¯n 1) have to be computed. All other moments remain unchanged
and can be copied from m and S. More precisely, mˆ is obtained by deleting the last and
replacing the second last component of m. Likewise, Sˆ is obtained by deleting the last
and replacing the second last row and column of S. The in-place computation of the
mean vector mˆ and the covariance matrix Sˆ is done in O(n) time.
6.3.2 Computation and Properties of Third Multivariate Cumulant
The skewness of a single random variable X was introduced by eq. (2.30) and it depends
on the third central moment E((X  mX)3). A multivariate analogue of the third central
moment of a random variable is the third multivariate cumulant k3(Y) 2 R(n 1)
2(n 1) of
a random vector Y , which is defined by definition 2.40 and can be written as
k3(Y) = E(Y¯ 
 Y¯ 
 Y¯T), (6.74)
where 
 denotes the Kronecker product. The reader should note that k3(Y) contains all
third order central moments E(Y¯iY¯jY¯k) with 1  i, j, k  n  1 conveniently arranged
[Mori94, Franc10, Luca15]. For example, suppose n = 4 then the third multivariate
cumulant of the 3-dimensional random vector Y is
k3(Y) =
0BBBBBBBBBBBBBB@
E(Y¯31 ) E(Y¯
2
1 Y¯2) E(Y¯
2
1 Y¯3)
E(Y¯21 Y¯2) E(Y¯1Y¯
2
2 ) E(Y¯1Y¯2Y¯3)
E(Y¯21 Y¯3) E(Y¯1Y¯2Y¯3) E(Y¯1Y¯
2
3 )
E(Y¯21 Y¯2) E(Y¯1Y¯
2
2 ) E(Y¯1Y¯2Y¯3)
E(Y¯1Y¯
2
2 ) E(Y¯
3
2 ) E(Y¯
2
2 Y¯3)
E(Y¯1Y¯2Y¯3) E(Y¯
2
2 Y¯3) E(Y¯2Y¯
2
3 )
E(Y¯21 Y¯3) E(Y¯1Y¯2Y¯3) E(Y¯1Y¯
2
3 )
E(Y¯1Y¯2Y¯3) E(Y¯
2
2 Y¯3) E(Y¯2Y¯
2
3 )
E(Y¯1Y¯
2
3 ) E(Y¯2Y¯
2
3 ) E(Y¯
3
3 )
1CCCCCCCCCCCCCCA
.
For any non-singular matrix A 2 R(n 1)(n 1), theorem 2.37 has already established
the relationship between the mean vectors and between the covariance matrices of the
88 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
random vectors Y and AY . A similar relationship holds between the third multivariate
cumulant of Y and the third multivariate cumulant of AY [Kollo05, p.190], as shown
by the following theorem.
Theorem 6.4. If X is a n-dimensional random vector with finite third multivariate cumulant
k3(X) and A 2 Rkn has rank k, then the third multivariate cumulant k3(Y) 2 Rk
2k of the
random vector Y = AX is
k3(Y) = (A
 A)k3(X)AT, (6.75)
where 
 denotes the Kronecker product.
This property is particularly useful to simplify the following computations by consider-
ing only the standardized version Z of the random vector Y . In eq. (2.30), the random
variable X is being standardized to zero mean and unit variance by substracting its
mean and dividing by its standard deviation. Similarly, the components of the stan-
dardized random vector Z are standardized to means of zero, standard deviations of
one and all pairwise covariances are zero. The standardized random vector Z can be
computed from the random vector Y by the linear transformation
Z = Lˆ 1(Y   mˆ) = Lˆ 1Y¯ , (6.76)
where Lˆ 1 denotes the inverse of the lower Cholesky factor of the covariance matrix Sˆ
of Y , such that LˆLˆT = Sˆ. The covariance matrix Sˆ of Y is non-singular by assumption.
To show this, theorem 2.37 is applied to compute the mean vector and the covariance
matrix of Z, which gives
E(Z) = Lˆ 10 = 0 (6.77)
Var(Z) = Lˆ 1SˆLˆ T = Lˆ 1 LˆLˆT Lˆ T = In 1. (6.78)
The efficient computation of Lˆ 1 is explained in section 6.4.
By definition, the third multivariate cumulant of a random vector is computed from
the corresponding centred random vector, as defined by eq. (6.73). Therefore, Y and
Y¯ share the same third multivariate cumulant so that theorem 6.4 can be applied to
compute the third multivariate cumulant of the standardized random vector Z = Lˆ 1Y¯ ,
which yields
k3(Z) = (Lˆ
 1 
 Lˆ 1)k3(Y)Lˆ T. (6.79)
One of the interesting properties of this matrix is that the skewness Sk(Y1) of the
first component Y1 of Y is in the top left corner of k3(Z). This is because theorem 2.6
implies that the standard deviation sY1 of Y1 is in the top-left corner of Lˆ. Therefore,
the top-left corner of Lˆ 1 is 1/sY1 (because LˆLˆ
 1 = In) and the above statement follows
from eqs. (2.30) and (6.79).
In the following, k3(Z) is called the third standardized multivariate cumulant of Y . The
third standardized multivariate cumulant of the random vector Y is of central impor-
tance for the estimation of the shape vector lˆ in the next section.
6.3  Statistical MAX-Operation 89
6.3.3 Estimation of Shape Vector lˆ
For a given skew-normal random variable X with known standard deviation sX and
skewness Sk(X). The shape parameter lˆ can be computed using eq. (6.26). This idea
can be generalized to compute the shape vector of a skew-normal random vector, as
shown by the following theorem.
Theorem 6.5. Let X  SN n(m,S,l) denote a n-dimensional skew-normal random vector. If
k3(V) is the third standardized multivariate cumulant of X, then the matrix
MX = k3(V)
Tk3(V) (6.80)
has an eigenvector
v =  L
 1l
jjL 1ljj (6.81)
corresponding to the only non-zero eigenvalue
y =

2  p
2
2
(lTS 1l)3 =

2  p
2
2 jjL 1ljj6, (6.82)
where L denotes the lower Cholesky factor of S such that S = LLT.
The  in eq. (6.81) is due to the fact that the sign of an eigenvector of a matrix is not
unique, because if MXv = yv holds then so does MX( v) = y( v). Hence, the above
theorem can be used to compute the shape vector of X, except for its sign.
Although the random vector Y doesn’t have a multivariate skew-normal distribution,
the above theorem can still be applied to approximate the distribution of Y with a multi-
variate skew-normal distribution. Let k3(Z) denote the third standardized multivariate
cumulant of the random vector Y , then the matrix
MY = k3(Z)
Tk3(Z) (6.83)
is called the skewness matrix of Y [Loper13, p.3]. Let v denote the dominant eigenvector
corresponding to the dominant eigenvalue y of the skewness matrix MY . From eq. (6.81)
it follows that
lˆ = jjLˆ 1lˆjjLˆv. (6.84)
Using eq. (6.82) and Sˆ 1 = Lˆ T Lˆ 1, the euclidean norm of the vector Lˆ 1lˆ is
jjLˆ 1lˆjj =
q
(Lˆ 1lˆ)T(Lˆ 1lˆ) =
q
lˆTSˆ 1lˆ = 6
s
4y
(p   4)2 . (6.85)
This norm is well defined because eq. (6.83) implies that MY is positive semidefinite so
that y  0 according to lemma 2.2. However, any valid shape vector estimate lˆ must
satisfy inequality eq. (6.28), which might be violated if the distribution of Y differs
90 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
strongly from a multivariate skew-normal distribution. From eqs. (6.28) and (6.85) it
follows that
3
s
4y
(p   4)2 <
2
p   2 (6.86)
so that
y < 2
(p   4)2
(p   2)3  0.990566 (6.87)
must hold for eq. (6.85) to be valid. If eq. (6.87) is not satisfied, then y is set to 0.99.
The sign of lˆ = (lˆ1, . . . , lˆn 1)
T is determined by noticing that lˆn 1 represents the
shape parameter of max(Xn 1,Xn). If the skewness of max(Xn 1,Xn), which is given
by eqs. (2.30) and (A.11) to (A.13), is positive then lˆn 1 should be positive. Likewise, if
the skewness of max(Xn 1,Xn) is negative, then lˆn 1 should also be negative.
More precisely, let
x = sign(Sk(max(Xn 1,Xn))) (6.88)
y = sign

lˆTn 1v

, (6.89)
where lˆTn 1 denotes the last row of Lˆ and sign() denotes the signum function as defined
in eq. (6.27). Then the estimate of the shape vector is
lˆ =
(
 jjLˆ 1lˆjjLˆv if xy < 0
+jjLˆ 1lˆjjLˆv if xy  0, (6.90)
where the euclidean norm jjLˆ 1lˆjj is given by eq. (6.85).
6.3.4 A Numerical Example
Given a 4-dimensional skew-normal random vector X  SN 4(m,S,l) with
m =
0B@  0.10.45 0.2
0.31
1CA , S =
0B@ 0.479 0.528  0.494  0.4280.528 1.088  1.199  0.661 0.494  1.199 1.624 0.536
 0.428  0.661 0.536 0.969
1CA , l =
0B@0.1690.1150.023
0.172
1CA .
At first, the mean vector mˆ, the covariance matrix Sˆ and the third multivariate cumulant
k3(X) of the random vector
Y = (X1,X2,max(X3,X4))
T
are computed. The results are
mˆ 
0@  0.10.45
0.5885
1A , Sˆ 
0@ 0.479 0.528  0.45020.528 1.088  0.8436
 0.4502  0.8436 0.9722
1A
6.3  Statistical MAX-Operation 91
and
k3(Y) 
0BBBBBBBBBBB@
0.0021 0.0014 0.0028
0.0014 0.001 0.0115
0.0028 0.0115  0.006
0.0014 0.001 0.0115
0.001 0.0007 0.0866
0.0115 0.0866  0.0587
0.0028 0.0115  0.006
0.0115 0.0866  0.0587
 0.0060  0.0587 0.0768
1CCCCCCCCCCCA
.
Note that the top-left corner 2  2 submatrix of S also appears in Sˆ and that the
subvector ( 0.1, 0.45)T of m also appears in mˆ. The inverse of the lower Cholesky factor
of Sˆ is
Lˆ 1 
0@ 1.4449 0 0 1.5496 1.4058 0
0.3286 1.2316 1.7942
1A
so that the third standardized multivariate cumulant is
k3(Z) 
0BBBBBBBBBBB@
0.0062  0.0026 0.0155
 0.0026 0.0011 0.0287
0.0155 0.0287 0.0543
 0.0026 0.0011 0.0287
0.0011  0.0004 0.2299
0.0287 0.2299 0.2359
0.0155 0.0287 0.0543
0.0287 0.2299 0.2359
0.0543 0.2359 0.4882
1CCCCCCCCCCCA
and the skewness matrix is
MY = k3(Z)
Tk3(Z) 
0@0.00514 0.02689 0.041950.02689 0.16302 0.2267
0.04195 0.2267 0.41035
1A .
The eigenvalues of M are approximately 0.54948, 0.0285442 and 0.000476673. The
eigenvector corresponding to the dominant eigenvalue y  0.54948 is
v  (0.0911, 0.5086, 0.8562)T.
Using eq. (6.85), the euclidean norm of Lˆ 1lˆ is computed from y as jjLˆ 1lˆjj  1.19979.
The lower Cholesky factor Lˆ of Sˆ is
Lˆ 
0@ 0.6921 0 00.7629 0.7113 0
 0.6505  0.4883 0.5574
1A ,
so that the estimate of the shape vector of the skew-normal approximation of the
random vector Y is given by eq. (6.84) as
lˆ  (0.0757, 0.5174, 0.2035)T.
92 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
The skewness of max(Xn 1,Xn) is positive according to the element in the bottom right
corner of k3(Y). Therefore, the final estimate of the shape vector is
lˆ  (0.0757, 0.5174, 0.2035)T.
The approach described in this section has O(n4) worst-case runtime and space com-
plexity due to the computation of the Kronecker product in eq. (6.79), which produces
a (n  1)2  (n  1)2 matrix. The runtime complexity can be reduced to O(n2) without
sacrificing the accuracy, as will be shown in section 6.5. An efficient method to compute
and update the inverse of the lower Cholesky factor Lˆ 1 of Sˆ is presented in section 6.4.
6.4 Incremental Update of Inverse Cholesky Factor
The inverse Cholesky factor Lˆ 1 can be computed in an efficient and numerically stable
manner by first computing the Cholesky factorization of Sˆ and then inverting the
resulting Cholesky factor Lˆ. However, if the Cholesky factorization of the covariance
matrix S of the random vector X is available, it is more efficient to compute the inverse
Cholesky factor of Sˆ by updating the inverse Cholesky factor of S.
Given the Cholesky factorization
S = LLT, (6.91)
then by left and right multiplication with L 1 and L T, respectively, this becomes
L 1SL T = In, (6.92)
where In 2 Rnn denotes the identity matrix. Next, the matrices S and L 1 are
partitioned as
S =
 
A a
aT e
!
(6.93)
L 1 =
 
B 0
cT d
!
, (6.94)
where A, B 2 R(n 1)(n 1), a, c 2 Rn 1 and e, d 2 R. Then eq. (6.92) can be written as 
B 0
cT d
! 
A a
aT e
! 
BT c
0T d
!
=
 
BA Ba
cTA+ daT cTa+ de
! 
BT c
0T d
!
= In, (6.95)
so that  
BABT BAc+ dBa
cTABT + daTBT cTAc+ 2daTc+ d2e
!
= In, (6.96)
which implies
BABT = In 1. (6.97)
The matrix A is a principal submatrix of the covariance matrix S, which is symmetric
positive definite by definition. Clearly, A is symmetric and corollary 2.5 implies that
6.4  Incremental Update of Inverse Cholesky Factor 93
A is also positive definite. Furthermore, B and therefore also B 1 is lower triangular.
Then from eq. (6.92) and theorem 2.6 it follows that
A = B 1B T (6.98)
is the Cholesky factorization of A with lower triangular inverse Cholesky factor B.
Since by definition, S without the last row and column is equal to A, it follows that the
corresponding inverse Cholesky factor B can be efficiently computed by removing the
last row and column from L 1.
From section 6.3.1 it is known that Sˆ is obtained from S by removing the last and
replacing the second last row and column S. To efficiently compute Lˆ 1 by exploiting
the knowledge of L 1, the idea is to remove the last two rows and columns of S and
then add the last row and column of Sˆ to the resulting matrix to obtain Sˆ. According to
the above explanations, the removal of the last two rows and columns of S implies the
removal of the last two rows and columns of L 1. Afterwards, a new row and column
must be added to the resulting lower triangular matrix to finally obtain Lˆ 1, which will
be explained in the following.
The incremental update of the inverse Cholesky factor after adding a new row and
column to the covariance matrix, such that the resulting matrix is symmetric positive
definite, can again be derived from eq. (6.96), which implies that
BAc+ dBa = 0 (6.99)
cTAc+ 2daTc+ d2e = 1. (6.100)
By replacing A using eq. (6.98), eq. (6.99) becomes
c =  dBTu, (6.101)
where
u = Ba. (6.102)
The scalar d can be computed from eq. (6.100), which can be simplified by replacing A
and c using eqs. (6.98) and (6.101) as
1 = cTAc+ 2daTc+ d2e (6.103)
= d2uTu  2d2aTBTu+ d2e (6.104)
= d2(e  uTu). (6.105)
From theorem 2.6 it follows that d > 0, so that the last equation implies e  uTu > 0
and
d =
1p
e  uTu
. (6.106)
Therefore, the last row of Lˆ 1 can be efficiently computed with eqs. (6.101), (6.102)
and (6.106) in O(n2).
94 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
6.5 Quadratic Time Algorithm for MAX-operation
This section shows how the skew-normal distribution based MAX-operation can be
computed in O(n2) time without sacrificing the accuracy of the results. The main idea
is explained for a special case in section 6.5.1. The following section 6.5.2 introduces a
transformation, which makes this idea applicable in the general case. The optimized
algorithm is finally presented in section 6.5.3. All proofs are given in section A.2.3.
6.5.1 Fast Algorithm based on Restricted Skew-Normal Distribution
This subsection shows that the shape vector lˆ for the proposed MAX-operation can be
efficiently computed if X has a ’restricted’ variant of the skew-normal distribution.
Definition 6.6. Let X denote a n-dimensional skew-normal random vector and let
SN n(m,S,l) denote the distribution of X. If the conditions
Sn 1,i = Sn,i (6.107)
li = 0 (6.108)
are satisfied for all 1  i  n   4, then X is said to have a restricted skew-normal
distribution.
In the following, eq. (6.107) is called covariance condition and eq. (6.108) is called skewness
condition. If the random vector X has a restricted skew-normal distribution then the
computation of the statistical MAX-operation simplifies because many of the joint
central moments E(Y¯iY¯jY¯k) are zero, as shown by the following lemma.
Lemma 6.7. Let X = (X1, . . . ,Xn)
T denote a n-dimensional random vector and let the random
vector Y = (Y1, . . . ,Yn 1)
T be defined as
Y = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T. (6.109)
If X has a restricted skew-normal distribution, then
E(Y¯iY¯jY¯k) = 0 (6.110)
for all i, j, k 2 N with 1  i  n   4 and 1  j, k  n  1, where Y¯ = (Y¯1, . . . , Y¯n 1)T
denotes the centred random vector corresponding to Y .
In other words, if the conditions of the previous lemma are satisfied, then all elements
in the third multivariate cumulant k3(Y) of the random vector Y , except for those
corresponding to the 33 = 27 joint central moments
E(Y¯iY¯jY¯k) with n  3  i, j, k  n  1, (6.111)
are zero. This leads to the main theorem of this chapter.
6.5  Quadratic Time Algorithm for MAX-operation 95
Theorem 6.8. Let X = (X1, . . . ,Xn)
T denote a n-dimensional random vector and let the
random vector Y be defined as Y = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T. Furthermore, let Sˆ
denote the covariance matrix of Y with Cholesky factorization Sˆ = LˆLˆT and let k3(V) denote
the third multivariate cumulant of the random vector
V = (Xn 3,Xn 2,max(Xn 1,Xn))
T. (6.112)
If X has a restricted skew-normal distribution, then the skewness matrix of Y is of the form
MY =

0n 4,n 4 0n 4,3
03,n 4 G

, (6.113)
where 0k,l is the (k l) zero matrix,
G = Lˆ 122 k3(V)
T(Lˆ T22 
 Lˆ T22 )(Lˆ 122 
 Lˆ 122 )k3(V)Lˆ T22 (6.114)
and Lˆ T22 is the bottom-right corner 3 3 submatrix of Lˆ 1.
Clearly, G 2 R33 is symmetric because it can be written as
G =

(Lˆ 122 
 Lˆ 122 )k3(V)Lˆ T22
T
(Lˆ 122 
 Lˆ 122 )k3(V)Lˆ T22 . (6.115)
Let v 2 Rn 1 denote the dominant eigenvector of the matrix MY , such that
MYv = yv (6.116)
for some non-zero eigenvalue y. Then eq. (6.113) implies that v is of the form
v = (0, v¯)T and the subvector v¯ = (v1, v2, v3)
T is the dominant eigenvector of G
corresponding to the same eigenvalue y, so that
Gv¯ = yv¯. (6.117)
Because Lˆ 1 is lower triangular, eq. (6.81) implies that the shape vector lˆ is also of the
form lˆ = (0, l¯)T, with subvector l¯ 2 R3. Indeed, theorem 6.5 can be applied to the
eigendecomposition of G to approximate the subvector l¯ by replacing Lˆ and Lˆ 1 with
Lˆ22 and Lˆ
 1
22 , respectively, where Lˆ22 is the inverse of Lˆ
 1
22 . This approach will be further
optimized as follows. By replacing v using eq. (6.81) and multiplication with Lˆ22, the
last equation becomes
Lˆ22G
Lˆ 122 l¯
jjLˆ 122 l¯jj
= yLˆ22
Lˆ 122 l¯
jjLˆ 122 l¯jj
= y
l¯
jjLˆ 122 l¯jj
. (6.118)
Let Sˆ 122 denote the bottom-right corner (3 3) submatrix of the inverse covariance
matrix Sˆ 1. Then because Lˆ T is upper triangular,
Lˆ T22 Lˆ
 1
22 = Sˆ
 1
22 (6.119)
96 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
must hold. Using eqs. (2.10) and (6.114), eq. (6.118) becomes
k3(V)
T(Sˆ 122 
 Sˆ T22 )k3(V)Sˆ 122

l¯ = yl¯, (6.120)
which implies that the subvector l¯ is a scalar multiple of an eigenvector of the matrix
H := Lˆ22GLˆ
 1
22 = k3(V)
T(Sˆ 122 
 Sˆ T22 )k3(V)Sˆ 122 (6.121)
corresponding to the eigenvalue y. Because H is a similarity transformation of G, G
and H have the same eigenvalues and y is the dominant eigenvalue of G and H that
must also satisfy eq. (6.87).
If l¯/jjl¯jj is the normalized eigenvector of H associated with the dominant eigenvalue
y, then by defining
x =
l¯T
jjl¯jj Sˆ
 1
22
l¯
jjl¯jj (6.122)
it follows from eq. (6.82) that
y =

2  p
2
2 jjl¯jj6x3 (6.123)
and finally
jjl¯jj = 6
s
4y
x3(p   4)2 . (6.124)
The correct sign of l¯ can again be determined from the sign of the skewness of
max(Xn 1,Xn), as described in section 6.3.3.
From the above it is apparent that the runtime of the algorithm in section 6.3 can be
reduced to O(n2) if X has a restricted skew-normal distribution, because not only
the matrix H but also its eigendecomposition can be computed very efficiently. In
particular, a very fast non-iterative algorithm can be used to efficiently compute the
eigendecomposition of the matrix H 2 R33.
6.5.2 Transformation to Restricted Skew-Normal Distribution
This subsection shows, that the runtime of the algorithm for the proposed skew-normal
distribution based MAX-operation can be reduced to O(n2) in the general case, where
the random vector X has an arbitrary skew-normal distribution. The main idea is to
transform the random vector X to a random vector W with restricted skew-normal
distribution using an invertible linear transformation that can be computed very
efficiently. Afterwards, the fast algorithm for the restricted skew-normal distribution is
applied to the random vector W . The shape vector lˆ is finally obtained by applying
the inverse linear transformation to the shape vector of the skew-normal distribution
approximation of W . In the following, X is assumed to be an n-dimensional skew-
normal random vector with n > 4.
6.5  Quadratic Time Algorithm for MAX-operation 97
To satisfy the covariance condition eq. (6.107), the idea is to define a linear transfor-
mation which maps the n-dimensional random vector X to a n-dimensional random
vector W such that Wi = Xi + qiXn 2 for all 1  i  n  4 and Wi = Xi otherwise.
Then it follows from eq. (6.63) that the covariances betweenWi and the last two random
variables Wn 1 and Wn are
Cov(Wn 1,Wi) = Cov(Xn 1,Xi + qiXn 2) = Cov(Xn 1,Xi) + qiCov(Xn 1,Xn 2)
Cov(Wn 0,Wi) = Cov(Xn 0,Xi + qiXn 2) = Cov(Xn 0,Xi) + qiCov(Xn 0,Xn 2).
Let cj = Cov(Xn 1,Xj) Cov(Xn,Xj) for all 1  j  n. It follows that if cn 2 6= 0 and
qi =  ci/cn 2 then Cov(Wn 1,Wi) = Cov(Wn,Wi) for all 1  i  n  4.
A similar transformation can be applied to satisfy the skewness condition eq. (6.108).
Suppose a linear transformation maps the n-dimensional random vector X to an n-
dimensional random vector W such that Wi = Xi + riXn 3 for all 1  i  n  4 and
Wi = Xi otherwise. Let lX and lW denote the shape vector of X and W , respectively.
Then eq. (6.64) implies that the shape parameter lW ,i with 1  i  n  4 of the random
variable Wi is
lW ,i = lX,i + rilX,n 3. (6.125)
It follows that if lX,n 3 6= 0 and ri =  lX,i/lX,n 3, then lW ,i = 0 for all 1  i  n  4.
Both transformations can be combined into a single linear transformation, which leads
to the following lemma.
Lemma 6.9. Let X  SN n(m,S,l) be a n-dimensional skew-normal random vector and
let cj := Cov(Xn 1,Xj)   Cov(Xn,Xj) for all 1  j  n. If cn 3ln 2   cn 2ln 3 6= 0,
then there exists an invertible linear transformation, which maps X = (X1, . . . ,Xn)
T to a
n-dimensional random vector W = (W1, . . . ,Wn)
T with
Wi =
(
Xi + qiXn 2 + riXn 3 for 1  i  n  4
Xi for n  3  i  n,
(6.126)
such that W has a restricted skew-normal distribution.
In the special case cn 3ln 2   cn 2ln 3 = 0, the transformation in lemma 6.9 can
be modified e.g. by using a suitable permutation of variables, as described by the
following remark.
Remark 6.10. Let X denote a n-dimensional skew-normal random vector, which doesn’t
satisfy the covariance condition eq. (6.107) and the skewness condition eq. (6.108) of a
restricted skew-normal distribution and cn 3ln 2   cn 2ln 3 = 0, where
ci := Cov(Xn 1,Xi) Cov(Xn,Xi) (6.127)
for 1  i  n. If there exists an 1  i, j  n  2 such that cjli   cilj 6= 0, then Xn 2 and
Xn 3 only need to be exchanged by the variables Xi and Xj. If there is no such pair,
then cjli   cilj = 0 must hold for all 1  i, j  n  2 and one of the following three
cases must be satisfied.
98 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
(i) If 9i 2 f1, . . . , n  2g such that ci 6= 0 and li = 0, then lj = 0 8j 2 f1, . . . , n  2g
(ii) If 9i 2 f1, . . . , n  2g such that ci = 0 and li 6= 0, then cj = 0 8j 2 f1, . . . , n  2g.
(iii) If ci 6= 0 and li 6= 0 8i 2 f1, . . . , n  2g, then cili =
cj
lj
for all i, j 2 f1, . . . , n  2g.
In each case, a single variable Xi with ci 6= 0 or li 6= 0 can be exchanged with Xn 2 or
Xn 3 and the transformations detailed at the beginning of this subsection can be used.
6.5.3 Description of Quadratic Time Algorithm
This subsection combines the results of the previous subsections to create a quadratic
time algorithm for the computation of the shape vector lˆ. The mean vector mˆ and
covariance matrix estimate Sˆ are computed as described in section 6.3.1 in linear time.
It is assumed that X doesn’t satisfy the conditions of the restricted skew-normal
distribution. Therefore, the linear transformation described in section 6.5.2 must be
applied to obtain the random vector W of restricted skew-normal distribution. Let
A 2 Rnn denote a full-rank matrix such that
W = AX (6.128)
realizes the linear transformation defined by eq. (6.126) in lemma 6.9. Furthermore, let
A n, n denote the matrix obtained by removing the last row and column of matrix A.
By replacing X with W in theorem 6.8, this theorem can be used to estimate the shape
vector of the random vector
A n, nY = (W1, . . . ,Wn 2,max(Wn 1,Wn))
T. (6.129)
By definition, the transformation described by lemma 6.9 does not change the last four
variables of X, so that the random vector V is not affected by the transformation. Then
theorem 6.8 can be used to estimate the shape vector of the random vector A n, nY by
computing the eigendecomposition of a matrix H˜ 2 R33. The matrix H˜ is obtained
from eq. (6.121) by replacing the submatrix Sˆ 122 with the bottom-right corner (3 3)
submatrix of the inverse covariance matrix of A n, nY . This submatrix, which will be
denoted by Sˆ 122 , can be computed in O(n
2) time as follows.
If Sˆ is the covariance matrix of the random vector
Y = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T, (6.130)
and Sˆ denotes the covariance matrix of the random vector A n, nY then theorem 2.37
implies that
Sˆ = A n, nSˆ(A n, n)
T (6.131)
so that its inverse is
Sˆ 1 = (A n, n)
 TSˆ 1(A n, n)
 1 = (A n, n)
 T Lˆ T Lˆ 1(A n, n)
 1. (6.132)
6.6  Application to the Computation of max(X1, . . . ,Xn) 99
Equation (6.126) states that Xn 2 = Wn 2 and Xn 3 = Wn 3 so that the inverse linear
transformation X = A 1X is
Xi =
(
Wi   qiWn 2   riWn 3 for 1  i  n  4
Wi for n  3  i  n.
(6.133)
Hence, A 1 can be obtained from A in O(n) time by changing the signs of the coeffi-
cients q1, . . . , qn 4 and r1, . . . , rn 4. Another implication is that
(A n, n)
 1 = (A 1) n, n. (6.134)
Let B 1 2 R(n 1)3 denote the matrix obtained by removing all but the last three
columns of (A 1) n, n. It follows that
Sˆ 122 = B
 T Lˆ T Lˆ 1B 1 = (Lˆ 1B 1)T(Lˆ 1B 1) (6.135)
and hence, the matrix Sˆ 122 2 R33 can be computed in O(n2) time because the compu-
tation of the matrix Lˆ 1B 1 2 R(n 1)3 can be done in O(n2) time.
As explained above, the matrix H, defined in eq. (6.121), is modified accordingly by
replacing Sˆ 122 with Sˆ
 1
22 , which gives
H˜ = k3(V)
T(Sˆ 122 
 Sˆ T22 )k3(V)Sˆ 122 . (6.136)
Let y˜ denote the dominant eigenvalue of the matrix H˜ corresponding to the eigenvector
l˜ = (l˜1, l˜2, l˜3)
T. The norm of l˜ is computed using eqs. (6.122) and (6.124) by replacing
Sˆ 122 , l¯ and y with Sˆ
 1
22 , l˜ and y˜, respectively.
Then the (n  1)-dimensional shape vector estimate for the random vector A n, nY is
(0, . . . , 0, l˜1, l˜2, l˜3)
T. According to eqs. (6.64) and (6.133), the estimate of the (n  1)-
dimensional shape vector for the distribution of the original random vector Y is obtained
by the inverse linear transformation
lˆ = (A n, n)
 1(0, . . . , 0, l˜1, l˜2, l˜3)
T, (6.137)
which can be written as
lˆi =
(
l˜i   qil˜2   ril˜1 for 1  i  n  4
l˜i (n 4) for n  3  i  n  1
(6.138)
for all 1  i  n  1, where qi and ri are computed as shown in the proof of lemma 6.9
in section A.2.3.
6.6 Application to the Computation of max(X1, . . . ,Xn)
As with the normal distribution based MAX-operation, the skew-normal distribution
based MAX-operation can be applied multiple times. During each application, two
100 Chapter 6  SUM and MAX-Operations based on Skew-Normal Distribution
random variables are replaced by a third random variable so that each application
reduces the number of random variables by one.
Let X = (X(0)1 , . . . ,X
(0)
n )
T be a n-dimensional skew-normal random vector. The first
application of the skew-normal distribution based MAX-operation approximates the
random vector
(X(0)1 , . . . ,X
(0)
n 2,max(X
(0)
n 1,X
(0)
n ))
T
with a (n  1)-dimensional skew-normal random vector (X(1)1 , . . . ,X(1)n 1)T. The second
application of the skew-normal distribution based MAX-operation approximates the
random vector
(X(1)1 , . . . ,X
(1)
n 3,max(X
(1)
n 2,X
(1)
n 1))
T
with a (n  2)-dimensional skew-normal random vector (X(2)1 , . . . ,X(2)n 2)T and so on.
In the second last application, the random vector
(X(n 3)1 ,max(X
(n 3)
2 ,X
(n 3)
3 ))
T
is approximated by the random vector (X(n 2)1 ,X
(n 2)
2 )
T with a bivariate skew-normal
distribution using a simplified version of the presented algorithm.
Finally, the last application uses the method of moments (moment matching) to approx-
imate the distribution of the random variable max(X(n 2)1 ,X
(n 2)
2 ) with a new random
variable X(n 1)1 of univariate skew-normal distribution using eqs. (6.24) to (6.26), which
gives the final result. Therefore, the distribution of the maximum max(X1, . . . ,Xn)
T is
obtained by n  1 applications of the skew-normal distribution based MAX-operation.
6.7 Conclusion
The statistical SUM and MAX-operation are the fundamental operations of block-based
statistical timing analysis. While the SUM-operation can usually be computed efficiently,
the efficient computation of the MAX-operation is, on the other hand, one of the most
challenging problems of block-based statistical timing analysis. For a given random
vector
X = (X1, . . . ,Xn)
T, (6.139)
the MAX-operation must approximate the distribution of the random vector
Y = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T (6.140)
with a member of the same family of probability distributions as the random vector
X. So far, block-based statistical timing analysis has relied on the normal distribution
based MAX-operation [Clark61], which can cause large approximation errors because
it approximates the distribution of Y with a normal distribution.
To minimize the approximation error, this chapter introduces the skew-normal distribu-
tion based MAX-operation. Compared to the normal distribution based MAX-operation,
6.7  Conclusion 101
the proposed MAX-operation is defined on the far more flexible skew-normal distribu-
tion, which allows the accurate approximation of the random vector Y with another
skew-normal distribution. The results in section 7.5 consistently show a significant
reduction of the approximation error by up to 90%. However, the proposed algorithm
has a slightly larger runtime complexity which might limit its application to n < 1000.

C
h
a
p
t
e
r7
Experimental Evaluation
7.1 Implementation
All algorithms were implemented in C/C++ and executed on Intel Core i7-2600K
processor workstations with 32GiB RAM. For the generation of the circuit instances
with random delay values, the high-performance implementations of the Box-Muller
transform and the Mersenne Twister pseudo-random number generator from the Intel
Math Kernel Library is used [Intel15].
To accurately measure the runtime of the presented algorithms and individual steps,
the hardware performance counters of the processor are accessed using the perfmon
interface. These hardware counters are configured to count the number of elapsed CPU
clock cycles for selected regions of code. All results are stored in a MySQL database.
The runtime is obtained by dividing the total number of clock cycles by the CPU clock
frequency.
7.2 Benchmark Circuits
The proposed algorithms were evaluated on several large benchmark circuits, kindly
provided by NXP. The circuits were synthesized and optimized for speed using a
commercial synthesis tool. The NanGate 45 nm Open Cell Library [nan11] was chosen
as a target library. The benchmark circuit characteristics are shown in table 7.1, where
#PPI and #PPO denote the number of pseudo-primary inputs and pseudo-primary
outputs, respectively. The last three columns will be explained at the end of this section.
It was assumed, that every delay value Xi of a gate has a normal distribution with
mean mi and standard deviation si. The mean mi is set to the nominal delay value that
104 Chapter 7  Experimental Evaluation
sTable 7.1 — Benchmark circuit characteristics
circuit #PPI #PPO #gates Q0.6 [ps] Q0.8 [ps] Q0.95 [ps]
p35k 2912 2229 28115 1163.6 1266.8 1412.7
p45k 3739 2550 26954 889.3 968.4 1084.8
p77k 3487 3400 41797 5805.2 6377.2 7137.5
p78k 3148 3484 57535 1213.3 1325.4 1485.2
p81k 4029 3952 91756 1018.5 1109.7 1238.6
p100k 5902 5829 61749 1400.1 1533.9 1710.6
p267k 17332 16621 138912 787.9 856.2 951.6
p330k 18010 17468 184425 1023.5 1116.9 1246.4
was extracted from the Standard Delay Format (SDF) description of the synthesized
netlists. The standard deviation of Xi was defined as
si := cvjmij, (7.1)
where jmij denotes the absolute value of the nominal delay and cv is a variation
coefficient. To study large process variations in innovative technology nodes and other
sources of delay variations like model inadequacy and environmental variations, a
variation coefficient of cv = 0.25 was chosen [Ye10].
Manufacturing data from a Intel microprocessor in 65nm technology [Kuhn08] has
shown that inter-die variations accounts for roughly 3% and intra-die variations causes
approximately 2.5% variability of the clock frequency of on-chip ring oscillators. Fur-
thermore, intra-die variations are expected to dominate in future technology nodes, due
to a shift to purely random and independent physical variations like random dopant
fluctuation and line edge roughness [Li10, Agarw07]. To account for inter-die and
intra-die variations, every delay value Xi of a gate was defined as
Xi =
1p
2
 
Zinter + Zintra,i

si + mi, (7.2)
where Zinter and Zintra,i are independent random variables of standard normal distribu-
tion. While a separate random variable Zintra,i exists for each Xi, the random variable
Zinter is shared by all delay values of all gates to account for inter-die variations, which
affects all delays of the gates on the same die in a similar way. For example, to consider
the components of process variation which occur predominantly from wafer-to-wafer
and lot-to-lot. From eq. (2.28) and theorem 2.24 it follows that
Var(Xi) = Var

1p
2
 
Zinter + Zintra,i

si

(7.3)
=
1
2
(Var(Zinter) +Var
 
Zintra,i

)s2i (7.4)
= s2i . (7.5)
As can be seen from eq. (7.4), half of the variance of Xi is due to inter-die and the other
half is due to intra-die variations.
7.2  Benchmark Circuits 105
The creation of circuit layouts was omitted to avoid an unnecessary complex experimen-
tal setup. Consequently, spacial correlations have not been considered. The interconnect
delays have been estimated during the synthesis and are considered to be part of the
gate delay.
The clock cycle time Tclk is determined from the distribution of the circuit delay D, which
is defined as follows. In this experimental setup, the circuit delay is defined by the
maximum delay of any path that is sensitized by a given large set of test vector-pairs
under the impact of delay variations. To compute the circuit delay distribution, all
available test vector-pairs were simulated with the nominal circuit instance. The test
vector-pairs were then sorted according to the arrival time of the last transition at
the circuit outputs. Afterwards, the 250 test vector-pairs with the latest transition
arrival time at the circuit outputs were simulated with a Monte Carlo simulation of 104
iterations. In each iteration, the arrival time of the last transition at the circuit outputs
across all test vector-pairs was stored in a database. The hereby obtained approximation
of the circuit delay distribution is shown in fig. 7.1 for the first four benchmark circuits.
p35k
p45k
p77k
p78k
0 2000 4000 6000 8000
0.0
0.2
0.4
0.6
0.8
1.0
qFigure 7.1 — CDF of the circuit delay D for several NXP benchmark circuits
The 0.6, 0.8 and 0.95-quantile of the circuit delay distribution is presented in the last
three columns of table 7.1. For a benchmark circuit, the p-quantile of the circuit delay
distribution is the time T 2 R, such that the probability that the circuit delay of a
randomly chosen circuit instance is less or equal T is equal to p, that is
P(D  T) = p. (7.6)
For example, the 0.95-quantile Q0.95 denotes the time where 5% of the defect-free
manufactured chips would fail the timing requirements due to delay variations.
A path with delay Y is a critical path according to definition 2.17 if
P(Y > Tclk) > pcri, (7.7)
where pcri = 1 F(3)  0.0013499 was chosen in this experimental setup to achieve
high accuracy. If m denotes the mean and s denotes the standard deviation of Y, then
106 Chapter 7  Experimental Evaluation
eq. (7.7) is equivalent to
m+ 3s > Tclk (7.8)
because the path delays are normally distributed in this experimental setup.
7.3 Probabilistic Sensitization Analysis
This section presents the experimental results for the probabilistic sensitization anal-
ysis. At first, the details of the experimental setup are explained. Afterwards, the
experimental results are discussed.
For each benchmark circuit, the 10000 longest paths were found using a commercial
static timing analysis tool. During that process, the number of paths which were
allowed to terminate at the same primary output was limited to 100. The resulting set
of paths was subsequently sensitized using a commercial ATPG tool. On average, 2536
test vector-pairs were generated for each benchmark circuit. The only notable exception
is p77k, where only 114 test vector-pairs had been generated.
Section 7.3.1 evaluates the accuracy and speedup of the Monte Carlo simulation of the
subcircuit S , compared to the Monte Carlo simulation of the complete circuit. The
experimental results for the simplified probabilistic sensitization analysis are presented
in section 7.3.2.
7.3.1 Evaluation of Representative Subcircuit
The proposed representative subcircuit S was constructed as described in section 4.4
using 100 circuit instances q0, . . . , q99, including the nominal circuit instance q0. Storing
all circuit instances in memory did not require more than 8 GiB of RAM per process.
To show the importance of the controlling paths for the delay test, another subcircuit
S¯ , consisting only of the critical target paths, was constructed as follows. At first, the
given test vector-pair was simulated with the nominal circuit instance q0 and all critical
paths, which are sensitized in the nominal circuit instance, are identified. Afterwards,
the subcircuit S¯ is constructed from all gates and interconnects which lie on at least one
of these paths. All floating gate input nodes are set to their respective non-controlling
values. For the floating off-path input node of a XOR/XNOR gate, the logic value,
which the off-path input had at the arrival time of the on-path transition, is used.
Finally, all floating circuit outputs are set to the value observed at the clock cycle time.
To evaluate the suitability of the subcircuits for the simulation of path delay fault
tests with the given test vector-pair, a Monte Carlo simulation with 104 iterations
(circuit instances) was performed. Following the simulation of a circuit instance q,
both subcircuits S and S¯ were simulated using the same random delay values as in q.
Afterwards, the logic values at the outputs of the circuit instance and the outputs of
the subcircuits were observed at the clock cycle time Tclk and compared to the expected
fault free stable output values. An inconsistent delay test result occurs if a delay fault
is detected in only either the simulation of the circuit instance or in the simulation
7.3  Probabilistic Sensitization Analysis 107
of the subcircuit. The following figs. 7.2 to 7.4 present average results for path delay
fault tests with otherwise defect free circuits, where the clock cycle time was chosen
corresponding to the 0.95-quantile of the circuit delay distribution.
The average probability of an inconsistent delay test result is shown in fig. 7.2. It can
be seen that the probability of an inconsistent delay test result is quite high for the
subcircuit containing only the critical target paths. After extending the subcircuit to a
representative subcircuit, the probability of an inconsistent delay test result drops to
almost zero. This shows that the delays of the controlling paths play a key role during
delay testing by determining which critical paths are sensitized by the test vector-pair
in any circuit instance.
p35k p45k p77k p78k p81k p100k p267k p330k
subcircuit of critical target paths representative subcircuit
0
1
2
3
4
P
ro
b
a
b
il
it
y
o
f
a
n
in
co
n
si
st
en
t
d
el
a
y
te
st
re
sp
o
n
se
[%
]
qFigure 7.2 — Average probability of observing an inconsistent delay test result with
the subcircuit
The relative size of a subcircuit is defined as the number of gates in the subcircuit, di-
vided by the number of gates in the respective (complete) benchmark circuit. Figure 7.3
shows the average relative size of the subcircuits and the average relative size of the
joint input cone of all critical paths that exist in the respective subcircuit.
As shown by fig. 7.3a, the relative size of the subcircuit S¯ that consists only of the
critical target paths is on average more than 10 smaller than the joint input cone of all
critical paths that exist in S¯ . Extending the subcircuit to the representative subcircuit S
results in only a small increase in the relative subcircuit size. As shown in fig. 7.3b, the
average relative size of the representative subcircuit is about 1%. The only exception is
p81k, which is not surprising because fig. 7.2 has already shown that the controlling
paths have a strong impact on the delay test result for this benchmark circuit. Therefore,
the construction of a representative subcircuit requires a greater number of critical
and controlling paths, which increases the average relative size of the representative
subcircuit to 5%. In general, many other critical paths may be sensitized by the test
vector-pair in randomly chosen circuit instances. Therefore, the joint input cone of all
critical paths in the representative subcircuit may on average contain up to 47% of the
gates of the complete circuit.
The small size of the representative subcircuit S and the low probability of an inconsis-
tent delay test result motivates the use of the representative subcircuit for more complex
statistical timing analysis problems, such as for the computation of the probability
that a particular target path is sensitized by a given test vector-pair, as explained in
108 Chapter 7  Experimental Evaluation
p35k p45k p77k p78k p81k p100k p267k p330k
subcircuit of critical target paths joint input cone of critical target paths
0.01
0.10
1
10
100
R
el
a
ti
v
e
g
a
te
co
u
n
t
[%
]
(a) Subcircuit that consists only of critical target paths
p35k p45k p77k p78k p81k p100k p267k p330k
representative subcircuit joint input cone of sensitizable critical paths
0.01
0.10
1
10
100
R
el
a
ti
v
e
g
a
te
co
u
n
t
[%
]
(b) Representative subcircuit
qFigure 7.3 — Relative size of subcircuit and relative size of joint input cone of all
critical paths that exist in the subcircuit
section 4.4. The average speedup is defined as the ratio of the average runtime of the
Monte Carlo simulation of the complete circuit, divided by the average runtime for
the construction and Monte Carlo simulation of the subcircuit. The average speedup
gained by this approach is presented in fig. 7.4 for different benchmark circuits. The
results show that it is between 8 and 256 times faster to first generate the subcircuit and
then perform a Monte Carlo simulation of the subcircuit instead of directly performing
a Monte Carlo simulation of the complete circuit.
p35k p45k p77k p78k p81k p100k p267k p330k
subcircuit of critical target paths representative subcircuit
4
8
16
32
64
128
256
S
p
ee
d
u
p
b
y
su
b
-
ci
rc
u
it
si
m
u
la
ti
o
n
qFigure 7.4 — Speedup of Monte Carlo simulation by constructing and simulating
only the subcircuit, compared to Monte Carlo simulation of complete circuit
The detailed numerical results are presented in table B.1a for path delay fault tests with
7.3  Probabilistic Sensitization Analysis 109
otherwise defect free circuits. Additional results for small delay fault tests in circuits
with a marginally detectable small delay fault are shown in table B.1b. Both tables
present average results over all test vector-pairs of a benchmark circuit.
The first column shows the name of the NXP benchmark circuit. The second column
"Tclk" shows the clock cycle time, corresponding to the 0.6, 0.8 and 0.95-quantile of
the circuit delay distribution. For small delay fault tests, only the clock cycle time
corresponding to the 0.95-quantile is considered.
The following five columns ("subcircuit S¯") show the experimental results for the
subcircuit S¯ , which was constructed from q0 without using probabilistic sensitization
analysis. The experimental results for the representative subcircuit S, which was
constructed using the proposed probabilistic sensitization analysis, are shown in the
last six columns ("representative subcircuit S").
The average number of critical paths, which are sensitized by the test vector-pair in at
least one of the considered circuit instances, is shown in column "#sens.crit. paths". The
number of critical paths is much smaller for S¯ because only the nominal circuit instance
is considered and many other critical paths may be sensitized by the test vector-pair in
randomly chosen circuit instances. The column "jconej", presents the average relative
number of gates in the joint input cone of the critical paths in the subcircuit, compared
to number of gates in the benchmark circuit. As expected from the greater number of
sensitized critical paths in the representative subcircuit S , the number of gates in the
joint input cone of these paths is also much larger for S .
The average relative number of gates in each subcircuit, compared to the number of
gates in the benchmark circuit, is shown in the columns "jS¯ j" and "jSj", respectively.
The results show that the representative subcircuit S is usually only slightly larger
than the subcircuit S¯ . The column "#iter" gives the average number of iterations,
during which the subcircuit was extended and simulated with the delay values of all
considered circuit instances q0, . . . , q99.
The probability of an inconsistent delay test result is given in the column "Perr" in
percent [%]. The average speedup is shown in column "SU". The results show a large
average speedup of up to 257 can be achieved by the Monte Carlo simulation of S¯ .
However, as expected, the probability of an inconsistent delay test result is quite high
and increases rapidly with the clock frequency, which shows the large impact of the
controlling paths on the delay test.
On the other hand, the simulation of the representative subcircuit S is very accurate and
the probability of an inconsistent delay test result "Perr" is almost zero. Furthermore, the
last column "SU" shows that it is still up to 32 faster to construct the representative
subcircuit and then perform a Monte Carlo simulation of the representative subcir-
cuit, instead of directly performing a Monte Carlo simulation of the complete circuit.
The speedup is particularly large for the Monte Carlo simulation of circuits with a
marginally detectable small delay fault for the evaluation of small delay fault tests.
110 Chapter 7  Experimental Evaluation
7.3.2 Simplified Probabilistic Sensitization Analysis
This subsection demonstrates the efficiency and flexibility of the proposed simplified
probabilistic sensitization analysis for guiding the delay test generation process. At
first, important path sensitization conditions are reviewed in section 7.3.2.
Path Sensitization Conditions
The tolerance of a path delay fault test towards delay variability is influenced by the
sensitization conditions that are satisfied by the off-path inputs of the path. These
sensitization conditions are of limited use for delay testing under the impact of de-
lay variations [Sauer12], because they consider only very few details of the actual
waveforms, such as the initial and the final value. Nevertheless, the most common
sensitization conditions are briefly reviewed here to describe important details of the
experimental results.
Table 7.2 shows the sensitization condition satisfied by a single off-path input for
different on- and off-path input transitions [Fuchs94, Krsti95]. The logic symbols are
explained below.
• S0 (S1) represents a waveform with only one transition (v0, t0) and v0 = 0 (v0 = 1)
• X0 (X1) means that the final value of a waveform is 0 (1)
• 01 (10) means the initial value of the waveform is 0 (1) and the final value is 1 (0)
The most stringent sensitization condition considered in the experiments is the robust
sensitization condition and the least stringent is the functional sensitization condition.
The relaxations made by going from one sensitization condition to the next less stringent
conditions are emphasized with bold letters.
The classification of a path delay fault test is determined by the weakest sensitization
condition satisfied by any of its off-path inputs. For example, if some of the off-path
inputs satisfy the robust sensitization condition while the remaining off-path inputs
only satisfy the non-robust sensitization condition, then the path is tested by a non-robust
test.
A robust test can detect a path delay fault, regardless of the delays of other paths in
the circuit. However, a large number of paths have no robust test [Lin87, Fuchs91].
Although synthesis for 100% robust testability is generally possible, this approach has
sTable 7.2 — Sensitization condition satisfied by an off-path input
off-path input
sensitization condition Robust Non-robust functional
on-path input rising falling rising falling rising falling
AND/NAND gate X1 S1 X1 X1 X1 10
OR/NOR gate S0 X0 X0 X0 01 X0
XOR/XNOR gate S0/S1 S0/S1 X0/X1 X0/X1 X0/X1 X0/X1
7.3  Probabilistic Sensitization Analysis 111
several major disadvantages such as a significant area overhead and possibly many
additional primary inputs [Jha92, Chakr00].
The sensitization conditions of a non-robust test are less stringent than those of a robust
test. However, the sensitization of a target path by a non-robust test depends on the
delays of other paths in the circuit. It is therefore likely, that a non-robust test sensitizes
a target path in only a subset of the manufactured circuits. For example, the orange
path a-b-c-d in fig. 1.5 is tested by a non-robust test, which can fail to sensitize the
target path even if all transitions at the off-path inputs arrive well before the respective
transition at the on-path input.
A logical path which is neither robustly nor non-robustly testable can still cause a
timing failure during normal operation of the circuit. For this reason, the category of
functionally sensitizable paths was introduced in [Krsti95].
Experimental Results
Each of the n test vector-pairs of a benchmark circuit was simulated with the nominal
circuit instance q0 and the sensitized critical paths were assumed to be the target paths
of the test vector-pair. In the following, p1, . . . , pmi will be used to denote the mi target
paths of the ith test vector-pair ni.
To evaluate the impact of delay variations on the sensitization of the target paths,
the test vector-pair was then simulated with 99 randomly chosen circuit instances
q1, . . . , q99 and it was checked which of those target paths was sensitized in q1, . . . , q99.
The simplified probabilistic sensitization analysis was then applied to all inconsistently
controlled target paths.
The results are presented in table B.2a for path delay fault tests with otherwise defect
free circuits. Table B.2b presents additional results for small delay fault tests of
marginally detectable size, which will be used in section 7.4. Column "Tclk" shows the
clock cycle time, corresponding to the 0.6, 0.8 and 0.95-quantile of the circuit delay
distribution. The following three columns show the average number of robustly ("#rob"),
non-robustly ("#nr") and functionally ("#fs") tested target paths per test vector-pair.
The following four columns present the experimental results for non-robust path delay
faults tests, considering a randomly chosen test vector-pair. Column "Pci" shows the
probability that in a randomly chosen circuit instance at least one non-robustly tested
target path is not sensitized. This probability can be formally described by introducing
the function f , which is defined as follows. If during the application of test vector-pair
ni to the circuit instance qj, the non-robust test of a target path pk fails to sensitize pk
then f (ni, qj, pk) = 1, else f (ni, qj, pk) = 0. Then the probability Pci is formally defined
as
Pci =
1
100n
n
å
i=1
99
å
j=0
max
k=1,...,mi
( f (ni, qj, pk)). (7.9)
112 Chapter 7  Experimental Evaluation
The probability that a randomly chosen non-robustly tested target path is not sensitized
in at least one circuit instance is given in column "Pid". This probability is computed as
Pid =
1
m1 + ...+mn
n
å
i=1
mi
å
k=1
max
j=0,...,99
( f (ni, qj, pk)). (7.10)
The results show that this probability is almost one, so that it is very likely that a
non-robust test is invalidated in at least one circuit instance. It may be surprising to
observe that for some benchmark circuits, Pci is quite small but Pid is almost one. This
can be understood by a quite common example, where many non-robustly tested target
paths start with the same sequence of gates, called path segment. If this common path
segment is not sensitized in one circuit instance, then the delay tests of all target paths
that run through this path segment are invalidated.
The next column "Pin" presents the probability that a randomly chosen non-robustly
tested target path is not sensitized in a randomly chosen circuit instance. This probabil-
ity has been computed as
Pin =
1
100(m1 + ...+mn)
n
å
i=1
99
å
j=0
mi
å
k=1
f (ni, qj, pk). (7.11)
The results in column "Pin" once again show that non-robust path delay fault test
invalidation is very likely under the impact of delay variations.
Each invalidated non-robust path delay fault test was further analysed to determine
if the propagation of the transition along the path was blocked by a waveform in-
consistency at one of the inputs of the gate. The probability that the invalidation
of a randomly chosen non-robust test of a target path in a randomly chosen circuit
instance is caused by a waveform inconsistency at one of the on-path/off-path inputs
of the path is shown in column "Pwi|in". The results show that non-robust path delay
fault test invalidation is most likely caused by variations in the arrival time of the
on-path/off-path transitions. However, in up to 31% of all invalidated non-robust path
delay fault tests, the invalidation is instead caused by a waveform inconsistency at one
of the on-path/off-path inputs. In this case, the invalidation of the path delay fault test
is prevented by correcting this waveform inconsistency.
Finally, the average runtime per test vector-pair is presented in the last two columns.
The runtime of the nominal circuit instance simulation and the identification of the
target paths is shown in column "Tsim". Column "Tdiag" presents the time required by
the proposed probabilistic sensitization analysis. The runtime for the fault simulation
[Fuchs91] itself is not included, since the fault simulation is merely used to separate
non-robust tests from other path delay fault tests for the purpose of this presentation.
The runtime for the random number generation to create the 99 randomly chosen
circuit instances is also excluded, since the generation is done only once and the same
circuit instances are shared by all test vector-pairs.
7.4  Computation of Target Paths Delay Fault Probability 113
7.4 Computation of Target Paths Delay Fault Probability
This section compares the accuracy and runtime of the proposed non-incremental and
incremental algorithms with extensive Monte Carlo simulations for a large number of
marginally detectable small delay faults. The proposed algorithms approximate the
target paths delay fault probability while the Monte Carlo simulation computes the
delay fault detection probability.
Definition 7.1 (delay fault detection probability). The probability of detecting a delay
fault during delay testing (see section 1.2.1) of a randomly chosen circuit instance with
a given set of test vector-pairs T and delay test parameters is called delay fault detection
probability and denoted by X.
To evaluate the effectiveness of a given set of test vector-pairs for the detection of small
delay faults, definition 7.1 is applied to a modified set of circuit instances, which is
obtained by injecting a marginally detectable small delay fault of fixed size into all
circuit instances in Q.
The first section 7.4.1 presents the experimental setup. The results obtained using
the non-incremental algorithm are shown in section 7.4.2. The following section 7.4.3
presents the experimental results obtained using the incremental algorithm. The last
section 7.4.4 applies the obtained experimental results to find a suitable trade-off
between the test set size and the delay test quality.
7.4.1 Experimental Setup
A benchmark of 20000 randomly chosen single small delay faults was created for each
circuit. Afterwards, a set of test vector-pairs was generated for each small delay fault.
For each small delay fault, the 1000 longest paths through the fault site were identified
with a commercial static timing analysis tool, where at most 100 paths were allowed to
end at the same circuit node. Afterwards, the selected paths were sensitized using a
commercial ATPG tool. The following experiments were conducted only with those
small delay faults, for which at least 20 test vector-pairs had been generated. To study
marginally detectable small delay faults, the fault size was set to the slack of the longest
sensitized path p˜ in the nominal circuit instance, which passes through the fault site.
From the set of test vector-pairs that was generated for a small delay fault, four different
test subsets T1, . . . , T4 are derived as follows. The first test subset T1 consists of only a
single test vector-pair, which sensitizes p˜ in the nominal circuit instance. Afterwards,
Ti+1 is derived from Ti by adding test vector-pairs in decreasing order of the delay of
the longest target path through the fault site. The number of test vector-pairs to be
added is chosen such that T2 contains five, T3 contains ten and T4 consists of twenty
test vector-pairs. The critical target path sensitization by the test vector-pair in T1 has
been evaluated with the simplified probabilistic sensitization analysis in table B.2b,
which shows the strong impact of delay variations on the mostly non-robustly and in
some cases only functionally sensitized critical target paths.
114 Chapter 7  Experimental Evaluation
The clock cycle time Tclk was set to the 0.95-quantile of the circuit delay distribution. In
other words, Tclk was chosen such that 5% of the defect-free manufactured chips are
expected to fail the timing requirement due to process variations.
For each injected small delay fault, the delay fault detection probability X (see defini-
tion 7.1) was computed for all test subsets T1, . . . , T4 by a Monte Carlo simulation of 104
iterations. Assuming that the approximation error of the Monte Carlo simulation has a
normal distribution, then the chosen number of iterations provides a 0.0098 half-length
of the 95% confidence interval. In other words, the true delay fault detection probability
lies within 0.98% of the Monte Carlo simulation estimate with 95% confidence. The
computational cost of the simulation is dominated by the large number of the random
delay values required for each circuit instance (Monte Carlo iteration).
7.4.2 Non-Incremental Computation
In this section, the effectiveness of the non-incremental algorithm is demonstrated. The
following figs. 7.5 and 7.6a presents average results over all small delay faults in a
benchmark circuit.
The speedup achieved by the proposed non-incremental algorithm is shown in fig. 7.5
for different benchmark circuits. The average speedup is defined as the average runtime of
the Monte Carlo simulation divided by the average runtime of the proposed algorithm.
It can be seen that the average speedup of the proposed algorithm is about three
orders of magnitude. While the runtime of the critical target path identification
increases slowly with the test subset size jT j, the relative contribution of the numerical
integration to the overall runtime of the proposed algorithm increases significantly.
Compared to this, the Monte Carlo simulation becomes more efficient, because the
high computational cost for generating the large number of random delay values is
shared among a greater number of test vector-pairs. Consequently, the average speedup
decreases with the size of the test subset jT j.
p35k p45k p77k p78k p81k p100k p267k p330k
5 test vector-pairs 10 test vector-pairs 20 test vector-pairs
0
500
1000
1500
2000
S
p
ee
d
u
p
qFigure 7.5 — Average speedup of non-incremental algorithm compared to Monte
Carlo simulation
The proposed algorithm is applied to approximate the delay fault detection probability.
For a given test subset T , the error d of this approximation is defined as
d = X  Yˆ, (7.12)
7.4  Computation of Target Paths Delay Fault Probability 115
where the delay fault detection probability X is obtained by the described Monte Carlo
simulation and Yˆ denotes the approximation of the target paths delay fault probability,
obtained by the proposed non-incremental algorithm. This error is almost exclusively
caused by the impact of delay variations on the sensitization of the critical paths. On
the one hand, the delay fault detection probability may be overestimated if a critical
target path is not sensitized in a randomly chosen circuit instance. On the other hand, a
critical path which is not sensitized in the nominal circuit instance might be sensitized
in a randomly chosen circuit instance, contributing to an underestimation of the delay
fault detection probability.
The average absolute approximation error of the proposed non-incremental algorithm
is presented in fig. 7.6a. The diagram shows that the error is quite small and tends
to decrease with the test subset size. However, more than 1000 critical target paths
were found for some delay faults in p267k and p330k, so that the normal distribution
based MAX-operation was applied to reduce the number of random variables. The
inherent inaccuracy of this operation causes the average absolute error to increase with
the test subset size for p267k and p330k. A box plot (see section 2.4.2) of the error d is
shown in fig. 7.6b for the evaluation of the test subset T2 with five test vector-pairs. It
is apparent that the error d is almost zero for at least 50% of the experimental results
and that in rare extreme cases (extreme outliers) the delay fault detection probability is
underestimated by up to 40% or overestimated by up to 65%. These extreme outliers
p35k p45k p77k p78k p81k p100k p267k p330k
5 test vector-pairs 10 test vector-pairs 20 test vector-pairs
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
A
v
er
a
g
e
|δ|
[%
]
(a) Average absolute error jdj for test subsets T2, T3 and T4
p35k p45k p77k p78k p81k p100k p267k p330k
-60
-40
-20
0
20
40
E
rr
o
r
δ[
%
]
(b) Box plot of the error d for test subset T2 with 5 test vector-pairs
qFigure 7.6 — Error d caused by approximating the delay fault detection probability
with the target paths delay fault probability approximation of the non-incremental
algorithm
116 Chapter 7  Experimental Evaluation
are caused by the impact of delay variations on the sensitization of the critical paths
and dominate the average absolute error in fig. 7.6a. Furthermore, the lower whiskers
are slightly longer than the upper whiskers, which indicates a slight tendency towards
overestimation of the delay fault detection probability.
Table B.3 presents the average results over all small delay faults in a benchmark circuit
in detail. The name of the circuit is given in column "circuit" and column "jT j" shows
the number of test vector-pairs in the test subsets T1, . . . , T4. Column "jPj" presents
the average number of critical target paths, sensitized by all test vector-pairs in the
test subset. It can be observed that the number of critical target paths slowly saturates
with the increasing number of test vector-pairs in the test subsets. This is because the
additional test vector-pairs predominantly sensitize shorter paths and many critical
paths may have already been sensitized by the smaller test subsets.
For low dimension of the multivariate normal integral (eq. (5.15)), very efficient spe-
cialized algorithms [Genz04] are used. For larger dimensions, a very efficient Monte
Carlo numerical integration algorithm [Genz92] is used, which was published by its
author in the Fortran routine MVNDST. The absolute error of the numerical integration
was estimated to be below 5  10 3 with 95% confidence with at most 105 Monte Carlo
iterations.
Column "d" shows the average and column "jdj" the average absolute value of the error
d (see eq. (7.12)) over all small delay faults in a benchmark circuit. The average error
indicates a slight tendency towards overestimation for a single test vector-pair and a
slight tendency towards underestimation of the delay fault detection probability for
multiple test vector-pairs.
The average runtime of the proposed algorithm is shown in column "TPA". This runtime
is dominated by the numerical integration (41.8%), followed by the identification of
the critical target paths (34.6%) and the simulation of the nominal circuit instance
(23.6%). Column "TMC" presents the average runtime of the Monte Carlo simulation.
The average speedup of the proposed algorithm is given in column "SU".
7.4.3 Incremental Computation
In this section, the effectiveness of the incremental algorithm is demonstrated. An
initially empty test subset is gradually extended by inserting single test vector-pairs
in decreasing order of the longest target path delay in the nominal circuit instance.
The following figs. 7.7 and 7.8a presents average results over all small delay faults in a
benchmark circuit.
Suppose a test vector-pair has been added to the test subset, then the average speedup of
the proposed incremental algorithm is defined as the average runtime of the Monte
Carlo simulation of the extended test subset divided by the average runtime of the
proposed algorithm to update the approximation of the target paths delay fault prob-
ability accordingly. The average speedup after the insertion of a test vector-pair is
presented in fig. 7.7 for different benchmark circuits. The diagram shows a very large
7.4  Computation of Target Paths Delay Fault Probability 117
average speedup of up to four orders of magnitude, which also tends to increase with
the number of test vector-pairs under consideration.
p35k p45k p77k p78k p81k p100k p267k p330k
5 test vector-pairs 10 test vector-pairs 20 test vector-pairs
0
5000
10 000
15 000
20 000
S
p
ee
d
u
p
qFigure 7.7 — Average speedup of incremental algorithm after insertion of test vector-
pair, compared to a Monte Carlo simulation of the extended test subset
The proposed incremental algorithm is applied to approximate the delay fault detection
probability (see definition 7.1). The total error e of this approximation is defined as
e = X  Yˆ, (7.13)
where the delay fault detection probability X is obtained by the Monte Carlo simulation
and Yˆ denotes the approximation of the target paths delay fault probability, obtained
by the proposed incremental algorithm. The average absolute total error is shown
in fig. 7.8a. It can be seen that the absolute total error is comparable to the absolute
error of the non-incremental algorithm. Although the absolute total error for p77k is
slightly larger, the incremental algorithm doesn’t require the normal distribution based
MAX-operation for those small delay faults in p267k and p330k, where the number
of critical target paths exceeds 1000. Instead, the proposed extension of the normal
distribution based MAX-operation is used, which reduces the total error for p267k
and p330k by up to 70%. A box plot (see section 2.4.2) of the total error e is shown
in fig. 7.8b for the evaluation of the test subset T2 with five test vector-pairs. It is
apparent that the error e is almost zero for at least 50% of the experimental results
and that in rare extreme cases (extreme outliers) the delay fault detection probability is
underestimated by up to 41% or overestimated by up to 76%. These extreme outliers
are again caused by the impact of delay variations on the sensitization of the critical
paths and dominate the average absolute total error jej in fig. 7.8a. Furthermore, a
slight tendency towards overestimation of the delay fault detection probability is visible
because the lower whiskers are slightly longer than the upper whiskers.
The detailed average results over all small delay faults in a benchmark circuit are
presented in table B.4. The name of the benchmark circuit is shown in column "NXP
circuit". Column "jT j" presents the number of test vector-pairs in the test subset after
an insertion or removal of a single test vector-pair. Only those updates, which resulted
in a test subset of size 1, 5, 10 or 20 test vector-pairs, are presented here.
118 Chapter 7  Experimental Evaluation
p35k p45k p77k p78k p81k p100k p267k p330k
5 test vector-pairs 10 test vector-pairs 20 test vector-pairs
0.0
0.5
1.0
1.5
2.0
A
v
er
a
g
e
|ϵ|
[%
]
(a) Average absolute total error jej for test subsets T2, T3 and T4
p35k p45k p77k p78k p81k p100k p267k p330k
-80
-60
-40
-20
0
20
40
T
o
ta
l
er
ro
r
ϵ[
%
]
(b) Box plot of the total error e for test subset T2 with 5 test vector-pairs
qFigure 7.8 — Error e caused by approximating the delay fault detection probability
with the target paths delay fault probability approximation of the incremental algorithm
Column "TMC" presents the average runtime of the Monte Carlo simulation. The results
for the insertion of a single test vector-pair are shown in column "INSERT(test vector-
pair)". The average number of critical target paths of all test vector-pairs in a test subset
is given in column "P". The part of the error caused by the impact of delay variations
on the sensitization of the critical paths by the test subset is defined as
d = X 

1 P

X1  Tclk, . . . ,XjPj  Tclk

(7.14)
and the average absolute value of d (eq. (7.14)) is shown in column "jdj". The experimen-
tal results show that this error decreases rapidly as the number of test vector-pairs in
the test subset increases. Column "TSA" shows the average runtime of the sensitization
analysis for a single test vector-pair.
If no critical target path was identified, the following steps of the algorithm are
omitted. The update of the normal distribution approximation of the random vector Y
(see eq. (5.17)) using the proposed extension of the normal distribution based MAX-
operation is evaluated in the following two columns. The numerical error due to the
(extended) normal distribution based MAX-operation is defined as
# =

1 P

X1  Tclk, . . . ,XjPj  Tclk

 

1 P

Y1  Tclk, . . . ,YjT j  Tclk

= P

Y1  Tclk, . . . ,YjT j  Tclk

 P

X1  Tclk, . . . ,XjPj  Tclk

. (7.15)
Columns "j#j" shows the average absolute value of #, which is very small. This demon-
strates the superior accuracy of the proposed covariance function for the normal
7.4  Computation of Target Paths Delay Fault Probability 119
distribution based MAX-operation. However, additional experiments with other statis-
tical timing analysis problems have shown that this error starts to increase rapidly as
soon as the distributions of the maxima show a significant degree of skewness.
The average runtime for updating the normal distribution approximation of the random
vector Y is shown in column "TDE". The runtime is very small and it increases very
slowly with the test subset size jT j.
Some test vector-pairs in the test subsets for p330k sensitize thousands of critical target
paths, with several critical paths being targeted by multiple test vector-pairs of the
same test subset. As an optimization, it is possible to ignore all recurrences of a critical
target path without affecting the accuracy of the target paths delay fault probability.
However, care must be taken if such a critical target path is deleted by a subsequent
removal of a test vector-pair or by the masking of an observable circuit output.
The target paths delay fault probability is computed using numerical integration
with the FORTRAN routines BVND, TVTL and MVNDST, which were developed by Genz
[Genz92, Genz04]. The average runtime of this computation is shown in column "TNI".
The runtime increases rapidly for small and slowly for bigger test subsets jT j  10.
The parameters for the numerical integration were chosen, such that the estimated
absolute error does not exceed 5  10 3 with 95% confidence and no more than 105
Monte Carlo iterations. These routines were also used to calculate the errors d and # of
the individual steps of the proposed algorithm.
The average absolute total error is shown in column "jej" after the insertion or the
removal of a test vector-pair. The error is small and mainly caused by the impact of
delay variations on the sensitization of the critical paths.
The very large speedup of the proposed algorithm is shown in column "Speedup". The
incremental algorithm is much faster than the non-incremental algorithm but may be
slightly less accurate. The speedup for the removal of a single test vector-pair is even
larger since only a rerun of the numerical integration is required. The update of other
delay test parameters can be estimated from the presented results of the necessary
individual steps of the algorithm. For example, the runtime of the algorithm after the
reduction of the clock cycle time or the modification of the masking/unmasking of
observable circuit outputs is roughly the sum of the runtimes given in columns "TDE"
and "TNI".
7.4.4 Application to Variation-Aware Pattern Selection
In this subsection, the accuracy of the proposed approximation algorithm is demon-
strated in the context of delay variation aware pattern selection for marginally detectable
small delay faults. The delay fault detection probability (definition 7.1) is used to find a
suitable compromise between the statistical delay test quality and the test subset size.
The results of the non-incremental algorithm in section 7.4.2 were stored in a database
and are now applied to select a suitable test subset Ti 2 fT1, . . . , T4g for each small
delay fault. For a given small delay fault and the test subset Ti with 1  i  4, let Xi
120 Chapter 7  Experimental Evaluation
denote the delay fault detection probability and let Yˆi denote the approximation of the
target paths delay fault probability by the non-incremental algorithm.
Clearly, if the probability of detecting a particular small delay fault is not significantly
increased by applying additional test vector-pairs, then these additional test vector-pairs
are not required for the detection of this particular small delay fault. The distinction
between a significant and a non-significant increase can be made based on a threshold
b. In this application example, a threshold value of b = 0.1 is chosen so that this
application never reduces the delay fault detection probability by more than 10%.
At first, a suitable test subset is selected for each small delay fault based on the
delay fault detection probability. In this example, the delay fault detection probability
X4 of the largest test set T4 with 20 test vector-pairs is compared to the delay fault
detection probabilities X1, X2 and X3 obtained with the smaller test subsets T1, T2 and
T3, respectively. In order to minimize the number of test vector-pairs while preserving
a sufficiently high delay fault detection probability, the smallest test set Ti with a delay
fault detection probability Xi satisfying
Xi  X4   b (7.16)
is selected. Based on this criterion, a suitable test subset is selected for each small delay
fault in a benchmark circuit. The results are shown in fig. 7.9. It can be seen that for
more 50% of all delay faults, the test subset T2 provides a sufficiently high delay fault
detection probability with only 5 test vector-pairs.
p35k p45k p77k p78k p81k p100k p267k p330k
1 vector-pair 5 vector-pairs 10 vector-pairs 20 vector-pairs
0
10
20
30
40
50
60
N
u
m
b
er
o
f
sm
a
ll
d
el
a
y
fa
u
lt
s
[%
]
qFigure 7.9 — Average number of test vector-pairs required for the detection of a
small delay fault of fixed size under the impact of delay variations
Afterwards, the whole process is repeated using the approximation Yˆi of the target
paths delay fault probability Yi obtained by the non-incremental algorithm, instead
of the delay fault detection probability Xi with 1  i  4. The results are compared
in fig. 7.10. The difference in the test subset size jTXj   jTYˆj is shown on the abscissa,
where TX and TYˆ denote the test subset chosen based on the delay fault detection
probability (X) and the approximation of the target paths delay fault probability (Yˆ),
respectively. The ordinate shows the relative frequency of the test subset size difference
over all small delay faults in all benchmark circuits.
For a small minority of small delay faults, this causes a slight underestimation or
overestimation of the test subset size, as shown in fig. 7.10. It can be seen that the
7.5  SUM and MAX-Operations based on Skew-Normal Distribution 121
test subset selection results match for more than 81% of the small delay faults. In
only few cases, a slightly smaller or slightly larger test subset is selected, when the
selection is based on the approximation of the target paths delay fault probability
Yˆ. The distribution of this error is in line with expectations given by the box plot in
fig. 7.6b and the explanation that this error is mainly caused by the impact of delay
variations on the sensitization of the critical paths.
0.0 0.5 1.0 0.1
6.0
1.0
81.4
2.2 5.6 0.4 1.1 0.4 0.2
-19 -15 -10 -9 -5 -4 0 4 5 9 10 15 19
0
20
40
60
80
N
u
m
b
er
o
f
sm
a
ll
d
el
a
y
fa
u
lt
s
[%
]
qFigure 7.10 — Error in test subset size when using target paths delay fault probability
approximation instead of delay fault detection probability
7.5 SUM and MAX-Operations based on Skew-Normal Distri-
bution
This section presents the experimental results for the proposed skew-normal distribu-
tion based MAX-operation. The skew-normal distribution based SUM-operation doesn’t
require experimental evaluation, as its high efficiency is apparent from theorem 6.3.
Let X = (X1, . . . ,Xn)
T  Nn(m,S) denote a n-dimensional normal random vector.
As explained in section 6.6, the skew-normal distribution based MAX-operation is
repeatedly applied to approximate the distribution of max(X1, . . . ,Xn) by a random
variable Z with univariate skew-normal distribution. The approximation error e is
defined by the Kolmogorov-Smirnov statistic, which is the maximum absolute difference
between the cumulative distribution function of Z and max(X1, . . . ,Xn) at any point
t 2 R, that is
e = max
t2R
jP(max(X1, . . . ,Xn)  t) P(Z  t)j . (7.17)
For every considered normal distribution of (X1, . . . ,Xn)
T, the accurate distribution of
max(X1, . . . ,Xn) was obtained by a Monte Carlo simulation of 10
9 iterations. The large
number of iterations is sufficient to compute the probability P(max(X1, . . . ,Xn)  t)
for any t 2 R with a maximum absolute error below 3.16228 5 with 95% confidence,
which is typically much more accurate than computing the same probability with
P(Z  t) when Z is obtained by the repeated application of the skew-normal dis-
tribution based MAX-operation. The results are compared to those obtained by the
repeated application of the normal distribution based MAX-operation using Clark’s
122 Chapter 7  Experimental Evaluation
algorithm, which approximates the distribution of max(X1, . . . ,Xn) by a univariate
normal distribution. The relative absolute error of the skew-normal distribution based
MAX-operation is defined as
jesnj   jenj
jenj
, (7.18)
where en and esn denote the error of the repeated application of the normal and
skew-normal distribution based MAX-operation, as defined in eq. (7.17), respectively.
The details of the experimental results for different distributions of (X1, . . . ,Xn)
T are
presented in the following subsections. The experimental results demonstrate the
superior accuracy of the proposed skew-normal distribution based MAX-operation.
Although the worst case runtime complexity is quadratic in the dimension of the
random vector X, the runtime of the proposed algorithm is very small even for several
hundreds of random variables.
7.5.1 Results for max(X1, . . . ,Xn) with Random m and S
Let X = (X1, . . . ,Xn)
T  Nn(m,S) be a random vector of multivariate normal distribu-
tion with mean vector
m = (m1, . . . , mn)
T (7.19)
and covariance matrix
S = DRD, (7.20)
where R 2 Rnn is a random correlation matrix and D 2 Rnn is a random diagonal
matrix with the (positive) standard deviations s1, . . . , sn of X1, . . . ,Xn on the main
diagonal. For all i 2 N with 1  i  n, the mean mi and standard deviation si of Xi
were independently generated from a uniform distribution such that mi 2 [0.9, 1.1]
and si 2 [0.8, 1.2]. The proposed MAX-operation has been evaluated for 8 dimensions
n1 = 4, n2 = 8, n3 = 16, n4 = 32, n5 = 64, n6 = 128, n7 = 256 and n8 = 512 of the
random vector X. For each dimension, 3000 random multivariate normal distributions
with large variations of the correlation coefficients in R and 1000 random multivariate
normal distributions with small variations of the correlation coefficients in R have been
considered.
A log-log plot of the runtime of the normal distribution and skew-normal distribution
based MAX-operations is presented in fig. 7.11. It can be seen that the runtime of
the proposed algorithm for the skew-normal distribution based MAX-operation is
comparable to Clark’s algorithm for the normal distribution based-MAX operation
when n is small. However, the runtime of the proposed algorithm increases faster as
n becomes larger, but still remains very small even for several hundreds of random
variables.
The average absolute error of the normal and skew-normal distribution based MAX-
operation is presented in fig. 7.12 in percent [%] for different dimensions of X and
large variations in the correlation coefficients in R. It can be seen that on average, the
skew-normal distribution based MAX-operation reduces the approximation error by
7.5  SUM and MAX-Operations based on Skew-Normal Distribution 123
Normal-MAX Skew-Normal-MAX
n=4 n=8 n=16 n=32 n=64 n=128 n=256 n=512
0.008
0.031
0.125
0.5
2
8
32
R
u
n
ti
m
e
[m
s]
qFigure 7.11 — Log-log plot of the runtime for the approximation of max(X1, . . . ,Xn),
where (X1, . . . ,Xn)  Nn(m,S) with random m and S and large correlation coefficient
variations
up to 60%. Furthermore, the box plot in fig. 7.12c shows a significant reduction of the
maximum error when the average correlation between any two random variables is
between 0.33 and 0.66. The variability of the correlation coefficients decreases with the
dimension of X, which causes the average absolute error to decrease with n.
The detailed numerical results are shown in table B.5a for small and in table B.5b for
large variations of the correlation coefficients in R. The average correlation between
any two random variables is given in column "r¯". The columns "jenj" and "jesnj"
show the average absolute error of the normal distribution based MAX-operation and
the proposed skew-normal distribution based MAX-operation, respectively. The next
column " jesnj jenjjenj " presents the relative absolute error of the skew-normal distribution
based MAX-operation, compared to the normal distribution based MAX-operation. On
average, the skew-normal distribution based MAX-operation reduces the approximation
error by up to 90% for small and by up to 61% for large variations in the correlation
coefficients in R. This demonstrates the superior accuracy of the proposed skew-normal
distribution based MAX-operation. The last two columns "Tn" and "Tsn" show the
runtime for the repeated application of the normal distribution based MAX-operation
and the skew-normal distribution based MAX-operation, respectively.
7.5.2 Results for max(X1, . . . ,Xn) for Critical Target Path Delays X1, . . . ,Xn
The proposed skew-normal distribution based MAX-operation was applied to compute
the distribution of the maximum delay of the target paths, as defined in definition 1.4.
Similar to the experimental setup for the evaluation of the target paths delay fault
probability computation in section 7.4.1, different test subsets with 1, 5, 10 and 20 test
vector-pairs were considered for each small delay fault in a benchmark circuit. For
each test subset, the set of critical target paths was determined using the sensitization
analysis in section 5.1. The delays of n critical target paths X1, . . . ,Xn were grouped
into a random vector
(X1, . . . ,Xn)
T  Nn(m,S) (7.21)
124 Chapter 7  Experimental Evaluation
n=4 n=8 n=16 n=32 n=64 n=128 n=256 n=512
0
1
2
3
4
5
6
7
A
v
er
a
g
e
|ϵ|
[%
]
(a) Using normal distribution based MAX-operation
n=4 n=8 n=16 n=32 n=64 n=128 n=256 n=512
0
1
2
3
4
5
6
7
A
v
er
a
g
e
|ϵ|
[%
]
(b) Using skew-normal distribution based MAX-operation
Normal-MAX Skew-Normal-MAX
n=4 n=8 n=16 n=32 n=64 n=128 n=256 n=512
0
2
4
6
8
|ϵ|
[%
]
(c) Box plot of absolute error for 0.33  r¯ < 0.66
qFigure 7.12 — Error e of max(X1, . . . ,Xn) approximation, where (X1, . . . ,Xn) 
Nn(m,S) with random m and S and large correlation coefficient variations
of n-dimensional normal distribution. Afterwards, the distribution of the maximum
delay of the target paths max(X1, . . . ,Xn) was computed by the repeated application of
the normal and skew-normal distribution based MAX-operation, respectively. For each
benchmark circuit, 5000 critical target paths delay distributions of different dimensions
have been considered.
As expected, the runtime for the normal and skew-normal distribution based MAX-
operation was almost exactly the same as the runtime observed for random multivariate
normal distributions in fig. 7.11. Even the computation of the maximum of several
hundreds of path delays, which requires several hundreds of applications of the skew-
normal distribution based MAX-operation, can be computed in a couple of milliseconds.
Hence, the proposed skew-normal distribution based MAX-operation is well suited for
block-based statistical timing analysis.
The average absolute error (eq. (7.17)) of the normal and skew-normal distribution
based MAX-operation is presented in fig. 7.13 in percent [%] for different numbers
7.5  SUM and MAX-Operations based on Skew-Normal Distribution 125
of critical target paths. It can be seen that on average, the skew-normal distribution
based MAX-operation reduces the approximation error by up to 50%. In particular the
maximum absolute error is significantly reduced, as shown in the box plot in fig. 7.13c
for between 64 and 127 critical target path delays. This error will be further reduced in
section 7.5.3.
4-15 paths 16-31 paths 32-63 paths 64-127 paths
p35k p45k p77k p78k p81k p100k p267k p330k
0
2
4
6
8
A
v
er
a
g
e
|ϵ|
[%
]
(a) Using normal distribution based MAX-operation
4-15 paths 16-31 paths 32-63 paths 64-127 paths
p35k p45k p77k p78k p81k p100k p267k p330k
0
2
4
6
8
A
v
er
a
g
e
|ϵ|
[%
]
(b) Using skew-normal distribution based MAX-operation
Normal-MAX Skew-Normal-MAX
p35k p45k p77k p78k p81k p100k p267k p330k
0
5
10
15
20
25
30
|ϵ|
[%
]
(c) Box plot of absolute error for 64  n  127
qFigure 7.13 — Absolute error jej of the approximation of the maximum
max(X1, . . . ,Xn) of critical target path delays X1, . . . ,Xn
The detailed numerical results are shown in table B.6. The columns "jenj" and "jesnj"
show the average absolute error of the normal and skew-normal distribution based
MAX-operation, respectively. The next column " jesnj jenjjenj " shows the relative absolute
error of the proposed skew-normal distribution based MAX-operation, which is defined
by eq. (7.18). The results show that the proposed skew-normal distribution based MAX-
operation reduces the average absolute error by up to 50%. The last two columns "Tn"
and "Tsn" show the runtime for the repeated application of the normal and skew-normal
distribution based MAX-operation, respectively. For very large dimensions n > 1000,
the runtime is dominated by the initial Cholesky factorization of the covariance matrix
S, which has O(n3) worst case runtime complexity.
126 Chapter 7  Experimental Evaluation
7.5.3 Analysis and Further Reduction of Approximation Error
The small error of the skew-normal distribution based MAX-operation was found to
be mainly caused by three reasons. At first, the exact distribution of max(X1, . . . ,Xn)
might not be accurately approximated by any skew-normal distribution. For example,
if the random variables have the same mean but very different standard deviations.
However, such extreme cases rarely occur in statistical timing analysis.
The second main reason for the approximation error lies within the efficient design of
the proposed algorithm. The proposed algorithm computes the shape vector lˆ from
the dominant eigenvalue y and corresponding eigenvector of the matrix H˜, which
is defined in eq. (6.136). If the dominant eigenvalue is not at least 10 times larger
than the next smaller eigenvalue, then the estimation of lˆ may be inaccurate. It was
frequently observed that the dominant eigenvalue was of the same magnitude than the
next smaller eigenvalue, in particular during the experiments in section 7.5.2.
The third main reason is that the shape parameter lˆ of a skew-normal distribution
cannot be chosen independently from the covariance matrix Sˆ because lˆTSˆ 1lˆ < 2p 2
must hold according to definition 6.2. This condition is satisfied if and only if the
dominant eigenvalue y satisfies eq. (6.87). The error caused by the third main reason
was particularly pronounced for the computation of the maximum delay of the target
paths in section 7.5.2 such that E(max(X1, . . . ,Xn)) was significantly overestimated.
In order to improve the accuracy of the results, it is possible to ignore this constraint
because the proposed algorithm is also applicable if only the subvector (Xn 1,Xn)
of the last two random variables of X has a bivariate skew-normal distribution. In
additional experiments it was found that this relaxation can significantly reduce the
approximation error, but the variance Var(max(Xn 1,Xn)) in the bottom right corner
of Sˆ may become so small that the covariance matrix Sˆ is no longer positive definite.
To avoid this, the idea is to multiply the entire covariance matrix S by a positive scalar
x 2 R+ slightly smaller than one (e.g. x = 0.98) every time the dominant eigenvalue y
is significantly reduced (e.g. by more than 20%) in order to satisfy eq. (6.136). If Sˆ is
positive definite then theorem 2.37 with A being a diagonal matrix with all diagonal
elements equal to x implies that ASˆAT and therefore xSˆ is positive definite. After
scaling the covariance matrix, the inverse Cholesky factor Lˆ 1 can be efficiently updated
by noticing that xSˆ = (
p
xLˆ)(
p
xLˆ)T and (
p
xLˆ) 1 = x 1/2 Lˆ 1.
If applied carefully, the above described scaling of the covariance matrix is a simple
but effective way to consider the strong non-linear relationships between the random
variables in Y , which cannot be accurately represented by a multivariate skew-normal
distribution. However, the scaling inevitably decreases the variance of the approxima-
tion of max(X1, . . . ,Xn), which must be compensated during the last application of
the skew-normal distribution based MAX-operation. This compensation is achieved
by multiplying Sˆ with x k, where k denotes the number of applications of the skew-
normal distribution based MAX-operation during which the dominant eigenvalue y
was significantly reduced to satisfy eq. (6.87). By using the proposed covariance matrix
scaling, the average absolute error jej is reduced by up to 70%, as shown in fig. 7.14.
7.5  SUM and MAX-Operations based on Skew-Normal Distribution 127
4-15 paths 16-31 paths 32-63 paths 64-127 paths
p35k p45k p77k p78k p81k p100k p267k p330k
0
2
4
6
8
A
v
er
a
g
e
|ϵ|
[%
]
(a) Using skew-normal distribution based MAX-operation
Normal-MAX Skew-Normal-MAX
p35k p45k p77k p78k p81k p100k p267k p330k
0
5
10
15
20
25
30
|ϵ|
[%
]
(b) Box plot of absolute error for 64  n  127
qFigure 7.14 — Absolute error jej of the approximation of max(X1, . . . ,Xn) using
covariance matrix scaling, where X1, . . . ,Xn are critical target path delays
The detailed numerical results are presented in table B.7. The results show that the
average absolute error jej can be reduced by up to 70% by the proposed skew-normal
distribution based MAX-operation with covariance matrix scaling, compared to the
normal distribution based MAX-operation. For example, the average absolute error jej
of computing the distribution of the maximum of between 256 and 512 critical path
delays is reduced from 9% to only 2.7% for benchmark circuit p77k.
Compared to the results in table B.6 obtained without covariance matrix scaling,
the average absolute error jej is consistently reduced. However, the scaling of the
covariance matrix and the inverse Cholesky factor slightly increases the runtime of the
skew-normal distribution based MAX-operation. But the increase in runtime is very
small because the covariance matrix scaling is only performed for very few applications
of the skew-normal distribution based MAX-operation.
7.5.4 Empirical Runtime Complexity of Algorithm for MAX-Operation
This subsection investigates the runtime of the proposed algorithm for the skew-
normal distribution based MAX-operation for very large dimensions of the skew-
normal random vector X = (X1, . . . ,Xn)
T  SN n(m,S,l). The goal is to test that
the asymptotic behaviour of the implementation matches the theoretical worst-case
runtime complexity of the algorithm for the skew-normal distribution based MAX-
operation. For this, the skew-normal distribution based MAX-operation was applied to
approximate the distribution of the random vector
(X1, ...,Xn 2,max(Xn 1,Xn))
T (7.22)
128 Chapter 7  Experimental Evaluation
with a (n   1)-dimensional skew-normal distribution. Except for the very first ap-
proximation step, the inverse Cholesky factor L 1 is obtained by updating the inverse
Cholesky factor of the previous application of the skew-normal distribution based
MAX-operation. To represent this typical scenario, it is assumed that the inverse
Cholesky factor L 1 is available, and hence, the runtime of the initial computation of
L 1 with LLT = S was ignored. As a consequence, the runtime appears to be lower
than in fig. 7.11.
The runtime of the algorithm in section 6.5.3 is measured for very large dimensions
n 2 f2000, 4000, 6000, 8000, 10000, 12000, 14000, 16000g.
Therefore, even the smallest size (n = 2000) is large enough for the high order terms to
dominate all other terms. The result of these measurements consists of 8 data points
(ni, ti) for 1  i  8, as shown in fig. 7.15. In order to predict the runtime t(n) with the
power-law function
t(n) = anb, (7.23)
linear regression is applied to the set of data points (log(ni), log(ti)). The resulting
coefficients are a  1.20176 10 6 and b  2, which confirms the theoretical quadratic
worst case runtime complexity O(n2) of the proposed algorithm for the skew-normal
distribution based MAX-operation.
1.20176× 10-6 n2.00058
2000 4000 6000 8000 10 000 16 000
4
8
16
32
64
128
256
Dimension n of multivariate skew-normal distribution
R
u
n
ti
m
e
[m
s]
qFigure 7.15 — Log-log plot of the runtime of the proposed algorithm for the skew-
normal distribution based MAX-operation for a huge number of random variables
C
h
a
p
t
e
r8
Conclusions
Tremendous advances in semiconductor process technology have created new delay
test challenges for digital integrated circuits. The complexity of state-of-the-art manu-
facturing processes does not only exacerbate the problem of process variability, it also
makes today’s integrated circuits more prone to defects such as resistive shorts and
opens. The resulting large delay variations severely degrade the quality and reliability
of all delay tests. A delay test might detect a delay fault of fixed size in only a subset of
all manufactured circuits, which can potentially result in a large number of test escapes.
Statistical timing analysis is an integral component of any variation-aware delay test
generation method and is required to analyse and predict the effectiveness of delay
tests in a population of circuits which are functionally identical but have varying timing
properties. Efficient statistical timing analysis of large circuits is a well known hard
problem. Under the impact of delay variations, a path is sensitized by a particular test
vector-pair only with some probability. Furthermore, the event that any of the sensitized
paths has a path delay fault is also described by a probability. Recent variation-aware
delay test generation methods must therefore be guided by the probability that at least
one of the sensitized paths has a path delay fault. One of the most challenging problems
of statistical timing analysis is the efficient computation of the statistical MAX-operation.
The normal distribution based MAX-operation [Clark61] approximates the skewed
distribution of the result with a normal distribution, leading to large approximation
errors.
8.1 Contributions of this Work
This work targets key statistical timing analysis problems, which arise in delay test
applications for innovative technology nodes. Novel and efficient statistical timing
130 Chapter 8  Conclusions
analysis algorithms for path and small delay fault testing applications have been
presented. In addition, accurate skew-normal distribution based SUM and MAX-
operations are proposed, which provide the foundation for the efficient statistical delay
fault simulation.
Probabilistic sensitization analysis is proposed to guide the delay test generation process
into generating path delay fault tests, which are more tolerant towards delay variations.
The analysis not only provides the location of the gate which blocks the propagation of
the transition along the target path, but it also identifies those paths that are responsible
for the invalidation of the delay test. Further important quality characteristics of the
given test vector-pair can be efficiently computed by a Monte-Carlo simulation of a
small subcircuit, which is constructed by the proposed analysis.
For the detection of small delay faults, two efficient algorithms for the computation of
the target paths delay fault probability are proposed. The non-incremental algorithm
provides high accuracy but may become inefficient if the delay test parameters are
frequently modified. To minimize the computational cost after delay test parameter
modifications, an efficient incremental algorithm has also been presented, which is
suitable for the inner loop of automatic test pattern generation methods. Compared to
extensive Monte-Carlo simulations, the experimental results show a very large speedup
with only a small approximation error, which is mainly due to the impact of delay
variations on the sensitization of the critical paths.
To minimize the error of block-based statistical timing analysis, the more accurate skew-
normal distribution based SUM and MAX-operations are introduced. Compared to the
normal distribution based MAX-operation [Clark61], the proposed MAX-operation is
defined on the far more flexible skew-normal distribution, which allows the accurate
approximation of the result with another skew-normal distribution. Although the
worst case runtime complexity of the proposed algorithm is quadratic in the number of
random variables, the runtime remains very small even for several hundreds of random
variables. The superior accuracy and low runtime makes the skew-normal distribution
based MAX-operation ideal for block-based statistical timing analysis and many other
statistical timing analysis problems.
The experimental results demonstrate the high efficiency of the proposed algorithms.
8.2 Ongoing Research and Future Work
While the probabilistic sensitization analysis presented in chapter 4 can easily consider
crosstalk and power droop effects, a non-trivial extension [Tang14b] would be required
in chapter 5 to adapt the joint delay distributions of the critical target paths to consider
these effects during the computation of the target paths delay fault probability.
In general, gate level models are designed for high speed simulation and provide only
limited accuracy. If only small process variations need to be considered during statistical
timing analysis, then the gate delays can usually be accurately described by a normal
distribution. However, with increasing process variability, the circuit timing behaviour
8.2  Ongoing Research and Future Work 131
is largely determined by the impact of these variations on the electrical properties of
transistors and interconnects, which leads to skewed gate delay distributions and the
complex nonlinear relationships between the gate delays become more pronounced
[Zhang05]. In order to get meaningful results from statistical timing analysis, more
realistic gate and interconnect models at transistor level must be applied in an efficient
manner. In particular, these models must consider the complex relationships between
the electrical parameter variation of gates and interconnects, based on a given circuit
layout. To efficiently solve the statistical timing analysis problems, the techniques and
algorithms developed in this thesis must be adopted for large and complex circuit
models on transistor level.
Modern VLSI circuits are prone to defects, that affect the electrical properties of logic
gates and interconnects. With continued shrinking of semiconductor feature size,
ever more new defect mechanisms are discovered, which require accurate analysis on
transistor level [Hapke11, Tang14c].
In [Tang14c], the authors perform mutation analysis to study the impact of various types
of defects on the electrical properties of standard cells, such as INV, NAND, NOR and
XOR. The authors evaluate these defects by generating mutants considering various
mutations such as incorrect connections of polarity devices, splitting high degree
nodes into two and keeping a transistor conducting/non-conducting. Afterwards, the
electrical properties of these mutants were analysed using SPICE simulations. The
experimental results show that a large fraction of defects in CMOS gates manifest
themselves as stuck-at-unknown (X). If Udd denotes the supply voltage, then stuck-at-
unknown means that the gate output voltage remains close to 0.5Udd and does not
stabilize to either logic one or logic zero. This is for example caused by a bridging fault
between the input and the output of an inverter. Clearly, this fault behaviour can only
be accurately represented at electrical level.
Another important future work is the application of the proposed techniques and
algorithms as part of a delay test generation process. This development will lead to
efficient delay test generation methods, which can reduce the test cost while improving
the delay test quality under the impact of large process variations.

A
p
p
e
n
d
ix
Mathematical Details
A.1 Moments involving the Maximum max(Xn 1,Xn)
This section defines all moments, which are required to compute the mean vector, the
covariance matrix and the third multivariate cumulant of the random vector
Y = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T, (A.1)
where X = (X1, . . . ,Xn)
T is a n-dimensional skew-normal random vector. For compact-
ness of notation, let b :=
p
2/p.
The formulas for computing the moments that involve the maximum max(Xn 1,Xn)
have been obtained from a probabilistic representation of the multivariate skew-normal
distribution based on the multivariate normal distribution. The following proposition
is adopted from [Azzal99, Proposition 1]:
Proposition A.1. Suppose that
Z0
Z

 Nn+1(0,W), W =
 
1 dT
d W¯
!
(A.2)
where Z0 is a scalar component and W
 is a correlation matrix. Then
X =
(
+Z if Z0 > 0
 Z if Z0  0,
(A.3)
is a n-dimensional skew-normal random vector with parameters x = 0, W = W¯ and
a =
W¯ 1dp
1  dTW¯ 1d
. (A.4)
134 Chapter A  Mathematical Details
Let (Xi,Xj,Xk,Xl)
T denote a 4-dimensional subvector of the random vector X with
Azzalini-parameterization x = (xi, x j, xk, xl)
T, d = (di, dj, dk, dl)
T and
W =
0BBB@
w2i wi,j wi,k wi,l
wi,j w
2
j wj,k wj,l
wi,k wj,k w
2
k wk,l
wi,l wj,l wk,l w
2
l
1CCCA .
The moments E(max(Xi,Xj)
m) for m = 1, 2, . . . can be computed as follows. Similar
to the approach in section 5.4, eq. (A.3) and the definition of max(xi, xj) over the real
numbers induces a partition of the sample space Q:
A1 =fq 2 Q : Z0(q) > 0,Zi(q) > Zj(q)g
A2 =fq 2 Q : Z0(q) > 0,Zi(q)  Zj(q)g
A3 =fq 2 Q : Z0(q)  0,Zi(q) > Zj(q)g
A4 =fq 2 Q : Z0(q)  0,Zi(q)  Zj(q)g,
were Z0 and Z = (Z1, . . . ,Zn)
T are as defined in proposition A.1. Using eq. (2.1),
E(max(Xi,Xj)
m) can formally be written as
E(max(Xi,Xj)
m) = E(Xi(q)
m
1A1(q))
+E(Xj(q)
m
1A2(q))
+E(( Xj(q))m1A3(q))
+E(( Xi(q))m1A4(q)), (A.5)
where Zi and Zj have been replaced by Xi and Xj, respectively, according to eq. (A.3).
Then theorem 2.43 implies that
E(max(Xi,Xj)
m) = E
 
Xmi jA1

P(A1)
+E

Xmj jA2

P(A2)
+E

( Xj)m jA3

P(A3)
+E
 
( Xi)m jA4

P(A4), (A.6)
where q has been omitted for compactness of notation. The formulas for the compu-
tation of the above moments have been presented in [Talli61] and [Arism13]. After
extensive simplifications and using the following definitions
a =
q
w2i   2wi,j +w2j (A.7)
s1 = (xi   x j)/a (A.8)
s2 = (diwi   djwj)/a (A.9)
s3 = s1/
q
1  s22, (A.10)
A.1  Moments involving the Maximum max(Xn 1,Xn) 135
the first three moments of max(Xi,Xj) are
E(max(Xi,Xj)
1) = E(X1j ) + 2(xi   x j)F2(s1, 0; s2) + b(diwi   djwj)F(s3)
+ 2af(s1)F( s2s3) (A.11)
E(max(Xi,Xj)
2) = E(X2j ) + 2

(x2i +w
2
i )  (x2j +w2j )

F2(s1, 0; s2)
+ 2a(xi + x j)f(s1)F( s2s3) + 2b(xiwidi   x jwjdj)F(s3)
+ ab
q
1  s22(diwi + djwj)f(s3) (A.12)
E(max(Xi,Xj)
3) = E(X3j ) + 2

(x3i + 3xiw
2
i )  (x3j + 3x jw2j )

F2(s1, 0; s2)
+
2
a

a2(x2i + xix j + x
2
j ) + a
2w2j + 2w
2
i (w
2
i  wi,j) w2i,j +w4j

f(s1)F( s2s3)
+ b

3(x2i +w
2
i )widi   d3i w3i   3(x2j +w2j )wjdj + d3jw3j

F(s3)
+ ab
q
1  s22

(2xi + x j)widi + (xi + 2x j)wjdj

f(s3), (A.13)
where F2 is given by eq. (2.65) and can be efficiently computed using the algorithm
proposed in [Genz04]. The moments E(X1j ), E(X
2
j ) and E(X
3
j ) can be computed with
eqs. (6.9), (6.10) and (6.13), respectively.
The computation of the remaining moments can be reduced to the computation of
E(max(Xi,Xj)
2) and E(max(Xi,Xj)
3). For example, the formula for the computation
of E(max(Xi,Xj)Xk) is obtained by considering the sum
max(Xi,Xj) + sXk = max(Xi + sXk,Xj + sXk)
with s 2 R. Using theorem 6.3, the parameters of the 2-dimensional skew-normal
random vector
(W1,W2)
T := (Xi + sXk,Xj + sXk)
T (A.14)
are determined as a function of s. Afterwards, the moment E(max(W1,W2)
2) is evalu-
ated symbolically with eq. (A.12) and expanded using the linearity of the expectation
operator to obtain
E(max(W1,W2)
2) = E((max(Xi,Xj) + sXk)
2) (A.15)
= E(max(Xi,Xj)
2 + 2smax(Xi,Xj)Xk + s
2X2k ) (A.16)
= E(max(Xi,Xj)
2) + 2sE(max(Xi,Xj)Xk) + s
2
E(X2k ). (A.17)
Hence, the moment E(max(W1,W2)
2) is a second order polynomial of s and the mo-
ment E(max(Xi,Xj)Xk) is easily obtained from the second coefficient of this polynomial.
136 Chapter A  Mathematical Details
The moments E(max(Xi,Xj)
2Xk) and E(max(Xi,Xj)X
2
k ) are obtained in the same way
from the symbolic evaluation of E(max(W1,W2)
3). The solutions are:
E(max(Xi,Xj)Xk) = E(XjXk) + 2(as1xk +wi,k  wj,k)F2(s1, 0; s2)
+ 2axkf(s1)F( s2s3) + ab(s2xk + s1wkdk)F(s3) + ab
q
1  s22wkdkf(s3) (A.18)
E(max(Xi,Xj)X
2
k ) = E(XjX
2
k ) + 2(2xk(wi,k  wj,k) + as1(x2k +w2k))F2(s1, 0; s2)
+ 2/a((wi,k  wj,k)2 + a2(x2k +w2k))f(s1)F( s2s3)
+ b(as2x
2
k + 2wkdk(as1xk +wi,k  wj,k)  as2(d2k   1)w2k)F(s3)
+ 4
q
1  s22xkwkdkaf(s3)/
p
2p (A.19)
E(max(Xi,Xj)
2Xk) = E(X
2
j Xk)
+ 2(x2i xk + 2xiwi,k   xk(x2j  w2i +w2j )  2x jwj,k)F2(s1, 0; s2)
+ (2a2s1xk + 4/a((w
2
i  wi,j)wi,k + (w2j  wi,j)wj,k + x jxka2))f(s1)F( s2s3)
+ b

2widi(xixk +wi,k)  2wjdj(x jxk +wj,k) w2i d2i wkdk
+wkdk(x
2
i   x2j +w2i  w2j +w2j d2j )

F(s3)
+ 2a
q
1  s22(xk(widi +wjdj) +wkdk(xi + x j))f(s3)/
p
2p (A.20)
Similarly, the moment E(max(Xi,Xj)XkXl) is obtained by considering the sum
max(Xi,Xj) + sXk + tXl = max(Xi + sXk + tXl ,Xj + sXk + tXl)
with s, t 2 R and symbolic evaluation of eq. (A.13). The solution is:
E(max(Xi,Xj)XkXl) = E(XjXkXl)
+ 2(xl(wi,k  wj,k) + xk(wi,l  wj,l) + as1(xkxl +wk,l))F2(s1, 0; s2)
+ 2/a((wi,k  wj,k)(wi,l  wj,l) + a2(xkxl +wk,l))f(s1)F( s2s3)
+ b

as2(xkxl +wk,l   dkdlwkwl) + dkwk(xixl   x jxl +wi,l  wj,l)
+ dlwl(xixk   x jxk +wi,k  wj,k)

F(s3)
+ 2a
q
1  s22(dkxlwk + dlxkwl)f(s3)/
p
2p. (A.21)
A.2 Proofs
This section presents the proofs for the skew-normal distribution based SUM and
MAX-operation.
A.2  Proofs 137
A.2.1 Statistical SUM-Operation
Theorem 6.3. If X  SN n(m,S,l), b 2 Rk and A 2 Rkn has rank k, then the random
vector
Y = AX + b (6.61)
has a k-dimensional skew-normal distribution SN k(mY ,SY ,lY) with parameters
mY = Am+ b (6.62)
SY = ASA
T (6.63)
lY = Al. (6.64)
Proof. The moment generating function MX(t) = E(exp(t
TX)) of the skew-normal
random vector X is given by eq. (6.31) as
MX(t) = 2 exp

tT(m  l) + 1
2
tT(S+ llT)t

F

lTt/b

. (A.22)
Then the moment generating function MY(t) of the random vector Y is
MY(t) = E(exp(t
TY)) = E(exp(tT(AX + b)))
= exp(tTb)E(exp(tTAX))
= exp(tTb)MX(A
Tt)
= 2 exp

(ATt)T(m  l) + 1
2
(ATt)T(S+ llT)ATt+ tTb

F

lTATt/b

To proof the theorem and compute the parameters (mY ,SY ,lY), MY(t) must be brought
into the structure
MY(t) = 2 exp

tT(mY   lY) +
1
2
tT(SY + lYl
T
Y)t

F

lTY t/b

. (A.23)
From
F

lTY t/b

= F

lTATt/b

(A.24)
it follows that lY = Al. Afterwards,
(ATt)T(m  l) + 1
2
(ATt)T(S+ llT)ATt+ tTb (A.25)
= tT(Am  Al) + 1
2
tT(ASAT + AllTAT)t+ tTb (A.26)
= tT(Am+ b  lY) +
1
2
tT(ASAT + lYl
T
Y)t (A.27)
= tT(mY   lY) +
1
2
tT(SY + lYl
T
Y)t (A.28)
and therefore, mY = Am+ b and SY = ASA
T. Furthermore, SY is symmetric and also
positive definite according to theorem 2.4 and
lTYS
 1
Y lY = l
TAT(A TS 1A 1)Al = lTS 1l <
2
p   2 (A.29)
holds by assumption.
138 Chapter A  Mathematical Details
A.2.2 Statistical MAX-Operation
Theorem 6.4. If X is a n-dimensional random vector with finite third multivariate cumulant
k3(X) and A 2 Rkn has rank k, then the third multivariate cumulant k3(Y) 2 Rk
2k of the
random vector Y = AX is
k3(Y) = (A
 A)k3(X)AT, (6.75)
where 
 denotes the Kronecker product.
Proof. Let mX and mY denote the mean vector of X and Y respectively. According to
definition 2.40 and eq. (2.53), the third multivariate cumulant of X is defined as
k3(X) = E((X   mX)
 (X   mX)
 (X   mX)T) (A.30)
and the third multivariate cumulant of Y = AX is
k3(Y) = E((Y   mY)
 (Y   mY)
 (Y   mY)T) (A.31)
= E(A(X   mX)
 A(X   mX)
 (X   mX)TAT). (A.32)
The above equation can be transformed using eq. (2.10), which gives
k3(Y) = E((A
 A)((X   mX)
 (X   mX))
 (X   mX)TAT). (A.33)
By using the linearity of the expectation operator it follows that
k3(Y) = (A
 A)E((X   mX)
 (X   mX)
 (X   mX)T)AT (A.34)
= (A
 A)k3(X)AT, (A.35)
which completes the proof.
Theorem 6.5. Let X  SN n(m,S,l) denote a n-dimensional skew-normal random vector. If
k3(V) is the third standardized multivariate cumulant of X, then the matrix
MX = k3(V)
Tk3(V) (6.80)
has an eigenvector
v =  L
 1l
jjL 1ljj (6.81)
corresponding to the only non-zero eigenvalue
y =

2  p
2
2
(lTS 1l)3 =

2  p
2
2 jjL 1ljj6, (6.82)
where L denotes the lower Cholesky factor of S such that S = LLT.
A.2  Proofs 139
Proof. According to eq. (6.38) and theorem 6.4, the third standardized multivariate
cumulant of a skew-normal random vector X is
k3(V) = (L
 1 
 L 1)k3(X)L T,
=

2  p
2

(L 1 
 L 1)(l
 l
 lT)L T (A.36)
which can be simplified by using eqs. (2.8) and (2.10) to
k3(V) =

2  p
2

(L 1l
 L 1l
 lTL T)
=

2  p
2

(L 1l
 L 1l)(L 1l)T. (A.37)
Starting from the definition of the skewness matrix of X given by eq. (6.80) and using
eq. (A.37), the resulting expression is then simplified by the application of eqs. (2.9)
and (2.10) to obtain
MX = k3(V)
Tk3(V)
=

2  p
2
2
(L 1l)(L 1l
 L 1l)T(L 1l
 L 1l)(L 1l)T
=

2  p
2
2
(L 1l)((L 1l)T 
 (L 1l)T)(L 1l
 L 1l)(L 1l)T
=

2  p
2
2
(L 1l)(lTS 1l
 lTS 1l)(L 1l)T
=

2  p
2
2
(L 1l)(lTS 1l)2(L 1l)T. (A.38)
Hence, MX is a symmetric real valued matrix of rank one with exactly one positive
eigenvalue y. By definition there must be a non-zero v such that MXv = yv. This
means that 
2  p
2
2
(lTS 1l)2(L 1l)(L 1l)Tv = yv, (A.39)
where the expression (L 1l)Tv on the LHS is a scalar. Using y 6= 0 it follows that v
must be a scalar multiple of L 1l, specifically
v =
 
2  p2
2
(lTS 1l)2(L 1l)Tv
y
L 1l. (A.40)
Therefore, the normalized eigenvector is
v =  L
 1l
jjL 1ljj =
L 1lp
lTS 1l
. (A.41)
Replacing v in eq. (A.40) with the above expression gives
L 1l
jjL 1ljj =
  
2  p2
2
(lTS 1l)2(L 1l)TL 1l
y
!
L 1l
jjL 1ljj (A.42)
140 Chapter A  Mathematical Details
and therefore
1 =
 
2  p2
2
(lTS 1l)2(L 1l)TL 1l
y
, (A.43)
so that the only non-zero eigenvalue is
y =

2  p
2
2
(lTS 1l)3 =

2  p
2
2 jjL 1ljj6, (A.44)
which completes the proof.
A.2.3 Quadratic Time Algorithm for MAX-operation
Lemma 6.7. Let X = (X1, . . . ,Xn)
T denote a n-dimensional random vector and let the random
vector Y = (Y1, . . . ,Yn 1)
T be defined as
Y = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T. (6.109)
If X has a restricted skew-normal distribution, then
E(Y¯iY¯jY¯k) = 0 (6.110)
for all i, j, k 2 N with 1  i  n   4 and 1  j, k  n  1, where Y¯ = (Y¯1, . . . , Y¯n 1)T
denotes the centred random vector corresponding to Y .
Proof. For any j 2 N with 1  j  n  1, let mˆj := E(Yj) and let Y¯j := Yj   mˆj denote
the centred random variable corresponding to the random variable Yj. In order to proof
this lemma, it must be shown that
E(Y¯iY¯jY¯k) = 0 (A.45)
holds for any choice of i, j, k 2N with 1  i  n  4 and 1  j, k  n  1. According to
the properties of the restricted skew-normal distribution,
li = bwidi = 0 (A.46)
wn,i = wn 1,i, (A.47)
must hold for any 1  i  n  4 and eq. (A.46) implies di = 0 because wi > 0.
For any choice of i, j, k 2N with 1  i  n  4 and 1  j, k  n  2, the correctness of
eq. (A.45) can easily be shown using eq. (6.36) with E(Y¯i) = E(Y¯j) = E(Y¯k) = 0 and
li = 0.
To complete the proof, it must be shown that eq. (A.45) also holds if it involves
Yn 1 = max(Xn 1,Xn). For this, k is fixed to k := n   1 and it is shown that the
moments E(Y¯iY¯
2
n 1), E(Y¯
2
i Y¯n 1) and E(Y¯iY¯jY¯n 1) evaluate to zero. It follows from
eq. (6.49), E(Y¯i) = 0 and li = 0 that the location parameter x¯i of the centred random
variable Y¯i is x¯i = 0 for all 1  i  n  4.
A.2  Proofs 141
The formula to compute the moment E(Y¯iY¯
2
n 1) is given by eq. (A.20), which simplifies
by using x¯i = 0, di = 0 and eq. (A.47) to
E(Y¯iY¯
2
n 1) = 2wn 1,i

x¯n + bwndn + b(dn 1wn 1   dnwn)F(s3)
+ 2af(s1)F( s2s3) + 2(x¯n 1   x¯n)F2(s1, 0; s2)

.
Then x¯n 1 := xn 1   mˆn 1 and x¯n := xn   mˆn 1 implies that E(Y¯iY¯2n 1) = 0.
The formula to compute the moment E(Y¯2i Y¯n 1) is given by eq. (A.19), which simplifies
by using x¯i = 0, di = 0 and eq. (A.47) to
E(Y¯2i Y¯n 1) = w
2
i

x¯n + bdnwn + b(dn 1wn 1   dnwn)F(s3)
+ 2af(s1)F( s2s3) + 2(x¯n 1   x¯n)F2(s1, 0; s2)

.
Then x¯n 1 := xn 1   mˆn 1 and x¯n := xn   mˆn 1 implies that E(Y¯2i Y¯n 1) = 0.
The last required moment E(Y¯iY¯jY¯n 1) can be computed by using eq. (A.21). Then by
using x¯i = 0, di = 0 and eq. (A.47), eq. (A.21) simplifies to
E(Y¯iY¯jY¯n 1) =  bwn 1,iwjdj + x¯nwj,i + b(wn 1,iwjdj +wj,iwndn)
+ b(wn 1dn 1  wndn)wj,iF(s3) + 2wj,i(af(s1)F( s2s3) + (x¯n 1   x¯n)F2(s1, 0; s2)).
Then x¯n 1 := xn 1   mˆn 1 and x¯n := xn   mˆn 1 implies that E(Y¯iY¯jY¯n 1) = 0.
Theorem 6.8. Let X = (X1, . . . ,Xn)
T denote a n-dimensional random vector and let the
random vector Y be defined as Y = (X1, . . . ,Xn 2,max(Xn 1,Xn))
T. Furthermore, let Sˆ
denote the covariance matrix of Y with Cholesky factorization Sˆ = LˆLˆT and let k3(V) denote
the third multivariate cumulant of the random vector
V = (Xn 3,Xn 2,max(Xn 1,Xn))
T. (6.112)
If X has a restricted skew-normal distribution, then the skewness matrix of Y is of the form
MY =

0n 4,n 4 0n 4,3
03,n 4 G

, (6.113)
where 0k,l is the (k l) zero matrix,
G = Lˆ 122 k3(V)
T(Lˆ T22 
 Lˆ T22 )(Lˆ 122 
 Lˆ 122 )k3(V)Lˆ T22 (6.114)
and Lˆ T22 is the bottom-right corner 3 3 submatrix of Lˆ 1.
142 Chapter A  Mathematical Details
Proof. Let Y¯ denote the centred random vector (see eq. (6.73)) corresponding to Y and
consider the partition of the centred random vector
Y¯ =

U¯
V¯

, (A.48)
into the (n  4)-dimensional skew-normal random vector U¯ and the 3-dimensional ran-
dom vector V¯ . The third multivariate cumulant of Y , which is defined by definition 2.40,
can be written in block matrix form as
k3(Y) = E
 
U¯
V¯


 Y¯ 


U¯
V¯
T!
= E
  
U¯ 
 Y¯ 
 U¯T U¯ 
 Y¯ 
 V¯T
V¯ 
 Y¯ 
 U¯T V¯ 
 Y¯ 
 V¯T
!!
(A.49)
=
 
E(U¯ 
 Y¯ 
 U¯T) E(U¯ 
 Y¯ 
 V¯T)
E(V¯ 
 Y¯ 
 U¯T) E(V¯ 
 Y¯ 
 V¯T)
!
. (A.50)
From lemma 6.7 it follows that
k3(Y) =
 
0(n 4)(n 1),n 4 0(n 4)(n 1),3
03(n 1),n 4 E(V¯ 
 Y¯ 
 V¯T)
!
. (A.51)
Let
Lˆ 1 =
 
Lˆ 111 0n 4,3
Lˆ 121 Lˆ
 1
22
!
(A.52)
denote a partitioning of the inverse lower Cholesky factor with Lˆ 111 2 R(n 4)(n 4),
Lˆ 121 2 R3(n 4), Lˆ 122 2 R33 and 0k,l is the (k l) zero matrix. Then the third standard-
ized multivariate cumulant can be written as
k3(Z) = (Lˆ
 1 
 Lˆ 1)k3(Y)Lˆ T
=
 
Lˆ 111 
 Lˆ 1 0(n 4)(n 1),3(n 1)
Lˆ 121 
 Lˆ 1 Lˆ 122 
 Lˆ 1
!
k3(Y)
 
Lˆ 111 0n 4,3
Lˆ 121 Lˆ
 1
22
!T
=
 
Lˆ 111 
 Lˆ 1 0(n 4)(n 1),3(n 1)
Lˆ 121 
 Lˆ 1 Lˆ 122 
 Lˆ 1
! 
0(n 4)(n 1),n 4 0(n 4)(n 1),3
03(n 1),n 4 E(V¯ 
 Y¯ 
 V¯)Lˆ T22
!
=
 
0(n 4)(n 1),n 4 0(n 4)(n 1),3
03(n 1),n 4 (Lˆ
 1
22 
 Lˆ 1)E(V¯ 
 Y¯ 
 V¯T)Lˆ T22
!
. (A.53)
Then according to eq. (6.83), the skewness matrix is
M = k3(Z)
Tk3(Z)
=

0n 4,n 4 0n 4,3
03,n 4 G

, (A.54)
where the submatrix G 2 R33 is
G =

(Lˆ 122 
 Lˆ 1)E(V¯ 
 Y¯ 
 V¯T)Lˆ T22
T
(Lˆ 122 
 Lˆ 1)E(V¯ 
 Y¯ 
 V¯T)Lˆ T22
= Lˆ 122 E(V¯ 
 Y¯ 
 V¯T)T(Lˆ T22 
 Lˆ T)(Lˆ 122 
 Lˆ 1)E(V¯ 
 Y¯ 
 V¯T)Lˆ T22 . (A.55)
A.2  Proofs 143
This expression can be further simplified by noticing that the matrix E(V¯ 
 Y¯ 
 V¯T)
contains moments involving the components U¯1, . . . , U¯n 4 of the random vector U¯.
According to lemma 6.7, these moments are zero and can therefore be omitted so that
G = Lˆ 122 k3(V)
T(Lˆ T22 
 Lˆ T22 )(Lˆ 122 
 Lˆ 122 )k3(V)Lˆ T22 , (A.56)
which completes the proof.
Lemma 6.9. Let X  SN n(m,S,l) be a n-dimensional skew-normal random vector and
let cj := Cov(Xn 1,Xj)   Cov(Xn,Xj) for all 1  j  n. If cn 3ln 2   cn 2ln 3 6= 0,
then there exists an invertible linear transformation, which maps X = (X1, . . . ,Xn)
T to a
n-dimensional random vector W = (W1, . . . ,Wn)
T with
Wi =
(
Xi + qiXn 2 + riXn 3 for 1  i  n  4
Xi for n  3  i  n,
(6.126)
such that W has a restricted skew-normal distribution.
Proof. In order to satisfy the conditions of the restricted skew-normal distribution, the
coefficients qi and ri must be chosen such that the system of linear equations
Cov(Xn 1,Xi + qiXn 2 + riXn 3) Cov(Xn,Xi + qiXn 2 + riXn 3) = 0 (A.57)
qiln 2 + riln 3 + li = 0. (A.58)
is satisfied for all i 2N with 1  i  n  4. The solution is
qi =  
cn 3li   ciln 3
cn 3ln 2   cn 2ln 3
(A.59)
ri =  
ciln 2   cn 2li
cn 3ln 2   cn 2ln 3
, (A.60)
where ci = Cov(Xn 1,Xi)  Cov(Xn,Xi). A unique solution always exists under the
precondition cn 3ln 2   cn 2ln 3 6= 0 of the lemma.
It remains to be shown that the transformation is invertible. The linear transformation
in this lemma can be written as W = AX, where A is a n n square matrix of the form
A =

In 4 C
04,n 4 I4

, (A.61)
where 04,n 4 is the 4 (n  4) zero matrix and
C =
0BBBBB@
q1 r1 0 0
q2 r2 0 0
...
...
...
...
qn 5 rn 5 0 0
qn 4 rn 4 0 0
1CCCCCA . (A.62)
144 Chapter A  Mathematical Details
The inverse of any block matrix 
B1 B2
B3 B4

(A.63)
with square submatrices B1 and B4, det(B1) 6= 0 and det(SB1) 6= 0 is 
B 11 + B
 1
1 B2S
 1
B1 B3B
 1
1  B 11 B2S 1B1
 S 1B1 B3B
 1
1 S
 1
B1
!
(A.64)
where SB1 = B4   B3B
 1
1 B2 denotes the Schur complement of B1 [Choi09]. Setting
B1 = In 4, B2 = C, B3 = 04,n 4 and B4 = I4 implies SB1 = I4 and after simplifications,
the inverse of A is
A 1 =

In 4  C
04,n 4 I4

. (A.65)
As a consequence, the inverse linear transformation can be obtained by switching the
signs of the coefficients qi and ri for all i 2N with 1  i  n  4.
A
p
p
e
n
d
ixB
Additional Result Tables
146 Chapter B  Additional Result Tables
sTable B.1 — Average results for construction and Monte-Carlo simulation of subcir-
cuits S¯ and S , compared to Monte-Carlo Simulation of complete circuit considering
subcircuit size, accuracy and runtime for subcircuit construction and simulation.
subcircuit S¯ representative subcircuit S
circuit Tclk #sen.crit. jconej jS¯ j Perr SU #sen.crit. jconej jSj #iter Perr SU
name [ps] paths [%] [%] [%] paths [%] [%] [%]
p35k
1163.6 11.1 40.92 0.28 7.67 127.3 127.9 49.72 1.25 23.3 0.11 25.4
1266.8 8.9 37.63 0.24 4.73 137.5 114.7 48.82 1.15 21.8 0.05 27.0
1412.7 6.0 32.00 0.19 1.51 127.3 81.4 47.30 0.98 19.4 0.01 21.6
p45k
889.3 19.2 10.23 0.81 27.39 63.1 145.4 14.50 3.62 61.3 0.02 11.4
968.4 11.7 7.97 0.50 11.63 85.7 91.8 11.49 2.96 64.3 0.01 13.0
1084.8 3.3 3.42 0.16 0.68 114.4 21.2 7.33 1.28 39.1 0.01 14.2
p77k
5805.2 2.9 2.69 0.10 2.03 202.5 198.4 3.99 1.08 29.9 0.03 17.2
6377.2 2.1 2.28 0.07 1.89 218.4 140.0 3.59 0.94 28.0 0.01 18.5
7137.5 1.2 1.45 0.05 0.92 207.6 76.0 3.58 0.73 21.8 0.00 15.8
p78k
1213.3 72.8 16.66 1.74 43.55 41.1 680.3 39.87 13.71 233.4 0.01 3.7
1325.4 23.8 6.52 0.68 9.85 92.7 218.4 17.70 5.54 159.9 0.01 8.0
1485.2 3.5 1.18 0.13 0.39 257.2 22.5 3.40 0.89 46.5 0.01 21.9
p81k
1018.5 118.6 21.72 2.17 75.13 27.3 1612.1 24.31 12.43 176.7 0.02 3.6
1109.7 60.2 18.51 1.25 61.58 45.8 869.9 22.92 11.02 250.9 0.01 3.7
1238.6 14.5 8.65 0.31 3.80 144.8 177.6 18.79 4.99 270.8 0.00 7.2
p100k
1400.1 28.7 9.10 0.71 62.44 72.4 429.8 11.73 4.71 160.9 0.01 7.6
1533.9 14.1 6.49 0.39 30.22 106.6 210.2 9.07 3.11 144.9 0.00 10.4
1710.6 3.2 2.42 0.11 1.15 178.0 33.4 4.76 0.96 63.1 0.02 22.4
p267k
787.9 293.4 11.43 1.07 52.38 47.5 1001.2 16.98 2.90 95.6 0.01 11.1
856.2 132.9 6.40 0.53 31.38 75.3 508.4 10.43 1.62 77.7 0.00 15.6
951.6 29.2 1.99 0.15 1.74 102.5 143.2 3.51 0.49 38.3 0.01 16.7
p330k
1023.5 254.5 17.26 1.59 78.49 36.0 2134.2 22.37 7.86 220.0 0.01 4.6
1116.9 107.4 12.43 0.81 57.99 62.6 1012.9 17.18 5.77 253.9 0.00 5.8
1246.4 18.3 3.75 0.15 1.32 156.0 121.5 8.58 1.60 188.9 0.01 13.9
(a) Average results over all test vector-pairs, generated for the detection of path delay faults
subcircuit S¯ representative subcircuit S
circuit Tclk #sen.crit. jconej jS¯ j Perr SU #sen.crit. jconej jSj #iter Perr SU
name [ps] paths [%] [%] [%] paths [%] [%] [%]
p35k 1412.7 11.2 39.65 0.26 3.07 129.8 61.3 40.74 0.72 12.7 0.03 32.5
p45k 1084.8 7.2 3.61 0.22 4.22 128.0 19.7 6.21 1.02 29.1 0.02 24.3
p77k 7137.5 6.7 3.32 0.15 2.02 170.3 77.2 5.88 0.90 21.1 0.02 22.3
p78k 1485.2 9.2 1.39 0.17 8.14 231.5 30.6 3.34 0.84 40.3 0.02 23.0
p81k 1238.6 13.0 8.54 0.29 6.25 152.9 125.8 17.83 4.79 266.4 0.01 7.5
p100k 1710.6 7.2 1.64 0.11 5.87 183.8 27.0 2.69 0.51 30.9 0.02 29.4
p267k 951.6 21.7 1.25 0.09 1.70 157.3 57.6 2.37 0.27 22.9 0.01 30.0
p330k 1246.4 25.3 3.96 0.16 4.76 157.6 126.8 8.55 1.58 189.8 0.02 14.0
(b) Average results over all test vector-pairs, generated for the detection of small delay faults
147
sTable B.2 — Average results of simplified probabilistic sensitization analysis of
critical target paths over all test vector-pairs
circuit Tclk sensitization criterion NR-test invalidation Tsim Tdiag
name [ps] #rob #nr #fs Pci Pid Pin Pwi|in [ms] [ms]
p35k
1163.64 0.00 7.79 3.32 0.50 1.00 0.60 0.16 1.05 118.46
1266.85 0.00 6.12 2.78 0.45 1.00 0.63 0.16 1.04 115.88
1412.74 0.00 4.02 1.96 0.35 1.00 0.64 0.17 1.04 107.77
p45k
889.279 0.00 12.42 6.77 0.85 0.97 0.52 0.19 1.04 141.02
968.447 0.00 7.71 4.03 0.74 0.97 0.51 0.18 1.03 133.98
1084.84 0.00 2.24 1.10 0.34 0.95 0.42 0.18 1.03 124.47
p77k
5805.18 0.00 1.61 1.28 0.10 1.00 0.97 0.27 2.36 1487.69
6377.17 0.00 1.09 1.03 0.08 1.00 0.97 0.27 2.35 1167.63
7137.51 0.00 0.53 0.68 0.04 1.00 0.95 0.31 2.36 746.36
p78k
1213.31 0.00 58.56 14.24 0.99 1.00 0.59 0.21 5.45 793.89
1325.36 0.00 19.26 4.59 0.98 1.00 0.58 0.20 5.50 679.63
1485.22 0.00 2.81 0.65 0.58 1.00 0.54 0.18 5.48 622.47
p81k
1018.51 0.00 89.86 28.70 0.99 0.99 0.70 0.24 3.40 640.68
1109.69 0.00 46.11 14.09 0.99 1.00 0.67 0.23 3.40 477.42
1238.59 0.00 11.26 3.25 0.97 1.00 0.65 0.18 3.51 338.22
p100k
1400.08 0.00 24.81 3.91 0.97 1.00 0.80 0.18 2.87 414.81
1533.9 0.00 12.54 1.57 0.92 1.00 0.79 0.16 2.83 352.58
1710.59 0.00 2.99 0.22 0.64 1.00 0.74 0.13 2.84 302.82
p267k
787.908 0.06 226.29 79.82 0.99 0.84 0.33 0.10 5.16 894.77
856.186 0.00 107.03 34.99 0.95 0.89 0.39 0.10 5.14 824.83
951.598 0.00 28.68 5.99 0.67 0.94 0.53 0.11 5.13 772.77
p330k
1023.52 0.23 196.62 69.44 0.99 0.90 0.51 0.25 8.87 1581.22
1116.89 0.00 85.99 29.04 0.99 0.97 0.55 0.24 8.79 1363.01
1246.38 0.00 15.52 5.51 0.95 0.94 0.46 0.20 8.95 1229.80
(a) Path delay fault tests for longest paths in benchmark circuit
circuit Tclk sensitization criterion NR-test invalidation Tsim Tdiag
name [ps] #rob #nr #fs Pci Pid Pin Pwi|in [ms] [ms]
p35k 1412.74 0.00 4.48 1.93 0.32 0.97 0.59 0.15 0.96 101.96
p45k 1084.84 0.00 1.16 0.69 0.25 0.98 0.48 0.20 1.02 118.52
p77k 7137.51 0.00 0.73 0.56 0.06 1.00 0.93 0.31 2.33 688.15
p78k 1485.22 0.00 2.81 0.66 0.51 0.91 0.43 0.16 6.02 611.46
p81k 1238.59 0.00 6.66 1.75 0.95 1.00 0.62 0.24 3.40 318.62
p100k 1710.59 0.00 1.19 0.47 0.26 0.94 0.63 0.15 2.77 287.31
p267k 951.598 0.00 10.46 4.16 0.39 0.86 0.32 0.08 4.84 741.11
p330k 1246.38 0.00 15.28 4.62 0.94 0.97 0.47 0.19 8.76 1189.22
(b) Delay tests for marginally detectable small delay faults used in section 7.4
148 Chapter B  Additional Result Tables
sTable B.3 — Runtime and error of approximating the delay fault detection probability
by the target paths delay fault probability approximation of the non-incremental
algorithm
circuit jT j jPj d jdj TPA TMC SU
10 2 10 2 [s] [s]
p35k
1 9.47 -2.6 3.3 0.0182 22 1204
5 45.23 -0.4 1.4 0.0879 54 612
10 88.96 -0.1 1.2 0.1842 88 475
20 171.56 0.1 1.0 0.4134 156 377
p45k
1 5.40 -1.8 2.5 0.0125 22 1732
5 24.15 -0.3 1.4 0.0595 56 948
10 48.23 0.1 1.0 0.1244 96 776
20 90.78 0.2 0.8 0.2599 176 677
p77k
1 5.87 -0.8 1.4 0.0159 39 2472
5 24.64 -0.1 1.0 0.0779 112 1435
10 47.73 -0.0 0.9 0.1603 198 1234
20 93.71 0.0 1.0 0.3477 374 1077
p78k
1 8.31 -4.3 5.1 0.0310 86 2772
5 39.94 -0.4 1.5 0.1554 280 1803
10 75.58 0.1 1.1 0.3079 516 1676
20 139.83 0.3 1.0 0.6273 990 1579
p81k
1 11.41 -1.8 2.5 0.0289 80 2753
5 53.80 -0.1 0.9 0.1510 197 1307
10 105.40 0.1 0.7 0.3176 339 1067
20 205.73 0.1 0.7 0.7116 623 876
p100k
1 5.57 -1.9 2.6 0.0199 59 2937
5 25.63 -0.4 1.4 0.0989 160 1614
10 51.62 -0.2 1.1 0.2037 280 1376
20 108.83 0.0 1.1 0.4499 527 1171
p267k
1 18.50 -0.5 1.5 0.0708 130 1843
5 88.53 0.1 0.9 0.4071 362 889
10 176.48 -0.8 1.7 0.7743 622 803
20 333.83 -2.1 2.9 1.5168 1141 752
p330k
1 25.13 -1.6 2.4 0.0938 200 2131
5 120.61 -0.4 1.4 0.5108 580 1136
10 238.90 -1.1 2.1 0.9914 1037 1046
20 479.62 -2.3 3.4 2.0844 1958 940
149
s
T a
bl
e
B
.4
—
R
un
ti
m
e
an
d
ab
so
lu
te
er
ro
r
of
ap
pr
ox
im
at
in
g
th
e
de
la
y
fa
ul
t
de
te
ct
io
n
pr
ob
ab
ili
ty
by
th
e
ta
rg
et
pa
th
s
de
la
y
fa
ul
t
pr
ob
ab
ili
ty
ap
pr
ox
im
at
io
n
of
th
e
in
cr
em
en
ta
la
lg
or
it
hm
,a
ft
er
in
se
rt
io
n
or
re
m
ov
al
of
a
te
st
ve
ct
or
-p
ai
r
N
X
P
ci
rc
ui
t
jT
j
R
ef
er
en
ce
IN
SE
RT
(t
es
t
ve
ct
or
-p
ai
r)
RE
MO
VE
(t
es
t
ve
ct
or
-p
ai
r)
M
on
te
-C
ar
lo
se
ns
it
iz
at
io
n
an
al
ys
is
(A
)
di
st
.e
xt
.(
B,
C
)
Yˆ
(D
)
C
om
pl
et
e
U
pd
at
e
Yˆ
(D
)
C
om
pl
et
e
U
pd
at
e
T M
C
jP
j
jdj
T S
A
j#j
T D
E
T N
I
jej
Sp
ee
du
p
T N
I
jej
Sp
ee
du
p
[m
s]
10
 2
[m
s]
10
 2
[m
s]
[m
s]
10
 2
[m
s]
10
 2
p3
5k
1
21
93
6
9.
47
3.
3
5.
71
0.
0
0.
24
0.
00
3.
3
36
84
0.
00
3.
3
12
68
1
89
0
5
53
78
5
45
.2
3
1.
4
3.
11
0.
2
0.
91
1.
33
1.
4
10
05
5
1.
33
1.
4
40
45
9
10
87
52
4
88
.9
6
1.
2
2.
95
0.
2
1.
71
13
.3
2
1.
2
48
66
13
.3
2
1.
2
65
70
20
15
5
91
9
17
1.
56
1.
0
2.
91
0.
2
3.
25
26
.6
5
1.
0
47
53
26
.6
5
1.
0
58
50
p4
5k
1
21
62
9
5.
40
2.
5
6.
05
0.
0
0.
23
0.
00
2.
6
34
44
0.
00
2.
6
12
49
3
34
2
5
56
42
7
24
.1
5
1.
4
5.
01
0.
2
0.
62
1.
37
1.
4
80
58
1.
37
1.
4
41
19
0
10
96
49
6
48
.2
3
1.
0
5.
09
0.
3
1.
38
12
.0
9
1.
0
51
98
12
.0
9
1.
0
79
82
20
17
6
03
0
90
.7
8
0.
8
4.
98
0.
3
1.
56
24
.1
3
0.
9
57
39
24
.1
3
0.
9
72
96
p7
7k
1
39
21
3
5.
87
1.
4
9.
16
0.
1
0.
33
0.
00
1.
5
41
29
0.
00
1.
5
25
49
2
79
9
5
11
1
73
4
24
.6
4
1.
0
8.
26
0.
9
1.
01
1.
22
1.
6
10
64
7
1.
22
1.
6
91
46
8
10
19
7
82
5
47
.7
3
0.
9
8.
09
1.
2
1.
87
10
.9
5
1.
7
94
60
10
.9
5
1.
7
18
06
4
20
37
4
39
4
93
.7
1
1.
0
8.
15
1.
5
3.
66
20
.0
1
1.
9
11
76
4
20
.0
1
1.
9
18
71
0
p7
8k
1
85
99
6
8.
31
5.
1
20
.9
5
0.
1
0.
14
0.
00
5.
1
40
77
0.
00
5.
1
45
56
3
94
9
5
28
0
09
1
39
.9
4
1.
5
20
.3
3
0.
2
0.
47
1.
48
1.
5
12
57
1
1.
48
1.
5
18
8
72
6
10
51
6
01
2
75
.5
8
1.
1
20
.2
1
0.
2
0.
83
13
.6
9
1.
1
14
85
7
13
.6
9
1.
1
37
68
0
20
99
0
45
3
13
9.
83
1.
0
19
.9
2
0.
2
1.
42
27
.0
8
0.
9
20
45
7
27
.0
8
0.
9
36
58
0
p8
1k
1
79
62
2
11
.4
1
2.
5
13
.6
4
0.
1
0.
24
0.
00
2.
5
57
34
0.
00
2.
5
40
37
3
91
2
5
19
7
36
3
53
.8
0
0.
9
12
.7
3
0.
1
0.
70
1.
32
0.
9
13
37
0
1.
32
0.
9
14
9
25
4
10
33
8
72
5
10
5.
40
0.
7
12
.6
9
0.
1
1.
27
12
.3
4
0.
7
12
88
2
12
.3
4
0.
7
27
44
7
20
62
3
08
1
20
5.
73
0.
7
12
.7
0
0.
1
2.
39
24
.3
4
0.
7
15
80
2
24
.3
4
0.
7
25
59
5
p1
00
k
1
58
53
2
5.
57
2.
6
13
.8
8
0.
0
0.
13
0.
00
2.
6
41
76
0.
00
2.
6
33
93
5
44
2
5
15
9
66
1
25
.6
3
1.
4
12
.9
8
0.
1
0.
55
1.
34
1.
4
10
73
8
1.
34
1.
4
11
9
17
7
10
28
0
35
7
51
.6
2
1.
1
12
.9
0
0.
2
0.
95
12
.1
2
1.
2
10
79
6
12
.1
2
1.
2
23
13
6
20
52
6
96
2
10
8.
83
1.
1
13
.0
0
0.
2
2.
10
23
.7
1
1.
1
13
57
7
23
.7
1
1.
1
22
22
2
p2
67
k
1
13
0
49
9
18
.5
0
1.
5
39
.0
2
0.
1
1.
28
0.
00
1.
5
32
38
0.
00
1.
5
66
87
0
54
5
5
36
1
98
9
88
.5
3
0.
9
39
.1
0
0.
3
5.
32
1.
35
1.
0
79
09
1.
35
1.
0
26
8
32
4
10
62
1
79
9
17
6.
48
1.
7
39
.0
4
1.
2
10
.2
4
12
.6
4
0.
9
10
04
1
12
.6
4
0.
9
49
17
9
20
1
14
1
30
5
33
3.
83
2.
9
38
.9
4
2.
6
16
.1
4
25
.2
2
0.
9
14
21
2
25
.2
2
0.
9
45
24
7
p3
30
k
1
19
9
81
2
25
.1
3
2.
4
49
.4
3
0.
2
2.
13
0.
00
2.
5
38
75
0.
00
2.
5
86
23
7
89
3
5
58
0
49
6
12
0.
61
1.
4
50
.0
3
0.
6
11
.2
6
1.
40
1.
2
92
58
1.
40
1.
2
41
3
19
0
10
1
03
7
26
6
23
8.
90
2.
1
50
.3
6
1.
4
24
.1
7
13
.6
4
1.
0
11
76
4
13
.6
4
1.
0
76
02
6
20
1
95
8
46
1
47
9.
62
3.
4
50
.3
8
2.
8
52
.1
2
27
.6
3
1.
0
15
05
0
27
.6
3
1.
0
70
88
0
150 Chapter B  Additional Result Tables
sTable B.5 — Runtime T and error e for the approximation of the distribution of
max(X1, . . . ,Xn), where (X1, . . . ,Xn)  Nn(m,S) with random m and random S
n r¯ jenj jesnj jesn j jen jjen j Tn[ms] Tsn[ms]
4
0.00  r¯ < 0.33 0.0162 0.0015 -90.7% 0.007 0.031
0.33  r¯ < 0.66 0.0089 0.0010 -89.2% 0.007 0.030
0.66  r¯ < 1.00 0.0070 0.0008 -89.1% 0.007 0.030
8
0.00  r¯ < 0.33 0.0205 0.0021 -89.7% 0.011 0.078
0.33  r¯ < 0.66 0.0102 0.0013 -86.8% 0.011 0.079
0.66  r¯ < 1.00 0.0091 0.0015 -84.0% 0.011 0.079
16
0.00  r¯ < 0.33 0.0219 0.0026 -88.2% 0.022 0.157
0.33  r¯ < 0.66 0.0105 0.0018 -82.7% 0.022 0.157
0.66  r¯ < 1.00 0.0095 0.0022 -76.8% 0.022 0.158
32
0.00  r¯ < 0.33 0.0242 0.0029 -88.2% 0.037 0.322
0.33  r¯ < 0.66 0.0117 0.0023 -80.1% 0.038 0.325
0.66  r¯ < 1.00 0.0092 0.0030 -67.8% 0.038 0.325
64
0.00  r¯ < 0.33 0.0264 0.0030 -88.7% 0.074 0.759
0.33  r¯ < 0.66 0.0128 0.0027 -78.5% 0.075 0.757
0.66  r¯ < 1.00 0.0096 0.0036 -62.9% 0.075 0.757
128
0.00  r¯ < 0.33 0.0278 0.0031 -89.0% 0.147 2.402
0.33  r¯ < 0.66 0.0132 0.0031 -76.9% 0.148 2.333
0.66  r¯ < 1.00 0.0099 0.0041 -59.1% 0.148 2.331
256
0.00  r¯ < 0.33 0.0284 0.0031 -89.1% 0.347 10.601
0.33  r¯ < 0.66 0.0133 0.0032 -75.8% 0.348 9.945
0.66  r¯ < 1.00 0.0100 0.0044 -56.1% 0.350 9.938
512
0.00  r¯ < 0.33 0.0286 0.0031 -89.0% 0.795 64.849
0.33  r¯ < 0.66 0.0130 0.0033 -74.9% 0.796 59.213
0.66  r¯ < 1.00 0.0098 0.0046 -53.7% 0.808 59.214
(a) Small variations of random correlation coefficients r(Xi,Xj)
n r¯ jenj jesnj jesn j jen jjen j Tn[ms] Tsn[ms]
4
0.00  r¯ < 0.33 0.0459 0.0195 -57.4% 0.007 0.034
0.33  r¯ < 0.66 0.0265 0.0110 -58.3% 0.007 0.034
0.66  r¯ < 1.00 0.0168 0.0081 -51.8% 0.007 0.034
8
0.00  r¯ < 0.33 0.0605 0.0337 -44.3% 0.011 0.075
0.33  r¯ < 0.66 0.0339 0.0208 -38.8% 0.011 0.075
0.66  r¯ < 1.00 0.0208 0.0129 -37.8% 0.012 0.075
16
0.00  r¯ < 0.33 0.0644 0.0385 -40.2% 0.019 0.148
0.33  r¯ < 0.66 0.0330 0.0229 -30.6% 0.019 0.149
0.66  r¯ < 1.00 0.0216 0.0148 -31.5% 0.019 0.150
32
0.00  r¯ < 0.33 0.0580 0.0314 -45.9% 0.033 0.315
0.33  r¯ < 0.66 0.0285 0.0200 -29.7% 0.033 0.315
0.66  r¯ < 1.00 0.0207 0.0138 -33.1% 0.034 0.318
64
0.00  r¯ < 0.33 0.0487 0.0234 -51.9% 0.064 0.739
0.33  r¯ < 0.66 0.0227 0.0163 -28.1% 0.064 0.739
0.66  r¯ < 1.00 0.0193 0.0125 -35.5% 0.065 0.743
128
0.00  r¯ < 0.33 0.0399 0.0175 -56.2% 0.151 2.317
0.33  r¯ < 0.66 0.0173 0.0133 -23.3% 0.151 2.319
0.66  r¯ < 1.00 0.0183 0.0111 -39.6% 0.153 2.327
256
0.00  r¯ < 0.33 0.0316 0.0136 -57.1% 0.351 9.912
0.33  r¯ < 0.66 0.0133 0.0110 -16.9% 0.352 9.912
0.66  r¯ < 1.00 0.0178 0.0106 -40.2% 0.355 9.929
512
0.00  r¯ < 0.33 0.0302 0.0117 -61.4% 0.810 59.094
0.33  r¯ < 0.66 0.0134 0.0094 -29.9% 0.809 59.101
0.66  r¯ < 1.00 0.0175 0.0110 -37.0% 0.817 59.129
(b) Large variations of random correlation coefficients r(Xi,Xj)
151
sTable B.6 — Runtime T and error e for the approximation of the distribution of the
maximum delay of n critical target paths, sensitized by 1, 5, 10 and 20 test vector-pairs.
circuit n jenj jesnj jesn j jen jjen j Tn[ms] Tsn[ms]
p35k
4 - 15 0.0124 0.0065 -48.0% 0.013 0.086
16 - 31 0.0259 0.0152 -41.2% 0.025 0.221
32 - 63 0.0415 0.0256 -38.3% 0.049 0.525
64 - 127 0.0669 0.0427 -36.1% 0.100 1.358
128 - 255 0.0932 0.0681 -26.9% 0.182 3.770
256 - 511 0.1102 0.0883 -19.8% 0.390 12.007
p45k
4 - 15 0.0207 0.0126 -39.1% 0.011 0.073
16 - 31 0.0412 0.0283 -31.1% 0.025 0.216
32 - 63 0.0618 0.0465 -24.8% 0.048 0.508
64 - 127 0.0726 0.0551 -24.1% 0.091 1.183
128 - 255 0.0815 0.0609 -25.2% 0.197 4.632
256 - 511 0.0996 0.0805 -19.1% 0.491 23.597
512 - 1023 0.0727 0.0787 8.3% 1.339 120.152
p77k
4 - 15 0.0146 0.0100 -31.8% 0.011 0.068
16 - 31 0.0249 0.0171 -31.4% 0.025 0.211
32 - 63 0.0355 0.0244 -31.1% 0.050 0.527
64 - 127 0.0464 0.0335 -27.7% 0.101 1.373
128 - 255 0.0625 0.0457 -26.9% 0.200 4.710
256 - 511 0.0896 0.0723 -19.3% 0.432 21.162
512 - 1023 0.1202 0.1066 -11.3% 1.171 103.090
p78k
4 - 15 0.0127 0.0072 -43.4% 0.010 0.063
16 - 31 0.0361 0.0240 -33.6% 0.025 0.226
32 - 63 0.0563 0.0424 -24.6% 0.047 0.511
64 - 127 0.0758 0.0614 -19.1% 0.082 1.028
p81k
4 - 15 0.0159 0.0114 -28.4% 0.012 0.084
16 - 31 0.0352 0.0264 -25.0% 0.024 0.217
32 - 63 0.0661 0.0489 -26.1% 0.046 0.491
64 - 127 0.0624 0.0485 -22.3% 0.093 1.230
128 - 255 0.0639 0.0533 -16.6% 0.201 4.716
256 - 511 0.0954 0.0834 -12.6% 0.392 17.669
p100k
4 - 15 0.0130 0.0082 -37.0% 0.011 0.070
16 - 31 0.0290 0.0208 -28.3% 0.025 0.227
32 - 63 0.0404 0.0288 -28.6% 0.047 0.508
64 - 127 0.0588 0.0450 -23.5% 0.092 1.220
128 - 255 0.0789 0.0578 -26.8% 0.179 3.799
256 - 511 0.0800 0.0591 -26.1% 0.389 17.480
p267k
4 - 15 0.0180 0.0103 -42.6% 0.011 0.073
16 - 31 0.0380 0.0254 -33.1% 0.024 0.211
32 - 63 0.0545 0.0411 -24.6% 0.047 0.493
64 - 127 0.0719 0.0555 -22.7% 0.098 1.317
128 - 255 0.0935 0.0755 -19.2% 0.194 4.408
256 - 511 0.1036 0.0867 -16.3% 0.461 24.767
512 - 1023 0.1558 0.1307 -16.1% 1.572 157.739
1024 - 2048 0.2380 0.2237 -6.0% 5.496 1296.518
p330k
4 - 15 0.0136 0.0079 -42.3% 0.012 0.077
16 - 31 0.0319 0.0214 -32.7% 0.026 0.224
32 - 63 0.0518 0.0369 -28.8% 0.048 0.507
64 - 127 0.0774 0.0562 -27.4% 0.097 1.263
128 - 255 0.0838 0.0609 -27.3% 0.207 4.890
256 - 511 0.1157 0.0926 -20.0% 0.464 24.221
512 - 1023 0.0681 0.0605 -11.1% 1.600 156.022
1024 - 2048 0.0959 0.0929 -3.1% 6.517 1195.454
152 Chapter B  Additional Result Tables
sTable B.7 — Runtime T and error e for the approximation of the distribution of the
maximum delay of n critical target paths, sensitized by 1, 5, 10 and 20 test vector-pairs,
using covariance matrix scaling proposed in section 7.5.3
circuit n jenj jesnj jesn j jen jjen j Tn[ms] Tsn[ms]
p35k
4 - 15 0.0124 0.0057 -54.3% 0.012 0.084
16 - 31 0.0259 0.0117 -54.9% 0.024 0.214
32 - 63 0.0415 0.0164 -60.5% 0.048 0.513
64 - 127 0.0669 0.0232 -65.4% 0.098 1.346
128 - 255 0.0932 0.0337 -63.8% 0.179 3.811
256 - 511 0.1102 0.0415 -62.3% 0.383 12.300
p45k
4 - 15 0.0207 0.0112 -45.9% 0.011 0.072
16 - 31 0.0412 0.0202 -50.8% 0.024 0.211
32 - 63 0.0618 0.0280 -54.8% 0.047 0.500
64 - 127 0.0726 0.0311 -57.2% 0.090 1.174
128 - 255 0.0815 0.0288 -64.7% 0.196 4.701
256 - 511 0.0996 0.0368 -63.1% 0.488 23.904
512 - 1023 0.0727 0.0242 -66.7% 1.289 120.964
p77k
4 - 15 0.0146 0.0090 -38.3% 0.011 0.068
16 - 31 0.0249 0.0115 -54.1% 0.024 0.206
32 - 63 0.0355 0.0159 -55.3% 0.049 0.518
64 - 127 0.0464 0.0185 -60.1% 0.100 1.366
128 - 255 0.0625 0.0230 -63.1% 0.198 4.815
256 - 511 0.0896 0.0270 -69.9% 0.426 21.788
512 - 1023 0.1202 0.0577 -52.0% 1.136 103.990
p78k
4 - 15 0.0127 0.0061 -51.5% 0.010 0.063
16 - 31 0.0361 0.0155 -57.1% 0.024 0.220
32 - 63 0.0563 0.0227 -59.8% 0.047 0.503
64 - 127 0.0758 0.0296 -60.9% 0.081 1.022
p81k
4 - 15 0.0159 0.0099 -37.7% 0.012 0.084
16 - 31 0.0352 0.0184 -47.6% 0.024 0.213
32 - 63 0.0661 0.0287 -56.6% 0.046 0.483
64 - 127 0.0624 0.0263 -57.9% 0.092 1.222
128 - 255 0.0639 0.0302 -52.8% 0.200 4.737
256 - 511 0.0954 0.0484 -49.2% 0.390 17.841
p100k
4 - 15 0.0130 0.0073 -44.2% 0.011 0.071
16 - 31 0.0290 0.0139 -52.3% 0.025 0.223
32 - 63 0.0404 0.0157 -61.1% 0.047 0.502
64 - 127 0.0588 0.0212 -63.9% 0.092 1.213
128 - 255 0.0789 0.0279 -64.7% 0.178 3.868
256 - 511 0.0800 0.0460 -42.6% 0.410 17.934
p267k
4 - 15 0.0180 0.0091 -49.6% 0.011 0.072
16 - 31 0.0380 0.0180 -52.7% 0.024 0.205
32 - 63 0.0545 0.0259 -52.5% 0.046 0.483
64 - 127 0.0719 0.0321 -55.3% 0.096 1.304
128 - 255 0.0935 0.0436 -53.3% 0.191 4.461
256 - 511 0.1036 0.0439 -57.6% 0.455 25.141
512 - 1023 0.1558 0.0699 -55.1% 1.554 160.024
1024 - 2048 0.2380 0.1383 -41.9% 5.262 1287.566
p330k
4 - 15 0.0136 0.0070 -48.4% 0.011 0.073
16 - 31 0.0319 0.0146 -54.3% 0.025 0.216
32 - 63 0.0518 0.0201 -61.2% 0.047 0.496
64 - 127 0.0774 0.0273 -64.8% 0.094 1.252
128 - 255 0.0838 0.0312 -62.8% 0.203 4.919
256 - 511 0.1157 0.0513 -55.7% 0.449 24.558
512 - 1023 0.0681 0.0378 -44.5% 1.536 156.395
1024 - 2048 0.0959 0.0470 -50.9% 5.014 1148.219


Bibliography
[Aftab09] S.-A. Aftabjahani and L. Milor. Fast Variation-Aware Statistical Dynamic
Timing Analysis. In World Congress on Computer Science and Information
Engineering, pages 488–492. Los Angeles, CA, USA, mar 2009. [page 46]
[Agarw07] K. Agarwal and S. Nassif. Characterizing process variation in nanometer
CMOS. In Proc. Design Automation Conf. (DAC), pages 396–399. San Diego,
CA, USA, jun 2007. [pages 24 and 104]
[Ahmed06] N. Ahmed, M. Tehranipoor, and V. Jayaram. Timing-based delay test for
screening small delay defects. In Proc. Design Automation Conf. (DAC),
pages 320–325. San Francisco, CA, USA, jul 2006. [page 10]
[Arism13] J. Arismendi. Multivariate truncated moments. Journal of Multivariate
Analysis, 117:41–75, 2013. [page 134]
[Aseno08] A. Asenov, A. Cathignol, B. Cheng, K. P. McKenna, A. R. Brown, A. L.
Shluger, D. Chanemougame, K. Rochereau, and G. Ghibaudo. Origin of
the asymmetry in the magnitude of the statistical variability of n- and p-
channel poly-Si gate bulk MOSFETs. IEEE Electron Device Letters, 29(8):913–
915, 2008. [page 21]
[Ash00] R. B. Ash and C. Doleans-Dade. Probability and Measure Theory. Academic
Press, second edition, 2000. ISBN 978-0-12-065202-0. [page 36]
[Azzal99] A. Azzalini and A. Capitanio. Statistical applications of the multivariate
skew normal distribution. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 61(3):579–602, 1999. [page 133]
[Azzal13] A. Azzalini and A. Capitanio. The Skew-Normal and Related Families. Cam-
bridge University Press, 2013. ISBN 978-1-139-24889-1. [pages 49, 76, 77,
80, 81, and 82]
[Balas91] B. K. Balasubramanian, M. I. Beg, and R. B. Bapat. On Families of Distribu-
tions Closed under Extrema. Sankhya¯: The Indian Journal of Statistics, Series
A (1961-2002), 53(3):375–388, 1991. [page 15]
[Becke10] B. Becker, S. Hellebrand, I. Polian, B. Straube, W. Vermeiren, and H.-J.
Wunderlich. Massive statistical process variations: A grand challenge
for testing nanoelectronic circuits. In Int. Conf. on Dependable Systems and
Networks Workshops (DSN-W), pages 95–100. Chicago, IL, USA, jun 2010.
[pages 6 and 11]
[Bench09] C. Bencher, H. Dai, and Y. Chen. Gridded design rule scaling: taking the
CPU toward the 16nm node. In Proc. of SPIE Advanced Lithography. San
Jose, CA, USA, feb 2009. [page 2]
156 Bibliography
[Birnb51] Z. W. Birnbaum and P. L. Meyer. On the effect of truncation in some or
all coordinates of a multinormal population. Technical report, Washing-
ton University and Seattle Laboratory of Statistical Research, nov 1951.
[page 73]
[Blaau08] D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer. Statistical Timing
Analysis: From Basic Principles to State of the Art. IEEE Trans. Computer-
Aided Design, 27(4):589–607, 2008. [pages vii, 20, 23, and 24]
[Bose07] S. Bose and V. D. Agrawal. Delay Test Quality Evaluation Using Bounded
Gate Delays. In Proc. VLSI Test Symp. (VTS), pages 23–28. Berkeley, CA,
USA, may 2007. [page 44]
[Bushn00] M. L. Bushnell and V. D. Agrawal. Essentials of electronic testing for digital,
memory and mixed-signal VLSI circuits. Kluwer Academic Publishers, 2000.
ISBN 978-0-306-47040-0. [page 4]
[Chakr00] S. Chakrabarti, S. Das, D. Das, and B. Bhattacharya. Synthesis of symmetric
functions for path-delay fault testability. Transactions on Computer-Aided
Design of Integrated Circuits and Systems (TCAD), 19(9):1076–1081, 2000.
[page 111]
[Chakr99] S. Chakraborty, D. Dill, and K. Yun. Min-max timing analysis and an
application to asynchronous circuits. Proceedings of the IEEE, 87(2):332–346,
1999. [page 44]
[Chakr12] S. Chakravarty, N. Devta-Prasanna, A. Gunda, J. Ma, F. Yang, H. Guo,
R. Lai, and D. Li. Silicon evaluation of faster than at-speed transition delay
tests. In Proc. VLSI Test Symp. (VTS), pages 80–85. Hyatt Maui, HI, USA,
apr 2012. [page 11]
[Chang96] J.-Y. Chang and E. McCluskey. Detecting delay flaws by very-low-voltage
testing. In Proc. Int. Test Conf. (ITC), pages 367–376. Washington, D.C., USA,
oct 1996. [page 11]
[Chang98] T.-Y. J. Chang. Voltage Screens Early-Life Failures In CMOS Integrated Circuits.
Ph.D. thesis, Stanford University, 1998. [page 11]
[Cheng93] K. Cheng and H. Chen. Delay testing for non-robust untestable circuits.
In Proc. Int. Test Conf. (ITC), pages 954–961. Baltimore, MD, USA, oct 1993.
[page 8]
[Cheng94] K.-T. Cheng and H.-C. Chen. Generation of high quality non-robust tests
for path delay faults. In Proc. Design Automation Conf. (DAC), pages 365–369.
San Diego, CA, USA, jun 1994. [page 9]
[Cheng08] L. Cheng. Non-Gaussian statistical timing analysis using second-order
polynomial fitting. In Proc. of Asia and South Pacific Design Automation Conf.,
pages 298–303. Seoul, South Korea, jan 2008. [page 50]
Bibliography 157
[Choi09] Y. Choi. New form of block matrix inversion. In IEEE/ASME International
Conference on Advanced Intelligent Mechatronics, pages 1952–1957. Singapore,
jul 2009. [page 144]
[Chopr06] K. Chopra, B. Zhai, D. Blaauw, and D. Sylvester. A New Statistical Max
Operation for Propagating Skewness in Statistical Timing Analysis. In Proc.
Int. Conf. Computer-Aided Design (ICCAD), pages 237–243. San Jose, CA,
USA, nov 2006. [page 49]
[Clark61] C. E. Clark. The greatest of a finite set of random variables. Operations
Research, 9(2):145–162, 1961. [pages 48, 72, 100, 129, and 130]
[Czado11] C. Czado and T. Schmidt. Mathematische Statistik. Springer Berlin Heidel-
berg, 2011. ISBN 978-3-642-17261-8. [page 37]
[Czutr12] A. Czutro, M. E. Imhof, J. Jiang, A. Mumtaz, M. Sauer, B. Becker, I. Polian,
and H.-J. Wunderlich. Variation-Aware Fault Grading. In Proc. IEEE Asian
Test Symp. (ATS), pages 344–349. Niigata, Japan, nov 2012. [pages 11 and 46]
[Daasc07] W. R. Daasch, M. Ward, and J. Van Slyke. Silicon evaluation of longest
path avoidance testing for small delay defects. In Proc. Int. Test Conf. (ITC).
Santa Clara, CA, USA, oct 2007. [page 10]
[DeGro12] M. H. DeGroot and M. J. Schervish. Probability and Statistics. Pearson
Education, fourth edition, 2012. ISBN 978-0-321-50046-5. [pages 26, 27, 28,
29, 32, 33, 34, and 36]
[Dervi91] B. I. Dervisoglu and G. E. Stong. Design for Testability Using Scanpath
Techniques for Path-Delay Test and Measurement. In Proc. Int. Test Conf.
(ITC), pages 365–374. Nashville, TN, USA, oct 1991. [page 3]
[Devad92] S. Devadas and K. Keutzer. Validatable nonrobust delay-fault testable
circuits via logic synthesis. IEEE Trans. Computer-Aided Design, 11(12):1559–
1573, 1992. [page 8]
[Devga03] A. Devgan and C. Kashyap. Block-based static timing analysis with uncer-
tainty. In Proc. Int. Conf. Computer-Aided Design (ICCAD), pages 607–614.
San Jose, CA, USA, nov 2003. [page 49]
[Dietr12] M. Dietrich and J. Haase. Process Variations and Probabilistic Integrated Circuit
Design. Springer New York, 2012. ISBN 978-1-4419-6620-9. [page 21]
[Egger11] S. Eggersglüß and R. Drechsler. As-Robust-As-Possible test generation in
the presence of small delay defects using pseudo-Boolean optimization. In
Proc. Design, Automation and Test in Europe (DATE). Grenoble, France, mar
2011. [page 9]
158 Bibliography
[Eiche77] E. B. Eichelberger and T. W. Williams. A logic design structure for LSI
testability. In Proc. Design Automation Conf. (DAC), pages 462–468. New
Orleans, LA, USA, jun 1977. [page 3]
[Erb14] D. Erb, K. Scheibler, M. Sauer, S. M. Reddy, and B. Becker. Circuit parameter
independent test pattern generation for interconnect open defects. In Proc.
IEEE Asian Test Symp. (ATS), pages 131–136. Hangzhou, China, nov 2014.
[page 10]
[Fonse10] R. A. Fonseca, L. Dilillo, A. Bosio, P. Girard, S. Pravossoudovitch, A. Virazel,
and N. Badereddine. Analysis of resistive-bridging defects in SRAM core-
cells: A comparative study from 90nm down to 40nm technology nodes.
In IEEE European Test Symp. (ETS), pages 132–137. Prague, Czech Republic,
may 2010. [page 11]
[Forza09] C. Forzan and D. Pandini. Statistical static timing analysis: A survey.
Integration, the VLSI Journal, 42(3):409–435, 2009. [pages 20 and 23]
[Franc10] C. Franceschini and N. M. R. Loperfido. A skewed GARCH-type model for
multivariate financial time series. In Mathematical and Statistical Methods for
Actuarial Sciences and Finance, pages 143–152. Springer-Verlag Italia, 2010.
ISBN 978-88-470-1481-7. [pages 35, 81, and 87]
[Frans04] S. Franssila. Introduction to microfabrication. John Wiley & Sons Ltd, 2004.
ISBN 978-0-470-85106-7. [page 1]
[Fuchs91] K. Fuchs, F. Fink, and M. Schulz. DYNAMITE: an efficient automatic test
pattern generation system for path delay faults. IEEE Trans. Computer-Aided
Design, 10(10):1323–1335, 1991. [pages 110 and 112]
[Fuchs94] K. Fuchs, M. Pabst, and T. Roessel. RESIST: a recursive test pattern
generation algorithm for path delay faults considering various test classes.
Transactions on Computer-Aided Design of Integrated Circuits and Systems
(TCAD), 13(12):1550–1562, 1994. [page 110]
[Gajsk83] D. D. Gajski and R. H. Kuhn. Guest Editors’ Introduction: New VLSI Tools.
Computer, 16(12):11–14, 1983. [page 38]
[Genz92] A. Genz. Numerical Computation of Multivariate Normal Probabilities.
Journal of Computational and Graphical Statistics, 1(2):141–149, 1992. [pages xi,
66, 71, 116, and 119]
[Genz04] A. Genz. Numerical computation of rectangular bivariate and trivariate
normal and t probabilities. Statistics and Computing, 14(3):251–260, 2004.
[pages xi, 66, 71, 74, 116, 119, and 135]
[Goel13] S. K. Goel and K. Chakrabarty. Testing for Small-Delay Defects in Nanoscale
CMOS Integrated Circuits. CRC Press, 2013. ISBN 978-1-4398-2942-4.
[pages vii, 2, and 10]
Bibliography 159
[Golub13] G. H. Golub and C. F. V. Loan. Matrix Computations. Johns Hopkins
University Press, fourth edition, 2013. ISBN 978-1-4214-0794-4. [pages 17,
19, 20, and 66]
[Green12] W. H. Greene. Econometric analysis. Pearson Education Limited, seventh
edition, 2012. ISBN 978-0-273-75356-8. [page 30]
[Gut09] A. Gut. An Intermediate Course in Probability. Springer New York, second
edition, 2009. ISBN 978-1-4419-0162-0. [page 38]
[Hanso05] S. Hanson, D. Blaauw, and D. Sylvester. Analysis and mitigation of variabil-
ity in subthreshold design. In Proc. Int. Symp. on Low Power Electronics and
Design (ISLPED), pages 20–25. San Diego, CA, USA, aug 2005. [page 47]
[Hao93] H. Hao and E. McCluskey. Very-low-voltage testing for weak CMOS logic
ICs. In Proc. Int. Test Conf. (ITC), pages 275–284. Baltimore, MD, USA, oct
1993. [page 11]
[Hapke11] F. Hapke, J. Schloeffel, W. Redemund, A. Glowatz, J. Rajski, M. Reese,
J. Rearick, and J. Rivers. Cell-aware analysis for small-delay effects and
production test results from different fault models. In Proc. Int. Test Conf.
(ITC). Anaheim, CA, USA, sep 2011. [page 131]
[Hopsc10] F. Hopsch, B. Becker, S. Hellebrand, I. Polian, B. Straube, W. Vermeiren,
and H.-J. Wunderlich. Variation-Aware Fault Modeling. In Proc. IEEE Asian
Test Symp. (ATS), pages 87–93. Shanghai, China, dec 2010. [pages 5 and 47]
[Ingel09] U. Ingelsson, B. Al-Hashimi, S. Khursheed, S. Reddy, and P. Harrod.
Process Variation-Aware Test for Resistive Bridges. IEEE Trans. Computer-
Aided Design, 28(8):1269–1274, 2009. [page 11]
[Ingel11] U. Ingelsson and B. M. Al-Hashimi. Investigation into voltage and process
variation-aware manufacturing test. In Proc. Int. Test Conf. (ITC). Anaheim,
CA, USA, sep 2011. [pages 5 and 45]
[Intel15] Intel Corporation. Intel Math Kernel Library 11.3 - Developer Reference,
2015. [page 103]
[Ishiu89] N. Ishiura, M. Takahashi, and S. Yajima. Time-symbolic simulation for
accurate timing verification of asynchronous behavior of logic circuits. In
Proc. Design Automation Conf. (DAC), pages 497–502. Las Vegas, NV, USA,
jun 1989. [page 43]
[Ishiu90] N. Ishiura, Y. Deguchi, and S. Yajima. Coded Time-Symbolic Simulation
Using Shared Binary Decision Diagrams. In Proc. Design Automation Conf.
(DAC), pages 130–135. Orlando, FL, USA, jun 1990. [page 44]
[ITRS12] ITRS. International Technology Roadmap for Semiconductors.
http://www.itrs.net/ITRS2012. [pages x, 1, and 2]
160 Bibliography
[Iyeng88a] V. Iyengar, B. Rosen, and I. Spillinger. Delay test generation. I. Concepts and
coverage metrics. In Proc. Int. Test Conf. (ITC), pages 857–866. Washington,
D.C. USA, sep 1988. [page 5]
[Iyeng88b] V. Iyengar, B. Rosen, and I. Spillinger. Delay test generation. II. Algebra
and algorithms. In Proc. Int. Test Conf. (ITC), pages 867–876. Washington,
D.C. USA, sep 1988. [page 5]
[Jayar13] D. Jayaraman and S. Tragoudas. A method to determine the sensitization
probability of a non-robustly testable path. In Proc. Int. Symp. on Quality
Electronic Design (ISQED), pages 676–681. Santa Clara, CA, USA, mar 2013.
[page 44]
[Jha92] N. Jha, I. Pomeranz, S. Reddy, and R. Miller. Synthesis of multi-level
combinational circuits for complete robust path delay fault testability. In
Int. Symp. on Fault-Tolerant Computing (FTCS), pages 280–287. Boston, MA,
USA, jul 1992. [page 111]
[Jung12] J. Jung, T. Kim, and S. Member. Variation-Aware False Path Analysis Based
on Statistical Dynamic Timing Analysis. Transactions on Computer-Aided
Design of Integrated Circuits and Systems (TCAD), 31(11):1684–1697, 2012.
[page 44]
[Kang13] C. Y. Kang, C. Sohn, R.-H. Baek, C. Hobbs, P. Kirsch, and R. Jammy. Effects
of Layout and Process Parameters on Device / Circuit Performance and
Variability for 10nm Node FinFET Technology. In IEEE Symposium on VLSI
Technology, pages T90–T91. Kyoto, Japan, jun 2013. [page 2]
[Kenne01] M. C. Kennedy and A. O’Hagan. Bayesian calibration of computer mod-
els. Journal of the Royal Statistical Society: Series B (Statistical Methodology),
63(3):425–464, 2001. [page 20]
[Kiure09] A. D. Kiureghian and O. Ditlevsen. Aleatory or epistemic? Does it matter?
Structural Safety, 31(2):105–112, 2009. [page 23]
[Kollo05] T. Kollo and D. von Rosen. Advanced Multivariate Statistics with Matrices.
Springer Netherlands, 2005. ISBN 978-1-4020-3418-3. [page 88]
[Konuk00] H. Konuk. On invalidation mechanisms for non-robust delay tests. In
Proc. Int. Test Conf. (ITC), pages 393–399. Atlantic City, NJ, USA, oct 2000.
[pages vii, 8, and 9]
[Kopp08] J. Kopp. Efficient numerical diagonalization of hermitian 3x3 matrices.
International Journal of Modern Physics C, 19(3), 2008. [page 18]
[Krsti95] A. Krstic and K.-T. Cheng. Generation of high quality tests for functional
sensitizable paths. In Proc. VLSI Test Symp. (VTS), pages 374–379. Princeton,
NJ, USA, may 1995. [pages 110 and 111]
Bibliography 161
[Kruse04] B. Kruseman, A. K. Majhi, G. Gronthoud, and S. Eichenberger. On hazard-
free patterns for fine-delay fault testing. In Proc. Int. Test Conf. (ITC), pages
213–222. Charlotte, NC, USA, oct 2004. [page 11]
[Kuhn08] K. Kuhn, C. Kenyon, A. Kornfeld, M. Liu, A. Maheshwari, W.-k. Shih,
S. Sivakumar, G. Taylor, P. VanDerVoorn, and K. Zawadzki. Managing
Process Variation in Intel’s 45nm CMOS Technology. Intel Technology
Journal, 12:93–110, 2008. [pages 21 and 104]
[Kuhn09] K. J. Kuhn. CMOS Scaling Beyond 32nm: Challenges and Opportunities.
In Proc. Design Automation Conf. (DAC), pages 310–313. San Francisco, CA,
USA, jul 2009. [page 21]
[Kuhn10] K. J. Kuhn. CMOS transistor scaling past 32nm and implications on
variation. In IEEE/SEMI Advanced Semiconductor Manufacturing Conference
(ASMC), pages 241–246. San Francisco, CA, USA, jul 2010. [page 2]
[Kuruv13] V. Kuruvilla, D. Sinha, J. Piaget, C. Visweswariah, and N. Chandrachoodan.
Speeding up computation of the max/min of a set of gaussians for statis-
tical timing analysis and optimization. In Proc. Design Automation Conf.
(DAC). Austin, TX, USA, jun 2013. [page 48]
[Lee05a] B. Lee, H. Li, L.-C. Wang, and M. S. Abadir. Hazard-aware statistical
timing simulation and its applications in screening frequency-dependent
defects. In Proc. Int. Test Conf. (ITC), pages 91–100. Austin, TX, USA, nov
2005. [page 11]
[Lee05b] B. Lee, L.-C. Wang, and M. S. Abadir. Reducing pattern delay variations
for screening frequency dependent defects. In Proc. VLSI Test Symp. (VTS),
pages 153–160. Palm Springs, CA, USA, may 2005. [pages 13 and 47]
[Leung12a] G. Leung and C. O. Chui. Variability impact of random dopant fluctuation
on nanoscale junctionless FinFETs. IEEE Electron Device Letters, 33(6):767–
769, 2012. [pages vii, x, 21, 22, and 23]
[Leung12b] G. Leung, L. Lai, P. Gupta, and C. O. Chui. Device- and circuit-level vari-
ability caused by line edge roughness for sub-32-nm FinFET technologies.
IEEE Transactions on Electron Devices, 59(8):2057–2063, 2012. [pages x, 2, 22,
and 23]
[Li10] Y. Li, C. H. Hwang, T. Y. Li, and M. H. Han. Process-variation effect,
metal-gate work-function fluctuation, and random-dopant fluctuation
in emerging CMOS technologies. IEEE Transactions on Electron Devices,
57(2):437–447, 2010. [pages 21, 24, and 104]
[Lin87] C. Lin and S. Reddy. On Delay Fault Testing in Logic Circuits. IEEE Trans.
Computer-Aided Design, 6(5):694–703, 1987. [page 110]
162 Bibliography
[Liou01] J.-J. Liou, K.-T. Cheng, S. Kundu, and A. Krstic. Fast statistical timing
analysis by probabilistic event propagation. In Proc. Design Automation
Conf. (DAC), pages 661–666. Las Vegas, NV, USA, jun 2001. [page 49]
[Loper13] N. M. R. Loperfido. Skewness and the linear discriminant function. Statis-
tics & Probability Letters, 83(1):93–99, 2013. [page 89]
[Louhi02] S. Louhichi. Rates of Convergence in the CLT for Some Weakly Dependent
Random Variables. Theory of Probability and its Applications, 46(2):297–315,
2002. [page 63]
[Luca15] G. D. Luca and N. Loperfido. Modelling multivariate skewness in financial
returns: a SGARCH approach. The European Journal of Finance, 21(13-
14):1113–1131, 2015. [pages 35 and 87]
[Ma11] Y. Ma, J. Sweis, C. Bencher, Y. Deng, H. Dai, H. Yoshida, B. Gisuthan, J. Kye,
and H. J. Levinson. Double patterning compliant logic design. In Proc. of
SPIE: Design for Manufacturability through Design-Process Integration V. San
Jose, CA, USA, feb 2011. [page 2]
[Matti09] R. Mattiuzzo, D. Appello, and C. Allsup. Small delay defect testing. Test
and Measurement World, pages 37–41, jun 2009. [page 10]
[Mille12] S. L. Millers and D. G. Childers. Probability and Random Processes with
Application to Signal Processing and Communications. Elsevier Academic
Press, second edition, 2012. ISBN 978-0-12-386981-4. [page 35]
[Montg13] D. C. Montgomery and G. C. Runger. Applied Statistics and Probability for
Engineers. John Wiley & Sons Inc, sixth edition, 2013. ISBN 978-1-118-
53971-2. [pages 25 and 30]
[Mori94] T. F. Móri, V. K. Rohatgi, and G. J. Székely. On Multivariate Skewness
and Kurtosis. Theory of Probability & Its Applications, 38(3):547–551, 1994.
[page 87]
[Nadar08] S. Nadarajah and S. Kotz. Exact Distribution of the Max/Min of Two Gaus-
sian Random Variables. IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, 16(2):210–212, 2008. [pages 15 and 68]
[nan11] Nangate 45nm Open Cell Library, aug 2011. [page 103]
[Needh98] W. Needham, C. Prunty, and E. H. Yeoh. High volume microprocessor test
escapes, an analysis of defects our tests are missing. In Proc. Int. Test Conf.
(ITC), pages 25–34. Washington, D.C., USA, oct 1998. [page 11]
[Nigh00] P. Nigh and A. Gattiker. Test method evaluation experiments and data. In
Proc. Int. Test Conf. (ITC), pages 454–463. Atlantic City, NJ, USA, oct 2000.
[page 47]
Bibliography 163
[Peng10] K. Peng, M. Yilmaz, M. Tehranipoor, and K. Chakrabarty. High-quality
pattern selection for screening small-delay defects considering process
variations and crosstalk. In Proc. Design, Automation and Test in Europe
(DATE), pages 1426–1431. Dresden, Germany, mar 2010. [page 45]
[Peng13] K. Peng, M. Yilmaz, K. Chakrabarty, and M. Tehranipoor. Crosstalk- and
process variations-aware high-quality tests for small-delay defects. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, 21(6):1129–1142,
2013. [page 45]
[Polia11] I. Polian, B. Becker, S. Hellebrand, H.-J. Wunderlich, and P. Maxwell.
Towards Variation-Aware Test Methods. In IEEE European Test Symp. (ETS),
pages 219–225. Trondheim, Norway, may 2011. [page 11]
[Qian10] X. Qian and A. D. Singh. Distinguishing Resistive Small Delay Defects
from Random Parameter Variations. In Proc. IEEE Asian Test Symp. (ATS),
pages 325–330. Shanghai, China, dec 2010. [page 13]
[Rabae03] J. M. Rabaey, A. Chandrakasan, and B. Nikolic. Digital integrated circuits:
a design perspective. Prentice-Hall International Inc, second edition, 2003.
ISBN 978-0-13-090996-1. [pages vii and 40]
[Radha11] G. S. Radhakrishnan and S. Ozev. Adaptive Modeling of Analog/RF
Circuits for Efficient Fault Response Evaluation. Journal of Electronic Testing
(JETTA), 27(4):465–476, 2011. [page 13]
[Resni14] S. I. Resnick. A Probability Path. Birkhäuser Boston, 2014. ISBN 978-0-8176-
8408-2. [page 30]
[Ryan14] P. G. Ryan, I. Aziz, W. B. Howell, T. K. Janczak, and D. J. Lu. Process Defect
Trends and Strategic Test Gaps. In Proc. Int. Test Conf. (ITC). Seattle, WA,
USA, oct 2014. [page 10]
[Sauer12] M. Sauer, A. Czutro, I. Polian, and B. Becker. Small-Delay-Fault ATPG with
Waveform Accuracy. In Proc. Int. Conf. Computer-Aided Design (ICCAD),
pages 30–36. San Jose, CA, nov 2012. [page 110]
[Sauer14] M. Sauer, I. Polian, M. E. Imhof, A. Mumtaz, E. Schneider, A. Czutro,
H.-J. Wunderlich, and B. Becker. Variation-aware deterministic ATPG. In
IEEE European Test Symp. (ETS). Paderborn, Germany, may 2014. [pages 11
and 46]
[Savir93] J. Savir and S. Patil. Scan-based transition test. Transactions on Computer-
Aided Design of Integrated Circuits and Systems (TCAD), 12(8):1232–1241,
1993. [page 3]
[Savir94] J. Savir and S. Patil. On broad-side delay test. Transactions on Computer-
Aided Design of Integrated Circuits and Systems (TCAD), 2(3):368–372, 1994.
[page 3]
164 Bibliography
[Schne15] E. Schneider, S. Holst, M. A. Kochte, X. Wen, and H.-J. Wunderlich. GPU-
Accelerated Small Delay Fault Simulation. In Proc. Design, Automation
and Test in Europe (DATE), pages 1174–1179. Grenoble, France, mar 2015.
[page 45]
[Schue13] K. Schuegraf, M. C. Abraham, A. Brand, M. Naik, and R. Thakur. Semicon-
ductor Logic Technology Innovation to Achieve Sub-10 nm Manufacturing.
IEEE Journal of the Electron Devices Society, 1(3):66–75, 2013. [page 2]
[Segur02] J. Segura, A. Keshavarzi, J. Soden, and C. Hawkins. Parametric failures
in CMOS ICs-a defect-based analysis. In Proc. Int. Test Conf. (ITC), pages
90–99. Baltimore, MD, USA, oct 2002. [page 10]
[Smith85] G. L. Smith. Model for delay faults based upon paths. In Proc. Int. Test
Conf. (ITC), pages 342–349. Philadelphia, PA, USA, oct 1985. [page 5]
[Sriva05] A. Srivastava, D. Sylvester, and D. Blaauw. Statistical Analysis and Optimiza-
tion for VLSI : Timing and Power. Springer, 2005. ISBN 978-0-387-25738-9.
[pages 20, 21, 23, 24, 27, and 31]
[Sumik12] N. Sumikawa, L.-C. Wang, and M. S. Abadir. An experiment of burn-in
time reduction based on parametric test analysis. In Proc. Int. Test Conf.
(ITC). Anaheim, CA, USA, nov 2012. [page 11]
[Talli61] G. Tallis. The moment generating function of the truncated multi-normal
distribution. Journal of the Royal Statistical Society. Series B (Methodological),
23(1):223–229, 1961. [pages 73 and 134]
[Tang14a] Q. Tang, J. Rodriguez, A. Zjajo, M. Berkelaar, and N. Van Der Meijs.
Statistical transistor-level timing analysis using a direct random differential
equation solver. Transactions on Computer-Aided Design of Integrated Circuits
and Systems (TCAD), 33(2):210–223, 2014. [page 45]
[Tang14b] Q. Tang, A. Zjajo, M. Berkelaar, and N. van der Meijs. Considering
Crosstalk Effects in Statistical Timing Analysis. Transactions on Computer-
Aided Design of Integrated Circuits and Systems (TCAD), 33(2):318–322, 2014.
[page 130]
[Tang14c] X. Tang, A. Xu, W. Li, and Z. Yang. Fault Models of CMOS Gates: An
Empirical Study Based on Mutation Analysis. In International Conference on
Dependable, Autonomic and Secure Computing, pages 115–120. Dalian, China,
aug 2014. [page 131]
[Tseng00] C.-W. Tseng, E. J. McCluskey, X. Shao, J. Wu, and D. M. Wu. Cold delay
defect screening. In Proc. VLSI Test Symp. (VTS), pages 183–188. Montreal,
QC, Canada, apr 2000. [page 11]
Bibliography 165
[Tsuki11] S. Tsukiyama and M. Fukui. A new statistical maximum operation for
Gaussian mixture models and its evaluations. In European Conference on
Circuit Theory and Design (ECCTD), pages 45–48. Linkoping, Sweden, aug
2011. [page 50]
[Ul Ha11] F. Ul-Hassan, W. Vanderbauwhede, and F. Rodriguez-Salazar. Timing yield
analysis of pipelined circuits under device variability. In Proc. of Interna-
tional Symposium on Signals, Circuits and Systems (ISSCS). Iasi, Romania, jun
2011. [page 47]
[Vijay14] M. Vijaykumar and V. Vasudevan. Statistical static timing analysis using a
skew-normal canonical delay model. In Proc. Design, Automation and Test
in Europe (DATE). Dresden, Germany, mar 2014. [page 50]
[Viswe04] C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, and S. Narayan.
First-order incremental block-based statistical timing analysis. In Proc.
Design Automation Conf. (DAC), page 331. San Diego, CA, USA, jun 2004.
[page 31]
[Wagne13] M. Wagner and H.-J. Wunderlich. Efficient Variation-Aware Statistical
Dynamic Timing Analysis for Delay Test Applications. In Proc. Design,
Automation and Test in Europe (DATE). Grenoble, France, mar 2013.
[Wagne14] M. Wagner and H.-J. Wunderlich. Incremental Computation of Delay
Fault Detection Probability for Variation-Aware Test Generation. In IEEE
European Test Symp. (ETS). Paderborn, Germany, may 2014.
[Wang06] L.-T. Wang, C.-W. Wu, and X. Wen. VLSI Test Principles and Architectures:
Design for Testability. Morgan Kaufmann Publishers Inc., 2006. ISBN
978-0-12-370597-6. [pages 3 and 5]
[Wang09] Z. Wang and D. M. Walker. Compact Delay Test Generation with a Realistic
Low Cost Fault Coverage Metric. In Proc. VLSI Test Symp. (VTS), pages
59–64. Santa Cruz, CA, USA, may 2009. [page 45]
[Wang13] S. Wang, G. Leung, A. Pan, C. O. Chui, and P. Gupta. Evaluation of
digital circuit-level variability in inversion-mode and junctionless FinFET
technologies. IEEE Transactions on Electron Devices, 60(7):2186–2193, 2013.
[page 2]
[Wothk93] W. Wothke. Nonpositive definite matrices in structural modeling. In Testing
Structural Equation Models, pages 256–293. Sage Publications, 1993. ISBN
978-0-8039-4507-4. [page 66]
[Xie09] L. Xie, A. Davoodi, K. K. Saluja, and A. Sinkar. False path aware timing
yield estimation under variability. In Proc. VLSI Test Symp. (VTS), pages
161–166. Santa Cruz, CA, USA, may 2009. [page 44]
166 Bibliography
[Yalci97] H. Yalcin and J. P. Hayes. Event propagation conditions in circuit delay
computation. ACM Transactions on Design Automation of Electronic Systems,
2(3):249–280, 1997. [page 39]
[Yamag13] T. J. Yamaguchi, J. S. Tandon, S. Komatsu, and K. Asada. A novel test
structure for measuring the threshold voltage variance in MOSFETs. In
Proc. Int. Test Conf. (ITC). Anaheim, CA, USA, sep 2013. [page 21]
[Ye10] Y. Ye, S. Gummalla, C.-C. Wang, C. Chakrabarti, and Y. Cao. Random
variability modeling and its impact on scaled CMOS circuits. Journal of
Computational Electronics, 9(3):108–113, 2010. [page 104]
[Yilma08] M. Yilmaz, K. Chakrabarty, and M. Tehranipoor. Test-Pattern Grading and
Pattern Selection for Small-Delay Defects. In Proc. VLSI Test Symp. (VTS),
pages 233–239. San Diego, CA, USA, apr 2008. [page 45]
[Zhang05] L. Zhang, W. Chen, Y. Hu, J. A. Gubner, and C. C.-P. Chen. Correlation-
preserved non-Gaussian statistical timing analysis with quadratic timing
model. In Proc. Design Automation Conf. (DAC), pages 83–88. Anaheim, CA,
USA, jun 2005. [page 131]
[Zhao06] W. Zhao and Y. Cao. New generation of predictive technology model for
sub-45 nm early design exploration. IEEE Transactions on Electron Devices,
53(11):2816–2823, 2006. [page 2]
[Zhong11] S. Zhong, S. Khursheed, B. M. Al-Hashimi, S. M. Reddy, and
K. Chakrabarty. Analysis of Resistive Bridge Defect Delay Behavior in the
Presence of Process Variation. In Proc. IEEE Asian Test Symp. (ATS), pages
389–394. New Delhi, India, nov 2011. [page 31]
Curriculum Vitae of the Author
Marcus Wagner studied computer science at the Friedrich
Schiller University (FSU) in Jena and received his diploma
degree in Computer Science (Dipl.-Inf.) in October 2007.
He attained some industrial work experience while work-
ing with the company MAZeT GmbH in 2008.
In April 2009 he joined the institute of computer archi-
tecture and computer engineering (Institut für Technische
Informatik, ITI) at the University of Stuttgart, under the
direction of Prof. Dr. rer. nat. habil. Hans-Joachim Wun-
derlich. During this time, he contributed to the RealTest
project and was working as a teacher assistant.

Publications of the Author
• M. Wagner and H.-J. Wunderlich. Incremental Computation of Delay Fault
Detection Probability for Variation-Aware Test Generation. In IEEE European Test
Symp. (ETS). Paderborn, Germany, may 2014
• M. Wagner and H.-J. Wunderlich. Efficient Variation-Aware Statistical Dynamic
Timing Analysis for Delay Test Applications. In Proc. Design, Automation and Test
in Europe (DATE). Grenoble, France, mar 2013

Declaration
All the work contained within this thesis, except where otherwise
acknowledged, was solely the effort of the author. At no stage
was any collaboration entered into with any other party.
—————————————
Marcus Wagner
