




The Dissertation Committee for Jae Yong Chung
certifies that this is the approved version of the following dissertation:
Refactoring-based Statistical Timing Analysis and Its
Applications to Robust Design and Test Synthesis
Committee:





Refactoring-based Statistical Timing Analysis and Its
Applications to Robust Design and Test Synthesis
by
Jae Yong Chung, B.S.;M.S.E.
DISSERTATION
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
DOCTOR OF PHILOSOPHY
THE UNIVERSITY OF TEXAS AT AUSTIN
May 2011
Dedicated to my father, Sukku Chung,
and my mother, Myungsuk Kim.
Acknowledgments
I would like to express my deepest gratitude to my advisor, Dr. Jacob
A. Abraham, for his excellent guidance, continuous support, patience, and
providing me with an excellent atmosphere for doing research which allowed
me to work on fundamental research problems. Without his guidance this
dissertation would not have been completed. I would like to thank Dr. Nur
Touba, Dr. Adnan Aziz, Dr. Michael Orshansky and Dr. Yaping Zhan for
their time and insightful comments while being members of my dissertation
committee.
My gratitude goes to my friends in the Computer Engineering Research
Center, Joonsung Park, Jae-Wook Lee, Jihwan Chun, Kihyuk Han, Eun Jung
Jang, Hyun Jin Kim, Junyoung Park, Joonsoo Kim, Ashutosh Chakaraborty,
Minsik Cho, Hongjung Shin, Jinkyu Lee, Wooyoung Jang, Yongchan Ban,
Kun Yuan, Rajeshwary Tayade, Sriram Sambamurthy, Sankar Gurumurthy,
Mahesh Prabhu. I would like to appreciate ECE friends, Donghyuk Shin,
Ick-Jae Yoon, Jungho Jo, Joon-Sung Yang, Yonghyun Kim, Bong Wan Jun,
Wonsoo Kim, Taesoo Jun, Jae Hong Min, Minsoo Rhu, Jinsuk Chung. Special
thanks to my roommate, Seyoung Kim, who has been with me for entire years
of my study, both in times of deep despair and in times of joy. Also, I would like
to acknowledge Melissa Campos and Debi Prather for administrative support.
v
I am very grateful to my colleagues at IBM T.J. Watson Research
Center, Jinjun Xiong and Vladimir Zolotov. They are world-class experts in
VLSI CAD area, and I was fortunate to work together with them for state-of-
the-art timing analysis technology.
My two internships at Strategic CAD Lab.(SCL), Intel and IBM T.J.
Watson Research Center provided me excellent opportunities to learn cutting-
edge technologies and industrial problems. I would like to thank Suriyaprakash
Natarajan and Eli Chiprout from Intel, and Tom Fox, Jun Sawada, Daniel A.
Prener and Bill Reohr from IBM. Also, I would like to express my gratitude
to Samsung Electronics and Intel for their financial support.
Finally, and most importantly, I am indebted to the love and support
that I have received from my parents. In special, I am thankful to Hyunyoung
Choi for her love, encouragement, sacrifice, and support.
vi
Refactoring-based Statistical Timing Analysis and Its
Applications to Robust Design and Test Synthesis
Publication No.
Jae Yong Chung, Ph.D.
The University of Texas at Austin, 2011
Supervisor: Jacob A. Abraham
Technology scaling in the nanometer era comes with a significant amount
of process variation, leading to lower yield and new types of defective parts.
These challenges necessitate robust design to ensure adequate yield, and smarter
testing to screen out bad chips. Statistical static timing analysis (SSTA) en-
ables this but suffers from crude approximation algorithms.
This dissertation first studies the underlying theories of timing graphs
and proposes two fundamental techniques enhancing the core statistical timing
algorithms. We first propose the refactoring technique to capture topological
correlation. Static timing analysis is based on levelized breadth-first traversal,
which is a fundamental graph traversal technique and has been used for static
timing analysis over the past decades. We show that there are numerous
alternatives to the traversal because of an algebraic property, the distributivity
of addition over maximum. This new interpretation extends the degrees of
vii
freedom of static timing analysis, which is exploited to improve the accuracy
of SSTA. We also propose a novel operator for computing joint probabilities
in SSTA. In many SSTA applications, this is very common but is done using
the max operator which results in much error due to the linear approximation.
The new operator provides significantly higher accuracy at a small cost of run
time.
Second, based on the two fundamental studies, this dissertation devel-
ops three applications. We propose a criticality computation method that is
essential to robust design and test synthesis; The proposed method, combined
with the two fundamental techniques, achieves drastic accuracy improvement
over the state-of-the-art method, demonstrating the benefits in practical ap-
plications. We formulate the statistical path selection problem for at-speed
test as a gambling problem and present an elegant solution based on the Kelly
criterion. To circumvent the coverage loss issue in statistical path selection,
we propose a testability driven approach, making it a practical solution for





List of Tables xii
List of Figures xiii
Chapter 1. Introduction 1
1.1 Robust Design Applications . . . . . . . . . . . . . . . . . . . 2
1.2 VLSI Testing Applications . . . . . . . . . . . . . . . . . . . . 3
1.3 Parameterized Statistical Timing . . . . . . . . . . . . . . . . 5
1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . 7
Chapter 2. A Hierarchy of Subgraphs Underlying a Timing Graph
and Its Use in Capturing Topological Correlation 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Refactoring Algorithms . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Ellipse Graphs . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3 Topological-order First Search . . . . . . . . . . . . . . 20
2.4.4 Static Refactoring . . . . . . . . . . . . . . . . . . . . . 23
2.4.5 Dynamic Refactoring . . . . . . . . . . . . . . . . . . . 28
2.5 The Hierarchical Timing Graph . . . . . . . . . . . . . . . . . 30
2.6 Notes on Implementation . . . . . . . . . . . . . . . . . . . . . 31
2.7 Exploring More Candidate Solutions . . . . . . . . . . . . . . . 32
2.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 36
2.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
ix
Chapter 3. Path Criticality Computation in Parameterized Sta-
tistical Timing Analysis 46
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Path Criticality . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1 A Simple Timing Graph . . . . . . . . . . . . . . . . . . 53
3.3.2 A General Timing Graph . . . . . . . . . . . . . . . . . 54
3.3.3 Conditional Probability Computation . . . . . . . . . . 58
3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Approximation Error Analysis . . . . . . . . . . . . . . . . . . 64
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.1 Conditioning Operation . . . . . . . . . . . . . . . . . . 69
3.5.2 Criticality Computation . . . . . . . . . . . . . . . . . . 72
3.5.3 Breakdown of the Accuracy Improvement . . . . . . . . 77
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 4. A Concurrent Path Selection Algorithm in Statisti-
cal Timing Analysis 80
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.1 Test Quality Metric . . . . . . . . . . . . . . . . . . . . 89
4.4 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.4.1 Partitioning Path Sets . . . . . . . . . . . . . . . . . . . 92
4.4.2 Path Selection by Betting . . . . . . . . . . . . . . . . . 95
4.4.3 Betting Strategies . . . . . . . . . . . . . . . . . . . . . 97
4.4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.5 Guaranteeing k Paths and Handling Untestable Paths . . . . . 103
4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 108
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
x
Chapter 5. Testability Driven Statistical Path Selection 118
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.2.1 Deterministic vs. Statistical Path Selection . . . . . . . 122
5.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 123
5.3.1 Testable Path Coverage Metric . . . . . . . . . . . . . . 125
5.4 Criticality Based Testable Path Selection Algorithm . . . . . . 127
5.4.1 Properties of Criticality . . . . . . . . . . . . . . . . . . 127
5.4.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . 129
5.4.3 Pruning Methods . . . . . . . . . . . . . . . . . . . . . . 134
5.5 Selection by a Threshold . . . . . . . . . . . . . . . . . . . . . 136
5.5.1 Modification of the Pruning Method . . . . . . . . . . . 139
5.5.2 Incrementality . . . . . . . . . . . . . . . . . . . . . . . 140
5.6 Integration of a SAT Solver . . . . . . . . . . . . . . . . . . . . 141
5.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 144
5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Chapter 6. Conclusions 154
Appendices 157





2.1 Numbers of literals (γ = 8%) . . . . . . . . . . . . . . . . . . . 38
2.2 Comparison with existing methods for ISCAS85 benchmarks
(γ=20%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Comparison with existing methods for ISCAS85 benchmarks
(γ=8%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Improvement by Algorithm 3 . . . . . . . . . . . . . . . . . . 43
3.1 Accuracy comparison . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Edge criticality computation results . . . . . . . . . . . . . . . 77
4.1 Pre-ATPG PCM and runtimes for ISCAS85 circuits . . . . . . 109
4.2 Post-ATPG PCM and runtimes for ISCAS85 circuits . . . . . 112
4.3 Post-ATPG PCM and runtimes for ISCAS85 circuits using test
margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.1 Our experiment for SSTA+ATPG is performed in a single tool
and the runtime results are much better than that in the typical
industrial setting where the two tools are separated. Neverthe-
less, the proposed method shows the significant speed-up over
SSTA+ATPG. Note that the proposed method also generates
test patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.2 If 100% TCM is desired, we can select paths by a threhold value.
Algorithm 9 with a low threshold value as input found all paths
that are testable and can potentially have the least slack among
all testable paths. . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.3 Our novel SAT Integration method can enhance the perfor-
mance substantially. . . . . . . . . . . . . . . . . . . . . . . . 152
xii
List of Figures
2.1 In static timing analysis, a dag is topologically sorted, and the
arrival time of each vertex is calculated in the topological order. 15
2.2 Ellipse graphs and the Hasse diagram representing their partial
order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The implication of division in a timing graph and recursive
refactoring example . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Refactoring takes a flat timing graph as input and generates a
hierarchy of ellipse graphs. . . . . . . . . . . . . . . . . . . . . 30
2.5 Divisors may not be re-factored if many antipodes are the su-
persink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 An example of divisor refactoring . . . . . . . . . . . . . . . . 34
2.7 The c.d.f of c499 for γ = 20% shows the relative performance
of various criteria of refactoring. . . . . . . . . . . . . . . . . . 37
2.8 The PDFs of the latest arrival time of c6288 . . . . . . . . . . 42
3.1 A simple timing graph with edge delays of canonical forms. The
mean values are assumed zero for illustration purpose. . . . . . 50
3.2 Depending on the mean values of d(p1) and d(p2) (not shown in
the equations) and the values b1, d1, c1, we consider two cases.
As in the case 1, if one path delay dominates the other path
delay, the approximation is accurate. However, it incurs some
error in the case 2. . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 A simple timing graph . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Given a path p, we can partition Ω as above. The top figure
(group 0) shows the given path p. . . . . . . . . . . . . . . . 56
3.5 In both the maximum operation and our proposed conditioning
operation, the normal approximation is inevitable for efficiency.
However, the approximation error is much less in the condition-
ing operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.6 Ξ(W )(WG) as a function of ρ and α when σY = 0.4 . . . . . . . 67
3.7 Ξ(W )(WG) as a function of σY and ρ when α = 1 . . . . . . . . 67
3.8 Ξ(W )(WG) as a function of σY and α when ρ = 0.8 . . . . . . . 68
xiii
3.9 Each method computes a joint probability in a weakly corre-
lated environment. . . . . . . . . . . . . . . . . . . . . . . . . 70
3.10 Each method computes a joint probability in a strongly corre-
lated environment. . . . . . . . . . . . . . . . . . . . . . . . . 71
3.11 The proposed method not only shows significant better accuracy
as compared to the existing method but also is comparable to
Monte Carlo simulation. . . . . . . . . . . . . . . . . . . . . . 74
3.12 Both refactoring and the conditioning operation are crucial to
achieve the accuracy close to that of Monte Carlo simulations. 76
4.1 The sample space of bad chips with stuck-at faults: We suppose
that a produced bad chip corresponds to one element in the
sample space so each element is equally likely. The 4 events
are not disjoint in practice, but since the intersections are small
enough, this figure is acceptable. . . . . . . . . . . . . . . . . . 86
4.2 Sample space of bad chips with path delay faults . . . . . . . 87
4.3 A simple circuit and the depth-first search tree . . . . . . . . . 93
4.4 The decision tree with an example of bets . . . . . . . . . . . 94
4.5 The payoffs for two produced chips (ensembles) . . . . . . . . 96
4.6 The local view of the gambler . . . . . . . . . . . . . . . . . . 98
4.7 Merge process . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.8 Pre-ATPG PCM . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.9 Post-ATPG PCM . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.10 Post-ATPG PCM when both methods use 0.7% test margin . 115
4.11 Post-ATPG PCM when proposed methods and BnB-SPM use
0.7% and 1% test margin, respectively . . . . . . . . . . . . . 116
5.1 Slack distributions of 4 paths . . . . . . . . . . . . . . . . . . 124
5.2 The regions that each path is critical in the process parame-
ter space are shown. The area of each region represents the
criticality. For example, λU(p1) = λV (p1) = 0.25. . . . . . . . . 128
5.3 A recursive depth-first traversal . . . . . . . . . . . . . . . . . 130
5.4 A binary partition tree to compute λΨ . . . . . . . . . . . . . 133
5.5 An incremental solver has been used to identify untestable branches.
If the conflict analysis is used, we can locate them more effi-
ciently even without using incremental SAT. . . . . . . . . . . 142
xiv
5.6 The optimality in pre-ATPG path selection achieved by SSTA
is destroyed during the ATPG process. Our testability driven
approach considers the testability in the first place and achieves
superior quality of results even in the extremely low testable ratio.145
5.7 The pruning technique allows us to select paths efficiently, and
the runtime grows linearly with respect to the number of se-
lected paths when m = k. . . . . . . . . . . . . . . . . . . . . 147
5.8 If the results of previous runs are available, we can further




The shrinking feature sizes of integrated circuits cause chips to operate
faster and consume less power, which adds value to the chips. The added value
makes us willing to pay the price of next technology scaling, and this positive
cycle has been fueled the silicon industry over the past decades. However,
we are facing the breakdown of the cycle. One reason is that it is becoming
difficult to control device parameters precisely. In other words, the variability
in process parameters is increasing. In this situation, the traditional margining
to deal with the variability depreciates the chips, stopping the positive cycle.
Significant efforts have been devoted to coping with the variability issues in the
nanometer IC technologies, which results in statistical static timing analysis
(SSTA). The initial efforts to deploy the statistical timing methodology such
as modeling, measurement, and building timing library may be large, but the
benefits of using SSTA come at almost every stage in IC development. It
provides much room for improvement in various CAD algorithms at many
different levels, and the efforts to harvest the fruits of statistical timing are
ongoing in industry and academies.
Current block-based SSTA is efficient and accurate enough to be used
1
for timing sign-off and can produce the probability distribution function (PDF)
of the circuit slack pretty accurately. However, developing applications on top
of SSTA reveals that large inaccuracy is hidden behind the PDF. In particular,
there are two well-known sources of inaccuracy. If two arrival times come
through a common path, they are correlated due to the topology. This is called
the topological correlation which is ignored in most SSTA tools. The maximum
of two delays become a non-linear function to the process parameters, but
approximate it by a linear function which causes some error. This error is called
the linear approximation error of the max operator. These two sources of
inaccuracy hinder the development of SSTA applications in design automation
and VLSI testing, while the significance of such applications is growing as
variability increases.
1.1 Robust Design Applications
Using SSTA, designers can validate before tape-out that chips will be
produced at a high yield even under various sources of variation. If a design is
too sensitive to some sources of variation, designers will fix the design in order
to reduce the sensitivities, making it robust. Thus, SSTA should be able not
only to predict the yield accurately, but also to assist them in enhancing the
design to improve the productivity of designers. Also, since a large part of to-
day’s designs comes from CAD tools, the sensitivities should be translated into
a form that can be used easily in high-level CAD algorithms. The sensitivity
values themselves are difficult to deal with by designers or the algorithms, so
2
we usually compute certain probability values from the sensitivities. The tim-
ing criticality of a path or an edge is a probability value widely used for this
purpose, and it allows us to locate paths or edges that should be optimized
to enhance the design. Designers can identify critical paths or edges to be
targeted, and we can develop statistical design automation algorithms using
the criticality. For example, a statistical discrete gate sizing algorithm sizes
up the gates with high edge criticality and sizes down those with low edge
criticality iteratively. However, current efficient computational methods add
a significant amount of error in the process of the translation from the sen-
sitivities to the probability values, and the error leads to sub-optimal robust
designs. Thus, we require computational methods with higher accuracy for
computing the probabilities.
1.2 VLSI Testing Applications
Due to the ever increasing variation, yield loss caused by parametric
faults is greater than that by catastrophic faults. Catastrophic faults have
been of concern for a very long time, because they are a main yield loss mech-
anism in previous technologies. To capture their behaviors in the functional
domain, various fault models such as the stuck-at fault model and the transi-
tion fault model have been proposed, and many effective testing methodologies
have been developed using these models. Therefore, defective parts have been
screened without significant increase of the test cost. However, effective testing
methodologies to deal with parametric faults do not yet exist.
3
In the models for catastrophic faults, fault coverage has been widely
used as a test quality metric, by which tests are evaluated. If test patterns
are evaluated as insufficient, we can add more test vectors. Test vectors come
from various sources. Sometimes they are written manually. They are of-
ten generated randomly or by an ATPG (automatic test pattern generation)
tool. Also, they can be part of existing applications. From these sources, we
select more test vectors and add them to the test flow. However, since this
will increase the test time and test cost, we are encouraged to improve test
techniques and develop better test methodologies. ATPG algorithms can be
elaborated, or better test compaction techniques can be introduced. Random
test vector generators, no matter where they exist (e.g., in ATPG tools or on
chip for built-in self test) can be biased to increase fault coverage. If all these
efforts are not effective, we may try design-for-test (DFT). For example, by
inserting scan chains, the complex sequential ATPG problem can be changed
into a simple combinational ATPG problem which allows us to achieve high
test coverage at small cost. Through these efforts, test patterns become suf-
ficient to meet a desired test quality goal. If test patterns are evaluated as
exceeding the goal, we can remove some test vectors from the test flow, which
will decrease the test time and test cost. We may be able to remove some DFT
circuits to save area and power. This testing eco-system is enabled by a simple
and effective test metric. For catastrophic faults, fault coverage delivers the
goods as the test metric, since it is easy to calculate and well correlated to
DPPM (defective parts per million), an ultimate test quality measure used in
4
testing.
However, fault coverage for parametric faults is not an effective test
metric. Conventionally, the path delay fault model has dealt with parametric
delay faults. The fault coverage of the path delay fault model is the percentage
of paths to be tested over the number of total paths. The fault coverage in
the path delay fault model is not well correlated with DPPM, because the
fault probabilities of paths are very different and the variations of path delays
have complex statistical correlation. Due to the absence of an effective test
metric, the testing eco-system does not work properly. Not only tests do not
have been evaluated correctly, but also the need for better test techniques and
methodologies has been hidden.
We believe SSTA can play a key role in the testing eco-system for para-
metric faults for several reasons. First, it is efficient. It is a linear time method
with respect to the circuit size. Second, it can calculate DPPM directly, so
the correlation between the test metric and the DPPM need not be worried.
Third, a lot of effort has already been made to develop SSTA methodology,
and the algorithms and modeling techniques for SSTA are already mature.
Therefore, we can develop testing methodologies on top of them.
1.3 Parameterized Statistical Timing
Most recent SSTA algorithms [60, 10, 71] employ a parameterized de-
lay model to efficiently capture correlation between delays. Under the linear
5
parameterized delay model, a (gate or wire) delay d is represented as




ei∆Xi + en+1∆Rd (1.1)
where e0 = E[d], ei = ∂d/∂∆Xi, en+1 = ∂d/∂∆Rd, ∆Xi is a normalized
random variable representing a global source of variation, and ∆Rd is a nor-
malized random variable representing the independent random variation of d.
The independent random variation is a source of variation that affects the
delay d only (e.g. random dopant fluctuation of Vth). The global source of
variation is a source of variation that affects many delays.
The sum of two timing quantities represented in the form of (1.1) can
be easily represented in the same form again. However, the maximum of
two timing quantities of the form (1.1) does not fit in the same form, but
we represent it approximately in the form using moment-matching type algo-
rithms [10, 71]. Therefore, all timing quantities such as delay, arrival times,
required arrival times, slacks are represented in the same form, and we call the
form the canonical form [71]. Given two timing quantities A and B represented
in the canonical form, we can easily obtain
P (A ≥ B) = Φ(α) (1.2)
where
α = (E[A]− E[B])/a,
a =
√











This is shown in [17] and this probability is called the tightness probability of
A over B.
1.4 Organization of the Dissertation
In Chapter 2 we present a technique for capturing topological corre-
lation in arbitrary block-based statistical static timing analysis (SSTA). We
interpret a timing graph as an algebraic expression made up of addition and
maximum operators. We define the division operation on the expression and
propose algorithms that modify factors in the expression without expansion.
As a result, the algorithms produce an expression to derive the latest arrival
time with better accuracy in SSTA. Existing techniques handling reconvergent
fanouts usually use dependency lists, requiring quadratic space complexity.
Instead, the proposed technique has linear space complexity by using a new
directed acyclic graph search algorithm. Our results show that it outperforms
an existing technique in speed and memory usage with comparable accuracy.
More importantly, the proposed technique is not limited to SSTA and is poten-
tially applicable to various issues due to reconvergent paths in timing-related
CAD algorithms.
In Chapter 3, we present a method to compute criticality probabili-
ties of paths in parameterized statistical static timing analysis (SSTA). We
partition the set of all the paths into several groups and formulate the path
criticality into a joint probability of inequalities. Before evaluating the joint
probability directly, we simplify the inequalities through algebraic elimina-
7
tion, handling topological correlation. Our proposed method uses conditional
probabilities to obtain the joint probability, and statistics of random variables
representing process parameters are changed because of given conditions. To
calculate the conditional statistics of the random variables, we derive analytic
formulas by extending Clark’s work. This allows us to obtain the conditional
probability density function of a path delay, given the path is critical, as well
as to compute criticality probabilities of paths. Our experimental results show
that the proposed method provides 4.2X better accuracy on average in com-
parison to the state-of-art method.
In Chapter 4, we present a new path selection algorithm for delay fault
testing in a statistical timing framework. Existing algorithms which consider
correlation between path delays use an iterative process for each path or de-
fect and require a Monte Carlo simulation for each iteration to calculate the
conditional fault probability. The proposed algorithm does not require the
iteration process and selects a requested number of paths simultaneously once
it performs a statistical timing analysis at the beginning. If selection of k
paths is required in a set of paths, it partitions the set into two path sets
and determines how many paths should be selected in each path set out of
the k paths. It recursively continues this process and ends up with k paths.
The partitioning is easily performed during the recursive traversal of a cir-
cuit, which produces an implicit path tree, where paths are already grouped
based on their prefix or suffix. Experimental results show the proposed algo-
rithm can effectively use the correlation to generate high quality path sets. In
8
the face of large-scale process variations, statistical timing methodology has
advanced significantly over the last few years, and statistical path selection
takes advantage of it in at-speed testing. In deterministic path selection, the
separation of path selection and test generation is known to require time con-
suming iteration between the two processes. In Chapter 5, we show that this
in statistical path selection is not only the case, but also the quality of results
can be severely degraded even after the iteration. To deal with this issue,
we consider testability in the first place by integrating a SAT solver, and this
necessitates a new statistical path selection method. We integrate the SAT
solver in a novel way that leverages the conflict analysis of modern SAT solvers
which provides more than 4X speed-up, without special optimizations of the
SAT solver for this particular application. Our proposed method is based on
a generalized path criticality metric which properties allow efficient pruning.
Our experimental results show that the proposed method achieves 47% better
quality of results on average, and up to 361x speedup compared to statistical
path selection followed by test generation.
9
Chapter 2
A Hierarchy of Subgraphs Underlying a
Timing Graph and Its Use in Capturing
Topological Correlation
2.1 Introduction
The increasing variability in the nanometer regime, combined with
shrinking technology parameters, has posed new challenges in static timing
analysis. Circuit timing is subject to many factors including process param-
eters, voltage and temperature. These are given as parameters in the timing
analysis flow. Some of them are provided as fixed constants and others are
provided as uncertain variables with ranges or probabilistic distributions. The
increasing variability requires timing analysis to take the parameters that were
considered as fixed constants in the previous technologies as uncertain vari-
ables. It is too pessimistic to assume the worst case for the increased number
of uncertain parameters, and the traditional corner-based approach does not
scale well with the increased number of parameters. Thus a research direction
has been to look for an alternative timing analysis method.
Block-based statistical timing analysis is a popular method. In the
early stage of the development of the approach, each gate and wire delay was
10
considered as a random variable (an uncertain variable with probabilistic dis-
tribution), and they were assumed to be independent of each other. This
reduces the computational complexity, while circuit timing is estimated con-
servatively. To capture the correlation between the random variables, various
techniques have been proposed; these produce less conservative estimates at
the cost of higher computation time [40]. The canonical delay model pro-
posed in [71] directly represents the underlying parameters that affect gate
delays and wire delays as random variables, and all timing quantities as a
linear affine form of the variables. This technique became popular for its low
computational overhead for capturing the correlation between gate and wire
delays.
Despite these efforts to reduce the conservatism of the estimates, the
number of uncertain parameters, or the conservatism induced by the uncertain
parameters, is expected to increase significantly in the near future. Random
dopant fluctuations are estimated to affect about 5% (STD/MEAN) of gate
delays, about 11% of setup times, and up to 25% of clock-to-output delays in
70nm devices [49]. The amount of variation is expected to be greater in 32nm
node or beyond because a smaller number of dopants dominate the threshold
voltage. Line-edge roughness also adds a certain amount of variation to each
gate delay. To capture these types of variations precisely, timing analysis is
required to handle the number of parameters greater than the size of a circuit.
In multi-corner static timing analysis, such a large number of random sources
is not easily addressed [38, 29]. Block-based SSTA that efficiently captures
11
the variations usually ignores topological correlation between arrival times,
since high computational complexity is required for dealing with the complex
correlation between that many random variables, again adding conservatism
to the estimation.
There exists some previous research taking the topological correlation
into account. In [20], reconvergent fanouts that cause topological correlation
are detected through dependency lists. The dependency list of a gate contains
all the gates whose arrival times affect the arrival time of the gate. Each
input of a gate has a dependency list propagated from the fan-in of the gate,
and the common gates in the dependency lists (common supporting gates) are
identified. The arrival time of the gate is derived from the common gates. Since
the dependency lists can consume a large amount of memory, it takes a user-
defined parameter to specify the size of dependency lists or to carry forward the
n most recent gates where n is also user-defined. An algorithm proposed in [4]
selects a specific type of gate out of common supporting gates of each gate.
From the selected gates, topological correlation can be effectively taken into
account using an enumeration technique based on conditional probabilities.
The worst case computational complexity is exponential, so the authors in [4]
selectively perform the enumeration. In [6], an exact algorithm is proposed
which interprets a timing graph as a Bayesian network. While the worst case
run time is still exponential, it grows with the largest clique size rather than
the circuit size. Recently, an approach was proposed in [87, 84] which captures
topological correlation by extending the popular canonical delay model. Since
12
it requires large amounts of memory to capture random sources whose number
is greater than the circuit size as the canonical form, the authors also proposed
a pruning technique in [88].
The disadvantages of the previous approaches can be summarized as
follows. The exact methods in [4, 6] require exponential run time in the worst
case. Even approximation methods in [4, 20, 87] require large amounts of mem-
ory to maintain dependency lists (the extended canonical form in [87] is similar
to the dependency lists). Actually, many techniques to handle reconvergent
fanouts in a graph including ones to simply detect them, use dependency lists
and have large memory requirements [80, 59].
The major contributions of this chapter can be summarized as follows.
• We show that a timing graph has a hierarchy of specially defined sub-
graphs, called ellipse graphs. It provides a new interpretation of timing
graphs.
• We propose a new directed acyclic graph search algorithm, which allows
us to traverse the ellipse graphs. Using the algorithm, we can deal with
reconvergent fanouts without dependency lists in linear space complexity.
• We interpret a timing graph as an algebraic factored form and propose
an algorithm that modifies factors in the expression without expansion.
Thus it can explore numerous equivalent algebraic expressions efficiently.
This can be considered as a generalization of the techniques proposed
in [4, 6].
13
• We propose various criteria to determine an algebraic expression that
minimizes the topological correlation error.
2.2 Preliminaries
A literal is an atomic formula (atom). A max-plus expression (MPE) is
an algebraic expression made up of addition operations, maximum operations
and literals. The number of literals in an MPE f is denoted by ρ(f). Note
that maximum and addition have the following algebraic properties.
• Associative
max(max(x, y), z) = max(x,max(y, z))
• Commutative
max(x, y) = max(y, x)
• Distributive
x+max(y, z) = max(x+ y, x+ z)
Note that only addition is distributive over maximum. Let · denote maximum
operation for simplicity. For example,
x+max(y, z) = x+ y · z = (x+ y) · (x+ z)1.
A timing graph is a directed acyclic graph (dag) G = (V,E) where each edge
e is associated with a delay variable de. Each vertex v is associated with a
1These notations may be counterintuitive because + is not distributive over · in elemen-
tary algebra. It is helpful to consider it as a Boolean expression.
14
value δ(v), called a level, such that δ(w) < δ(v) for all (w, v) ∈ E. A source
is a vertex with no incoming edge and a sink is a vertex with no outgoing
edge. A sequence of vertices (v1, ..., vn) is a partial path if (vi, vi+1) ∈ E for
all i = 1, ..., n − 1. A partial path (v1, ..., vn) is a path if 1) v1 is a specified
vertex or a source and 2) vn is a specified vertex or a sink. If a vertex v is
reachable from a vertex u, then u is a predecessor of v and v is a successor of
u. If there is an edge from u to v, then u is a direct predecessor of v, and v is
a direct successor of u. The sets of direct predecessors and successors of v are
denoted by N−(v) and N+(v), respectively. An arrival time is a function of
delay variables and can be represented by a max-plus expression. The PERT
form of the arrival time of a vertex v is an MPE corresponding to
at(v) =
{ 0 if v has no incoming edge;
max
u∈N−(v)
{d(u,v) + at(u)} otherwise.
Figure 2.1 shows a simple timing graph and the arrival times repre-
sented in the PERT form. The PERT form of at(v3) has 6 variables and 8
literals.
Figure 2.1: In static timing analysis, a dag is topologically sorted, and the
arrival time of each vertex is calculated in the topological order.
15
2.3 Problem Formulation
In conventional static timing analysis, arrival times are obtained by
evaluating the PERT forms. Block-based SSTA also uses PERT forms, where
variables are random variables. When evaluating a PERT form, block-based
SSTA considers all literals to represent different variables, which results in
topological correlation error. We will consider an example from Figure 2.1.
Suppose that all variables in Figure 2.1 are independent of each other. The 4th
and 7th literals in at(v3) represent the same variable a, but block-based SSTA
consider the two literals to represent two independent variables, respectively,
and the correlation between the two literals is ignored. If an arrival time
is represented as an MPE where the number of literals equals the number
of variables, then we can derive the arrival time accurately. This actually
happens in tree-structured circuits (i.e., fanout-free circuits). Also, that is the
case for at(v1) and at(v2), and they do not have topological correlation error.
Thus, the ultimate goal of this paper is to find such a MPE for the latest
arrival time of a general circuit.
Due to the algebraic properties, there are a number of MPEs equivalent
to a PERT form. For example, using the distributivity, we can find MPEs
equivalent to the PERT form of the arrival time of v3 as follows.
at(v3) = (e+ (c · (d+ (a · b)))) · (f + (a · b)) (2.1a)
= (e+ c) · (e+ d+ (a · b)) · (f + (a · b)) (2.1b)
= (e+ c) · (((e+ d) · f) + (a · b)) (2.1c)
16
Even if there are numerous, equivalent MPEs, the desired MPE does not exist
for the latest arrival time in a general circuit. Nevertheless, we observe that
some MPEs produce significantly less topological correlation error than others.
Thus, if we can explore equivalent MPEs and can select good one intelligently,
we can minimize the error. The exploration process is called refactoring, and
in this chapter, we propose efficient refactoring algorithms that, for the latest
arrival time of a given time graph, find an equivalent MPE which produces
less topological correlation error.
2.4 Refactoring Algorithms
In this section, we first present a refactoring algorithm that does not
require delay information, called static refactoring. If there is only one literal
representing a variable in an MPE, the literal is original. If there are two or
more literals for a variable, one is original and the others are replicated. For
example, Eq.(2.1a) has 2 replicated literals. Assuming each replicated literal
causes the same amount of error, the static refactoring uses the number of
replicated literals as a measure of error. Thus, Eq.(2.1c), which has 1 replicated
literals, is better than Eq.(2.1a) in the static-refactoring sense. Therefore, we
try to find an MPE with the minimum number of replicated literals. Since the
number of the original literals is invariant in equivalent MPEs, we simply use
the total number of literals as the measure.
17
2.4.1 Division
In multi-level synthesis, there have been a number of studies on factor-
ing algorithms, which minimize the number of literals in an algebraic expres-
sion using the distributive property [8]. They are based on an operation, called
division, which is defined as follows in our notation. Division is an operation
which, given MPEs f and p, find MPEs q and r such that f = (p + q) · r.
We say that the division of dividend f by divisor p generates quotient q and
remainder r. For example,
f = (a · b+ c) · (a · b+ d) · (e+ h)
can be divided by a · b and be expressed as follows:
f = (a · b+ c · d) · (e+ h).
Note that only 6 literals are required rather than 8 literals. The factoring
algorithms have a typical form. Given an expression, they find a good divisor,
by which the expression is divided. This process recursively continues on
the divisor, the quotient and the remainder. In this sense, the expression is
maximally factored. For an MPE of the latest arrival time of a circuit, we may
be able to apply one of the algorithms. However, since they take a maximally
expanded form as input (they are not refactoring but factoring algorithms), it
is required to enumerate all paths in the circuit. Moreover, they can hardly
handle the number of literals and terms in the expanded form. Thus, we need
to develop a new algorithm for refactoring.
18
2.4.2 Ellipse Graphs
Definition 1. If all paths starting from a vertex s contain a vertex t, then t
is an antipodal vertex of s.
Definition 2. The antipode t of a vertex s is an antipodal vertex of s such
that δ(t) < δ(u) for all antipodal vertices u 6= t of s.
Definition 3. Let G = (V,E) be a dag with a single source and a single sink
and s ∈ V be a vertex such that |N+(v)| ≥ 2. Let t be the antipode of s. Then,
the ellipse graph of s, s-ellipse graph, is a subgraph G′ = (V ′, E ′) of G such
that 1) V ′ and E ′ are the set of vertices and edges, respectively, reachable from
s, but not from t in G and 2) V ′ contains t.
We can define an ordering relation for G1 = (V1, E1) and G2 = (V2, E2)
as follows. G1 ≤ G2 if and only if V1 ⊆ V2 and E1 ⊆ E2. The set of the ellipse
graphs in G become a partially ordered set. An ellipse graph G1 is a parent
(ellipse) graph of G2 if G2 < G1 and there is no smaller ellipse graph than G1
in the circuit graph.
Theorem 1. Let G = (V,E) be a dag with a single source and a single sink
and s, v ∈ V be vertices with more than one edge such that v is reachable from
s. Also let Gs and Gv be the ellipse graphs of s and v, respectively. Let t and
w be the antipodes of s and v, respectively. If v is in Gs, then δ(w) ≤ δ(t) and
Gv < Gw.
19
Figure 2.2: Ellipse graphs and the Hasse diagram representing their partial
order
Proof. If w = t, δ(w) = δ(t). We consider the case w 6= t. In G, there exists
a path (v, ..., w) that does not contain t. Otherwise, w is not the antipode of
v. Suppose that δ(w) > δ(t). Then, there exists a path r = (s, ..., v, ..., w, ...)
in G such that t /∈ r. This is a contradiction. All vertices and edges in Gv are
reachable from s, but they are not reachable from t since δ(w) ≤ δ(t). Thus
Gv < Gw.
Figure 2.2 shows an ellipse graph and its descendant ellipse graphs. The
set of all the ellipse graphs is a partially ordered set and can be represented
as a Hasse diagram.
2.4.3 Topological-order First Search
In this subsection, we propose an algorithm, topological order first
search (TFS), to traverse an ellipse graph. The TFS takes a vertex s as
20
input in a dag with a single source and a single sink. Prior to start of TFS,
the graph is topologically sorted, and the levels are used as keys in the heap
Q of TFS.
Algorithm 1 TFS († TFS∗)
Input: A vertex s in a dag G with a single source and a single sink
Output: The antipode of s (†the sink of G)
1: v ← s, Q← ∅
2: while v = s or Q 6= ∅ († v is not the sink) do
3: for each u ∈ N+(v) do
4: insert u to Q if discovered for the first time
5: end for
6: v ← EXTRACT −MIN(Q)
7: end while
8: return v
Lemma 1. Let v be the vertex being visited and u be a vertex to be visited
after v by TFS∗. Then, δ(u) ≥ δ(v).
Proof. In order for u to be visited after v, u is in the heap Q or u is a successor
of a vertex in the heap Q. If u is in the heap Q, δ(u) ≥ δ(v) by the heap
property. If u is a successor of a vertex w in the heap Q, then δ(u) > δ(w) ≥
δ(v) by the topological sorting and the heap property.
Lemma 2. Let v be a vertex being visited by TFS∗ running on a dag G =
(V,E) from a vertex s ∈ V . Then, v is an antipodal vertex of s if and only if
v 6= s and the heap Q is empty.
Proof. We first prove the “if” part. Let t be the sink of the given graph G.
Note that t is an antipodal vertex of s since the given graph G has one sink.
21
Suppose that v is not an antipodal vertex of s. Then, since v 6= t and v 6= s,
δ(s) < δ(v) < δ(t) and there exists a path p = (u0 = s, ..., un = t) such that
v /∈ p. Let i be the largest integer such that δ(ui) < δ(v). Note that ui is
visited before v by Lemma 1 and ui+1 is discovered. If it is the first time to
discover ui+1, ui+1 is inserted to the heap. Otherwise, ui+1 sits in the heap.
Note that δ(ui+1) ≥ δ(v). If δ(ui+1) = δ(v) and ui+1 is extracted first, then ui+2
is discovered and sits in the heap when v is being visited. If δ(ui+1) = δ(v) and
v is extracted first, then ui+1 is in the heap. If δ(ui+1) > δ(v), v is extracted
first and ui+1 is in the heap. Thus the heap is not empty when v is being
visited. Let us now prove the “only if” part. Suppose the heap is non-empty
and let h be an element in the heap. Then δ(v) ≤ δ(h) by the heap property.
If we add π[u]← v statement before the insertions to Q, it shows a path from
s to h, which does not contain v because the π assignment for v cannot be
executed before. If a path exists such that (s, ..., h, ..., v, ...), then δ(h)¡δ(v)
by the topological sorting. This is a contradiction and there is no such path.
Therefore, a path exists that does not contain v.
Theorem 2. Given a dag G = (V,E) with a single source and a single sink
and a vertex s ∈ V such that |N+(s)| ≥ 2, TFS traverses the s-ellipse graph
in the topological order.
Proof. Obvious from Lemma 1 and Lemma 2.
22
2.4.4 Static Refactoring
In this subsection, we explain a static refactoring algorithm in detail.
The algorithm takes a timing graph as input. In the beginning, we add a
supersource and an edge with zero delay from the supersource to each source.
We also add a supersink and an edge with zero delay from the supersink to each
sink. Then all paths in the timing graph start from the supersource and end
in the supersink. We call the modified timing graph the root graph. The root
graph is topologically sorted and REFACTOR starts with the supersource.
Without loss of generality, we assume the ellipse graph of the supersource is
the root graph. If we add an edge with zero delay from the supersource to
the supersink, the assumption becomes true, but it may degrade the quality
of results. Thus, we can use the following algorithm instead: if the sink of the
ellipse graph being traversed by REFACTOR is not the supersink, we continue
to call REFACTOR with the sink until it becomes the supersink, and then add
all the results up which results in the latest arrival time.
If the variable divide is false for all vertices, REFACTOR performs
TFS, and hence traverses the root graph in the topological order, producing
the arrival times in the PERT form. Let v be a vertex with more than one
outgoing edge in an ellipse graph G and u be the antipode of v. Let fv and fu
be MPEs of the arrival times of v and u, respectively. If we divide fu by fv,
fu can be represented as follows.
fu = (fv + q) · r
23
We can generate q and r by using the following division algorithm.
Since fu is the maximum of the delays of paths that end in u, we can partition
the paths into the two path sets Ph and Pr, where Ph contains paths that pass
through v and Pr contains the others. Let Pq be a path set that contains all















Therefore, an MPE of the latest arrival time of the v-ellipse graph can be
q, and if v is eliminated from G, an MPE of the arrival time of u can be r.
Figure 2.3(a) and (b) illustrates the division fu by fv in a graph.
This division algorithm is implemented as follows. Suppose that REFAC-
TOR runs on the ellipse graph G and encounter v. If divide is true for v,
REFACTOR excludes the delays of the paths that pass through v by setting
at[v] = NIL. Then, the arrival time of u equals r. In addition, it creates a
temporary vertex whose arrival time is (fv + q) and connects it to u to per-
form the max operation on r and (fv + q), making fu into the divided form.
If REFACTOR encounters another vertex with more than one outgoing edge
before it reaches u, it can perform a division operation in the same way. Thus,
if the temporary vertex is not considered, fu becomes an optimized (refac-
tored) form of r. To generate q, it is necessary to traverse the v-ellipse graph
separately. Note that an ellipse graph is a dag with a single source and a
single sink. Besides, since it is already topologically sorted, we can recursively







(a) an original timing graph (b) division (c) quotient and remainder
(d) quotient (e) refactoring of quotient (f) refactoring of remainder
v
u
Figure 2.3: The implication of division in a timing graph and recursive refactoring example
25
sense, fu is maximally refactored similar to typical factoring algorithms. Once
an ellipse graph is separately traversed, the result is stored and used again. If
the Hasse diagram is considered that represents the partial order of all ellipse
graphs in the root graph, REFACTOR can be deemed to perform the depth
first search in the Hasse diagram.
While REFACTOR traverses the v-ellipse graph Gv, if it encounters
a vertex w with more than one outgoing edge, it can traverse the w-ellipse
graph, Gw. Then Gv is a parent graph of Gw. Consider an edge e in Gw. If in
Gv, all the paths that contain e pass through w, e is a free edge of Gw. The
literal corresponding to the edge is a free literal. Thus, given an ellipse graph,
its free literals depend on its parent graph. For example, in Figure 2.3(e), the
v-ellipse graph is the parent graph of the w-ellipse graph, which has 3 free
literals (line arrows), while it has 2 free literals when the parent graph is the
root graph as shown in Figure 2.3(f).
If REFACTOR divides fu by fv, replicated literals of some original lit-
erals in fu can increase and we call the increase cost. Similarly, replicated
literals of some original literals can decrease and we call the decrease revenue.
Depending on cost and revenue, ρ(fu) can increase or decrease with the divi-
sion. Thus we need to estimate profit(v), the decrease of ρ(fu) by the division,
and REFACTOR perform the division only if profit(v) > 0. We consider ρ(q)
as cost, but we can subtract the number of the free literals. The number of free
literals of a ellipse graph cannot be stored with the refactoring result of the
ellipse graph because it depends on the parent graph. Thus we conservatively
26
Algorithm 2 REFACTOR
Input: A vertex s in a dag G with a single source and a single sink, refactored
Output: The latest arrival time in the s-ellipse graph
1: v ← s
2: Q← ∅
3: at[s]← 0
4: while v = s or Q 6= ∅ do
5: divide← false
6: if v 6= s and |N+(v)| ≥ 2 then
7: if refactored[v] = NIL then
8: REFACTOR(v, refactored)
9: end if
10: (antipode, propat)← refactored[v]




15: if divide = true then
16: at[v]← NIL
17: at[vt]← propat where vt is a new vertex and d(vt,antipode) = 0
18: FIt[antipode]← FIt[antipode] ∪ vt
19: insert antipode to Q if discovered for the first time
20: else
21: for each u ∈ N+(v) do
22: insert u to Q if discovered for the first time
23: end for
24: end if
25: v ← EXTRACT −MIN(Q)
26: att ← 0
27: for each u ∈ N−(v) ∪ FIt[v] do
28: e← (u, v)
29: if at[u] 6= NIL then
30: att ←MAX(att, de + at[u])
31: end if
32: end for
33: at[v] = att
34: end while
35: refactored[s]← (v, at[v])
36: return at[v]
27
estimate the number of the free literals by assuming the parent graph is the
root graph. We estimate revenue as follows. If the division is not performed on
v but on all subsequent vertices with more than one outgoing edge, fv occurs
|N+(v)| times in fu. If the division is performed on v, fv occurs only once in
fu. Thus we consider (|N+(v)|−1)×ρ(fv) as revenue. Therefore, we estimate
the decrease of ρ(u) by the division fu over fv as follows.
profit(v) = (|N+(v)| − 1)× ρ(fv)− ρ(q) + k (2.2)
where k is the number of free literals of the v-ellipse graph when the parent
graph is the root graph.
2.4.5 Dynamic Refactoring
In this subsection, we present a refactoring technique that requires
delay information, thereby called dynamic refactoring. The lower and upper




e , respectively. Similarly, the lower
and upper bounds of an MPE f are denoted by fmin and fmax, respectively.
The lower and upper bounds of the arrival time of a vertex v are denoted by
atmin(v) and atmax(v), respectively. We can obtain the observable bound b(v)
of a vertex v as follows.
b(v) =




An edge e = (w, v) is unobservable if atmax(w) + d
max
e < b(v). A vertex is
unobservable if all outgoing edges of the vertex are unobservable. A graph can
be reduced by eliminating all unobservable edges and vertices. It is important
28
to note that the reduction can be applied to all ellipse graphs as well as the
root graph. Therefore, we use TFS to calculate the lower and upper bounds
of the arrival times in the topological order in an ellipse graph. Also we use
another TFS to calculate reachable bounds in the reverse-topological order and
determine unobservable vertices and edges. Those two procedures are added
at the beginning of REFACTOR, and the ellipse graph is reduced. In the
reduced graph, we can use Equation(2.2) (dynamic-lits). However, in order to
use observable bounds more aggressively, we propose another profit criterion.
We define the spread Λ(f) of an MPE f as follows.
Λ(f) = fmax − fmin
We call the spread of the latest arrival time the output spread. We can obtain
the observable spread Λo(f, v) of an MPE f and a node v as follows.
Λo(f, v) =
{ fmax −max (fmin, b(v)) if fmax > b(v);
0 otherwise.
If the arrival time of a vertex v is represented by an MPE f , the observable
spread Λo(f, v) can be considered as the output spread caused by the spread
of the arrival time of v. We estimate the decrease of the output spread by the
division fu over fv as follows.
profit(v) = (|N+(v)| − 1)× Λo(fv, v)− Λo(fmaxv + q, u)
where u is the antipode of v and q is the refactored form of the latest arrival
time of the v-ellipse graph.
29
Figure 2.4: Refactoring takes a flat timing graph as input and generates a
hierarchy of ellipse graphs.
2.5 The Hierarchical Timing Graph
The result of refactoring can be represented in a hierarchical timing
graph where each component is an ellipse graph. Since each MPE corresponds
to a hierarchical timing graph, there are numerous, equivalent hierarchical
timing graphs. REFACTOR explores a part of them and produce a “good”
hierarchy in the sense implied by the given profit function. For example,
REFACTOR explores 9 possible hierarchies when the timing graph in Fig-
ure 2.2 is given as input, and Figure 2.4 shows two among them. The selection
of one among the 9 possible hierarchies is determined by the profit function.
We believe that this new point of view for a timing graph can foster several
applications in which the following properties may be used:
• Each component in the hierarchy becomes tree-like as a result of the
effort for dealing with the problem caused by re-convergent paths. Many
30
algorithms working on a graph can produce the optimal solution if re-
convergent paths do not exist (i.e., the input is a tree). Thus, refactoring
may be able to enhance the quality of results in the algorithms.
• Hierarchical analysis is a general techniques for dealing with the com-
plexity. Each component in the hierarchical timing graph is simple, and
we may be able to perform a more expensive but accurate analysis for
them.
• The more stages a signal goes though, the more uncertain the absolute
timing value of the signal is. Thus, it is very difficult to predict the inter-
actions of signals (e.g., MIS and crosstalk) in the middle of circuits. In
each component of the hierarchy, we can perform more accurate analysis
using the relative timing values from the common starting point. Also,
in SSTA, the arrival times become contaminated by the linear approxi-
mation error of the max operator as they propagate. In the hierarchical
timing graph, the analysis is started over in each component, and the
accuracy can be improved.
2.6 Notes on Implementation
Static timing analysis needs to consider both rising and falling transi-
tions. To deal with this, we run a conventional static timing analysis algorithm
on a given timing graph but instead of computing rising and falling arrival
times with direct predecessors and incoming edges, we create new vertices and
31
edges corresponding to them which compose a new timing graph with twice as
many vertices as the original one. We can simply run REFACTOR on the new
timing graph, and then rising and falling transitions can be considered easily.
We have proposed REFACTOR as a recursive function to explain the concept.
However, it is not necessary to implement it as a recursive function. We can
enumerate vertices in reverse topological order. During the enumeration, if a
vertex v with more than one outgoing edge is encountered, the v-ellipse graph
is traversed by calling REFACTOR. If another vertex w with more than one
outgoing edge is encountered during the traversal, the w-ellipse graph is al-
ready refactored since w comes first before v in the reverse-topological order.
Thus REFACTOR is not called recursively. This implementation is usually
more efficient in both CPU time and memory than the recursive version.
2.7 Exploring More Candidate Solutions
Figure 2.5: Divisors may not be re-factored if many antipodes are the super-
sink.
In every division operation of Algorithm 2, the dividend should be the
32
antipode of the divisor, which allows us to have developed the simple refac-
toring algorithm. However, this constraint reduces the solution space that are
explored by the refactoring algorithm. In theory, it is possible to perform the
division between any two vertices, which may be able to enhance the algo-
rithm. However, this may complicate the algorithm, increasing the runtime.
Thus, in this section, we propose a method that can enhance REFACTOR
with a slight modification.
According to our investigation to several benchmark circuits, the an-
tipodes of many vertices are actually the supersink in the timing graph, and
Figure 2.5 illustrates how refactoring is performed in this case. In Figure 2.5,
the antipodes of both v1 and v2 are the supersink. Suppose the following sce-
nario: we divide the supersink by v1, which results in the quotient q1. Then,
in order to optimize the quotient q1, we divide the supersink by v2. Then we
continue to examine the quotient q2 for possible division.
From this example, we can notice that REFACTOR can optimize quo-
tients well in a recursive manner, but divisors, represented by dotted ellipses in
Figure 2.5, may remain unoptimized. In particular, this is more problematic in
circuits with a high logic depth because the depth of the recursive refactoring
is also deep and a large part remains as it is. Thus, whenever REFACTOR
performs division, we optimize the divisor using a dedicated algorithm. The
basic concept of the algorithm is as follows. Conceptually, we turn the sub-
graph corresponding to the divisor upside down and perform refactoring in the
same way as REFACTOR except the fact that dividends should be the sink
33
of the sub-graph.
Figure 2.6: An example of divisor refactoring
Figure 2.6 illustrates the proposed approach. Suppose that REFAC-
TOR takes the timing graph in Figure 2.6(a) as input and decides to divide
the vertex t by the vertex v. Then, the graph corresponding to the divisor
and the quotient is shown in Figure 2.6(b), and the one corresponding to the
reminder is shown in Figure 2.6(c). In Figure 2.3, we have focused the quo-
tient and the reminder, but our focus at this time is the divisor, the arrival
time of v. The graph representation of the divisor is surrounded by a box in
Figure 2.6(b). The representation of the divisor is overturned which is shown
in Figure 2.6(d). REFACTOR with the restriction, which will be denoted by
DIVREFACTOR later, is started from v, and in all divisions, the dividend
should be s. Suppose that s is divided by w. Then, the quotient and the
divisor are represented as in Figure 2.6(e), and the remainder is represented
34
Algorithm 3 DIVREFACTOR
Input: a dag G with a single source s and a single sink t, at
Output: The latest arrival time in the s-ellipse graph
1: v ← t
2: Q← ∅
3: rev at[t]← 0
4: while v = t or Q 6= ∅ do
5: divide← false
6: if v 6= t and |N−(v)| ≥ 2 then
7: propat← rev at[v] + at[v]




12: if divide = true then
13: rev at[v]← NIL
14: rev at[vt]← propat where vt is a new vertex and d(vt,s) = 0
15: FOt[s]← FOt[s] ∪ vt
16: insert s to Q if discovered for the first time
17: else
18: for each u ∈ N+(v) do
19: insert u to Q if discovered for the first time
20: end for
21: end if
22: v ← EXTRACT −MIN(Q)
23: att ← 0
24: for each u ∈ N+(v) ∪ FOt[v] do
25: e← (u, v)
26: if rev at[u] 6= NIL then
27: att ←MAX(att, de + rev at[u])
28: end if
29: end for
30: rev at[v] = att
31: end while
32: return rev at[v]
35
as in Figure 2.6(f).
The overall procedure is summarized in Algorithm 3. In a run of
REFACTOR that is started from a vertex s, if we decide to perform division at
a vertex v, we can invoke DIVREFACTOR to optimize the divisor further, and
the subgraph between the vertex s and v (i.e., the graph corresponding to the
divisor) becomes the input for DIVREFACTOR. In DIVREFACTOR, we tra-
verse the input graph backward (i.e., reverse topological order). Thus, for the
heap Q, the reverse topological order is used as the key (line 22). In REFAC-
TOR, before actual division is performed for each vertex, we pre-compute the
quotient and store the sum of the quotient and the corresponding divisor into
refactored. The sum is denoted by propat in REFACTOR. In DIVREFAC-
TOR, if division is performed at a vertex v, the quotient becomes at[v], which
is already computed in REFACTOR before DIVISOR REFACTOR is invoked.
Thus propat is calculated as line 7. Unlike REFACTOR, this procedure is not
done recursively, and the dividend is always the source s of the input graph.
2.8 Experimental Results
We have implemented the proposed algorithm in C++, and have used
ISCAS85 circuits as benchmarks. All experiments were performed on a 3.0GHz
2 Xeon X5450 Linux machine. All benchmarks are technology-mapped to the
TSMC 180nm standard cell library, and the nominal delay annotated to each
edge in the timing graph is the sum of the gate and wire delays obtained from
the Standard Delay Format (SDF) file.
36












(a) Discrete PDF SSTA














Figure 2.7: The c.d.f of c499 for γ = 20% shows the relative performance of
various criteria of refactoring.
37
Table 2.1: Numbers of literals (γ = 8%)
static dynamic-lits
ckt PERT refactor PERT refactor
c17 40 40 20 20
c432 280595 4440 112630 178
c499 808784 19208 119640 3005
c880 29508 8438 898 76
c1355 808688 16360 134312 3281
c1908 4865369 42059 1485010 4092
c2670 129076 13785 9263 308
c3540 13688338 72487 904875 4425
c5315 1582121 53288 238192 3401
c6288 7.4985x1016 100488501 9.1431x1015 5373024
c7552 1479776 79065 9494 227
In this experiment, global and spatially correlated variation are not
considered (i.e., the number of global sources of variation is zero in canonical
first-order forms). The delay of each edge has the independent variation follow-
ing a Gaussian distribution, and the 3σ point is γ of the nominal delay (i.e.,
the sensitivity coefficient to the independent variation is γµ/3 in canonical
first-order forms).
Table 2.1 shows the numbers of literals of the PERT (original) and
refactored forms for the latest arrival time of each benchmark circuit. The
static refactoring technique is used for the results in the static section. In the
dynamic section, the PERT forms are obtained from the reduced root graph,
and the refactored forms are optimized for the number of literals (dynamic-
lits).
Figure 2.7(a) compares various criteria of refactoring in a discrete PDF
38
Table 2.2: Comparison with existing methods for ISCAS85 benchmarks (γ=20%)
monte Canonical Model [71] EPCTM [87] refactoring + [71] vs. EPCTM
ckt µ σ ∆µ ∆σ CPU mem ∆µ ∆σ CPU mem ∆µ ∆σ CPU mem CPU mem
(ps) (ps) (ps) (ps) (s) (MB) (ps) (ps) (s) (MB) (ps) (ps) (s) (MB) (x) (x)
c17 151.3 4.8 0.4 -0.2 0 1.2 0 -0.3 0 1.3 0.4 -0.2 0 1.2 1.0 1.0
c432 2329.4 31.1 26.7 -10.1 0.002 1.6 -0.6 -2.3 0.028 9.9 0.3 0 0.005 1.5 5.6 6.5
c499 1983.1 17.6 9.9 -7.4 0.004 2.3 -5.9 -6.1 0.178 52.9 0.3 -1.2 0.147 3.0 1.2 17.6
c880 2157.9 23.8 0.6 -1 0.002 2.1 -1.1 -3 0.085 39.8 0.2 -0.1 0.006 1.9 14.2 20.8
c1355 2151.2 21.1 14.2 -9.3 0.004 2.3 -5.5 -7.3 0.178 52.4 0.3 -1.4 0.138 2.9 1.3 18.1
c1908 3179.1 33.6 25.4 -12.9 0.004 2.2 -3.7 -0.2 0.152 47.5 2.3 -3.3 0.057 2.4 2.7 20.1
c2670 2293 27.4 23.5 -9.4 0.007 3.1 -2.3 -2.9 0.468 150.1 4.3 -2 0.029 3.0 16.1 50.2
c3540 3283.8 28.9 13.8 -9.5 0.008 3.4 -2.5 -3.4 0.649 197.7 5 -3.5 0.147 4.9 4.4 40.6
c5315 3031 30.6 21.2 -10.6 0.014 5.0 -0.6 -2.2 1.38 579.1 5.4 -2.8 0.073 4.8 18.9 121.5
c6288 9618.6 61.8 92.3 -30 0.016 6.5 -3.7 -1.3 3.42 1187.4 42.7 -13.6 2.07 48.3 1.7 24.6
c7552 5196.3 60.4 37.6 -25.6 0.012 5.5 -6 2.6 1.6 739.2 4.5 -3 0.044 4.8 36.4 153.0
average 24.1 -11.4 0.006 3.2 - - 0.739 277.9 5.9 -2.8 0.246 7.1 9.4 43.1
39
Table 2.3: Comparison with existing methods for ISCAS85 benchmarks (γ=8%)
monte Canonical Model [71] EPCTM [87] refactoring + [71] vs. EPCTM
ckt µ σ ∆µ ∆σ CPU mem ∆µ ∆σ CPU mem ∆µ ∆σ CPU mem CPU mem
(ps) (ps) (ps) (ps) (s) (MB) (ps) (ps) (s) (MB) (ps) (ps) (s) (MB) (x) (x)
c17 150 2.3 0 0 0 1.2 0 0 0 1.3 0 0 0 1.2 1.0 1.0
c432 2311.8 14.3 9.3 -4.4 0.001 1.6 0.2 -0.5 0.027 9.9 0.2 -0.2 0.002 1.5 13.5 6.6
c499 1949.4 8.7 6.3 -3.4 0.004 2.3 -0.6 -2 0.178 52.9 0.6 -0.7 0.058 2.5 3.1 21.5
c880 2145.5 10.3 0.2 -0.2 0.003 2.1 0.2 -1 0.104 39.8 0.3 -0.2 0.003 1.9 34.7 21.2
c1355 2113.6 10.2 7.9 -3.9 0.004 2.3 0 -2.9 0.151 52.4 0.3 -0.6 0.042 2.4 3.6 21.6
c1908 3165.4 15 7.6 -3.9 0.004 2.2 0.2 0.1 0.152 47.5 0.4 -0.1 0.057 2.3 2.7 20.9
c2670 2270.7 12.7 8.4 -4 0.007 3.1 0.2 -0.8 0.469 150.1 1.9 -0.6 0.011 2.7 42.6 54.9
c3540 3263.6 12.5 4.4 -2.4 0.008 3.4 0.1 -0.8 0.649 197.7 1.5 -0.8 0.05 3.4 13.0 59.0
c5315 3018.1 14.5 7.9 -4.3 0.014 5.0 0 -0.5 1.839 579.1 2.7 -2.2 0.037 4.5 49.7 129.8
c6288 9602.1 25.7 21.3 -8.9 0.021 6.5 -0.6 -0.4 4.259 1187.4 7.4 -2.1 2.361 38.1 1.8 31.1
c7552 5186.9 26.6 5.2 -4.8 0.015 5.5 0.1 0 1.96 739.2 0.5 -0.6 0.025 4.6 78.4 160.6
average 7.1 -3.7 0.007 3.2 - - 0.890 277.9 1.4 -0.7 0.241 5.9 22.2 48.0
40
SSTA. For some benchmarks, static refactoring is as good as dynamic refac-
toring but Figure 2.7(a) shows their relative performances in general. Fig-
ure 2.7(b) compares them in the SSTA with canonical delay model [71] and
shows some distortion caused by the Gaussian approximation to results of max-
imum operations. The dynamic refactoring technique significantly improves
the accuracy of any arbitrary block-based SSTA.
The refactoring technique is compared against the SSTA with the ex-
tended pseudo-canonical timing model (EPCTM) proposed in [87], and Ta-
ble 2.2 and 2.3 show the cases where γ=20% and 8%, respectively. We have
implemented two versions of [87], but since the one using a spare matrix library
is worse than the other simple implementation in both CPU time and memory
consumption, we present only one version. Ignoring topological correlation in-
creases the means of latest arrival times and decreases the standard deviations
as denoted by the minus signs. The EPCTM is optimized for the accuracy and
can use less CPU time and less memory if the accuracy is compromised. The
refactoring technique preserves the rule of conservative estimation in static
timing analysis by construction, while EPCTM can overestimate the topolog-
ical correlation of arrival times which often results in optimistic latest arrival
times, as shown in all benchmark circuits in Table 2.2 and c499, c1908 and
c6288 in Table 2.3. Note that the refactoring technique consumes significantly
less amount of memory than EPCTM since it has linear space complexity.
Table 2.4 shows the enhancement by DIVREFACTOR. We perform
static refactoring for 4 benchmark circuits, and the number of literals and the
41


























Figure 2.8: The PDFs of the latest arrival time of c6288
42
Table 2.4: Improvement by Algorithm 3
refactoring refactoring-D B/A
ckt #literals, A CPU(s) #literals, B CPU(s)
c3540 72487 0.250 65795 1.300 9.23%
c5315 53288 0.258 48612 0.678 8.77%
c6288 100488501 1.684 58044667 13.822 42.24%
c7552 79065 0.447 72819 1.860 7.9%
runtime are shown in the second and the third columns. Then we perform
static refactoring where DIVREFACTOR is added (denoted by refactoring-
D), and the results are shown in the fourth and fifth columns. The reduction
of the number of literals by DIVREFACTOR is shown in the last column. By
additionally invoking DIVREFACTOR, we traverse divisors again and explore
more candidate solutions. As a result, we optimize the number of literals fur-
ther at a small cost of runtime. However, compared to the drastic reduction in
Table 2.1, the improvement by DIVREFACTOR is marginal for the following
reasons:
• Refactoring has eliminated numerous replicated literals already, and the
diminishing return is natural. Given that refactoring captures topolog-
ical correlation efficiently but it cannot be complete, we may be very
close to the limit we can reach.
• In the timing graph, the number of literals of arrival times at a low
level is usually smaller than that at a high level. Thus, the numbers
of replicated literals of divisors are usually small so there is not much
room for improvement. However, they are not that small in circuits with
43
many logic levels. Thus, the reduction in c6288 is decent compared to
the other circuits.
The second reason conversely implies that in most circuits, the original REFAC-
TOR produces good results even in the scenario that concerns us in Section 2.7.
Figure 2.8 shows the probability density functions (PDFs) of the latest
arrival time of c6288. As demonstrated in Table 2.2 and 2.3, c6288 is the most
complex topological correlation, which is very difficult to capture. For this
experiment, dynamic refactoring is employed. In a small amount of variation,
refactoring combined with DIVREFACTOR results in almost the same PDF
as Monte Carlo simulation even for c6288 as shown in Figure 2.8(a). In a large
amount of variation, the improvement by DIVREFACTOR becomes noticeable
but obtaining the exact PDF in this case is remained as a challenging problem,
as shown in Figure 2.8(b).
2.9 Conclusions
We have proposed a novel refactoring algorithm, and our experiments
have shown its improvements over an existing algorithm in terms of accuracy,
speed and memory consumption. In addition to the improvements in SSTA,
the refactoring algorithms provide a new interpretation of timing graphs. We
believe that the refactoring algorithms can be applied to more timing-related
EDA applications where reconvergent paths degrade the quality of results such
as critical path selection on SSTA [76] and timing optimization (e.g. buffer
44
insertion). If each gate and wire delay varies as design alternatives, then the
process that estimates the change of the latest arrival time by a set of design




Path Criticality Computation in
Parameterized Statistical Timing Analysis
3.1 Introduction
The timing critical path in a chip is of great interest for almost ev-
ery step involved in producing VLSI chips, from design to testing. Since the
critical path changes from die to die in the presence of process variations, po-
tentially critical paths are of importance in a design. During the design phase,
a set of potentially critical paths enables path-based SSTA to be performed,
which allows us to estimate parametric timing yield more accurately [57, 58].
If the design does not meet a desired yield level, the paths in the set can be op-
timized. In adaptive body bias or adaptive supply voltage techniques [45], the
potentially critical paths are often replicated in order to estimate the minimum
operating voltage without modifying the design. At the end points of these
paths, we can insert double-sampling flops for dynamic voltage scaling [24]
and/or observable points for debugging as in [81]. For post-silicon validation
and manufacturing test, these paths can be targeted by an ATPG tool, which
can generate the test vectors sensitizing the paths, and the test vectors can
be used to determine the operating frequency (speed-binning) of the chip or to
check if manufactured chips can be operated at the desired frequency.
46
Better prediction of the critical path provides major benefits in all
these applications, but due to increasing process variability it is becoming
more difficult to predict a small set of potentially critical paths in the pre-
silicon phases. However, the advent of statistical timing analysis provides a
tool to deal with the variability [7]. Recent statistical static timing analysis
algorithms [60, 40, 10, 71] are able to capture complex correlations of path
delays, which offers a chance of efficiently narrowing down potentially critical
paths even with the increased variability.
In order to construct a set of potentially critical paths precisely, it is
necessary to accurately compute the path criticality, which is the probability
that a path becomes the critical path. Several approaches are proposed to
calculate the path criticality. In [71], the authors consider the events for
each edge in the timing graph that determines the arrival time of the head
(the terminal vertex). If these events occur for all the edges on a path, the
path becomes critical. Since these events are assumed to be independent,
the path criticality is calculated by multiplying the probabilities of the events.
However, since the global sources of variation as well as topological correlation
make them correlated events, the independent assumption causes quite a large
error. To handle this issue, the authors in [83] consider the joint probability for
the correlated events, and the probability is evaluated using the max operator
provided by the SSTA framework. In [77], the path criticalities are obtained
by calculating the probability that each path delay is greater than the latest
arrival time. If the max operator and the arrival times in SSTA are accurate,
47
both approaches can provide accurate criticality values. However, current
block-based SSTA algorithms compromise the accuracy for reducing run times;
they usually ignore topological correlation and use a linear approximation to
the nonlinear max operation. Due to these issues, both approaches are still
far from computing precise values. Some issues in computing criticality using
the max operator are demonstrated in detail in [54].
The major contributions of this chapter can be summarized as follows.
• We propose a novel way to divide all paths in the circuit into several
disjoint sets. The partitioning method allows us to easily calculate the
complement path delay, the maximum of all the other path delays ex-
cluding the delay of a given path. The cutset method, which is widely
used for this type of partitioning, has difficulty in calculating the com-
plement path delay [77].
• Unlike [83], our delay model takes random independent variation into
account. Under this model, the path criticality becomes more difficult
to calculate because topological correlation of the random independent
component causes large error. In order to deal with this issue, we perform
algebraic elimination in our path criticality formulation, reducing such
a type of error.
• Instead of relying on the inaccurate max operator as in [77, 83] or Monte
Carlo sampling as in [54], we propose novel, analytic conditioning oper-
ation in parameterized SSTA. For this, we also develop a general result
48
on conditional statistics by extending Clark’s work [17]. Unlike [71], our
computation method does not use the independent assumption.
The rest of the chapter is organized as follows. Section 1.3 introduces
current SSTA techniques. Section 3.2 defines the path criticality and presents
issues with the existing work. Section 3.3 presents our proposed method in
detail. Section 3.4 discusses the normal approximation error of the proposed
method. Section 3.5 shows experimental results, and Section 3.6 concludes the
chapter.
3.2 Path Criticality
Let Ω denote the set of all the paths in the circuit. The delay between
two vertices (including path delays) is denoted by d.
Definition 4. A path p is critical if d(p) ≥ d(s) for all s ∈ Ω.
Definition 5. The criticality of a path p is the probability that the path p is
critical.
The criticality is denoted by λ. Thus we can write
λ(p) = P (
⋂
s∈Ω
d(p) ≥ d(s)). (3.1)
One of the computation methods proposed in [77] obtains the path criticality
by




We can obtain maxs∈Ω{d(s)} from SSTA. Then, Eq.(3.2) is evaluated effi-
ciently using Eq. (1.2), and this single comparison directly determines the
path criticality. In this method, while the operands of the maximum contain
the path delay being compared, this correlation is ignored in the computation.
To demonstrate this, we will consider an example from Figure 3.1. For this
Figure 3.1: A simple timing graph with edge delays of canonical forms. The
mean values are assumed zero for illustration purpose.
example, we assume that a1 = 0, b1 = 0, c1 = 0, d1 = 0 and will compute the
criticality of p1. The method of [77] is intended to evaluate
λ(p1) = P (∆Ra +∆Rd ≥
max{∆Ra+∆Rd,∆Rb +∆Rd,∆Rc}).
(3.3)
In parameterized SSTA, the variables on the left hand side in the inequality
are lumped into one variable to represent it in the canonical form. Similarly,
the maximum on the right hand side is lumped into one variable. So if we let
∆Rx = ∆Ra + ∆Rd and ∆Ry = max{∆Ra + ∆Rd,∆Rb + ∆Rd,∆Rc}, then
we can write
λ(p1) = P (∆Rx ≥ ∆Ry). (3.4)
50
While ∆Rx are ∆Ry are correlated (due to the topology), SSTA assumes that
they are independent. To reduce this error, the authors in [77] also propose
the re-construction method (another method in [77]) which excludes the path
delay from the maximum in a constant time. If the re-construction method is
used, we can compute the path criticality of p1 by
λ(p1) = P (∆Ra +∆Rd ≥ max{∆Rb +∆Rd,∆Rc}). (3.5)
In this case, the left hand side and the right hand side in the inequality becomes
less correlated, so the accuracy can be improved. However, it still ignores the
correlation induced by ∆Rd. This happens even in the case there is no re-
convergence path as in this example.
In addition to ignoring the topological correlation, both methods of [77]
heavily rely on the max operator. The max operator in SSTA causes some
error to fit the non-linear result of maximum operation into a single linear
form. Both the methods represent the maximum of numerous path delays as a
single canonical form, and the path criticality depends on the single canonical
form excessively, incurring much error. We will consider another example from
Figure 3.1 again. For this example, we will assume that a2 = 0, b2 = 0, c2 =
0, d2 = 0. Then, the re-construction method computes the path criticality by
λ(p1) = P ((a1 + d1)∆X1 ≥ max{(b1 + d1)∆X1, c1∆X1}). (3.6)
Figure 3.2 illustrates the linear approximation error to the maximum on the
right hand side in the inequality. The process that fits the non-linear curve
51
Figure 3.2: Depending on the mean values of d(p1) and d(p2) (not shown in the
equations) and the values b1, d1, c1, we consider two cases. As in the case 1, if
one path delay dominates the other path delay, the approximation is accurate.
However, it incurs some error in the case 2.
into a line loses some information. Simply, we can retain all the information
using two canonical forms to represent the two path delays. To keep all the
information for computing the criticality in this way, we may need as many
canonical forms as the number of paths in the circuit. However, since there
are usually numerous paths in a circuit, practically it is impossible to use that
many canonical forms. Thus, we cannot avoid the use of the max operator
completely, but we introduce a new operation, which allows us to compute the
path criticality from a desired number of canonical forms.
3.3 Proposed Algorithm
We partition Ω into m groups and compare the delay of the given path
to the maximum of all the path delays in each group. Then, the path criticality
52
can be written as




d(p) ≥ Ui) (3.7)
where Ui is the maximum of all the path delays in the i-th group. The way
that we partition Ω allows us to compute Ui from several arrival times very
efficiently. Thus we do not have to enumerate paths to calculate Ui. Instead of
computing (3.7) directly, we re-write it by performing algebraic elimination for
them inequalities. This process improves the accuracy by handling topological
correlation.
3.3.1 A Simple Timing Graph
Figure 3.3: A simple timing graph
In this subsection, we demonstrate the first two steps of the proposed
algorithm with a simple example. Let us consider a simple timing graph shown
in Figure 3.3. In this timing graph, Ω is {p1, p2, p3, p4} which is partitioned into
3 groups. We are to compute the criticality of the path p1. The comparison
with group 1 is trivially true so we can write
λ(p1) = P (V +W ≥ max(X +W,Y +W ) ∩ V +W ≥ Z). (3.8)
53
We can re-write it using the distributivity of addition over maximum and
performing algebraic elimination as follows.
P (V +W ≥ max(X +W,Y +W ) ∩ V +W ≥ Z)
= P (V +W ≥ max(X, Y ) +W ∩ V +W ≥ Z)
= P (V ≥ max(X, Y ) ∩ V +W ≥ Z)
(3.9)
As a result, W in the both sides of the first inequality is eliminated. We let
A1 = V,B1 = max(X, Y ), A2 = V +W and B2 = Z. Then, we can write the
path criticality of p1 by
λ(p1) = P (A1 ≥ B1 ∩ A2 ≥ B2) (3.10)
Note that there is no topological correlation between A1 and B1, and between
A2 and B2. Also note that this algebraic elimination is not done in the runtime
but is performed implicitly in the way we formulate the path criticality. This
will be clarified in the following subsection.
3.3.2 A General Timing Graph
In this subsection, we show how Ω is partitioned and the algebraic elim-
ination is performed in a general timing graph. Our computation algorithm
takes a timing graph as input. In the beginning, we add a source, and the
source is connected to each primary input via an edge whose delay is the ar-
rival time at the primary input. We also add a sink, and the sink is connected
to each primary output and each timing test vertex via an edge. The delay of
such an edge is the negative of the required arrival time of the corresponding
54
vertex. Then all paths in the timing graph start from the source and end
at the sink. The source and the sink are denoted by v0 and w, respectively.
This timing graph is called an augmented timing graph and the construction
process from a circuit is well illustrated in [77]. We perform parameterized sta-
tistical static timing analysis on the modified timing graph, which produces
the arrival time of each vertex. The arrival time is denoted by at. We are
given a path p = (v0, ..., vt = w) in the timing graph and to compute the
path criticality λ(p). Then, according to the path p, we will partition Ω (the
set of all the paths in the timing graph) into t + 1 groups (i.e., m = t + 1).
Note that the partitioning depends on the given path p, and similar to the
algebraic elimination, this partitioning is not done in the runtime and is im-
plicit in our path criticality formulation. The t + 1 groups are denoted by
G0, ..., Gt. The group G0 contains the path p. The group G1 contains all the
paths that pass through (v1, ..., vt) excluding the paths in G0. Similarly, the
group G2 contains all the paths that pass through (v2, ..., vt) excluding the
paths in G0 and G1. Generally, for k = 1, ..., t, the group Gk contains all the
paths that pass through (vk, ..., vt) excluding the paths in G0, ..., Gk−1 groups.
These groups are mutually exclusive and the union of these groups becomes
Ω (i.e., the division by these groups is a partition of Ω). This partitioning is
illustrated in Figure 3.4. In this example, G1 does not contain any path since
v1 has only one incoming edge. If we remove all the vertices with only one
incoming edge by merging them with their direct predecessor, this case does
not happen. Thus, without loss of generality, we assume that every vertex
55
Figure 3.4: Given a path p, we can partition Ω as above. The top figure (group
0) shows the given path p.
has two or more incoming edges (i.e, every group is not empty), and we will
write the following derivation without handling such a case specially. Let Uk
be the maximum of all the path delays in the group Gk. Then, U0=d(p) and
for k = 1, ..., t, we can easily compute Uk by
Uk = max
u∈N−(vk)
{at(u) + d(u,vk) + d(vk,...,vt)|u 6= vk−1}. (3.11)
56
where N− denotes the set of direct predecessors. Using the algebraic elimina-
tion as in (3.9), we obtain
















{at(u) + d(u,vk)|u 6= vk−1}. (3.14)
Since we put the path p into a separate group (G0) and the comparison d(p) ≥
U0 is always true, we can exclude the case k = 0 in this step.
In parameterized SSTA, since all timing quantities are represented in
the canonical form, Ai and Bi in (3.12) are of the canonical form, and the
key problem becomes the accurate calculation of the joint probability. This
calculation can be done by numerical integration, but the computational com-
plexity of performing multi-dimensional numerical integration is prohibitive.





Ai ≥ Bi) = P ( min
i=1,...,t
{Ai −Bi} ≥ 0)
= P ( max
i=1,...,t
{−Ai + Bi} ≤ 0).
(3.15)
However, similar to (3.2), this method also squeezes many timing quantities
into a single canonical form, which incurs large approximation error. The
57
problem of the linear approximation error has been raised in many statistical
timing studies [83, 54, 85], but to compute a joint probability more accurately,
all these studies employ Monte Carlo simulation in the end. In an environ-
ment with a small number of sources of variation, Monte Carlo simulation can
improve the accuracy at a reasonable runtime overhead. However, considered
the increasing number of sources of variation in nanometer technologies, the
runtime overhead can be excessive. Therefore, we develop an analytic method
to compute the joint probability efficiently.
3.3.3 Conditional Probability Computation






Ak ≥ Bk) = P (A1 ≥ B1) · P (A2 ≥ B2|A1 ≥ B1)
· P (A3 ≥ B3|A1 ≥ B1, A2 ≥ B2)
...
· P (At ≥ Bt|A1 ≥ B1, A2 ≥ B2, · · · , At−1 ≥ Bt−1).
(3.16)
We define an event Sk as
Sk ≡ A1 ≥ B1, A2 ≥ B2, ..., Ak ≥ Bk. (3.17)









P (Ak ≥ Bk|Sk−1). (3.18)
58
We also define conditional random variables Ac,k and Bc,k as









P (Ac,k ≥ Bc,k), (3.20)
















Then according to the definition of Ak (3.13), we obtain





















i . We represent the changes of ∆Xi






i = ∆Xi|A1 ≥ B1
∆Y
(3)




i = ∆Xi|A1 ≥ B1, A2 ≥ B2, ..., At−1 ≥ Bt−1.
(3.23)
59






























d |A1 ≥ B1, ..., At−1 ≥ Bt−1).
(3.24)
Using the conditional random variables, Ac,k is represented as









We obtain Bk according to (3.14) and denote it as




hi∆Xi + hn+1∆Rb. (3.26)
Then, Bc,k is represented as






i + hn+1∆Rb. (3.27)






d for k =
1, ...t, then we can also get Ac,k and Bc,k for k = 1, ..., t.
From (3.23), we can obtain a recursive formula which is
∆Y
(k+1)
i =∆Xi|A1 ≥ B1, · · · , Ak ≥ Bk
=∆Y
(k)
i |Ac,k ≥ Bc,k.
(3.28)











d |A1 ≥ B1, · · · , Ak ≥ Bk)
=(∆Z
(k)




d |Ac,k ≥ Bc,k)
=(∆Z
(k)







The last equality follows from the fact that ∆R
(k+1)
d is independent of both Ak














when we assume that the kth random variables are normally distributed. This
assumption is common in SSTA to re-use the output of an operation as input.
Theorem 3. Let X, Y , T and U be normally distributed with expected values






t , and σ
2
u. Then,
E[T |X > Y ] = µt + β(cov[X,T ]− cov[Y, T ])/a (3.30)
and
cov[T, U |X >Y ] = cov[T, U ]− (β2 + αβ)(cov[X,T ]







y − 2cov[X, Y ],
α = (µx − µy)/a and β = ϕ(α)/Φ(α).
Proof. The proof is presented in Appendix A.
Let (∆Y
(k)




d )∼ N(Mk,Ck). To apply this theorem, we
need the characteristics of Ac,k and Bc,k (corresponding to X and Y). If we
have Mk and Ck, we can characterize Ac,k and Bc,k by
E[Ac,k] = g0 + [g1 ... gn 1]Mk, (3.32)
E[Bc,k] = h0 + [h1 ... hn 0]Mk, (3.33)
61
var[Ac,k] = [g1 ... gn 1]Ck[g1 ... gn 1]
T , (3.34)
var[Bc,k] = [h1 ... hn 0]Ck[h1 ... hn 0]
T + h2n+1, (3.35)
cov[Ac,k, Bc,k] = [g1 ... gn 1]Ck[h1 ... hn 0]
T . (3.36)
Hence, Mk and Ck result in Mk+1 and Ck+1 by (3.30), (3.31) and the two




we can obtain Mk and Ck for all k = 2, ..., t. Finally, substituting (3.32-3.36)
into (1.2) yields P (Ac,k ≥ Bc,k) for all k = 1, ..., t, and we obtain the path
criticality by (3.20). If we let f t+1n+1 = 0, Ac,t+1 becomes the conditional delay
of p given p is the critical path. The conditional distribution can be obtained
from (3.32) and (3.34).
3.3.4 Summary
Algorithm 4 summarizes the overall computation procedure. In Algo-
rithm 4, A,B and d are represented in the standard canonical form, and the
addition and the maximum are the statistical operations provided by param-
eterized SSTA (line 4 and 10). The function SensitivityV ector returns the
sensitivity values to each sources of variation from a timing quantity repre-
sented in the canonical form (line 5,13, and 14) The random variable Ac,i (Bc,i)
is the conditional random variable of Ai (Bi) given A1 ≥ B1, ...., Ai−1 ≥ Bi−1,
and their moments are computed by (3.32-3.36) (line 15 and 16). Theorem 3
is applied to every source of variation, and the two equations in line 21 and




Input: p = (v0, ..., vt)
Output: λ(p)
1: A0 ← 0, λ← 1
2: M1 ← 0, C′1 ← diag([1 ... 1 0])
3: for i = 1, ..., t do
4: Ai ← Ai−1 + d(vi−1,vi)
5: [f1, ..., fn, fn+1]← SensitivityVector(d(vi−1,vi))
6: Ci ← C′i + diag([0 ... f 2n+1])
7: Bi ← 0
8: for each direct predecessor u of vi do
9: if u 6= vi−1 then
10: Bi ← max{Bi, at(u) + d(u, vi)}
11: end if
12: end for
13: [g1, ..., gn, gn+1]← SensitivityVector(Ai)
14: [h1, ..., hn, hn+1]← SensitivityVector(Bi)
15: compute E[Ac,i], var[Ac,i]
16: compute E[Bc,i], var[Bc,i], cov[Ac,i, Bc,i]
17: a←
√
var[Ac,i] + var[Bc,i]− 2cov[Ac,i, Bc,i]
18: α← (E[Ac,i]− E[Bc,i])/a, β ← ϕ(α)/Φ(α)
19: λ← λ× Φ(α)
20: D← Ci([g1 ... gn 1]− [h1 ... hn 0])T
21: Mi+1 ←Mi + βD/a
22: C′




While Algorithm 4 is specially optimized for the path criticality com-
putation, we can easily modify it for the general purpose of computing a
joint probability of normal random variables; Algorithm 4 implicitly involves
the cumulative distribution function for multivariate normal random variables
represented in the canonical form (i.e. multidimensional Q-function). Thus,
this technique can be employed in many other applications of statistics timing
analysis.
3.4 Approximation Error Analysis
In the previous section, we have assumed that the conditional proba-
bility density functions (PDFs) follow normal distributions which incurs some
approximation error. The authors in [68] formally analyzed the approxima-
tion error in the max operation, and we will perform the same analysis for our
conditioning operation, comparing the accuracy of the two operations. The
probability density function of a random variable is denoted by ψ. For a ran-





|ψX(t)− ψY (t)|dt. (3.37)
Let X and Y be normally distributed random variables where X ∼ N(µx, σx),
Y ∼ N(µy, σy). Their correlation coefficient is denoted by ρ. Let W = X|X >
Y and we denote its normal approximation by WG. Using the derivations
64















(a) The PDFs of X and Y

















(b) The PDFs of X|X > Y and MAX(X,Y )
Figure 3.5: In both the maximum operation and our proposed conditioning
operation, the normal approximation is inevitable for efficiency. However, the
approximation error is much less in the conditioning operation.
65
























The derivative of (3.38) implies that the conditional PDF of X given X > Y
is unimodal in contrast to the PDF of max(X, Y ). Figure 3.5(a) shows the
PDFs of X and Y when µx = 0, σx = 1, µy = −1, σy = 0.1 and ρ = 0, and
Figure 3.5(b) shows the conditional PDF of X given X > Y . The conditional
PDF has almost the same shape as the PDF of X in the region of X > 0
because the term Φ in (3.38) becomes one in that region. However, in the
region of X < 0, the value of term Φ sharply drops from one to zero which
determines the shape of the conditional PDF in that region. In Figure 3.5(b),
we superpose the normal approximation to the conditional PDF, and readers
can identify that the conditional PDF has better similarity to its normal ap-
proximation compared to the PDF of max(X, Y ). Note that Ξ(W )(WG) = 0.2
in this example.
The approximation error is a function of µx, σx, µy, σy and ρ, and for






y − 2σxσyρ (3.39)
α ≡ (µx − µy)/a (3.40)
and applying the same procedure as [68] for the conditioning operation allow



































































































Figure 3.8: Ξ(W )(WG) as a function of σY and α when ρ = 0.8
in the region of
−4 ≤ α ≤ 4,
0 ≤ σy ≤ 1,
−1 ≤ ρ ≤ 1.
(3.41)
Figure 3.6, 3.7 and 3.8 show the approximation error as a function of two
parameters when the other parameter is fixed. Since the same fixed points
as [68] are used, these figures can be directly compared with those in [68]. The
comparison reveals that the approximation error in the conditioning operation
is significantly less than that in the max operation. In particular, the error
is decreasing as ρ increases and becomes very small when X and Y are posi-
tively correlated as shown in Figure 3.7 and 3.8. Therefore, our conditioning





We first demonstrate the accuracy and runtime of the proposed condi-
tioning operation in computing a joint probability, comparing it to the max
operation and 6 different Monte Carlo simulations with different numbers of
samples. We have implemented all these methods in C++, and a Basic Linear
Algebra Subprograms (BLAS) package has been used for vector and matrix
operations. This comparison has been performed on a 3.0GHz 2 Xeon X5570
Linux machine. For this experiment, we randomly generate 10 timing quan-
tities represented in the canonical form with 21 global sources of variation.
The mean value of each timing quantity is randomly selected within a range of
[1.0, 3.0] and the standard deviation is also randomly chosen from 10% to 20%
of the mean value. Given the 10 timing quantities, each method computes
the probability that one timing quantity selected randomly is greater than the
other timing quantities. For the max operation, this probability is computed
by using (3.15). We run Monte Carlo simulation with 100,000 samples for
golden values and obtain the absolute error for each method. We perform
this experiment in two different cases. In the first case, the sensitivity values
to each sources of variation are randomly generated within a range of [-1.0,
1.0], and the average correlation coefficient between the selected timing quan-
tity and the other 9 timing quantities becomes 0.1. In the second case, the
69

























  (4.75, 0.42)
   (1,1)
  50 Samples
10000 Samples
5000 Samples
  1000 Samples
   500 Samples
  100 Samples
Figure 3.9: Each method computes a joint probability in a weakly correlated
environment.
sensitivity values are within a range of [0, 1.0], and the average correlation
coefficient becomes 0.82. Figure 3.9 and 3.10 shows the absolute error and
the runtime of each method in the two cases, respectively. The runtime and
the absolute error of each method are normalized to those of the max oper-
ation. In Figure 3.9 and 3.10, the results of the six Monte Carlo simulations
exhibit the tradeoff of the runtime and the accuracy. Clearly, the max opera-
tion and the conditioning operation are below the tradeoff curve, and they are
Pareto improvements to some Monte Carlo simulations. In the first case, the
conditioning operation provides 2.38X accuracy improvement over the max
operation at the cost of 4.75X runtime overhead as shown in Figure 3.9. The
Monte Carlo simulation with 104 samples requires 4500X runtime overhead
70

























  100 Samples
10000 Samples
5000 Samples
  1000 Samples
   500 Samples
  50 Samples
   (1,1)
  (3.52, 0.27)
Figure 3.10: Each method computes a joint probability in a strongly correlated
environment.
to obtain similar accuracy to the conditioning operation. In the second case,
the conditioning operation achieves significantly better accuracy as expected
in Section 3.4. In this case, the runtime is also improved because if a given
condition is true in the whole sample space, the conditional statistics do not
need to be updated, and we encounter such a case more often.
In this experiment, we have computed the criticality probability of one
timing quantity among 10 timing quantities. If we need to compute the criti-
cality probabilities of all the 10 timing quantities, the conditioning operation
need to repeat the same procedure 10 times, while Monte Carlo simulations
produce them at the same time. In the max operation, this is also possible if
the binary partition tree proposed in [76] is employed. On the other hand, if the
criticality probability of only one timing quantity is needed as in this experi-
71
ment, and we need to obtain the criticality incrementally (i.e. we find the other
9 timing quantity gradually and need to maintain the criticality of the timing
quantity among the timing quantities found so far), the improvements of the
conditioning operation becomes even bigger than that in this experiment. We
often encounter this scenario in statistical path selection applications (e.g., the
selection of paths whose criticality is greater than a threshold value). Thus,
the improvements can vary depending on applications.
3.5.2 Criticality Computation
We have implemented the proposed algorithm in C++, and have used
ISCAS85 circuits as benchmarks. We also have used the or1200 open-source
microprocessor including a 32-bit, 5-stage Wallace multiplier and s9234 in
ISCAS89 since it contains many near-critical paths. All benchmarks are
technology-mapped to the TSMC 180nm standard cell library, and the nom-
inal delay annotated to each edge in the timing graph is the sum of the gate
and wire delays obtained from the Standard Delay Format (SDF) file.
In this experiment, a quad tree with 3 levels is used to model spatial
correlation which results in 21 global sources of variation. This allows us to
verify the proposed algorithm under difficult conditions. The global sources of
variation associated with the first, the second and the third level can change
the delay of each edge up to ±4%, ±5% and ±6% of its nominal value in
the 3σ case, respectively. The delay of each edge also has the independent
variation, and the 3σ point is 5% of the nominal delay. Our statistical static
72
timing tool uses the SSTA algorithm proposed in [15]. In the timing tool,
we have implemented the second method (excluding the given path) proposed
in [77] for comparison. To demonstrate the accuracy of the conditioning oper-
ation, we also calculate the path criticality by (3.15). This is called the max
method. Note that the proposed method and the max method compute the
path criticality from the same timing quantities simplified by the algebraic
elimination.
Figure 3.11 compares the accuracy of each method in the circuits s9234
and or1200. We first select top 100 (300) paths with largest criticality prob-
ability using Monte Carlo simulation from s9234 (or1200) and then compute
the criticality probabilities of the paths by each method. Figure 3.11(a) shows
the criticality probabilities. Unlike [77], the results of our proposed method
are very close to the Monte Carlo simulation results. Figure 3.11(b) shows the
cumulative criticality probabilities computed by Monte Carlo simulation for up
to top 300 (8000) paths of s9234 (or1200) with largest criticality probability in
each method. In other words, each method ranks paths differently, and for each
list of paths, we compute the cumulative criticality by Monte Carlo simula-
tion. Thus, the values shown in Figure 3.11(b) are accurate, and Figure 3.11(b)
shows how well each list is ranked. The proposed method outperforms the two
other methods. So as not to clutter the plot, the max method is not shown in
Figure 3.11(a), but its poor accuracy is implied by Figure 3.11(b). The circuit
or1200 is highly optimized and has numerous near-critical paths which make
the max method and the method of [77] more inaccurate. In this situation,
73
















































(a) criticality probability in s9234 and or1200














































































(b) 1-cummulative criticality probability in s9234 and or1200
Figure 3.11: The proposed method not only shows significant better accuracy
as compared to the existing method but also is comparable to Monte Carlo
simulation.
74
Table 3.1: Accuracy comparison
[77] Max Method Proposed
#p max.ε avg.ε max.ε avg.ε max.ε avg.ε
c17 1 0.0003 0.0003 0.0004 0.0004 0.0002 0.0002
c432 6 0.0816 0.0380 0.0636 0.0257 0.0016 0.0006
c499 37 0.0247 0.0185 0.0696 0.0494 0.0015 0.0006
c880 4 0.1399 0.0761 0.0649 0.0277 0.0074 0.0032
c1355 23 0.0359 0.0272 0.0766 0.0654 0.0015 0.0004
c1908 2 0.2157 0.1522 0.0058 0.0042 0.0006 0.0004
c2670 8 0.1985 0.0504 0.1496 0.0374 0.0815 0.0250
c3540 4 0.2168 0.0740 0.2272 0.0665 0.0463 0.0213
c5315 4 0.1075 0.0619 0.0637 0.0463 0.0239 0.0159
c6288 3 0.5486 0.2033 0.4838 0.1697 0.3252 0.1191
c7552 1 0.0896 0.0896 0.0002 0.0002 0.0008 0.0008
avg. 0.1508 0.0720 0.1096 0.0448 0.0446 0.0170
the improvement of our method becomes more significant.
In order to compare the accuracy of each method in ISCAS85 bench-
mark circuits, we select the paths whose criticality probabilities are greater
than 0.01 by Monte Carlo simulation, and obtain the criticality values for
them from each method. Table 3.1 shows the average absolute error and the
maximum error of each method. The first column in the table shows the num-
ber of the selected paths. The max method shows better accuracy on average
than the method of [77] since it uses the inequalities simplified by the algebraic
elimination. The single canonical form used in both the max method and the
method of [77] can be accurate if the design have only one potential critical
path (a very long path). In this case, the maximum of all the path delays in
the design always equals the critical path delay, which means that the max-
75







































Figure 3.12: Both refactoring and the conditioning operation are crucial to
achieve the accuracy close to that of Monte Carlo simulations.
imum is still linear and the linear approximation to the maximum does not
cause error. The circuits c17 and c7552 are close to this case, and the max
method and the method of [77] shows pretty accurate results for them. On
the other hand, the circuits c499 and c1355 are several near-critical paths, and
both the methods suffer from large error. Especially, the formulation used in
the max method is more susceptible to such a case. Unlike the two meth-
ods, the proposed method provides accurate results even for c499 and c1355.
Since industrial designs have a number of near-critical paths, the benefit of
the proposed method can be much bigger in industrial designs.
76
Table 3.2: Edge criticality computation results
[76] Conditioning Only Cond.+Refactoring
avg.ε avg.ε norm.ε avg.ε norm.ε
c17 0.00008 0.00007 0.932 0.00007 0.932
c432 0.00510 0.00228 0.447 0.00010 0.020
c499 0.00328 0.00126 0.385 0.00052 0.159
c880 0.00150 0.00019 0.127 0.00019 0.127
c1355 0.00192 0.00171 0.890 0.00040 0.206
c1908 0.00260 0.00181 0.695 0.00004 0.014
c2670 0.00147 0.00108 0.733 0.00035 0.237
c3540 0.00151 0.00172 1.138 0.00052 0.342
c5315 0.00102 0.00047 0.460 0.00028 0.271
c6288 0.00109 0.00056 0.517 0.00025 0.232
c7552 0.00035 0.00036 1.039 0.00000 0.010
eth 0.00022 0.00007 0.322 0.00003 0.154
tv80 0.00062 0.00034 0.544 0.00021 0.332
or1200 0.00083 0.00020 0.237 0.00012 0.141
Avg. 0.00154 0.00087 0.605 0.00022 0.227
3.5.3 Breakdown of the Accuracy Improvement
The accuracy improvement of the proposed method is achieved by the
combination of the conditioning operation and the refactoring technique. To
show the contribution of each technique, we rank paths in or1200 again using
the proposed method with and without refactoring, and the results are shown
in Figure 3.12. Significant part of the improvement comes from the refactoring
technique. However, it is important to note that without the conditioning
operation, the accuracy cannot be improved at all. In Figure 3.12, the result
of the max method is obtained using refactoring, but due to the inaccuracy of
the max operator, it cannot take advantage of refactoring.
77
We have not presented the details of the edge criticality computation.
However, we can also take advantage of refactoring and the conditioning op-
eration there. We compare our method with the conventional cutset-based
method [76]. We compute the criticality of each edge using each method and
obtain the golden values from Monte Carlo simulations with 40,000 samples.
For each edge, the error is computed, and the average error for each circuit is
calculated. Also, we normalize the average error to that of the cutset-based
method. Table 3.2 shows the average errors and the normalized errors. If only
the conditioning operation is used, the error is reduced by 40% on average,
although there are some glitches in c3540 and c7552. However, if our two
techniques are combined, we can achieve a significant error reduction for all
benchmark circuits.
3.6 Conclusions
We have proposed a path criticality computation method which signif-
icantly improves the accuracy of the path criticality computation compared
to the state-of-the-art method, in particular when the circuit has many near-
critical paths. Since the conditioning operation used in this chapter is not
limited to this specific application, we believe that this operation, combined
with the total probability theorem, could replace the max operator where
higher accuracy is required. The run time of the proposed method may be
a concern for very large industrial circuits. However, the matrix operations
can be easily parallelized, and our method can be implemented very efficiently
78
in today’s computing environments with SIMD instructions and multi-core
multi-threaded CPUs and GPUs.
79
Chapter 4
A Concurrent Path Selection Algorithm in
Statistical Timing Analysis
4.1 Introduction
The increasing performance of devices, combined with increasing vari-
ability in the nanometer regime, has posed new challenges in design and testing
methodologies. Circuit timing is subject to factors including process parame-
ters, manufacturing defects, power supply noise, crosstalk and multiple input
switching (MIS). Some of these factors are difficult to control precisely dur-
ing manufacture and others change during operation or within die. Circuit
delays thus are uncertain at the design stage. While the variability continues
to increase as device technology scales, the margin to deal with the variability
remains the same or decreases due to the demand for high performance cir-
cuits [56]. Therefore, testing methodologies need to be improved in order to
maintain the reliability of the circuits.
In order to deal with various timing defects, several delay fault models
have been used such as the transition fault model and the path delay fault
model. The transition fault model well captures the behaviors of defects that
cause a localized and large change of delay, while the path delay fault model
80
is suitable for distributed and subtle changes of delay such as process vari-
ation delay faults. Under the path delay fault model, the number of faults
is enormous and it is common to select some paths to be tested depending
on the behaviors of timing defects being targeted. Some path selection meth-
ods [41, 50, 63] target localized but small delay defects and others [48, 36] tar-
get distributed and small delay defects. Actually, the transition fault model,
which aims at large and gross delay defects, can be considered as to guide us
to select any path passing through each wire.
Certainly path selection is an important step for delay testing, and
considering this problem in statistical domain provides a unified view to all
these approaches. In delay testing, our ultimate objective is to test at least one
faulty path of each given bad chip. The first-order factor we need to consider is
the fault probability of each path in the circuit. Clearly, paths with high faulty
probability should be tested. The second-order factor is statistical correlation
of the delay variations. If some two paths with high fault probability become
faulty together in every produced chip, it is wasteful to test both the two
paths.
Traditional path selection algorithms bear in mind the statistical as-
pects of the delay variations to improve test effectiveness. Since a simple and
effective approach is to select paths with high fault probability, it is common
to target top k longest paths. Since a spot defect causes many paths to be
faulty that pass through the defected site, we can assume strong structural
correlation if spot defects are targeted. Then, it is effective to select a longest
81
path passing through every wire [41, 50, 63]. If spatial correlation is con-
sidered, an effective method is to select a few long paths in each block [36].
These conventional methods use the statistical aspects of the delay variations
in an ad-hoc manner, while recent statistical path selection algorithms quan-
tify the fault probability and the correlation using statistical timing analysis.
The algorithm proposed in [72] iteratively selects a path that detects the most
number of bad chips not covered by already selected paths. In other words, it
selects paths with high conditional fault probability. However, this conditional
fault probability is difficult to be computed in SSTA so it requires multiple
Monte Carlo simulations. Recently, the authors in [89] propose two statistical
path selection algorithms that can work with SSTA. The first algorithm called
BnB-SPM selects top k paths with highest fault probability very efficiently
by pruning. However, this does not utilize the correlation. Thus, the authors
define a test quality metric that accounts for the correlation and propose the
second algorithm called BnB-JPM, which in a greedy manner, selects paths
that maximize the test quality metric.
Basically, these statistical algorithms do not target specific timing de-
fects, and the target defects are determined by the statistical timing model
used. Ideally, if the statistical timing model captures process variation [71],
spot defects, crosstalk [66], MIS [3], power droop [23], aging effects, or any
combinations of theses [65], they target the corresponding defects accordingly.
However, these extensive applications of the algorithms are limited by sev-
eral factors in practice. First, timing models are abstracted and simplified
82
so are inaccurate to some extent. In particular, second-order effects such as
crosstalk, MIS and power drop are difficult to be modeled accurately with-
out input vectors. Secondly, under static timing models, the delays of all
input vectors sensitizing a single given path are the same, so path selection
matters and the path selection problem is separated from automatic test pat-
tern generation (ATPG). However, in order to target the defects due to the
second-order effects, path selection and ATPG cannot be dealt separately be-
cause the delay of a given path depends largely on input vectors under the
effects. Thirdly, analysis algorithms running on the timing models may not
be efficient enough. Monte Carlo simulations can account for various timing
defects such as spot defects [72] but are too slow to be used in path selection.
Delay tests are typically used to screen timing defects in the manufacturing
process of ASICs [30] and for speed-binning of high-performance integrated
circuits [18]. Also, they are useful in finding speed-limiting paths in the post-
silicon debug (a.k.a., post-silicon validation), which is performed to improve
the performance and yield of the next silicon iteration (or, spin) of a high-speed
microprocessor design [55, 13]. Smart path selection is beneficial in both the
manufacturing test and the post-silicon validation, but the above-mentioned
factors are more critical in the post-silicon validation where it is necessary to
find speed-limiting paths and vectors [55]. In contrast, in the manufacturing
test, the correlation between the result of delay tests and the actual operating
frequency is more important than finding timing critical paths. Also, since
design bugs are already fixed, process variation and spot defects are major
83
defect mechanisms although other effects may be combined. Therefore, the
above mentioned factors are less crucial in high-volume manufacturing test.
Considering current statistical timing modeling techniques in terms of
accuracy and efficiency, this chapter primarily addresses process variation de-
lay faults in the manufacturing test of ASICs as in [89]. However, our proposed
algorithm is not limited to this specific type of faults and can extend its ap-
plications as timing models advance. The major contributions of this chapter
include the following:
• Existing statistical algorithms [72, 89] select one path at a time in a
greedy manner, whereas our proposed method selects k paths gradually
at the same time with a global view. To our best knowledge, this is a
new class of statistical path selection.
• The methods in [44, 72] require a Monte Carlo simulation for each path
or each defect, while our proposed algorithm needs statistical timing
analysis at the beginning of the process only once for k-path selection,
and both Monte Carlo sampling and block-based SSTA can be used.
• Existing statistical path selection studies do not consider the issue of
untestable paths and validate the quality of produced path sets assum-
ing all paths are testable. By integrating a SAT solver, our algorithm
can easily exclude untestable paths from consideration. In addition, we
present the post-ATPG quality of produced path sets tackling various
issues associated with untestable paths and false paths.
84
The rest of this chapter is organized as follows. Section 4.2 presents the neces-
sary background for statistical path selection. Section 4.3 discusses the path
selection problem in detail. Section 4.4 formulates the concurrent path se-
lection problem and proposes our basic approach, and Section 4.5 extends
the basic approach to handle several remaining issues. Section 4.6 presents
experimental results, and Section 4.7 concludes this chapter.
4.2 Background
In this section, we first define basic terms used throughout this chapter.
A timing graph is an edge-weighted directed acyclic graph (dag). The edge
delay (weight) is the sum of the gate delay and the interconnect delay asso-
ciated with the edge. The arrival time and the delay are denoted by at and
d, respectively. The set of all paths in the given circuit is denoted by Ω. The
set of all paths passing through a subpath π is denoted by Ωπ. The operating
clock period is denoted by T .
4.3 Problem Formulation
The objective of this chapter is to construct a high quality path set
using a statistical timing framework. In order to measure the quality of a
path test, we will define a quality metric. In the stuck-at fault model, the
fault coverage is a universally used quality metric and due to its simplicity,
the metric was adopted in the path delay fault model. However, the use of
fault coverage makes the path delay fault model intractable and ineffective. To
85
demonstrate why the fault coverage is a misleading metric in the path delay
fault model unlike in the stuck-at fault model, we will consider an example.
Figure 4.1: The sample space of bad chips with stuck-at faults: We suppose
that a produced bad chip corresponds to one element in the sample space so
each element is equally likely. The 4 events are not disjoint in practice, but
since the intersections are small enough, this figure is acceptable.
Suppose that a design with 4 nets is manufactured. Figure 4.1 shows
the sample space of the bad chips with stuck-at faults. Each quarter circle
represents bad chips that can be detected by a test. Clearly, our objective is to
cover the entire area by some tests; we want to detect all bad chips. However,
if the number of tests we can run is limited, our objective is to cover as large
area as possible. Usually the probability that each net is defective is not very
different, so each test is depicted to take similar areas. If we are allowed to
select two nets to be tested out of the four nets, any combination will detect a
similar number of bad chips. Thus, we do not need to select nets to be tested,
86
and the number of tested nets is proportional to the number of detected bad
chips; the stuck-at fault coverage is effective. Besides, since it is affordable to
test all nets in the circuit, we just try to test all nets. Figure 4.2(a) and (b)
Figure 4.2: Sample space of bad chips with path delay faults
show the sample space of bad chips with path delay faults. Unlike the stuck-at
fault model, the fault probability of each path is very different, and the area
taken by each test is depicted accordingly. In this case, selection of paths to
be tested can yield a substantial difference in test effectiveness. In the path
delay fault model, we can consider two different scenarios on the intersections
of the tests. Suppose that the areas of the intersections are very small as in
Figure 4.2(a). Then, we can just test top k paths with the largest area in the
sample space when k paths are affordable, and this can lead to an optimal
solution. In this example, if k = 2, it is optimal to select A and B. Thus, path
selection is an easy task in this scenario. Now suppose that the areas of the
intersections are large as shown in Figure 4.2(b). In this scenario, the simple
87
heuristic as in the first scenario does not result in a good solution. In this case,
the paths A and D are better than the paths A and B. To obtain an optimal
path set, we may need to inspect every combination. Thus, path selection
becomes a very difficult task. This kind of selection problem is known as the
maximum coverage problem, which is a well-known NP complete problem.
Path delays in the circuit are correlated because of several different
mechanisms, which makes the path selection problem similar to the second
scenario. If two paths pass through one or more same gates, the two path
delays are correlated even if the delays of all gates and interconnects are inde-
pendent. This is called structural correlation. If two gates are closely placed
in a chip, they are fabricated or operated under a similar environmental con-
dition, and their delays are increased or decreased together. This is called
spatial correlation. Also, a source of variation can affect several gates and
interconnect delays simultaneously, inducing global correlation. Thus, we need
to solve the difficult combinatorial optimization problem.
For the maximum coverage problem, it is known that integer linear pro-
gramming provides an exact solution. Also, there are many heuristics. How-
ever, we are required to solve the maximum coverage problem in the probability
space. In this case, to solve the problem using integer linear programming, it
is necessary to enumerate all paths explicitly and the joint fault probabilities
of their combinations should be calculated. Clearly, this is prohibitive and we
look for an alternative.
88
4.3.1 Test Quality Metric
In this subsection, we formally write the concepts discussed above. The
detection probability of a test Π is defined as
q(Π) = P (Test fail|Chip is bad). (4.1)
It is obvious that the detection probability is the ultimate test quality metric.
However, instead of this probability, the stuck-at fault coverage is used uni-
versally in the stuck-at fault model which can be justified from the following
derivation. Let p be the probability that a net has a stuck-at fault. Also let
N be the total number of stuck-at faults in the circuit and let c be the stuck-
at fault coverage of Π. Assuming that the occurrences of stuck-at faults are
independent events, we obtain
q(Π) =
1− (1− p)Nc
1− (1− p)N . (4.2)
If p is sufficiently small, we can assume that multiple stuck-at faults do not
occur. Then,
q(Π) = c. (4.3)
In the path delay fault model, the above assumptions are not reason-
able, so we need to go back to the ultimate test quality metric to evaluate the
quality of delay tests accurately. Let us ignore the testability issue of paths
for a moment and assume that all paths are robustly testable. This issue will
be handled later. If a static timing model is employed, we can easily gener-
ate test patterns sensitizing given paths and conversely given test patterns, it
89
only matters which paths are sensitized. Thus we can say that a path set is
equivalent to a test suite, and Π denote a path set. Thus we can write



















P (maxp∈Π{d(p)} > T )
P (maxp∈Ω{d(p)} > T )
.
(4.4)
In deterministic timing, this probability cannot be calculated, but recent SSTA
can compute it very efficiently, allowing us to use the ultimate quality met-
ric directly. Since our work mainly targets process variation delay faults by
employing an adequate timing model, the sample space of all produced chips
becomes the process (parameter) space and we will call the detection proba-
bility process space coverage metric (PCM) as in [89].





s.t. |Π| = k.
(4.5)
Since the denominator of the detection probability is not a function of Π, we







d(p) > T )




Using the randomness in the manufacturing process of chips, we can
consider a gamble. Let a gambler play the gamble. We partition the set of
all paths in a given design into two equally sized groups. The gambler is
supposed to bet his entire capital on this game, dividing it into two parts for
each group. Let us take a manufactured chip. Suppose that the manufactured
chip has only one faulty path. If there exists a faulty path in one group, the
gambler will get his bet on the group back as a payoff (i.e., the odds are one).
Otherwise, he loses his bet. The probability of winning for each group is given
to the gambler before betting from the design. If he has the remaining capital,
we can partition the group with the faulty path into two groups again, and
the gambler can play another game. If there are N paths in the design, he can
play up to logN games. If he wins in all these games, he actually succeeds to
predict the faulty path from the design. In this scenario, the gambler was able
to proceed only if he wins. Let us consider another scenario. Suppose that
his initial capital is k and a bet must be an integer. He plays the first game.
Without letting the gambler know the outcome, he can simulate playing the
next game for each possible outcome. This process can be continued which is
equivalent to drawing a decision tree. At the end of this simulation process,
each group contains only one path and some paths will get the gambler’s bet.
At most k paths can get some bets and others will get nothing. A smart
gambler selects good candidate paths from the design.
91
4.4.1 Partitioning Path Sets
We take a design as input and construct a timing graph. For simplicity,
we assume that rising and falling delays are the same. Then, the timing graph
is an edge-weighted, directed acyclic graph (dag). However, if the technique
suggested in [15] is employed, we can construct the same type of a graph that
considers both rising and falling delays and has twice the number of paths for
falling and rising cases as that in the original timing graph. Thus, without loss
of generality, we can consider the timing graph to be a simple edge-weighted
dag.
We perform statistical static timing analysis on the timing graph and
can obtain arrival times, required arrival times and slacks for each vertex. We
add a vertex called the source. The start points (i.e., the vertices without
incoming edges) corresponds to primary inputs and outputs of flops. Each
start point is connected to the source by the edge whose delay is the arrival
time of the start point. We add a vertex called the sink. The end points (i.e.,
the vertices without outgoing edges) correspond to primary outputs and inputs
of flops. Each end point is connected to the sink by the edge whose delay is
the required arrival time of the end point. This modified timing graph is called
the augmented timing graph and the construction process from a design is well
illustrated in [77]. We denote the source and the sink by s and t, respectively.
In the augmented timing graph, all paths start with s and end at t. From
the sink t in the augmented timing graph, we perform the recursive depth-first
search which implicitly constructs g a depth-first search tree. Figure 4.3 shows
92
Figure 4.3: A simple circuit and the depth-first search tree
a simple circuit and the corresponding depth-first search tree. Readers can
easily identify that the paths are already grouped depending on their suffixes
in the depth-first search tree. We utilize this fact for the partitioning. In this
example, we partition the set of all paths into the paths via a ← c and the
paths via a← b at the first round. At the second round, the paths via a← c
are partitioned into the paths via a ← c ← d and the paths via a ← c ← e.
Also, the paths via a ← b are partitioned into the paths via a ← b ← e and
the path a ← b ← 4 according to the depth-first search tree. In this type
of partitioning, the depth-first search tree is equivalent to the decision tree we
desire to draw for the gambling. In the decision tree, we assume that every
node has at most two children (i.e., the vertices in the augmented timing graph
have at most two incoming edges) without loss of generality. This is possible
because we can modify the augmented timing graph by adding pseudo vertices.
93
Figure 4.4: The decision tree with an example of bets
Figure 4.4 shows the decision tree with the bets of a gambler for the
circuit in Figure 4.3. Each node in the decision tree corresponds to a game
where the gambler bets on the two groups of the paths passing through each
child. Simply, we can say that the gambler bets on the two children in each
node. The initial capital is 3 and he divides it into 2 and 1 at the first game.
If there exists a faulty path among the paths via a ← c, he receives 2 back
and his capital also becomes 2. Since the capital and the payoff are the same,
we can annotate one value for each node, and the value represents his capital
in the case that there is a faulty path passing though the node. At the same
time, the value represents his bet on the node in the game corresponding to
the parent node. Each node in the decision tree can be uniquely identified by
a subpath ending at the sink t. When we mention a subpath, it usually means
a subpath ending at the sink t and readers can associate it with a node in the
decision tree. Now we formally link this gamble with path selection.
94
4.4.2 Path Selection by Betting
Definition 6. The bet on a subpath π is the number of paths to be selected
among the paths passing through π.
Definition 7. The fault event of a subpath π is the event that a faulty path
exists among the paths passing through π.
We denote the bet and the fault event by B and F , respectively.
Definition 8. The payoff µ of a subpath π is
µ(π) =
{
B(π) if Fπ occurs ;
0 otherwise.
(4.7)
The definition of the bet immediately yields
B(t← ...← v) =
∑
u∈N−(v)
B(t← ...← v ← u). (4.8)
It is important to note that the payoff and the bet for a subpath can be
different as shown in the definition above, but in the decision tree, the fault
event corresponding to each node is assumed to have occurred (this is what
the decision tree represents) and the annotated value represents the bet and
the payoff at the same time. Also note that the definition of the bets implies
that the gambler selects the paths a ← c ← d ← 0, a ← c ← e ← 2 and
a← b← 3.
Given a manufactured chip (an ensemble), we can depict the depth first
search tree with the payoff of each node as shown in 4.5(a). Only along the
95
Figure 4.5: The payoffs for two produced chips (ensembles)
faulty path, the bets equal the payoffs. In this case, the gambler fails to reach
a leaf of the tree which means that he fails to predict the faulty path. Note
that the payoffs of all full paths (i.e., the payoffs of the leaf nodes) are zero.
Figure 4.5(b) shows the DFS tree from another manufactured chip. In this
chip, there are two faulty paths. For simplicity, we have explained the concept
assuming there is at most one faulty path in a chip. However, multiple faulty
paths do not affect our formulation, and for the concept, it can be viewed
as several gamblers with each payoff play games in parallel. In the case of
Figure 4.5(b), a gambler successfully reaches a leaf, predicting a faulty path.
Note that there exist some paths with a non-zero payoff in this case. Using


















The last equality follows from the fact that µ(p) = 0 for all p /∈ Π. Let Ψ
denote all subpaths ending at t. Using the new objective function, we can







s.t. B(t) = k,
Equation (4.8) for all π ∈ Ψ,
B(π) ≤ |Ωπ| for all π ∈ Ψ,
B(π) ∈ {0, 1, 2, 3, ...} for all π ∈ Ψ.
(4.10)
In the view of the gambler, a good anticipating strategy is required to solve
this problem optimally. In other words, the gambler needs to account for
future games before betting.
4.4.3 Betting Strategies
Considering the computational complexity of anticipating strategies, we
will look for a casual strategy to solve this problem, i.e. the betting strategy
of the gambler will depend on the results of the past games only. Due the
use of a non-anticipating strategy, the gambler will have a local view which
97
is illustrated in Figure 4.6. Currently, his capital is B(π) which equals µ(π)
since it is a part of the decision tree. He will place bets on each side πl and πr
depending on the probability of winning. He will not prefer one side against
another due to the future games. Thus, if the probability of winning on each
side is the same, he will just place the same bet on both sides. In this situation,
it is reasonable to set his objective to maximize µ(πl)+µ(πr), which is a random
variable and its distribution changes according to B(πl) and B(πr). Figure 4.6
Figure 4.6: The local view of the gambler
shows a possible distribution of µ(πl) + µ(πr). The probability values of the
impulses are fixed, but under the constraint B(π) = B(πl)+B(πr), the gambler
can adjust the positions of the first two impulses. One adventurous gambler
may prefer to place the impulse of B(πr) at B(π), making the first impulse
placed at 0. One conservative gambler may prefer to place the two impulses
at B(π)/2. To make a choice among possible distributions, we need a measure
to evaluate the distributions. One common measure for this is the expectation
98




s.t. B(πl) +B(πr) = B(π).
(4.11)
In our betting strategy, we will ignore the constraint on the numbers of paths
in Ωπl and Ωπr , and this will be handled in a different way later. A simple
calculus results in that if the probability of B(πl) is greater than that of B(πr),
B(πl) = B(π) and B(πr) = 0. Otherwise, B(πl) = 0 and B(πr) = B(π).
Clearly, this is too aggressive and will lead to bankruptcy almost certainly
after a few games. To deal with this issue, we need to consider that the
gambler will play a sequence of games, and a gambler should guess correctly
in all the games in order to predict a faulty path. Thus, a more conservative
strategy is required. This issue also arises in real-life gambles and investments.
Compared to real-life ones, our gamble is a bit restrictive in the sense that the
payoffs from πl and πr are not put together to play the next game. However,
they share a more important common aspect. In real-life gambles, the capital
of a gambler after n rounds of games is the product of factors which also hold in
our case when we define an indicator function of the fault event. If one factor
is zero, the final result becomes zero as well, so we need to be conservative.
Since this is the basis of the portfolio theory derived from the information
theory, we adopt a strategy called the log-optimal portfolio from the portfolio
theory which is to use the log expectation [19, 33]. In a series of real-life
games represented by independent and identically distributed (i.i.d.) random
variables, the log-optimal portfolio provides asymptotically optimality among
casual strategies. In a non-i.i.d. case, the conditionally log-optimal portfolio
99
is known to provide a comparable solution to any other causal portfolios [19].
Before writing our strategy, we will relax the integer constraints on each bet,
and we define the betting fraction of a subpath as
α(t← ...← v ← u) = B(t← ...← v ← u)
B(t← ...← v) . (4.12)




s.t. α(πl) + α(πr) = 1.
(4.13)
This is a constrained optimization problem and the Kuhn-Tucker conditions
give a necessary condition to the optimal betting fraction α∗. Due to the
concavity of log, the condition also becomes sufficient. Our problem is nearer
to the non-iid case, and the condition is a function of conditional probabilities
of the fault events. To compute these probabilities easily, we assume that even
if there are faulty paths in both πl and πr, this knowledge is not given to the
gambler in the future games. Let us define the fault probability, which is the
probability that the fault event occurs and is denoted by f . Then we can write
f(t← . . .← v) ≡ P (Ft←...←v). (4.14)
We also define the joint fault probability by








Then, the optimal betting fractions can be written as
α∗(πl) =
f(πl)− fj(π)
f(πl) + f(πr)− 2fj(π)
,
α∗(πr) = 1− α∗(πl).
(4.16)
100
The following two propositions shows that the fault probability and the joint
fault probability can be computed easily using the maximum operation pro-
vided in a SSTA tool.
Proposition 1. Let π be a path t← . . .← v. Then,
f(π) = P (d(t← . . .← v) + at(v) > T ). (4.17)
Proof.
f(t← . . .← v) = P (
⋃
p∈Ωt←...←v
d(p) > T )
= P ( max
p∈Ωt←...←v
{d(p)} > T )
= P (d(t← . . .← v) + at(v) > T ).
(4.18)
The last equality follows from the distributivity of the maximum operation
over the addition operation, max(A + B,A + C) = A + max(B,C), and the
fact that all paths in Ωt←...←v share the same suffix t← . . .← v.
Proposition 2. Let π be a path t← . . .← v. Then,
fj(π) =
P (d(t← . . .← v) + min
u∈N−(v)


















d(t← . . .← v ← u) + at(u) > T


= P ( min
u∈N−(v)
{d(t← . . .← v ← u) + at(u)} > T )
= P (d(t← . . .← v) + min
u∈N−(v)
{d(v ← u) + at(u)} > T ).
(4.20)
Note that min(A,B) can be easily computed by −max(−A,−B) in
statistical static timing analysis. Also note that the left hade side quantities
of the inequalities in (4.17) and (4.19) are represented in the form of (1.1),
and (4.17) and (4.19) are reduced to (1.2).
4.4.4 Summary
Algorithm 5 summarizes the overall procedure. Initially, TopDownSe-
lection is called with π = t, v = t and m = k. Then, it will return a set Π of
k best paths. The algorithm performs the recursive depth-first search (DFS)
from the sink t. For each branch in the depth-first search tree, it computes
the optimal betting fraction by (4.16), (4.19) and (4.17) (line 7), and each
branch is traversed only if a non-zero bet is placed on the branch (line 8).







2: if v is a start point then
3: Add π to Π
4: else
5: for u ∈ N−(v) do
6: Extend π to u and let π′ denote the extended path
7: B(π′)← α∗(π′)×m
8: Discretize B(π′) s. t. the sum of B(π′)s through this iteration equals
m
9: if B(π′) > 0 then





4.5 Guaranteeing k Paths and Handling Untestable Paths
Since the available number of paths is not taken into account, some
paths get a bet more than 1, and Algorithm 5 may produce less than k paths. If
it selects less than k paths, we may increase k to select more paths. However, in
order to guarantee k paths, time consuming iteration is required until k paths
are selected. For this problem, we observe the fact that paths are selected in a
particular order as k increases. If the discretization is not performed, the bet







where n is the length of π and πi is the subpath from the i-th segment of π
to t. We can rank paths by the amount of bets, and this ranking does not
depend on the initial capital B(t). However, according to our experiments,
the discretization is helpful to increase the quality of results, and we develop
an algorithm that produces the same path set as in Algorithm 5. A possible
implementation of the dicretization in Algorithm 5 is as follows:
B(πl) = ⌊B(π)α∗(πl) + 0.5⌋,
B(πr) = ⌈B(π)α∗(πr)− 0.5⌉.
(4.22)
We compute B(πl) and B(πr) from B(π) in Algorithm 5. Conversely, given
B(πl), we can obtain the range of B(π) by
B(πl)− 0.5
α∗(πl)
≤ B(π) < B(πl) + 0.5
α∗(πl)
. (4.23)
Similarly, given B(πr), we can get the range of B(π) by
B(πr)− 0.5
α∗(πr)
< B(π) ≤ B(πr) + 0.5
α∗(πr)
. (4.24)
In the case thatB(π) increases from zero, the subpaths πl and πr start receiving















respectively. This allows us to run the procedure in Algorithm 5 backward
without the initial bet B(t). Suppose that for πl (πr), we have a set of pairs
of a path and an integer which represent that the path is selected when the
104
Figure 4.7: Merge process
bet on πl (πr) reaches the integer. Then, we can merge the two sets into a
set, converting the integers for the bet on π. Figure 4.7 illustrates the merge
process. Suppose that if the bet for πl is increasing from 1 to 4, then p1, p3,
and p4 are selected in the order named. No additional path is selected for bet
3. This is tabulated in Figure 4.7, where the table for πr is also shown. For
each p1, p3, and p4, B(π) can be obtained from α
∗(πl) and the corresponding
B(πl) by (4.25). Similarly, B(π) can be obtained for each p5, p7, and p6. The
two path sets are merged in the increasing order of B(π) and become a path set
for π. Algorithm 6 summarizes the new procedure for guaranteeing k paths.
This is a divide and conquer (DnC) algorithm (e.g., merge sort). The paths
in the circuit are recursively divided into two parts based on their suffixes,
and the two subprograms are solved independently (line 11 and 14) and the
solutions are merged together (line 16). This suffix-based divide and conquer
approach is also useful to exclude untestable paths from consideration using
105
the fact that if a subpath is not testable, all the paths passing though the
subpath are not testable. In the top-down phase of our DnC algorithm, we
check if the subpath π is testable using a specialized ATPG tool such as [26]
or incremental SAT [35] (line 1). If the subpath π is not testable, all paths
passing through π are untestable, and we can prune the branch. The merge
process is detailed in Algorithm 7.
Algorithm 6 BottomUpSelection
Input: π, k; ǫ
1: if π is not testable then
2: return
3: end if
4: Let v be the last vertex of π
5: if v is a start point then
6: Π← Π ∪ (1, π)
7: else
8: Let πl and πr be extended paths toward the two vertices in N
−(v),
respectively
9: Πl,Πr ← ∅
10: if α(πl) < ǫ then
11: Πl ← BottomUpSelection (πl, k)
12: end if
13: if α(πr) < ǫ then
14: Πr ← BottomUpSelection (πr, k)
15: end if





Input: πl,Πl, πr,Πr, k
Output: Π
1: I ← 0, J ← 0
2: Π← ∅
3: for m = 0 to min(k,|Πl|+ |Πr|) do
4: Let (Bl, pl) be I-th element of Πl
5: Let (Br, pr) be J-th element of Πr
6: B̂l ← ⌈(Bl − 0.5)/α∗(πl)⌉
7: B̂r ← ⌊(Br − 0.5)/α∗(πr) + 1⌋
8: if B̂l ≤ B̂r then
9: Add (B̂l, pl) to Π
10: I ← I + 1
11: else
12: Add (B̂r, pr) to Π






We implemented the proposed method in C++, and used ISCAS85 cir-
cuits and or1200 [1] as benchmarks. The or1200 includes a 32-bit, 5-stage Wal-
lace multiplier. To check the testability, minisat [21] was incorporated in our
implementation. All experiments were performed on a 3.0GHz 2 Xeon X5570
Linux machine. All benchmarks are technology-mapped to TSMC 180nm li-
brary. The nominal delay annotated to each edge in the timing graph is the
sum of the gate and wire delays obtained from the Standard Delay Format
(SDF) file.
Our statistical static timing analysis (SSTA) tool uses the algorithm
proposed in [71]. To model spatial correlation, we use a quad tree [4] with
3 levels which results in 21 global sources of variation. The global sources of
variation associated with the first, the second and the third level can change
the delay of each edge up to ±4%, ±5% and ±6% of its nominal value in the
3σ case, respectively. The delay of each edge also has independent variation,
and the 3σ point is 5% of the nominal delay.
In our experiments, we assume that non-robust tests can determine
the delay of the path being targeted irrespective of other path delays as in
robust tests. We compare our method to the two branch and bound methods
(BnB-JPM and BnB-SPM) proposed in [89] and top k-longest path selection
in deterministic timing analysis (Deterministic). Given a number of paths to
be selected k, each method constructs a path set Π of size k, and the PCM
value q(Π) of the path set is calculated by a Monte Carlo simulation with
108
Table 4.1: Pre-ATPG PCM and runtimes for ISCAS85 circuits
Deterministic BnB-JPM [89] BnB-SPM [89] Proposed
q(Π) CPU(s) q(Π) CPU(s) q(Π) CPU(s) q(Π) CPU(s)
c432 0.990 0.000 0.946 0.006 0.991 0.002 0.991 0.004
c499 0.674 0.004 0.831 0.023 0.872 0.006 0.906 0.011
c880 0.983 0.001 0.969 0.004 0.983 0.004 0.999 0.004
c1355 0.554 0.005 0.799 0.046 0.811 0.007 0.876 0.011
c1908 1.000 0.001 0.979 0.012 1.000 0.005 1.000 0.006
c2670 0.993 0.005 0.968 0.014 0.998 0.010 0.999 0.009
c3540 0.950 0.002 0.969 0.015 0.950 0.011 1.000 0.010
c5315 0.981 0.006 0.980 0.021 0.981 0.019 0.998 0.020
c6288 0.995 0.008 0.995 0.166 0.995 0.045 0.995 0.816
c7552 1.000 0.006 1.000 0.022 1.000 0.025 1.000 0.021
100,000 samples.
We first assume that all paths in the circuit are testable as in most
path selection studies [72, 89, 16] and for each ISCAS85 circuit, each method
selects 5 paths except c499, c1355 and c2670. Since these circuits have many-
near critical paths, 30 paths are selected instead. Table 4.1 shows the PCM
values (q(Π)) and the runtime values (CPU). The basic idea of top k longest
path selection is to use the fact that long paths are more likely to fail to
meet the timing than shorter ones. BnB-SPM explicitly computes the fault
probabilities using the statistical timing model and provides better path sets
than the top k longest path selection. Our proposed method goes further and
utilizes statistical correlation of the delay variations intelligently and produces
path sets of better quality than BnB-SPM at a small runtime overhead. The
benefit of smart path selection is more significant in highly optimized industrial
109

















Figure 4.8: Pre-ATPG PCM
circuits since they have numerous near-critical paths which offer many possible
choices. Figure 4.8 shows 1-q(Π) as k increases for or1200. The value 1-
q(Π) is also called the missing probability in [72]. If a smart path selection
algorithm is employed, good paths are selected first, and it is natural to exhibit
diminishing returns as k increases. In other words, a small improvement in a
high PCM value requires testing a larger number of paths than that in a low
PCM value. In order to account for this behavior, we plot this type of graphs
in a logarithmic scale. In or1200, the proposed method shows a significant
improvement of the coverage over BnB-SPM at the same number of paths, or a
significant reduction of the required number of paths for a given PCM value. In
the conventional flow, the paths selected in this way are fed into an automatic
test pattern generation (ATPG) tool which generates test patterns sensitizing
110

















Figure 4.9: Post-ATPG PCM
the target paths. During this process, some paths are determined as untestable
and the quality loss (i.e., coverage loss) occurs. The quality loss can be more
severe in the path set produced by the proposed method. This is because BnB-
SPM covers the process parameter space redundantly, while our method tries
to cover as large area as possible with a small number of paths and in order to
achieve this goal, it avoids redundancy. However, the redundancy can actually
mitigate the testability problem. Balancing this redundancy and efficiency in
a smart way is proposed in [74]. However, it does not still guarantee that the
quality loss does not occur, and we circumvent this testability issue completely
by integrating the minisat solver. Our method as well as the methods in [89]
can consider only testable paths as candidate paths in this way. Figure 4.9
shows post-ATPG PCM values using this approach. The proposed method
111
Table 4.2: Post-ATPG PCM and runtimes for ISCAS85 circuits
BnB-SPM [89] Proposed
q(Π) CPU(s) q(Π) CPU(s)
c499 0.868 0.021 0.888 0.071
c880 0.983 0.004 0.999 0.005
c1355 0.851 0.027 0.878 0.085
c2670 0.998 0.027 0.999 0.019
c5315 0.996 0.030 0.998 0.045
c432 0.004 0.006 0.004 0.009
c1908 0.000 0.055 0.000 0.017
c3540 0.012 0.074 0.012 0.041
c6288 0.034 57.280 0.034 118.108
c7552 0.005 0.033 0.005 0.028
still provides better quality than BnB-SPM, but compared to the pre-ATPG
PCM values (i.e., the PCM values when assuming all paths are testable), the
improvement is marginal. The PCM values seem to approach a certain limit.
Actually, in many designs, the post-ATPG PCM value does not become one
even if all testable paths in the design are tested.
Table 4.2 shows the post-ATPG results for ISCAS85 circuits. For the
top 5 circuits, our proposed method provides better post-ATPG coverage than
BnB-SPM. BnB-SPM also shows adequate PCM values for 5 paths. However,
for the other 5 circuits, both methods result in path sets of unreasonable
quality. There are two possible explanations for this result and the limit in
or1200. First, the circuits may have false paths. In this case, the selected
paths may be actually good, but the test metric q(Π) fails to measure the
quality of the path sets correctly. This issue arises because q(Π) is defined
112
Table 4.3: Post-ATPG PCM and runtimes for ISCAS85 circuits using test margin
BnB-SPM [89] Proposed
TestMargin FalseNegative q(Π) CPU(s) FalseNegative q(Π) CPU(s)
c499 0% 0% 0.868 0.021 0% 0.888 0.071
c880 0% 0% 0.983 0.004 0% 0.999 0.005
c1355 0% 0% 0.851 0.027 0% 0.878 0.085
c2670 0% 0% 0.998 0.027 0% 0.999 0.019
c5315 0% 0% 0.996 0.030 0% 0.998 0.045
c432 3% 2.6% 0.817 0.015 2.6% 0.817 0.032
c1908 13% 1.6% 0.992 0.125 1.6% 0.992 0.260
c3540 3% 1.9% 0.998 0.075 1.9% 0.998 0.193
c6288 2% 0.2% 0.895 81.382 0.2% 0.895 210.865
c7552 3% 1.7% 0.945 0.038 1.7% 0.945 0.038
113
without the consideration about false paths. To deal with this issue, we need
to eliminate false paths in timing analysis by specifying them manually or by
an automated tool. This process can immediately improve q(Π) for the same
path sets. Secondly, there may exist some true paths that are not testable.
Such paths are common in any circuits, and the constraints that arise from
scan types (e.g., launch on capture, launch on shift, etc) increase the number
of such paths, which are not considered in this experiment. If such paths
cause poor quality, we can test the target paths faster-than-at-speed (i.e., we
can use a shorter clock period for testing than the operating clock period).
The difference between the operating clock period and the test clock period is
called the test margin. A systematic approach to determine the optimal test
margin is proposed in [78]. For the rest of the experiments, we will assume
that the test quality is degraded after ATPG for the second reason.
Table 4.3 shows the post-ATPG PCM values using the test margin. The
test margins are expressed by the percentage to the operating clock period. For
the top 5 circuits, we do not use the test margin because the PCM values are
satisfactory considering only 5 paths are selected. In this case, there is no false
negative (i.e., yield loss) under our model because chips are determined as bad
ones only when actual faulty paths are found. The amount of false negative
chips is expressed by the percentage to the number of produced chips. For the
bottom 5 circuits, we use non-zero test margins to obtain decent PCM values,
and both methods successfully achieve it. However, the proposed method does
not provide better quality of results than BnB-SPM. Actually they select the
114









































Figure 4.10: Post-ATPG PCM when both methods use 0.7% test margin
same paths, and this is because there are not many near-critical and testable
paths in these circuits. In this scenario, the selection simply does not matter.
Fortunately, this scenario is uncommon in highly optimized large industrial
circuits. Figure 4.10 shows the PCM values and the ratio of false negative
chips when 0.7% test margin is used. The improvement of the proposed method
shown in Figure 4.8 has disappeared in Figure 4.9 and now appears back in
Figure 4.10. The high quality is obtained at the cost of false negative, but
the proposed method does not incur additional yield loss in comparison with
BnB-SPM. Readers can identify this by checking that the amount of false
negative is similar at the same PCM value. This becomes clearer when we
use a more aggressive 1% test margin for the path set from BnB-SPM to
make up for the insufficient coverage. Figure 4.11 shows this scenario. The
115









































Figure 4.11: Post-ATPG PCM when proposed methods and BnB-SPM use
0.7% and 1% test margin, respectively
coverage is improved slightly but in this case, our proposed method provides
a significantly better PCM value and a better yield using the same number of
paths, or the same PCM value and better yield using a significantly smaller
number of paths.
4.7 Conclusions
In order to deal with the increasing variability of circuit delays and
the growing impact of small delay defects, we have presented a new class of
statistical path selection algorithm to generate a compact, high quality path
set. Our algorithm outperforms existing methods utilizing statistical correla-
tion of the delay variations intelligently. Most studies for path selection ignore
116
the testability issue with the hope that the improvements in pre-ATPG qual-
ity would be maintained after ATPG. Our experimental results have revealed
that in some designs, this is the case, but there exists other designs where the
improvement becomes marginal after ATPG. We have also found that if this
is not the problem of the test metric due to false paths, faster-than-at-speed
test can recover the original improvement. If it is the test metric issue, false
paths should be eliminated from the timing analysis. This process is usually
done before path selection for correct timing analysis but is often incomplete
and cumbersome. Thus, we may need an alternative test quality metric which
may be our future work.
117
Chapter 5
Testability Driven Statistical Path Selection
5.1 Introduction
The selection of paths to be targeted in delay testing has been an issue
over the past decades, and various path selection methods have been proposed.
Some of them target localized delay faults due to timing defects [63], and oth-
ers target accumulated delay faults due to process variations [47]. To deal
with ever-increasing process variation, a lot of effort has been devoted to de-
veloping a model for the variation and efficient algorithms for analyzing timing
behavior on top of the model [71]. Since the modeling and analysis techniques
are becoming mature, path selection algorithms that leverage the statistical
timing framework are gaining more attention [89, 16]. These algorithms take
advantage of the global and spatial correlations between delays and generate a
higher quality path set with a smaller number of paths. Previous approaches
are not aware of the correlations or use them in an ad-hoc manner, while the
recent algorithms [89, 16, 71, 77] can quantify them using the statistical timing
model.
Statistical path selection algorithms can be categorized as covering-
based algorithms [89, 16] and criticality-based algorithms [71, 77]. Covering-
118
based algorithms solve the well-known maximum coverage problem in the pro-
cess parameter space. In the process parameter space, the test of a path delay
covers a region, and the objective of the algorithms is to cover as large area
as possible with a limited number of paths. Criticality-based algorithms rank
paths by the criticality, which is the probability that a path become the critical
(longest) path. Covering-based algorithms take a test clock period as input
and try to find a specifically optimized solution for the given test clock period,
while criticality-based algorithms does not require the test clock period and
produces a path set that is good irrespective of the test clock period. They are
thus suitable for speed-binning where a test set is run at various frequencies.
Once the paths to be tested are determined by one of these algorithms,
they are fed into an ATPG (Automatic Test Pattern Generation) tool, which
generates the test patterns sensitizing the paths. During the ATPG process,
some paths turn out to be untestable which results in test coverage loss. Since
many physical paths can be untestable [25, 27], this coverage loss is a severe
problem. To make up for the loss, more paths may be selected and targeted
in ATPG, and this process can be continued until a desired test coverage is
achieved. However, the number of the iterations can be very large and the
iterations that involve running separate tools take a lot of time. In particular,
if the path selection algorithm used does not support incremental selection,
the path selection time gets longer as the process continues.
In the covering-based path selection, several approaches are proposed
to deal with that issue. In [74], the authors propose a method that covers the
119
parameter space m times, where m is provided by a user. In this method,
the test coverage loss does not happen unless the m paths that cover a region
are untestable at the same time. In [73], the authors use the paths which are
found in the path selection process but are discarded because they cover almost
the same regions as already selected paths. The proposed approach creates
a hierarchy of paths, and the children of a path are used as alternatives to
the path since the area covered by a child is contained in that of the path.
Both methods are efficient and can leverage conventional tools. However,
they require a decent ratio of testable paths to the selected paths and do not
guarantee the selection of a required number of testable paths. The authors
in [16] show that their covering-based algorithm, combined with a SAT solver,
can ensure k testable path selection, but the test quality metric used in [16] is
affected by false paths and actual post-ATPG results are not presented.
The testability issue in criticality-based path selection cannot be dealt
with merely by integrating a SAT solver and requires a new selection algorithm,
which is presented in this chapter. The major contributions of this chapter
can be summarized as follows.
• Most statistical test quality metrics in the literature as well as that in [16]
ignore false paths, which cripple the metrics. We define the (critical)
testable path coverage metric in the process parameter space, and the
test quality is measured independent of the existence of false paths.
• Our criticality-based algorithm guarantees the selection of k testable
120
paths, and unlike [16, 74, 73], our technique can also be used for speed-
binning.
• Our algorithm is based on the branch and bound framework proposed
in [89], and we show that the efficient pruning of the framework is still
applicable in the criticality metric. Also, the algorithm in [89] requires
O(kN) time, whereas our algorithm can run in O(N) time, where N is
the number of paths after pruning and k is the number of paths to be
selected.
• Instead of using incremental SAT for the testability check as in [35],
we propose an alternative method, reducing the number of the SAT
calls substantially. This new integration method can be applied to other
applications that require solving the ISAT problem defined in [35].
5.2 Background
In this section, we first define basic terms used throughout this chap-
ter. The arrival time, the required arrival time and the delay are denoted by
AT,RAT and d, respectively. The timing slack S(p) of a (sub) path p from
vertex v to w is defined as
S(p) = RAT (w)− AT (v)− d(p). (5.1)
The slack of a path set U is the minimum of the path slacks in U and is denoted
by SU . The set of all paths in the given circuit is denoted by Ω. The chip
121
slack is the minimum of the slack of all paths in the circuit, and is denoted
by SΩ. If a path cannot be sensitized by any vector pairs, the path is a false
path. The viability [52] and floating-mode condition [14] are commonly used
sensitization criteria. Delay faults in false paths do not affect the functionality
of the circuit. Robustly testable paths can be tested irrespective of the other
path delays, while non-robustly testable paths can be tested only if some paths
providing the initial value are not slow. The functionally un-sensitizable paths
are a sufficient condition for false paths, and the statistically sensitizable paths
equivalent to non-robustly testable paths are a sufficient condition for true
paths. The functionally sensitizable paths can affect circuit timing only if
some other paths become slow and are sensitized at the same time. Thus they
are not testable by a pair of vectors in most cases. The formal definitions of
these conditions are shown in [14]. The true chip slack is the slack of the set
of true paths, and it is less than or equal to the slack of the set of all testable
paths and greater than or equal to the chip slack.
5.2.1 Deterministic vs. Statistical Path Selection
The selection of paths to be targeted in at-speed test has been done
though deterministic STA. This deterministic path selection has several draw-
backs compared to statistical path selection. First, path selection is done in
a particular corner, usually the worst case corner. The critical or near-critical
paths in other regions of the process space can be very different from those
in the worst case corner. Second, statistical path selection accounts for the
122
sensitivities of path delays to various uncertain parameters. For example, sup-
pose that we are to select one out of path A and B. Also suppose that path A
and path B have the same nominal delay but path A has a larger sensitivity
to the thickness of metal 1 than that of path B. It is clear that we should
select path A for a better test quality, but deterministic path selection does
not distinguish path A and B. Also, statistical path selection works on an
abstracted model and it can be generally used for various sources of variation
(e.g., metal 1, metal 2, Vdd, aging effects, etc, or any combinations of those).
The sensitivities considered in statistical path selection are determined by the
statistical timing model used, and we do not have to develop a particular path
selection algorithm for each source of variation. Third, deterministic path se-
lection does not leverage statistical correlations of path delay variations. For
example, deterministic path selection may select paths sensitive to metal 1
only, whereas statistical method can avoid that because we may be able to
obtain enough information about metal 1 from a few metal-1-sensitive-paths.
Similar scenarios occur in many cases. A deterministic method may select
paths in a particular region of a die, a particular module, or a particular part
of a circuit (e.g., a chain of reconvergent paths). On the other hand, statistical
path selection ensures diversity to increase the test quality.
5.3 Problem Formulation
Our objective is to construct a set of potentially longest testable paths
in a design. In practice, the longest path means the least slack path and in this
123
chapter we use them interchangeably. In the deterministic timing model, the
iterative process of the top k longest path selection and ATPG can enumerate
testable paths in the order of their lengths, but in the statistical timing model,
the selection of the top k paths with highest criticality and the following ATPG
will fail to list testable paths in the order desired.
Figure 5.1: Slack distributions of 4 paths
Let us consider a circuit with 4 paths, and suppose that Figure 5.1
shows the PDFs (probability density functions) of the path slacks. Also sup-
pose that A and B are perfectly correlated and other slacks are independent
of each other. Out of the 4 paths, we will select 2 paths for at-speed test. In
this example, we will consider two different scenarios on the testability of each
path. First, we assume that all the paths are testable. If we select two longest
path using deterministic STA, A and B will be selected. However, since A and
B are perfectly correlated, and their slacks change together, path B will not
be the critical path in any realizations (i.e., any produced chips). Thus it is
124
redundant to test path B and some test resources are wasted. If we select the
top two paths with the highest criticality using statistical STA, A and C will
be selected. This is because SSTA knows the correlation and the criticality of
B becomes zero. Thus we can more effectively test the circuit with the same
test budget. Now, we consider the case where path A is not testable and the
other paths are testable. The following ATPG process will determine that A
is not testable, and one more path is queried to the statistical STA tool which
will return path D since the criticality of B is zero. In the end, C and D are
tested and the test quality is degraded. Readers can easily identify B and C
as the best paths, but SSTA fails to select them.
5.3.1 Testable Path Coverage Metric
To get around this issue, we confine our solution space to testable paths
from the path selection stage and develop a new quality metric generalizing
the path criticality concept.
Definition 9. A path p is critical in a path set U if S(p) ≤ S(s) for all s ∈ U .
Definition 10. The criticality of a path p in a path set U is the probability
that a path p is critical in U .
The criticality in a path set U is denoted by λU . Then, the conven-
tional path criticality becomes λΩ. Let Ωt denote the set of all (robustly or
non-robustly) testable paths. For a given path set Π, (critical) testable path
coverage metric (TCM) is defined as the probability that the path set Π con-
125
tains the longest testable path and is denoted by q(Π). Then, we can write
q(Π) = P (
⋃
p∈Π
p is critical in Ωt). (5.2)
We will compare TCM with other test metrics in the existing work.
The set of tested paths is denoted by Π and we call SΠ the test slack. The
correlation coefficient of the chip slack and the test slack used in [46] can also
be a good metric to evaluate a path set for at-speed testing. However, since a
true or false path is defined based on delay values, the true chip slack cannot
be simply computed by the statistical minimum of the slacks of a certain set of
paths. The chip slack computed by SSTA is an upper bound of the true chip
slack, and we can use it for the correlation coefficient, but the sub-optimality
even under the timing model is unavoidable in practice. To compare TCM to
the other test metrics, we first consider the probability that a DUT (device
under test) is classified to a correct bin (i.e., operating frequency). Also, for
the comparison, we assume that there is only one bin, the test margin (i.e.,
guard band) is zero and all paths are testable as in the previous studies where
the metrics to be compared are defined. Then, we can write
P (Correctly classified) =
P (SΠ < 0|SΩ < 0)P (SΩ < 0) + P (SΩ > 0)
(5.3)
where P (SΠ < 0|SΩ < 0) is called PCM (process space coverage metric) in [74]
(a.k.a., the detection probability in [16]) and it is a statistical counterpart to
the conventional fault coverage. P (Correctly classified) has a simple linear
relation with the PCM. Given SΩ < 0, it is a sufficient condition for SΠ < 0
126
that Π contains the critical path. Thus we obtain
P (Correctly classified) ≥ q(Π). (5.4)
With TCM, our objective becomes to maximize q(Π) subject to |Π| = k
where k is the number of paths to be selected. In this problem, unless the size
of the path set should be constrained, it leads to a trivial solution which is the
set of all testable paths in the circuit. Due to the mutually exclusive nature





if any two paths in the circuit are not exactly identical. This is true in most
practical applications [42]. Thus if we can compute λΩt efficiently, selecting
top k paths with highest λΩt leads to the optimal solution. However, this
is not possible unless we explicitly separate all testable paths from all paths
in the circuit. In the following section, we develop an efficient algorithm to
solve the optimization problem without the explicit enumeration. Even if we
present the algorithm in the context of the testability, it can be used for the
selection of paths with any specific property (e.g., functionally sensitizable,
robustly testable, etc.).
5.4 Criticality Based Testable Path Selection Algorithm
5.4.1 Properties of Criticality
In this subsection, we derive some basic properties of the criticality in
a path set.
127
Proposition 3. Let U and V be path sets such that U ⊂ V . Then, λU(p) ≥
λV (p) for all p ∈ U .
Proposition 4. Let U and V be path set such that U ⊂ V . Then, λV (p) ≥
λU(p)− P (SV \U < SU) for all p ∈ U .
To explain the motivation of these two propositions, we will consider
an example with U={p1,p2,p3} and V={p1,p2,p3,p4,p5}. Suppose that U was
of interest initially, so we computed λU . Later we have found p4 and p5, so
we may need to compute λV for every path. In this case, the two propositions
allow us to estimate λV efficiently without the re-computation. Figure 5.2
illustrates the change of the criticality. Since additional paths are introduced,
Figure 5.2: The regions that each path is critical in the process parameter space
are shown. The area of each region represents the criticality. For example,
λU(p1) = λV (p1) = 0.25.
more conditions are required for each path to be critical, so the criticality of
each path decreases (Proposition 1). However, the decrement is at most the
probability that the minimum slack of new paths is less than that of existing
128
paths (Proposition 2). Thus, we have an upper bound and a lower bound of
the new criticality when additional paths are introduced or, conversely, some
paths are eliminated.
Proposition 5. Let U ′, U, V ′ and V are path sets such that U ′ ⊂ U and
V ′ ⊂ V . Then, P (SU ′ < SV ) ≤ P (SU < SV ′).
This proposition can also be proved by inclusion relation in the sample
space.
Proposition 6. In a path set U , the number of the paths whose criticality in
U is greater than τ is at most 1/τ .
Proof. Let k be the number of paths whose λU > τ . Suppose k > 1/τ .
Then the sum of the criticality of the k paths is greater than 1 which is a
contradiction.
5.4.2 Proposed Algorithm
Our algorithm takes a timing graph G and the number of paths to be
selected, k, as input and lists k testable paths that maximize TCM. In the
beginning, we perform statistical static timing analysis on the timing graph
and thus arrival times, required arrival times and slacks are available for each
vertex. These timing quantities are represented in the form of (1.1) (the canon-
ical form). Basically, we will adopt the branch and bound framework proposed
in [89]. The framework uses recursive depth-first traversal with efficient prun-
ing. From each end point (input pins of flops and primary outputs) in the
129
timing graph, the traversal begins toward start points (output pins of flops
and primary inputs). The framework maintains the k best paths found so far
during the traversal and whenever a new path is found, one path among the
k paths is replaced, or the new path is just discarded depending on a chosen
test quality metric. Figure 5.3 illustrates the recursive depth-first traversal
Figure 5.3: A recursive depth-first traversal
which is visiting the vertex h via the sub-path π and will search the solution
space that consists of the paths going through π. In the vertex h, the solution
space is naturally split into the paths via f and π and the paths via g and
π (branching). Suppose that it visits f first. Then the sub-path π becomes
(f, h, i, j, k), and the depth-first traversal will search the sub-space that con-
sists of the paths via (f, h, i, j, k). Before searching this sub-space, we can
efficiently check if it is worthwhile to search the sub-space using the following
two strategies.
• If a sub-path is not testable, all the paths going through the sub-path
are not testable. Thus we check if the sub-path π is testable using an
incremental SAT solver [35] or RESIST [27], and if it is not testable, the
branch is pruned.
130
• The slack of the sub-path π represents the minimum of the slacks of all
the paths via the sub-path π. Using the sub-path slack, we can see if
there exists a better path in the sub-space than current k best paths.
The detailed condition will be explained later.
The first condition allows us to consider testable paths only, and due to the
second condition, our algorithm inspects only a small portion of the testable
paths.
Algorithm 8 SelectTestableCriticalPaths
Input: G, k; m
1: Π← ∅, Σ← ∅
2: for each testable path p obtained from DFS of G with pruning do
3: if |Π| < k then
4: Insert p to Π and continue
5: end if
6: Insert p to Σ
7: if |Σ| < m then
8: continue
9: end if
10: Ψ← Π ∪ Σ
11: for each path s ∈ Ψ do
12: compute the path slack S(s)
13: compute the complement slack S(s)
14: λΨ(s) = P (S(s) ≤ S(s))
15: end for




The overall algorithm is shown in Algorithm 8. Our algorithm main-
tains k best paths at any given time and the set of the k best paths is denoted
131
by Π (line 1). In [89], whenever a new path p is found, the path p is examined
and Π is updated if desirable. However, our algorithm collects m paths before
updating Π (line 7). Once m paths are accumulated, the algorithm consid-
ers the replacement of the paths in Π with the paths in Σ. In other words,
Ψ = Π∪Σ becomes a set of candidate paths (line 10), and we find k best paths
again out of Ψ. Note that the size of Ψ is k +m. We compute the criticality
among the candidate paths (i.e., λΨ). To compute λΨ, for each path in Ψ,
we first calculate the complement slack (line 13), which is the minimum of
the slacks of all the other paths in Ψ except the path. Then, λΨ is efficiently
computed by (1.2) (line 14). Finally, the best k paths become top k paths
with highest λΨ (line 16).
Due to Proposition 3, the criticality λΨ becomes an upper bound of λΩt
and paths with high λΨ can be considered as promising paths. Proposition 3
also implies that the larger m we use, the tighter the bound is. Thus, a large
m value can improve TCM compared to the case m = 1. Actually, if we set
m such that m − k is the number of the total testable paths in the circuit,
λΨ = λΩt and the algorithm produces the optimal solution.
A straightforward algorithm to compute the complement slacks requires
O((m + k)2) time, while the binary partition tree proposed in [76] reduces
it to O(m + k) time. Given the candidate path set Ψ, we can construct a
balanced binary tree where each leaf corresponds to a path, as in Figure 5.4.
We will compute the slack and the complement slack for each node in the
tree. The slack of each leaf is set to the slack of the corresponding path, and
132
Figure 5.4: A binary partition tree to compute λΨ
the complement slack of the root is initialized to infinity. Then, the slack
and complement slack of the other nodes are calculated as follows. First, we
traverse the binary tree in the bottom-up fashion, computing the minimum
of two child for each node. This becomes the slack of the node. Once we
reach the root, we traverse the binary tree again in the top-down fashion,
computing the minimum of the parent complement slack and the sibling slack.
This becomes the complement slack of each node, and the complement slack
of each leaf becomes the complement slack of the corresponding path. In
this way, the complement slacks are computed and then the criticality values
are calculated. Finding the kth largest value in a list can be done in a liner
time using the median of medians algorithm (a liner time selection algorithm)
which also gives us the top k paths with highest criticality. Thus, the overall
complexity of updating Π is O(k+m). Let N be the number of the inspected
paths (not pruned paths). Then, the update of Π is performed N/m times,
and the time complexity of our algorithm is O((k+m)N/m). Therefore, if we
133
set m = O(k), the time complexity becomes O(N), and the space complexity
becomes O(k).
5.4.3 Pruning Methods
During the depth first traversal, a sub-path π and its slack are given and
we find out if there exists a better path in the paths via π than the current k
best paths Π. At any given time, Ψ (previous candidate paths including Π), Π
(k best paths), Σ (newly found paths) and λΨ for each path in Ψ are available
and they can be used for pruning.
First, we formally define the condition that a candidate path is dis-
carded in the proposed algorithm when m = 1.
Definition 11. For a path p, if λΠ∪{p}(p) ≤ λΠ∪{p}(s) for all s ∈ Π, then the
path p is worthless.
Let Ωπ be denote the set of all paths via π. A straightforward condition
to prune π is as follows.
Proposition 7. If P (S(π) < SΠ) ≤ λΠ∪Ωπ(s) for all s ∈ Π, then all paths via
π are worthless.
Proof. We will find an upper bound of the left hand side in the inequality of
Definition 11 and an lower bound of the right hand side which will give us a
sufficient condition. For any path p in Ωπ, we have an upper bound
λΠ∪{p}(p) = P (S(p) < SΠ) ≥ P (S(π) < SΠ) (5.6)
134
by Proposition 5. For every s ∈ Π, p ∈ Ωπ, we have a lower bound
λΠ∪{p}(s) ≥ λΠ∪Ωπ(s) (5.7)
by Proposition 3.
However, this condition requires computing λΠ∪Ωπ for all s ∈ Π which
means that we need to traverse the binary partition tree at every branching
point in the depth-first traversal. Clearly, this is time consuming so we may
desire to perform pruning in a constant time using already computed criticality
λΨ.
Proposition 8. If 2×P (S(π) < SΠ) ≤ mins∈ΠλΨ(s), then all paths via π are
worthless.
Proof. We can find a looser bound than the lower bound used in Lemma 7.
For every s ∈ Π, p ∈ Ωπ, we have a lower bound
λΠ∪{p}(s) ≥ λΠ∪Ωπ(s) ≥ λΨ∪Ωπ(s)
≥ λΨ(s)− P (S(π) < SΨ)
(5.8)
by Proposition 4. Thus,
P (S(π) < SΠ) ≤ λΨ(s)− P (S(π) < SΨ)
⇔ P (S(π) < SΠ) + P (S(π) < SΨ) ≤ λΨ(s)
(5.9)
is a sufficient condition to the inequality of Definition 11, and 2 × P (S(π) <
SΠ) ≤ λΨ(s) becomes a more stringent condition.
135
These exact methods allow us to prune fruitless branches effectively
when m is a small value. However, as m increases, we update Π in a lazy
manner and this can degrade the performance of the pruning. In other words,
Σ may have some paths that are useful for pruning but they are not used
properly when m is large. Since it is important to use a sufficiently large m
value for both runtime and quality of results, our algorithm employs a heuristic
method, which is to compare the paths via π to the path with the minimum
λΨ in Π. If all paths in the branch π are worse than the worst path in Π, the
branch is highly likely to be fruitless. Let w be the path with the minimum
λΨ in Π. For all paths p via π, we have
λΨ∪Σ∪Ωπ(p) =P (S(p) < min(SΩπ\{p}, SΨ, SΣ))
≤ P (S(π) < min(SΨ, SΣ))
(5.10)
by Proposition 5. Thus, if
P (S(π) < min(SΨ, SΣ)) ≤ λΨ∪Σ∪Ωπ(w) (5.11)
is satisfied, all paths via π have a lower criticality than w, and we consider
that the paths via π are worthless.
5.5 Selection by a Threshold
The proposed algorithm in the previous section is stable in the sense
that before running the algorithm, we can estimate the amount of the re-
sult (i.e., the number of selected paths) and the run time, which is usually
proposition to the number of paths. Thus, users seem comfortable to select a
136
desired number of paths. However, the goal of testing is to assure a desired
quality all the time so we often question how many paths we should select to
obtain a desired TCM value. This is a difficult problem, but in most cases,
we want to cover all potentially critical paths, so it is useful to select all paths
whose criticality is greater than a threshold value. Deterministic STA tools
also provide similar two options, and in this section we present a modification
for supporting the selection-by-a-threshold.
Our modified algorithm takes a timing graph and a threshold value τ
as input and tries to list testable paths whose criticality among testable paths
is greater than or equal to the threshold value τ . The number of paths to
be selected is automatically determined depending on the amount of variation
and the circuit. The overall algorithm is shown in Algorithm 9. Basically the
Algorithm 9 SelectTestableCriticalPathsByThreshold
Input: G, τ
1: Π← ∅, Ψ← ∅
2: for each testable path p obtained from DFS of G with pruning do
3: add p to Π
4: if a certain criterion is satisfied then
5: for each path p in Π do
6: let Ψ be the set of all paths found so far from the DFS
7: compute λΨ(p)
8: if λΨ(p) < τ then







algorithm accumulates found testable paths into Π (line 3), and if a certain
criterion is satisfied (line 4), we examine Π and discard unnecessary paths. We
call the criterion the examination criterion, which will be explained later. If
λΩt of a path is smaller than τ , the path is not necessary and can be discarded.
However, as mentioned earlier, λΩt is difficult to calculate so we use an upper
bound provided by Proposition 3. Let Ψ be the set of all paths found so far
from the DFS (line 6). For each path in Π, we compute λΨ instead of λΠ
because λΨ is a tighter upper bound of λΩt than λΠ. If the upper bound
is less than τ , we can safely discard the path (line 8 and 9). Note that to
compute λΨ, we do not have to maintain actual paths in Ψ, and we can just
store the minimum slack of discarded paths. In the case that the pruning is
not performed and the examination is done only once in the end, Ψ = Ωt
and Algorithm 9 produces the exact result. If the pruning is performed or
the examination is done in the middle, Ψ is a subset of Ωt and, since we use
an upper bound to discard paths, the algorithm results in a superset of the
exact solution. However, if τ is small enough, the pruned paths (Ωt − Ψ)
have negligible impact on the path criticality (i.e, λΨ ≈ λΩt). Also, in our
application, the exact set is not necessary, and the superset can be a set of
good candidate paths for at-speed test. After we perform the examination, we
hope to eliminate some paths from Π in order to keep a decent memory usage.
One may think that the number of eliminated paths is not guaranteed so it is
not possible to reduce the memory usage. However, Proposition 6 allows us
to calculate how many paths will remain after the examination. After each
138
examination, the size of Π is at most 1/τ .
The frequency of the examination determines the time and space com-
plexity of the algorithm. We can consider two extreme examination criteria;
one is the case that the examination is performed only once in the end, and
the other is the case that it is performed whenever a new path is founded.
Both the time and space complexity in the first case are O(N) where N is the
number of the testable paths found by the DFS. In the second case, the time
complexity is O(min(N, 1/τ)N), and the space complexity is O(min(N, 1/τ)).
5.5.1 Modification of the Pruning Method
For the selection by a threshold, the pruning method should also be
modified. The heuristic pruning fits for Algorithm 8, whereas an exact pruning
method can be employed to Algorithm 9. We maintain SΨ at any time, and
we are given a subpath π and its slack during the depth first traversal. For
each branch, if
P (S(π) ≤ SΨ) ≤ τ (5.12)
then, for every path p passing though π,
λΩt(p) = P (S(p) ≤ SΩt) ≤ P (S(π) ≤ SΨ) ≤ τ (5.13)
due to Proposition 5 which means that all paths passing through π have smaller
criticality values than τ so we can prune the branch. Due to this pruning, the
number of discovered (and inspected) paths from the DFS is much smaller




For path selelction tools, users often request more paths than the al-
ready selected paths. In the traditional flow, this is mainly because some
selected paths are determined as untestable in the ATPG process and the test
coverage loss occurs. In our proposed flow, this will not happen so the incre-
mentality that speeds up the algorithm using the results of the previous runs
may not be as important as the traditional flow. However, there are still some
needs for the incrementality. For example, the test budget may increase so
more paths can become affordable. If some testable paths are available from
the previous runs, we can easily speed up our algorithm.
The upper bound λΨ becomes tight as the size of Ψ increases. Also
it is more effective if Ψ contains many paths with high criticality. However,
it takes some time for the bounding to become effective by finding several
testable paths with high criticality. If the algorithm takes Π and SΨ over
from a previous run, the bounding is very effective from the beginning. This
bounding can prune the paths that are already in Π, so the algorithm should
start with Π, and only newly discovered paths should be added to Π. Since
the incremental algorithm prunes more testable paths from the beginning, the
computed criticality becomes a lessen upper bound of λΩt than that when the
incrementality is disabled, and the produced path set is a superset of the result
of the non-incremental algorithm.
140
5.6 Integration of a SAT Solver
Boolean satisfiability is a problem of finding an assignment for making
a given boolean formula evaluate to true, or proving that such an assignment
does not exist (unsatisfiable). A literal is a variable or the negation of a vari-
able, and a clause is disjunctions of literals. We are typically interested in
the special cases of boolean satisfiability where the formulae are required to
be in the conjunctive normal form (i.e., conjunctions of clauses). To check
the testability, we first take a circuit as input and assign a variable to each
net. Then we extract clauses for enforcing the relations among the variables
due to the function of each gate. The derivation process and more examples
are found in [35, 39]. As mentioned earlier, to prune untestable branches, we
add some clauses and solve the current satisfiability problem whenever the
DFS goes forward. In other words, we solve the testability problem of a path
incrementally to find out untestable branches. If a SAT solver used supports
this type of usage (i.e., an incremental SAT solver is used), the approach can
be efficient [35]. However, there is also some penalty in this approach. For
example, if all paths are testable, the number of SAT calls increase unneces-
sarily. In the case of [35], the SAT solver seems specially optimized for this
scenario, and each SAT call seems very light-weighted. However, this is not
the case in general SAT solvers.
To deal with this issue, we propose an alternative approach that lever-
ages the conflict analysis, with which modern SAT solvers are equipped. With-
out checking the testability of subpaths, we only check the testability of a full
141
Figure 5.5: An incremental solver has been used to identify untestable
branches. If the conflict analysis is used, we can locate them more efficiently
even without using incremental SAT.
path. If the path is not testable, the SAT solver performs the final conflict
analysis which will imply which segments cannot satisfy the constraints for the
testability at the same time. In our implementation, we assign one variable
for each segment of a path, and the variable becomes true if the corresponding
segment satisfies the testability constraint. Then we let the SAT solver finds
an assignment that all the variables become true. In the case of minisat [21],
it takes a set of variables called the assumptions and find an assignment that
makes all assumptions evaluate to true. Thus we put the variables that rep-
resent the testability of each segment into a list of assumptions. If it is un-
satisfiable under the assumptions, the minisat will produce a set of conflict
assumptions, by which we can locate the untestable branch, from which we
142
can start over the DFS.
Figure 5.5 illustrates the conventional flow and the proposed flow, and
a part of the DFS tree is shown on the top. The variables for the testability
of each segment are denoted by x1, x2, · · · , x5. Thus if all the variables can
be set to true at the same time, the path can be testable. Let us suppose
that π1 is an untestable branch (i.e., all paths via π1 are not testable) so we
want to go into π2 by pruning π1. In the conventional flow, we incrementally
check the testability and in this case, we found π1 is an untestable branch
after 3 calls. In the proposed flow, we just check the testability of the full
path, and the SAT solver will indicate the path is untestable and will produce
the conflict assumptions. Even if π1 is the shortest untestable branch (i.e.,
the branch corresponding to x1 and x2 is testable), there are several possible
explanations. For example, the conflict assumptions can be x1 = 1 and x3 = 1
which means that the corresponding segments cannot satisfy the constraints
together. Similarly, they can be x2 = 1 and x3 = 1. However, if π1 is the
shortest untestable branch, the conflict assumptions should include x3 = 1
but x4 = 1 and x5 = 1. Thus, we can locate the shortest untestable branch
easily by finding the leading conflict assumption and can go into π2 after just
one SAT call.
We may enhance the proposed approach further. Let us consider the
case that x2 = 1 and x3 = 1 are the conflict assumptions and let us denote the
segments corresponding to x1, · · · , x5 by s1, · · · , s5, respectively. There may
be other paths passing through s2 and s3 but s1, and the SAT solver is called
143
to see if such paths are testable. However, we already know that those paths
are not testable. Thus if that information is stored, we can reduce the number
of SAT calls more. However, some SAT solvers internally store the information
by adding a clause called the learnt clause. In this case, the benefit of storing
it externally may be marginal.
5.7 Experimental Results
We implemented the proposed algorithms in C++, and used ISCAS85
circuits as benchmarks. We also used an ethernet IP core (eth), the tv80 mi-
croprocessor core (tv80), and the or1200 open-source microprocessor including
a 32-bit, 5-stage Wallace multiplier. To check the testability, minisat [21] was
incorporated in our implementation. All experiments were performed on a
3.0GHz 2 Xeon X5570 Linux machine. The tv80 and eth are technology-
mapped to the IBM 45nm library, and or1200 and ISCAS85 circuits use the
TSMC 180nm library. The nominal delay annotated to each edge in the tim-
ing graph is the sum of the gate and wire delays obtained from the Standard
Delay Format (SDF) file. In this experiment, a quad tree with 3 levels is used
to model spatial correlation which results in 21 global sources of variation.
This allows us to verify the proposed algorithms under difficult conditions.
The global sources of variation associated with the first, the second and the
third level can change the delay of each edge up to ±4%, ±5% and ±6% of its
nominal value in the 3σ case, respectively. The delay of each edge also has the
independent variation, and the 3σ point is 5% of the nominal delay. In our
144




















(a) eth (avg. path length: 13.1 gates, testable ratio: 82.4%)





















(b) tv80 (avg. path length: 37.6 gates, testable ratio: 0.03%)
Figure 5.6: The optimality in pre-ATPG path selection achieved by SSTA is
destroyed during the ATPG process. Our testability driven approach considers
the testability in the first place and achieves superior quality of results even
in the extremely low testable ratio.
145
experiments, we assume that non-robust tests can determine the delay of the
path being targeted irrespective of other path delays as in robust tests. We first
compare Algorithm 8 with two other methods. One method (SSTA+ATPG) is
to select paths with highest criticality λΩ using SSTA without considering the
testability and then to check the testability using ATPG. The untestable paths
are discarded and the procedure is iterated until a desired number of paths
are obtained. The other method (STA+ATPG) is the same as SSTA+ATPG
but nominal slacks (nominal delay) are used instead of the path criticality. In
this way, given a desired number of paths, we can construct three different
path sets from each method, and the (critical) testable path coverage (TCM)
is measured by Monte Carlo simulation using 40000 samples. Note that only
testable paths are used in measuring TCM.
Figure 5.6 shows TCM archived by each method as the desired number
of paths increases. From SSTA+ATPG method, we obtain the testable ratio,
which is the ratio of testable paths to the total selected paths. According
to our experiments, the lengths of paths (in the number of gates) shows a
strong correlation with the testable ratio, so we also show the average path
length of the total selected paths from SSTA+ATPG method. The correlation
seems to come from the fact that longer paths require more constraints, mak-
ing the SAT problem for ATPG more difficult to be satisfied. In eth, many
paths are testable, and SSTA+ATPG provides significantly better quality than
STA+ATPG. On the other hand, the quality is severely degraded in tv80 due
to the very low testable ratio. The proposed algorithm produces a high quality
146
path set regardless of the testable ratio. Figure 5.6(b) also show the improve-
ment of TCM by a large m value, with which the proposed algorithm finds a
solution in a less “greedy” manner.

















Figure 5.7: The pruning technique allows us to select paths efficiently, and
the runtime grows linearly with respect to the number of selected paths when
m = k.
To show the runtime scalability of the proposed method with respect
to the number of selected paths, we selected paths for each case that m = 1
and m = k. For this experiment, the circuit or1200 is used which consists
of about 40k gates. Figure 5.7 shows that the runtime increases slowly when
m = k, compared to the case m = 1. Table 5.1 compares Algorithm 8 with
SSTA+ATPG in ISCAS85 circuits. Both methods are iterated until 5 paths
are obtained. However, for c499 and c1355, 50 paths are selected because
they have many near-critical paths. In SSTA+ATPG, the threshold value
147
Table 5.1: Our experiment for SSTA+ATPG is performed in a single tool and the runtime results are
much better than that in the typical industrial setting where the two tools are separated. Nevertheless,
the proposed method shows the significant speed-up over SSTA+ATPG. Note that the proposed method
also generates test patterns.
SSTA+ATPG Proposed
Circuit #paths ratio #i TCM CPU(s) #i TCM CPU(s)
ATPG PathSel Total
c17 18 100 5 1.000 0.000 0.000 0.000 1 1.000 0.000
c432 119332 2.8 5 0.373 0.077 0.003 0.082 1 0.780 0.019
c499 332928 45.8 1 0.888 0.016 0.013 0.030 1 0.901 0.079
c880 14470 100 1 0.913 0.010 0.002 0.010 1 0.903 0.005
c1355 332928 49.2 1 0.863 0.017 0.011 0.028 1 0.870 0.058
c1908 1867970 19.5 322 0.0 62.1 395.8 464.9 1 0.869 0.238
c2670 54214 83.0 1 0.870 0.017 0.011 0.029 1 0.870 0.030
c3540 5892630 0.3 15 0.0 0.673 0.066 0.747 1 1.000 0.134
c5315 644060 26.7 2 1.000 0.047 0.012 0.061 1 1.000 0.056
c6288 3.7269×1016 4.8e-7 14 0.006 17780.0 442.3 18369.2 1 0.991 33.16
c7552 611804 2.0 12 1.000 0.490 0.015 0.507 1 1.000 0.063
avg. 0.629 1784.3 83.8 1712.3 0.926 3.077
148
begins with 0.001 and is multiplied by 0.1 for each iteration. Table 5.1 shows
the testable ratio (ratio), the number of the iteration (#i), TCM and the
runtime. The circuits c432, c1908, c3540, c6288 and c7552 have a small number
of testable paths similar to tv80, and both the runtime and the quality are
poor. The proposed method can easily handle even c6288 that is notorious for
numerous paths. For a given number of paths, the proposed method produces
a better path set than that from SSTA+ATPG, but it is difficult to determine
an adequate number of paths for achieving a high quality. For example, if we
desire to achieve more than 80% TCM, 5 paths are inadequate in c432.
Alternatively, we can select paths by a threshold τ . For ISCAS85 cir-
cuits, Algorithm 9 selected paths using two threshold values, τ = 10−4 and
τ = 10−5. Then, we ran Monte Carlo simulations with 100,000 samples to
obtain TCM. Also during the Monte Carlo simulations, we counted the num-
ber of testable paths that became critical among testable paths at least once.
For convenience, we will just call such paths critical paths. Table 5.2 shows
the number of critical paths (# critical), the number of selected paths (# sel.
paths), TCM and the runtime. The proposed algorithm chose the number of
paths to be selected based on the circuit and the amount of variation. Since
c499 and c1355 have many-near critical paths, it selected adequate numbers of
paths from them. Readers can identify a good correlation between the num-
ber of critical paths and the number of selected paths. Since we inspected
106 samples in this experiment, the threshold 10−4 is not sufficient to capture
all the critical paths, and the TCM in c499 is not 100%. However, with the
149
Table 5.2: If 100% TCM is desired, we can select paths by a threhold value. Algorithm 9 with a low
threshold value as input found all paths that are testable and can potentially have the least slack among
all testable paths.
τ = 10−4 τ = 10−5
Circuit # critical # sel. paths TCM CPU(s) # sel. paths TCM CPU(s)
c17 2 4 1.000 0.002 4 1.000 0.005
c432 30 36 1.000 0.022 39 1.000 0.024
c499 94 131 0.997 0.046 207 1.000 0.061
c880 7 10 1.000 0.006 11 1.000 0.005
c1355 102 148 1.000 0.042 262 1.000 0.051
c1908 10 20 1.000 0.224 29 1.000 0.218
c2670 31 58 1.000 0.038 65 1.000 0.041
c3540 3 4 1.000 0.147 4 1.000 0.171
c5315 4 19 1.000 0.048 44 1.000 0.056
c6288 16 86 1.000 76.017 159 1.000 103.717
c7552 2 7 1.000 0.057 8 1.000 0.054
150
threshold 10−5, we successfully captured all the critical paths, although more
paths are used.




































Without incrementality (left bar)
With incrementality (right bar)
Figure 5.8: If the results of previous runs are available, we can further speeds
up the algorithm.
The incrementality is supported in the selection-by-a-threhold. To show
the benefit of the incrementality, we selected paths by several iterations while
decreasing the threshold. The threshold is decreased by 0.001 from 0.01. Fig-
ure 5.8 shows the runtime values when the incrementality is enabled and dis-
abled, respectively. If the incrementality is enabled, the runtime becomes
faster, but the result path set gets bigger for the reason explained in the in-
crementality section.
Table 5.3 compares the proposed SAT integration method with the one
using incremental SAT. As discussed earlier, the incremental method suffers
151
Table 5.3: Our novel SAT Integration method can enhance the performance
substantially.
Incremental Method Proposed
Circuit # SAT calls CPU(s) # SAT calls CPU(s) Improv.
c17 28 0.002 8 0.000 5x
c432 2768 0.050 975 0.019 2.63x
c499 6037 0.326 1415 0.079 4.12x
c880 176 0.007 36 0.005 1.4x
c1355 4681 0.242 896 0.058 4.17x
c1908 13501 0.539 4884 0.238 2.26x
c2670 1923 0.064 636 0.030 2.13x
c3540 6498 0.753 1627 0.134 5.62x
c5315 3153 0.302 357 0.056 5.39x
c6288 396567 352.394 32347 33.16 10.63x
c7552 4925 0.338 719 0.063 5.37x
avg. 40023 32.274 3991 3.077 4.43x
from a large number of SAT calls, while the proposed method reduces the
number of SAT calls significantly which in turn decreases the runtime sub-
stantially. The average speed-up is about 4.43X which is achieved without
any compromise.
5.8 Conclusions
We have presented a testability driven approach in criticality-based
path selection with post-ATPG results evaluated by Monte Carlo simulation.
In deterministic path selection, this approach is considered as an option for
speed-up, but our results have shown that it is necessary to ensure high test
quality in statistical path selection. Since the testability check using a SAT
152
solver actually involves test generation, our algorithm not only produces k
testable paths but also generates the test patterns sensitizing them in a single
run. If the test patterns are run a tester, the benefits of statistical methodology




This dissertation aims to develop variation-aware design automation
and testing techniques. Our efforts for achieving this goal range from funda-
mental theories and computational methods to high-level applications. Our
target applications leverage block-based SSTA, which is very fast but suffers
from poor accuracy. The combination of our two fundamental techniques,
refactoring and the conditioning operation, raises the accuracy of block-based
SSTA to Monte Carlo simulation level only using analytical and algorithmic
methods. As demonstrated in criticality computation, it can enhance various
existing SSTA applications immediately. The enhanced scalability in com-
parison with Monte Carlo simulation used to be uncharted territory, and we
strongly believe that it can foster new SSTA applications in electronic design
automation and VLSI testing. Some future directions may include:
• Refactoring is a very general technique that relies only on one algebraic
property and works in a highly abstracted domain. The issues being
dealt with in refactoring are closely related to the reason why many
efficient algorithms in graphs cannot produce the optimal solutions. We
believe that further studies on refactoring are beneficial to many other
154
areas even beyond EDA.
• In the conditioning operation, the random variables representing the
parameters become correlated due to given conditions. Computing the
conditional covariance takes up a large part of the runtime. However, the
benefit of explicitly computing it is not fully justified. If assuming their
independence does not degrade the accuracy severely, we may achieve a
significant speed-up at the cost of marginal accuracy loss.
• Our proposed criticality computation methods can accurately locate
where optimization is needed in the circuit. Thus using the methods,
various yield optimization techniques such as gate sizing, buffer inser-
tion and threshold voltage assignment can be developed. However, it
should be noted that sensitivities are subtle, so coarse grain knobs such
as discrete gate sizing may not be very effective. Currently, research on
transistor sizing for custom blocks is ongoing and it seems appropriate
because the knob is fine enough and the benefits of such optimization are
proven already in analog circuit designs. In order to guarantee adequate
yield in nanoscale technologies, it is required to find such fine knobs for
synthesized digital blocks.
• Our proposed test synthesis techniques are quite mature, and they are
ready to be proved on silicon. Thus, we believe that future research
should be performed in design-for-test (DFT) rather than pre-silicon test
synthesis. It is clear that the measurement on natural critical paths pro-
155
vides better correlation to the circuit performance than that of process
ring oscillators (PSRO). However, the measurement on natural critical
paths is still noisy with DFT circuits currently employed in industry.





Proof of Theorem 3
We continue to use the notation introduced in Theorem 3. However,









x2ϕ(x)dx = mϕ(m) + Φ(−m)
Proof. This lemma is proved in [17].
We define a random variable Z as
Z = X − Y. (A.1)
Then, we can easily identity that a = σz and α = µz/σz. We will first
prove (3.30) in Theorem 3. The probability density function is denoted by
158
ψ. Using (A.1), we get

















tψT (t|Z = z)dt.
(A.2)
Note that Φ(α) = P (X > Y ). Using the fact that E[T |Z = z] = cov[T, Z](z−
µz)/σ
2


































Applying Lemma 3 to (A.3) yields








which becomes (3.30) in Theorem 3. Next, to obtain (3.31) in Theorem 3, we
write
E[TU |X > Y ]





















tuψT,U(t, u|Z = z)dtdu.
(A.5)
159
Using the fact that (T, U |Z = z) ∼ N(µ,Σ) where
µ =
[










σ2t cov[T, U ]









(A.5) is reduced to




(cov[T, U |Z = z]
+µ1µ2)ψZ(z)dz.
(A.6)
Since the conditional covariance is independent of z, (A.6) can be re-written
as
































Substituting (A.8) into (A.7) results in
E[TU |Y > X]
= cov[T, U ]− (αϕ(α)/Φ(α))cov[T, Z]cov[U,Z]/σ2z
= cov[T, U ]− αβcov[T, Z]cov[U,Z]/a2.
(A.9)
160
Finally, from (A.9) and (A.4), we obtain
cov[TU |Y > X]
= E[TU |Y > X]− E[T |Y > X]E[U |Y > X]
= cov[T, U ]− (β2 + αβ)cov[T, Z]cov[U,Z]/a2
(A.10)
which proves Theorem 3.
161
Bibliography
[1] OR1200 RISC processor. http://www.opencores.org.
[2] A. Agarwal, D. Blaauw, and V. Zolotov. Statistical timing analysis for
intra-die process variations with spatial correlations. In Proceedings of
the 2003 IEEE/ACM international conference on Computer-aided design.
IEEE Computer Society Washington, DC, USA, 2003.
[3] A. Agarwal, F. Dartu, and D. Blaauw. Statistical gate delay model
considering multiple input switching. In Proc. Design Automation Conf.,
pages 658–663, 2005.
[4] A. Agarwal, V. Zolotov, and D.T. Blaauw. Statistical timing analysis
using bounds and selective enumeration. IEEE Trans. on Computer-
Aided Design of Integrated Circuits and Systems, (9):1243–1260, 2003.
[5] J. Benkoski, E. Vanden Meesch, L. Claesen, and H. De Man. Efficient
algorithms for solving the false path problem in timing verification. In
Proc. Int. Conf. on Computer Aided Design, pages 44–47, 1987.
[6] S. Bhardwaj, S. Vrudhula, and D. Blaauw. τau: Timing analysis under
uncertainty. In Proc. Int. Conf. on Computer Aided Design, pages
615–620, 2003.
162
[7] D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer. Statistical timing
analysis: From basic principles to state of the art. IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, 27(4):589–
607, 2008.
[8] R. K. Brayton, G.D. Hachtel, and A.L. Sangiovanni-Vincentelli. Multi-
level logic synthesis. Proc. of the IEEE, (2):264–300, 1990.
[9] S. Browne and W. Whitt. Portfolio choice and the bayesian kelly crite-
rion. Advances in Applied Probability, 28(4):1145–1176, 1996.
[10] H. Chang and S.S. Sapatnekar. Statistical timing analysis considering
spatial correlations using a single PERT-like traversal. In Proc. Int.
Conf. on Computer Aided Design, 2003.
[11] H. Chang, V. Zolotov, S. Narayan, and C. Visweswariah. Parameter-
ized block-based statistical timing analysis with non-gaussian parame-
ters, nonlinear delay functions. In Proceedings of the 42nd annual Design
Automation Conference, page 76. ACM, 2005.
[12] H.C. Chen and DHC Du. Path sensitization in critical path problem. In
1991 IEEE International Conference on Computer-Aided Design, 1991.
ICCAD-91. Digest of Technical Papers., pages 208–211, 1991.
[13] L.C. Chen, P. Dickinson, P. Dahlgren, S. Davidson, O. Caty, and K. Wu.
Using transition test to understand timing behavior of logic circuits on
UltraSPARC T2 family. In Proc. Int. Test Conf., pages 1–10, 2009.
163
[14] K.T. Cheng and H.C. Chen. Delay testing for non-robust untestable
circuits. In Proc. Int. Test Conf., pages 954–961, 2002.
[15] J. Chung and J. A. Abraham. A hiearchy of subgraphs underlying a
timing graph and its use in capturing topological correlation in ssta. In
Proc. Int. Conf. on Computer Aided Design, pages 321–327, 2009.
[16] Jaeyong Chung and Jacob A. Abraham. Recursive Path Selection for
Delay Fault Testing. In Proc. VLSI Test Sympo., pages 65–70, 2009.
[17] C. E. Clark. The greatest of a finite set of random variables. In Opera-
tions Research, pages 85–91, 1961.
[18] B.D. Cory, R. Kapur, and B. Underwood. Speed binning with path
delay test in 150-nm technology. IEEE Design & Test of Computers,
20(5):41–45, 2003.
[19] T.M. Cover and J. A. Thomas. Elements of Information Theory. New
York: Wiley, 1991.
[20] A. Devgan and C. Kashyap. Block-based static timing analysis with
uncertainty. In Proc. Int. Conf. on Computer Aided Design, pages
607–614, 2003.
[21] N. Eén and N. Sorensson. An extensible SAT-solver. In Theory and
Applications of Satisfiability Testing, pages 333–336, 2004.
164
[22] T. Enami, S. Ninomiya, and M. Hashimoto. Statistical timing analysis
considering spatially and temporally correlated dynamic power supply
noise. In Proc. Int. Symp. on Physical Design, pages 160–167, 2008.
[23] T. Enami, S. Ninomiya, and M. Hashimoto. Statistical timing analysis
considering spatially and temporally correlated dynamic power supply
noise. IEEE Trans. on Computer-Aided Design of Integrated Circuits
and Systems, 28(4):541–553, 2009.
[24] D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler,
D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: a low power
pipeline based on circuit level timing speculation. In Proc. Int. Symp.
Microarchitecture (MICRO-36), pages 7–18, 2003.
[25] K. Fuchs, F. Fink, and MH Schulz. Dynamite: an efficient automatic
test pattern generation system forpath delay faults. IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, 10(10):1323–
1335, 1991.
[26] K. Fuchs, M. Pabst, and T Rossel. Resist: a recursive test pattern gen-
eration algorithm for path delay faults considering various test classes.
IEEE Trans. on Computer-Aided Design of Integrated Circuits and Sys-
tems, 13(12):1550–1562, 1994.
[27] K. Fuchs, M. Pabst, and T Rossel. Resist: a recursive test pattern gen-
eration algorithm for path delay faults considering various test classes.
165
IEEE Trans. on Computer-Aided Design of Integrated Circuits and Sys-
tems, 13(12):1550–1562, 1994.
[28] R. Gandikota, D. Blaauw, and D. Sylvester. Modeling crosstalk in sta-
tistical static timing analysis. In Proc. Design Automation Conf., pages
974–979, 2009.
[29] K.R. Heloue, S. Onaissi, and F.N. Najm. Efficient block-based param-
eterized timing analysis covering all potentially critical paths. In Proc.
Int. Conf. on Computer Aided Design, pages 173–180, 2008.
[30] V. Iyengar, T. Yokota, K. Yamada, T. Anemikos, B. Bassett, M. De-
gregorio, R. Farmer, G. Grise, M. Johnson, D. Milton, et al. At-speed
structural test for high-performance ASICs. In Proc. Int. Test Conf.,
pages 1–10, 2007.
[31] D. Josephson and B. Gottlieb. The crazy mixed up world of silicon debug.
pages 665–670, 2004.
[32] C. Kashyap, P. Bastani, K. Killpack, and C. Amin. Silicon feedback
to improve frequency of high-performance microprocessors: an overview.
In Proc. Int. Conf. on Computer Aided Design, pages 778–782. IEEE
Press, 2008.
[33] J.L. Kelly. A new interpretation of information rate. IEEE Trans. on
Information Theory, 2(3):185–189, 1956.
166
[34] H. S. Kim and D. M. H. Walker. Statistical static timing analysis con-
sidering the impact of power supply noise in vlsi circuits. In Proc. Int.
Workshop on Microprocessor Test and Verification, pages 76–82, 2006.
[35] J. Kim, J. Whittemore, JP Marques-Silva, and K Sakallah. On apply-
ing incremental satisfiability to delay fault testing. In Proc. Design,
Automation and Test in Eurpoe, pages 380–384, 2000.
[36] K.S. Kim, S. Mitra, and P.G.Ryan. Delay defect characteristics and test-
ing strategies. IEEE Trans. on Design and Test of Computers, 20(5):8–
16, 2003.
[37] P.M. Kogge and H.S. Stone. A parallel algorithm for the efficient solution
of a general class of recurrence equations. IEEE Trans. on Computers,
22(8):796–793, 1973.
[38] S.V. Kumar, C.V. Kashyap, and S.S. Sapatnekar. A framework for block-
based timing sensitivity analysis. In Proc. Design Automation Conf.,
pages 688–693, 2008.
[39] T. Larrabee. Test pattern generation using Boolean satisfiability. IEEE
Trans. on Computer-Aided Design of Integrated Circuits and Systems,
11(1):4–15, 2002.
[40] Jiayong Le, Xin Li, and L.T. Pileggi. Stac: statistical timing analysis
with correlation. In Proc. Design Automation Conf., pages 343– 348,
2004.
167
[41] W. Li, S. M. Reddy, and S. K. Sahni. On path selection in combinational
logic circuits. IEEE Trans. on Computer-Aided Design of Integrated
Circuits and Systems, 8(1):56–63, 1989.
[42] X. Li, J. Le, M. Celik, and L.T. Pileggi. Defining statistical timing sen-
sitivity for logic circuits with large-scale process and environmental vari-
ations. IEEE Trans. on Computer-Aided Design of Integrated Circuits
and Systems, 27(6):1041–1054, 2008.
[43] J.J. Liou, K.T. Cheng, S. Kundu, and A. Krstic. Fast statistical timing
analysis by probabilistic event propagation. In Proc. Design Automation
Conf., pages 661–666. ACM, 2001.
[44] J.J. Liou, K.T. Cheng, and D. A. Mukherjee. Path selection for de-
lay testing of deep-submicrometer devices using statistical performance
sensitivity analysis. In Proc. VLSI Test Sympo., pages 97–104, 2000.
[45] Q. Liu and S.S. Sapatnekar. Synthesizing a representative critical path
for post-silicon delay prediction. In Proc. Int. Symp. on Physical Design,
pages 183–190, 2009.
[46] Q. Liu and S.S. Sapatnekar. Synthesizing a representative critical path
for post-silicon delay prediction. In Proc. Int. Symp. on Physical Design,
pages 183–190, 2009.
[47] GM Luong and DMH Walker. Test generation for global delay faults. In
Proc. Int. Test Conf., 1996.
168
[48] GM Luong and DMH Walker. Test generation for global delay faults. In
Proc. Int. Test Conf., pages 433–442, 2002.
[49] H. Mahmoodi, S. Mukhopadhyay, and K. Roy. Estimation of delay
variations due to random-dopant fluctuations in nanoscale cmos circuits.
IEEE J. Solid-State Circuits, (9):1787– 1796, 2005.
[50] A.K. Makji, V.D. Agrawal, J. Jacob, and L.M. Patnaik. Line coverage of
path delay faults. IEEE Trans. on Very Large Scale Integration (VLSI)
Systems, 8(1):610–613, 2000.
[51] PC McGeer and RK Brayton. Efficient algorithms for computing the
longest viable path in a combinational network. In Proc. Design Au-
tomation Conf., pages 561–567, 1989.
[52] P.C. McGeer and R.K. Brayton. Efficient algorithms for computing the
longest viable path in a combinational network. In Proc. Design Au-
tomation Conf., pages 561–567, 2006.
[53] H.D. Mogal, H. Qian, S.S. Sapatnekar, and K. Bazargan. Clustering
based pruning for statistical criticality computation under process varia-
tions. In Proc. Int. Conf. on Computer Aided Design, pages 340–343.
IEEE, 2007.
[54] H.D. Mogal, H. Qian, S.S. Sapatnekar, and K. Bazargan. Fast and ac-
curate statistical criticality computation under process variations. IEEE
169
Trans. on Computer-Aided Design of Integrated Circuits and Systems,
28(3):350–363, 2009.
[55] S Natarajan, A. Krishnamachary, E Chiprout, and R Galivanche. Path
coverage based functional test generation for processor marginality vali-
dation. In Proc. Int. Test Conf., pages 1–9, 2010.
[56] P. Nigh and A. Gattiker. Test method evaluation experiments & data.
In Proc. Int. Test Conf., pages 454–463, 2000.
[57] M. Orshansky and K. Keutzer. A general probabilistic framework for
worst case timing analysis. In Proc. Design Automation Conf., pages
556–561, 2002.
[58] A. Ramalingam, G.J. Nam, A.K. Singh, M. Orshansky, S.R. Nassif, and
D.Z. Pan. An accurate sparse matrix based framework for statistical
static timing analysis. In Proc. Int. Conf. on Computer Aided Design,
pages 231–236, 2006.
[59] M. W. Robert and P. K. Lala. Algorithm to detect reconvergent fanout
in logic circuits. IEE Proc. Part E Computers and Digital Techniques,
pages 105–111, 1987.
[60] L. Scheffer. Explicit computation of performance as a function of process
variation. In IEEE/ACM Int. Workshop on on Timing Issues in the
Specification and Synthesis of Digital Systems (TAU), pages 1–8, 2002.
170
[61] L. Scheffer. Why are timing estimates so uncertain? what could we do
about this? In IEEE/ACM Int. Workshop on on Timing Issues in the
Specification and Synthesis of Digital Systems (TAU), 2002.
[62] M. Sharma and J.H. Patel. Bounding circuit delay by testing a very
small subset of paths. In Proc. VLSI Test Sympo., pages 333–341, 2000.
[63] M. Sharma and J.H. Patel. Finding a small set of longest testable paths
that cover every gate. In Proc. Int. Test Conf., pages 974–982, 2002.
[64] M. Sharma and J.H. Patel. What does robust testing a subset of paths,
tell us about the untested paths in the circuit? In Proc. VLSI Test
Sympo., pages 31–36, 2004.
[65] D. Sinha and H. Zhou. A unified framework for statistical timing analysis
with coupling and multiple input switching. In Proc. Int. Conf. on
Computer Aided Design, pages 837–843, 2005.
[66] D. Sinha and H. Zhou. Statistical timing analysis with coupling. Proc.
Int. Conf. on Computer Aided Design, 25(12):2965–2975, 2006.
[67] D. Sinha, H. Zhou, and N.V. Shenoy. Advances in computation of the
maximum of a set of random variables. In Proceedings of the 7th Inter-
national Symposium on Quality Electronic Design, pages 306–311. IEEE
Computer Society, 2006.
[68] D. Sinha, H. Zhou, and N.V. Shenoy. Advances in computation of the
maximum of a set of Gaussian random variables. IEEE Trans. on
171
Computer-Aided Design of Integrated Circuits and Systems, 26(8):1522–
1533, 2007.
[69] S. Tani, M. Teramoto, T. Fukazawa, and K. Matsuhiro. Efficient path
selection for delay testing based on partial path evaluation. In Proc.
VLSI Test Sympo., pages 188–193, 1998.
[70] S. Tsukiyama, M. Tanaka, and M. Fukui. A statistical static timing
analysis considering correlations between delays. In Proceedings of the
2001 Asia and South Pacific Design Automation Conference, page 358.
ACM, 2001.
[71] C. Visweswariah, K. Ravindran, K. Kalafala, S.G. Walker, S. Narayan,
D.K. Beece, J. Piaget, N. Venkateswaran, and J.G. Hemmett. First-
order incremental block-based statistical timing analysis. IEEE Trans.
on Computer-Aided Design of Integrated Circuits and Systems, (10):2170–
2180, 2006.
[72] L.C. Wang, J. J. Liou, and K.T. Cheng. Critical path selection for delay
fault testing based upon a statistical timing model. IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Systems, 23(11):1550–
1565, 2004.
[73] J. Xiong, Y. Shi, V. Zolotov, and C. Visweswariah. Pre-ATPG path
selection for near optimal post-ATPG process space coverage. In Proc.
Int. Conf. on Computer Aided Design, pages 89–96, 2009.
172
[74] J. Xiong, Y. Shi, V. Zolotov, and C. Visweswariah. Statistical multilayer
process space coverage for at-speed test. In Proc. Design Automation
Conf., pages 340–345, 2009.
[75] J. Xiong, C. Visweswariah, and V. Zolotov. Statistical ordering of corre-
lated timing quantities and its application for path ranking. In Proceed-
ings of the 46th Annual Design Automation Conference, pages 122–125.
ACM, 2009.
[76] J. Xiong, V. Zolotov, N. Venkateswaran, and C. Visweswariah. Criti-
cality computation in parameterized statistical timing. In Proc. Design
Automation Conf., pages 63–68, 2006.
[77] J. Xiong, V. Zolotov, and C. Visweswariah. Incremental criticality and
yield gradients. In Proc. Design, Automation and Test in Eurpoe, pages
1130–1135, 2008.
[78] J. Xiong, V. Zolotov, C. Visweswariah, and P.A. Habitz. Optimal margin
computation for at-speed test. In Proc. Design, Automation and Test in
Eurpoe, pages 622–627, 2008.
[79] J. Xiong, V. Zolotov, C. Visweswariah, and N. Venkateswaran. Criti-
cality computation in parameterized statistical timing. In Proc. Design
Automation Conf., pages 63–68, 2006.
[80] Shiy Xu and E. Edi. A new way of detecting reconvergent fanout branch
pairs in logic circuits. In Proc. Asian Test Sympo., pages 354–357, 2004.
173
[81] J.S. Yang and N.A. Touba. Automated Selection of Signals to Observe
for Efficient Silicon Debug. In Proc. of VLSI Test Symposium, pages
79–84.
[82] Y. Zhan, A.J. Strojwas, X. Li, L.T. Pileggi, D. Newmark, and M. Sharma.
Correlation-aware statistical timing analysis with non-gaussian delay dis-
tributions. In Proceedings of the 42nd annual Design Automation Con-
ference, page 82. ACM, 2005.
[83] Y. Zhan, AJ Strojwas, M. Sharma, and D. Newmark. Statistical crit-
ical path analysis considering correlations. In Proceedings of the 2005
IEEE/ACM International conference on Computer-aided design, page
704. IEEE Computer Society, 2005.
[84] L. Zhang, W. Chen, Y. Hu, and C. Chen. Statistical timing analysis with
extended pseudo-canonical timing model. In Proc. Design, Automation
and Test in Eurpoe, pages 952–957, 2005.
[85] L. Zhang, W. Chen, Y. Hu, and C.C. Chen. Statistical static timing
analysis with conditional linear MAX/MIN approximation and extended
canonical timing model. IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, 25(6):1183–1191, 2006.
[86] L. Zhang, W. Chen, Y. Hu, J.A. Gubner, and C.C.P. Chen. Correlation-
preserved non-gaussian statistical timing analysis with quadratic timing
model. In Proceedings of the 42nd annual Design Automation Conference,
page 88. ACM, 2005.
174
[87] L. Zhang, Y. Hu, and C. Chen. Block based statistical timing analysis
with extended canonical timing model. In Proc. Asia and South Pacific
Design Automation Conf., pages 250–253, 2005.
[88] L. Zhang, Y. Hu, and C. Chen. Statistical timing analysis with path
reconvergence and spatial correlations. In Proc. Design, Automation
and Test in Eurpoe, pages 528–532, 2006.
[89] V. Zolotov, J. Xiong, H. Fatemi, and C. Visweswariah. Statistical path
selection for at-speed test. In Proc. Int. Conf. on Computer Aided
Design, pages 624–631, 2008.
175
Vita
Jaeyong Chung was born in Seoul, Republic of Korea in 1981. He
received the Bachelor of Science degree from Yonsei University in 2006, where
his major was Electrical Engineering and his minor was Computer Science.
He joined the University of Texas at Austin in 2006, where he received the
Master of Science degree in Electrical and Computer Engineering in 2008 and
continued to pursue his Ph.D. degree. During the program at UT, he worked
as a summer intern at Strategic CAD Lab (SCL), Intel and IBM T.J. Watson
Research Center in 2008 and 2010, respectively. His research interests include
statistical static timing analysis, robust design, and VLSI testing. The impacts
of his research were recognized with Best Paper Nominations in ICCAD 2009
and ASPDAC 2010.
Permanent address: jwings@gmail.com
This dissertation was typeset with LATEX
† by the author.
†LATEX is a document preparation system developed by Leslie Lamport as a special
version of Donald Knuth’s TEX Program.
176
