Abstract-The large-scale process and environmental variations for today's nanoscale ICs require statistical approaches for timing analysis and optimization. In this paper, we demonstrate why the traditional concept of slack and critical path becomes ineffective under large-scale variations and propose a novel sensitivity framework to assess the "criticality" of every path, arc, and node in a statistical timing graph. We theoretically prove that the path sensitivity is exactly equal to the probability that a path is critical and that the arc (or node) sensitivity is exactly equal to the probability that an arc (or a node) sits on the critical path. An efficient algorithm with incremental analysis capability is developed for fast sensitivity computation that has linear runtime complexity in circuit size. The efficacy of the proposed sensitivity analysis is demonstrated on both standard benchmark circuits and large industrial examples.
I. INTRODUCTION
A S IC technologies are scaled to finer feature sizes, the increasing fluctuations in manufacturing processes introduce various uncertainties in circuit behavior, thereby significantly impacting product yield. Further exacerbating the problem is the increasing impact of environmental fluctuations, such as those due to temperature and power supply variations. Addressing the nanoscale manufacturing and design realities requires a paradigm shift in the current design methodology such that large-scale variations are considered at all levels of design hierarchy.
Toward this goal, various algorithms have been proposed for statistical timing analysis with the consideration of both process and environmental variations [3] - [20] . Most of the proposed solutions fall into one of two broad categories: path-based approaches [3] - [9] and block-based approaches [10] - [20] . The path-based approaches can take into account the correlations from both path sharing and global process parameters; however, a set of critical paths must be preselected based on their nominal delay values. In contrast, the block-based techniques are more general yet are limited by various delay modeling assumptions. For example, many block-based statistical timing analysis algorithms [13] - [17] assume that delay variations can be approximated as normal distributions in order to efficiently handle both spatial correlations and reconvergent fan-outs.
Whereas statistical timing analysis has been intensively studied, how to interpret and utilize its results remains an open question. Most importantly, a new methodology of using timing-analysis results to guide timing optimization and explore the tradeoffs between performance, yield, and cost is required in the statistical domain. In nominal timing analysis, critical path and slack are two important metrics that have been widely applied to timing optimization, but the inclusion of large-scale variations renders these traditional methodologies obsolete.
First, the delay of each path is a random variable, instead of a deterministic value, in statistical timing analysis. As such, every path can be critical (i.e., have the maximal delay) with certain probability. Second, the slacks at all nodes are random variables that are statistically correlated. The parametric timing yield is determined by the probability distributions of all these slacks as well as their correlations. It, in turn, implies that an individual slack at a single node is not a sufficiently good metric that can be utilized to guide timing optimization. For these reasons, a new timing performance criterion must be proposed to accommodate the special properties of statistical timing analysis/optimization.
In this paper, we propose a new concept of statistical timing sensitivity to guide the timing optimization of logic circuits with large-scale parameter variations. We define statistical timing sensitivities for paths, arcs, and nodes. The novelty of this paper is the creation of a link between probability and sensitivity. We prove that the path sensitivity is exactly equal to the probability that a path is critical and that the arc (or node) sensitivity is exactly equal to the probability that an arc (or a node) sits on the critical path.
An important contribution of this paper is to propose a novel algorithm for fast sensitivity computation and demonstrate how one can evaluate the timing sensitivities by a single breadth-first graph traversal. The computational complexity of the proposed sensitivity analysis is linear in circuit size. In addition, an incremental analysis capability is provided to quickly update the statistical timing and sensitivity information after changes to a circuit are made.
Our proposed path and arc sensitivities are theoretically equivalent to the path and edge criticalities defined in [16] and [21] , respectively. Namely, the path (or arc) sensitivity value is identical to the path (or arc) criticality value except for numerical errors. However, the proposed sensitivity framework is ready to handle several extensions, including (but not limited to) the high-order sensitivities that will be discussed in Section IV-E. This paper is organized as follows. In Section II, we review the background of statistical static timing analysis and then discuss the statistical properties of slack and critical path in Section III. We define various statistical timing sensitivities in Section IV and develop the algorithm for sensitivity computation in Section V. The efficacy of the proposed sensitivity analysis is demonstrated by several numerical examples in Section VI. Finally, we conclude in Section VII.
II. BACKGROUND

A. Nominal Static Timing Analysis
Given a gate-level netlist, static timing analysis translates the netlist into a timing graph, i.e., a weighted directed graph G = (V, E) where each node V i ∈ V denotes a primary input, output, or internal net, each edge E i = V m , V n ∈ E denotes a timing arc, and the weight D(V m , V n ) of E i stands for the delay value from the nodes V m to V n . In addition, a source/sink node is conceptually added before/after the primary inputs/outputs so that the timing graph can be analyzed as a single-input singleoutput network. Fig. 1 shows a simple timing graph example.
There are several key concepts in nominal static timing analysis, which are briefly summarized as follows. More details on static timing analysis can be found in [31] .
1) The arrival time (AT) at a node V i is the latest time that the signal becomes stable at V i . It is determined by the longest path from the source node to V i . 2) The required time (RT) at a node V i is the latest time that the signal is allowed to become stable at V i . It is determined by the longest path from V i to the sink node. 3) Slack is the difference between the required time and the arrival time, i.e., RT − AT. Therefore, a positive (or negative) slack means that the timing constraint is satisfied (or failed). 4) Critical path is the longest path between the source and sink nodes. In nominal timing analysis, all nodes along the critical path have the same (i.e., smallest) slack. The purpose of static timing analysis is to compute the arrival time, the required time, and the slack at each node and then identify the critical path. Taking the arrival time as an example, static timing analysis starts from the source node, propagates the arrival times through all timing arcs by a breadth-first graph traversal and eventually reaches the sink node. Two atomic operations, i.e., SUM(•) and MAX(•), as shown in Fig. 2 , are repeatedly applied during such a traversal.
After the static timing analysis is complete, the critical path and the slack provide the necessary information that is required for timing optimization. Roughly speaking, the gates and interconnects along the critical path (where the slacks are small) should be upsized to reduce delay, whereas those along the noncritical paths (where the slacks are large) should be downsized to save area and/or power.
B. Process Variation Modeling
According to the geometrical scale of their occurrence, process variations can be classified into two broad categories: interdie and intradie variations. Interdie variations model the common/average variations across the die, whereas intradie variations model the individual, but spatially correlated, local variations within the same die.
In most practical applications, both interdie and intradie variations are modeled as the random variables that are jointly normal. In such cases, principal component analysis (PCA) can be applied to find a set of independent factors to represent the original correlated random variables [29] .
Given N process parameters η = [η 1 , η 2 , . . . , η N ] T , the process variations ∆η = η − η 0 , where η 0 contains the mean values of η, are modeled as zero-mean random variables. The correlation of ∆η can be represented by a symmetric, positivesemidefinite covariance matrix R [29] . PCA decomposes R as follows:
where
contains the corresponding eigenvectors that are orthonormal, i.e., V T V = I (I is an identity matrix). Based on Σ and V , PCA defines a set of new random variables
These new random variables ∆ε = [∆ε 1 , ∆ε 2 , . . . , ∆ε N ] T are called the principal components or factors. It is easy to verify that {∆ε i ; i = 1, 2, . . . , N} are independent and standard normal (i.e., zero mean and unit variance) [29] .
The essence of PCA can be interpreted as a coordinate rotation of the space defined by the original random variables. In addition, if the magnitude of the eigenvalues {λ i } decreases quickly, it is possible to use a small number of principal components to approximate the original N -dimensional space. More details on PCA can be found in [29] .
C. Statistical Static Timing Analysis
Unlike nominal timing analysis, the gate/interconnect delays in statistical timing analysis are all modeled as random variables to account for large-scale process variations. It means that the weight D(V m , V n ) associated with each timing arc is a random variable instead of a deterministic value. Therefore, the two atomic operations, SUM(•) and MAX(•), must handle statistical distributions.
Many statistical timing analysis algorithms [13] - [17] approximate the gate/interconnect delays and the arrival times as linear models
where x and y denote two gate/interconnect delays or arrival times, C x , C y ∈ R are the constant terms, B x , B y ∈ R N contain the linear coefficients, {∆ε i ; i = 1, 2, . . . , N} is a set of random variables to model process variations, and N is the total number of these random variables. We assume that {∆ε i ; i = 1, 2, . . . , N} are independent standard normal distributions and that they are extracted by the PCA in Section II-B. The random variables x and y in (3) and (4) are the linear combinations of multiple normal distributions and, therefore, are also normal [30] .
Given the linear models in (3) and (4), the SUM(•) operation can be easily handled by the following:
The MAX(•) operation, however, is nonlinear. In other words, the maximum of two normal distributions is not necessarily normal. However, it is possible to approximate MAX(•) by a linear model [13] - [17] MAX where the constant term γ is determined by matching the mean value
and the linear coefficients α and β are determined by either matching the moments [13] , [14] , [17] or calculating the tightness probabilities [16] 
In (7)- (9), E(•) stands for the expected value and P (•) represents the probability.
III. STATISTICS OF SLACK AND CRITICAL PATH
In this section, we highlight the reasons why the traditional concept of slack and critical path becomes ineffective under process variations.
A. Slack
In nominal timing analysis, slack is utilized as a metric to measure how tightly the timing constraint is satisfied. A negative slack means that the timing constraint is not met, whereas a (small) positive slack means that the timing constraint is (marginally) satisfied. In the statistical case, however, it is difficult to make such a straightforward judgment, because all slacks are random variables instead of deterministic values. For instance, Fig. 3 shows two slack distributions computed from statistical timing analysis. The node V 1 presents a larger probability that the slack is positive than the node V 2 . However, the worst case (smallest) slack at V 1 is more negative than that at V 2 . Hence, it is hard to conclude which slack distribution is better using a simple criterion.
More importantly, the slacks throughout a timing graph are statistically correlated in statistical timing analysis and must be concurrently considered to determine the parametric timing yield. In nominal timing analysis, it is well known that the timing constraint is satisfied if and only if all slacks in the timing graph are positive. In the statistical case, this condition can be stated as follows: The probability that the timing constraint is satisfied (i.e., the parametric timing yield) is equal to the probability that all slacks are positive.
Studying (10), we notice that such a probability depends on all slack distributions as well as their correlations. Unlike nominal timing analysis where slacks are deterministic values without correlations, knowing individual slack distributions in statistical timing analysis is not sufficient to determine the parametric timing yield. The probability in (10) cannot be accurately evaluated if the correlations are ignored. The aforementioned analysis implies an important fact that an individual slack distribution at a single node may not be meaningful in statistical timing analysis.
However, there exist some "important" nodes for which the slacks have special meanings. Given a timing graph, we define the node V IN as an important node if all paths in the timing graph pass V IN . Based on this definition, the source and sink nodes are two important nodes, because all paths start from the source node and terminate at the sink node. In some special timing graphs, it is possible to find other important nodes. For example, the node e in Fig. 1 is an important node by this definition. The importance of the node is that, if V IN is an important node, the parametric timing yield in (10) can be uniquely determined by the slack at
The physical meaning of (11) can be intuitively explained by the concept of Monte Carlo simulation. When a timing graph is simulated by Monte Carlo analysis, a delay sample (i.e., a set of deterministic delay values for all timing arcs) is drawn from the random variable space in each Monte Carlo run. The parametric timing yield is equal to Num 1 (the number of samples for which the timing constraint is satisfied) divided by Num (the total number of Monte Carlo runs). Similarly, the probability Slack VIN ≥ 0 is equal to Num 2 (the number of samples for which the slack at V IN is positive) divided by Num. In each Monte Carlo run, the timing constraint is failed if and only if there is a path P whose delay is larger than the given specification. In this case, the slack at V IN must be negative because all paths pass the important node V IN , and therefore, V IN must be on the path P . The aforementioned analysis proves that Num 1 is equal to Num 2 , yielding (11) .
Equations (10) and (11) indicate another difference between nominal and statistical timing analyses. In nominal timing analysis, the slack at any node along the critical path uniquely determines the timing performance. In statistical timing analysis, however, only the slack at an important node uniquely determines the timing performance. Compared with those nodes on the critical path, important nodes belong to a much smaller subset, because they must be shared by all paths in the timing graph.
The aforementioned concept of important node can be extended to a node set. If we remove a set of nodes and cut the entire timing graph into two disconnected subgraphs, where one subgraph contains the source node and the other subgraph contains the sink node, we refer to the set of the removed nodes as a separating node set. It is easy to verify that all paths from the source node to the sink node pass through the separating node set. Therefore, following the same reasoning of the Monte Carlo simulation, it can be proven that the parametric timing yield is uniquely determined by the slacks of all nodes in a separating node set
where Slack W i represents the slack at the ith node of the separating node set.
B. Critical Path
Similar to slack, there are several major differences between nominal and statistical timing analyses on critical path. First, given a timing graph, the maximal delay from the source node to the sink node can be expressed as
where D P i is the delay of the ith path. In nominal timing analysis, D = D P i if and only if the path P i is critical. In statistical timing analysis, however, every path can be critical (i.e., have the maximal delay) with certain probability. Although it is possible to define the most critical path as the path P i that has the largest probability to be critical, the maximal circuit delay in (13) must be determined by all paths instead of the most critical path only. Second, but more importantly, the critical path concept is not so helpful for statistical timing optimization. In the nominal case, the gates and interconnects along the critical (or noncritical) path are repeatedly selected for up (or down) sizing. This strategy becomes ineffective under process variations. One important reason is that many paths may have similar probabilities to be critical and all of them must be considered for timing optimization. Even in the nominal case, many paths in a timing graph can be equally critical, which is the socalled "slack wall" [23] . This multiple-critical-path problem is more pronounced in statistical timing analysis, because more paths can have overlapped delay distributions due to large-scale process variations. In addition to this multiple-critical-path problem, we will demonstrate in Section IV-B that selecting the gates and interconnects along the most critical (or least critical) path for up (or down) sizing is not the best choice under a statistical modeling assumption.
IV. CONCEPT OF STATISTICAL TIMING SENSITIVITY
In this section, we mathematically define various statistical timing sensitivities and theoretically prove the equivalence between sensitivity and probability.
A. Path Sensitivity
In nominal timing analysis, the critical path is of great interest because it uniquely determines the maximal circuit delay. If the delay of the critical path is increased (or decreased) by a small perturbation δ → 0, the maximal circuit delay is correspondingly increased (or decreased) by δ. Therefore, given the maximal circuit delay D in (13) , the relation between D and the individual path delay D P i can be mathematically represented as the path sensitivity
From the sensitivity point of view, a critical path is important because it has nonzero sensitivity and all other noncritical 
paths have zero sensitivity. The maximal circuit delay can be changed if and only if the critical path delay is changed. This is the underlying reason why the critical path is important for timing optimization. It is the sensitivity, instead of the critical path itself, that provides an important criterion to guide timing optimization. A path is more (or less) important if it has a larger (or smaller) path sensitivity.
In statistical timing analysis, all path delays are random variables and, therefore, can be characterized by their moments [28] . The relation between the maximal circuit delay D and the individual path delay D P i can be represented by a multidimensional multioutput function that maps the moments of
In general, it is possible to define the sensitivity between any mth-order moment of D and nth-order moment of D P i , where m, n ∈ {1, 2, . . .}. In this paper, we define the path sensitivity by the first-order moments as (16) , shown at the bottom of the page.
There are two important clarifications that must be made for the path sensitivity in (16) . First, the function in (15) depends on multiple variables: (16), we should keep all other input variables, i.e., all high-order central moments {E{[
In other words, such a perturbation only shifts the probability distribution by a small amount δ, whereas the shape of the distribution (determined by all high-order central moments) is not changed, as shown in Fig. 4 . Second, the perturbation of δ in (16) is defined mathematically. It only changes D P i and does not impact {D P j ; j = i}. This is different from a perturbation that is physically applied to an arc delay or a process parameter. Such a physical perturbation can concurrently change the delays of multiple paths. These two clarifications are also applicable to other statistical timing sensitivities defined in this section.
The path sensitivity in (16) has several important properties that are summarized by the following theorems.
Theorem 1: The path sensitivity in (16) satisfies
where D P i is the delay of the ith path, if the probability P [D P i = MAX(D P j ; j = i)] is equal to 0, then the path sensitivity in (14) is equal to the probability that the path P i is critical, i.e.,
S Path
The detailed proofs of Theorems 1 and 2 can be found in the Appendix. Theorem 2 relies on the assumption P [
This assumption is valid if any two paths in the circuit are not exactly identical. This conclusion can be summarized by the following Theorem 3 that is formally proven in the Appendix.
Theorem 3: Let D P i be the delay of the ith path. The proba-
If D P i and D P j are two continuous random variables and they are not fully correlated, the probability P (D P i = D P j ) is equal to 0. In most practical applications, path delays are impacted by both interdie and intradie variations. Even if two path delays have the same mean and variance, they often depend on different intradie variations due to the location difference and, therefore, are not fully correlated.
B. Arc Sensitivity
In nominal timing optimization, the gates and interconnects along the critical path are important, because the maximal circuit delay is sensitive to these gate/interconnect delays. Following this observation, the importance of a given gate or interconnect can be assessed by the following arc sensitivity:
where D is the maximal circuit delay defined in (13) , D Ai denotes the gate/interconnect delay associated with the ith arc, and D P k represents the delay of the kth path. In (19) , the path sensitivity S Path P k is nonzero (i.e., equal to 1) if and only if the kth path P k is critical. In addition, the derivative ∂D P k /∂D Ai is nonzero (i.e., equal to 1) if and only if the ith arc A i sits on the kth path P k , because the path delay D P k is equal to the sum of all arc delays D Ai 's that belong to P k . These observations yield the conclusion that the arc sensitivity S
Arc
Ai is nonzero if and only if A i is on the critical path. The arc sensitivity explains the reason why the gates and interconnects along the critical path are important for timing optimization. A gate/interconnect is more (or less) important if it has a larger (or smaller) arc sensitivity.
The aforementioned sensitivity concept can be extended to statistical timing analysis. In the statistical case, we define the arc sensitivity using the first-order moments
The arc sensitivity in (20) has the following property. 
The detailed proof of Theorem 4 can be found in the Appendix. Remember that S Path P k is equal to the probability that the kth path P k is critical (see Theorem 2). Therefore, Theorem 4 implies the important fact that the arc sensitivity defined in (20) is exactly equal to the probability that an arc sits on the critical path.
The arc sensitivity in (20) provides an effective criterion to select the most important gates and interconnects for up-/downsizing. Roughly speaking, for statistical timing optimization, the gates and interconnects with large arc sensitivities are critical to the maximal circuit delay and, in general, should be upsized to reduce delay, whereas the others with small arc sensitivities should be downsized to save area and/or power. Next, using the concept of arc sensitivity, we explain the reason why repeatedly selecting the gates and interconnects along the most critical (or least critical) path for up (or down) sizing can be ineffective in the statistical case.
Consider a simple timing graph including three paths, as shown in Fig. 5 . Assume that the path sensitivity S circuit delay and should be selected for upsizing, although it does not sit on the most critical path. In this example, using the traditional concept of critical path selects the wrong arc, because it does not consider the nonzero path sensitivities of other less critical paths. These nonzero sensitivities make it possible that changing an arc delay can change the maximal circuit delay through multiple paths. In Fig. 5 , the arc A 1 can change the maximal circuit delay through two paths P 1 and P 2 , whereas the arc A 2 can change the maximal circuit delay only through one path P 3 . Therefore, the arc A 1 eventually becomes more critical than A 2 , although neither P 1 nor P 2 is the most critical path.
C. Node Sensitivity
The nominal and statistical node sensitivities can be, respectively, defined as
where D is the maximal circuit delay given in (13), AT V i denotes the arrival time at the ith node, and D P k represents the delay of the kth path. The node sensitivities in (22) and (23) are similar to the arc sensitivities defined in (19) and (20) . Following the same reasoning of Theorem 4, we can prove that the statistical node sensitivity in (23) is exactly equal to the probability that a node sits on the critical path. This conclusion can be formally stated as the following Theorem 5. The detailed proof of Theorem 5 can be found in the Appendix.
Theorem 5: Let D P i be the delay of the ith path. If the probability P [D P i = MAX(D P j ; j = i)] = 0 for any {i = 1, 2, . . .}, then the node sensitivity in (23) is equal to the following:
D. Yield Sensitivity
We define the yield sensitivities for paths, arcs, and nodes
SY
The yield sensitivities in (25)- (27) (25)- (27) .
E. High-Order Sensitivity
The aforementioned sensitivity concept can be extended to high order. One important application of high-order sensitivity is the quadratic MAX(•) approximation proposed in [24] .
For statistical timing analysis, the nonlinear MAX(•) operator can be approximated as a linear function in (6) , where the linear coefficients α and β are determined by the tightness probabilities in (8) and (9) . Given the equivalence between probability and sensitivity proven by Theorem 2, the tightness probabilities in (8) and (9) are equal to the first-order sensitivities
Although the MAX(•) operator is not analytical (i.e., does not have continuous derivatives), it can be statistically approximated as the form of (6), (28) and (29) that is similar to the traditional Taylor expansion. Therefore, such a linear approximation is referred to as the first-order statistical Taylor expansion in [24] . The aforementioned statistical Taylor expansion can be extended to the second order to achieve higher approximation accuracy. Consider the simple example MAX(0, z) where z is a zero-mean random variable. The second-order statistical expansion can be expressed as follows:
The details of the quadratic MAX(•) approximation is beyond the scope of this paper and can be found in [24] .
F. Summary
The proposed sensitivity framework has three unique properties.
1) Distribution-independent. Our discussions do not rely on any specific probability distribution to model the gate/interconnect delays and the arrival times.
2) Correlation-aware. Our proposed sensitivity framework is not restricted to any assumption of statistical independence, and it can handle correlated arrival times. 3) Computation-efficient. The proposed statistical timing sensitivities can be efficiently computed by a single breadth-first graph traversal, as will be demonstrated in Section V.
V. COMPUTATION OF STATISTICAL TIMING SENSITIVITY
In this section, we first develop the sensitivity equations for two atomic operations: SUM(•) and MAX(•). Then, we show how to propagate the sensitivities throughout a timing graph by using a single breadth-first graph traversal. Finally, we discuss the incremental analysis algorithm to quickly update the statistical timing and sensitivity information after changes to a circuit are made.
We assume that all gate/interconnect delays and arrival times are approximated as normal distributions. Such an assumption facilitates an efficient sensitivity computation, even though our proposed sensitivity framework is distribution-independent.
A. Atomic Operations
Because multivariable operations can be broken down into multiple two-variable cases, the remainder of this section focuses on the sensitivity computation for the SUM(•) and MAX(•) of two random variables, i.e., z = x + y and z = MAX(x, y) where x and y are approximated as the linear models in (3) and (4) and z is similarly approximated as a linear function
Given the operation z = x + y or z = MAX(x, y), we define the sensitivity matrix Q z←x as follows:
The sensitivity matrix Q z←y can be similarly defined. The sensitivity matrix in (32) provides the quantitative information on how much the coefficients C z or {B zi ; i = 1, 2, . . . , N} will be changed if there is a small perturbation on C x or {B xi ; i = 1, 2, . . . , N}. Next, we derive the mathematical formulas of the sensitivity matrices for both SUM(•) and MAX(•) operations.
For the SUM(•) operation z = x + y, it is easy to verify that
Therefore, the sensitivity matrix Q z←x is an identity matrix.
For the MAX(•) operation z = MAX(x, y), it can be proven that
where ϕ(•) and Φ(•) are the probability density function (PDF) and the cumulative distribution function (CDF) of the standard normal distribution N (0, 1), respectively, and the coefficients α and β are defined by the following:
Equations (35)- (40) can be derived by directly following the mathematical equations in [26] . The sensitivity matrix Q z←y can be similarly calculated because both SUM(•) and MAX(•) are symmetric. Finally, it is worth mentioning that the sensitivity matrix defined by (35)- (40) is an approximation for the MAX(•) operation, because a simple linear function is used in (31) to approximate the nonlinear operation z = MAX(x, y). It can further be shown that, when a multivariable MAX(•) is broken down into multiple two-variable operations, the approximation error depends on the ordering of these two-variable operations [25] . More details on this ordering issue are beyond the scope of this paper and will be considered in our future research.
B. Sensitivity Propagation
Once the atomic operations are available, they can be applied to propagate the sensitivity matrices throughout a timing graph. Next, we use the simple timing graph in Fig. 1 as an example to illustrate the key idea of sensitivity propagation. 
and
are two identity matrices due to the SUM(•) operation. (g,sink) ] to the node g through the arc g, sink . Determine Q D←AT(g) and Q D←D(g,sink) . 4) Propagate Q D←AT(f ) and Q D←AT(g) to the node e, yielding
3) Similarly, propagate Q D←[AT(g)+D
Note that the outdegree of the node e is equal to two. Therefore, the sensitivity matrices Q D←AT(f ) and Q D←AT(g) should be added together at the node e to compute Q D←AT(e) , based on the chain rule of derivatives. Its physical meaning is that a small perturbation on AT(e) can change the maximal circuit delay D through two different paths {e → f → sink} and {e → g → sink}. 5) Continue propagating the sensitivity matrices until the source node is reached. After the sensitivity propagation is complete, the sensitivity matrix Q D←D(i,j) (or Q D←AT(V i) ) between the maximal circuit delay D and any arc delay D(i, j) (or node arrival time AT(V i )) is determined. The statistical timing sensitivities can be easily computed by a quick postprocessing. For example, the arc sensitivity defined in (20) and the node sensitivity defined in (23) are the (1, 1)th element of Q D←D(i,j) and Q D←AT(V i) , respectively
Calculating the yield sensitivities in (26) and (27) is more comprehensive because the parametric timing yield is determined by not only the mean value of the maximal circuit delay D but also its variance.
After the statistical timing analysis is complete, D is approximated as the following linear model that is similar to (3) and (4) and (31):
Because D is the linear combination of multiple normal distributions, it is also normal and its mean and standard deviations are, respectively, determined by the following [30] :
Therefore, the CDF of D is equal to the following: where Φ(•) stands for the CDF of the standard normal distribution N (0, 1).
Assume that the timing constraint is specified by the following:
and therefore, the parametric timing yield is equal to the following:
We further assume that x denotes the arc delay or the arrival time of interest and that it is approximated as the linear model in (3). Hence, the yield sensitivity can be calculated as follows:
Based on the chain rule of derivatives, we have
where ϕ(•) represents the PDF of the standard normal distribution N (0, 1) and the derivatives {∂B Di /∂C x ; i = 1, 2, . . . , N} and ∂C D /∂C x are the elements of the sensitivity matrix Q D←x that is extracted from the sensitivity propagation.
C. Incremental Analysis
The complete statistical timing and sensitivity analysis consists of one forward arrival time propagation from the source node to the sink node and one backward sensitivity propagation from the sink node to the source node. It would be quite expensive, if not impossible, to run such a complete analysis for multiple times within an optimization loop. Therefore, an incremental analysis technique is required to quickly update the statistical timing and sensitivity information after local changes to a circuit are made.
Once a logic cell is modified for timing optimization, the arrival time and the timing sensitivity of a number of nodes are changed. Taking Fig. 6 as an example, if we size logic cell A, the input capacitance, delay, and output slew of cell A are all changed. Due to the input-capacitance change of cell A, the delay and output slew of its fan-in cell (i.e., cell B in Fig. 6 ) are also changed. Therefore, the arrival time of the fan-out cone of cell B (i.e., cone I in Fig. 6 ) must be updated, and the timing sensitivity of the fan-in cone of all affected nodes (i.e., cone II in Fig. 6 ) must also be updated.
VI. NUMERICAL EXAMPLES
We demonstrate the efficacy of the proposed statistical timing sensitivity analysis using several circuit examples. All circuits are implemented in either 0.13-µm or 90-nm commercial CMOS technologies. Both interdie and intradie variations on V TH , T OX , W , and L are considered. The probability distribution and the correlation information of these variations are specified in the process design kit from the foundry. All numerical simulations are run on a 2.6-GHz computer with 1-GB memory. Fig. 7 is a simple digital circuit that consists of nine gates and two D-flip-flops. Such a simple example allows us to intuitively illustrate several key concepts of the proposed sensitivity analysis. Table I shows the arc sensitivity values computed by the proposed sensitivity analysis and a Monte Carlo simulation with 10 4 samples. The Monte Carlo simulation repeatedly draws random samples and counts the probability that an arc sits on the critical path. Note that the largest arc sensitivity error in Table I is only 1.6%. Such a high accuracy demonstrates that the normal distribution assumption applied to our sensitivity analysis does not incur significant error in this example.
A. Simple Example
Shown in
As shown in Table I , I 3 , N 2 is the arc that has the largest sensitivity value. This is because I 3 , N 2 sits on the three longest paths: perturbation on the delay of I 3 , N 2 can significantly change the maximal circuit delay through these three paths. Note that, although such a multiple-path effect cannot be easily identified by a nominal timing analysis, it is successfully captured by the proposed statistical sensitivity analysis. In addition, it is worth mentioning that the arc I 2 , N 2 in Fig. 7 has zero sensitivity, because the NAND gate is asymmetric and the arc delay D(I 3 , N 2 ) is larger than D(I 2 , N 2 ). Even with process variations, D(I 3 , N 2 ) still dominates, because D(I 2 , N 2 ) and D(I 3 , N 2 ) are from the same gate and they are strongly correlated.
B. ISCAS'85 Benchmark Circuits 1) Accuracy and Speed:
We conducted statistical timing and sensitivity analysis for the ISCAS'85 benchmark circuits. Table II shows the minimal, average, and maximal sensitivity errors of all timing arcs. These errors are compared against a Monte Carlo simulation with 10 4 samples. Note that the maximal sensitivity error in Table II is less than 3.5% for all circuits. In addition, the proposed sensitivity analysis achieves about 4000 times speedup over the Monte Carlo simulation, as shown in Table III . To fully understand the computational complexity, Table III also lists the number of independent random variables (after PCA) to model both interdie and intradie variations. It is important to note that the proposed statistical sensitivity analysis is slightly cheaper than the statistical timing analysis in this example. The reason is that the proposed sensitivity analysis only involves simple matrix operations, whereas the statistical timing analysis spends substantial computational time on delay calculation (e.g., computing the effective capacitance C eff for interconnects via a number of numerical iterations [27] ).
2) Slack and Sensitivity Wall: One important problem in nominal timing optimization is the steep slack wall discussed in [23] . After the nominal timing optimization is complete, many paths have similar delays and are equally critical. We nominally optimized the circuit C7552 and plotted the optimized slacks in Fig. 8(a) . (Note that Fig. 8(a) is plotted for "-Slack".) The steep slack wall in Fig. 8(a) implies that a great number of nodes have close-to-zero slacks and, therefore, are equally important in nominal timing optimization.
Next, we ran a statistical sensitivity analysis for the same circuit and plotted the arc sensitivities in Fig. 8(b) . Note that the sensitivity wall in Fig. 8(b) is flat. In other words, after process variations are considered, only a small number of arcs dominate the overall timing performance. Although these arcs cannot be identified by nominal timing analysis, they are captured by the proposed statistical sensitivity analysis.
3) Statistical Timing Optimization: We further incorporated the proposed sensitivity analysis into an optimization engine for statistical gate sizing. Because timing optimization is not the major focus of this paper, we only implemented a simple selectand-conquer approach. Namely, we select a small number of the most and least critical cells based on yield sensitivities. The most critical cells are upsized to reduce delay, and the least critical cells are downsized to reduce area and/or power.
For testing and comparison, we applied both corner-based optimization and statistical optimization to all ISCAS'85 benchmark circuits. In both optimizations, the objective is to minimize the total gate area with a given timing constraint. Table IV shows the total gate area after the optimizations are complete. In this example, the statistical timing optimization achieves up to 27.8% area reduction compared with the cornerbased method.
C. Industrial Design Examples 1) Statistical Sensitivity Analysis:
As a final example, we tested the proposed sensitivity analysis on three large industrial examples. Table V shows the circuit size (i.e., the number of cells, the number of pins, and the number of independent random variables to model both interdie and intradie variations) and the computational cost for these examples. The Monte Carlo simulation is too expensive for these large-size examples and, therefore, is not computationally feasible. As shown in Table V , the computational cost of the proposed sensitivity analysis linearly scales as the circuit size increases (up to 1.3M pins).
2) Statistical Timing Optimization:
We further ran a statistical timing optimization for design A that contains 16K cells. The proposed yield sensitivity is utilized as a criterion to select the most critical cells for upsizing and the least critical cells for downsizing. The statistical timing optimization is formulated to minimize the total gate area with a given timing constraint. For the initial design, the normalized gate area is 803 758. The gate area is reduced to 712 221 (11.39% difference) by the statistical timing optimization, whereas the parametric timing yield is guaranteed to be 99%. Fig. 9 shows the histogram of the mean value of all node slacks before and after the statistical timing optimization is applied. It is apparent that our timing optimization pushes the slack values toward zero to reduce area. However, these slack changes all happen at noncritical nodes, and therefore, no parametric timing yield is surrendered. It is also interesting to note that a number of slack values in Fig. 9 (c) are increased after optimization. We believe that it is caused by the load dependence of the delay. Namely, when a cell is downsized to save area, its input capacitance is reduced, which can speed up the driving cell and reduce the total delay.
VII. CONCLUSION
In this paper, we define the statistical timing sensitivities for paths, arcs, and nodes. Our theoretical analysis proves a direct link between probability and sensitivity. An efficient algorithm is developed for fast sensitivity computation. The proposed sensitivity analysis has a linear complexity in circuit size and offers an incremental analysis capability. Our numerical examples demonstrate that the proposed sensitivity analysis yields accurate results and achieves 4000 times speedup over the Monte Carlo simulation with 10 4 samples. The proposed sensitivity framework is further incorporated into an optimization engine for statistical gate sizing. Our optimization examples demonstrate that the proposed timing sensitivity can be used to guide statistical gate sizing. Even if a simple sizing algorithm is utilized, the proposed sensitivitybased optimization yields promising results.
APPENDIX
Proof of Theorem 1
Given a small perturbation δ on the mean values of all paths, the mean value of the maximal circuit delay is equal to the following:
According to the path sensitivity definition in (16) , the mean value of the maximal circuit delay can also be represented as follows:
where O(δ 2 ) is a high-order (order ≥ 2) polynomial of δ. Comparing (51) and (52) yields
Equation (53) is valid for any sufficiently small δ. Therefore, the first-order coefficient of δ at the left-hand side must equal the first-order coefficient of δ at the right-hand side, yielding
Equation (54) 
Proof of Theorem 2
Let A P i = MAX(D P j ; j = i) and we have
The operation MAX(D P i , A P i ) can be rewritten as follows:
Substituting (57) into (55) yields
The second term in (58) is independent of E(D P i ), and therefore, its derivative to E(D P i ) equals zero
Substituting (59) into (58) yields
Given a small perturbation δ → 0 on the mean value of D P i , (60) yields (61), shown at the bottom of the page. Assume that pdf(D P i , A P i ) is the joint PDF for D P i and A P i , yielding
Therefore, given the assumption that the probability P (D P i = A P i ) is zero, the following integration is equal to zero:
Substituting (63) and (64) into (61) yields
In (65), P (D P i ≥ A P i ) = P (D P i > A P i ) because P (D P i = A P i ) = 0. Substituting (65) into (56) proves the result in (18) . 
Proof of Theorem 3
Based on probability theorem [30] , we have
Equation (66) proves Theorem 3.
Proof of Theorem 4
Assume that pdf(D P 1 , D P 2 , . . .) is the joint PDF of all path delays, yielding (67), shown at the top of the page. Theoretically, the MAX(•) function is not differentiable at the locations where D P i = MAX(D P j ; j = i). However, as shown in (64), the integration in (67) is equal to zero at these singular points, given the assumption that P [D P i = MAX(D P j ; j = i)] = 0. Therefore, these singular points have no impact on the final value of S Arc Ai and can be completely ignored
In (68), the derivative ∂D P k /∂E(D Ai ) is nonzero (equal to 1) if and only if the ith arc A i sits on the kth path P k . Therefore, we have (69), shown at the top of the page. Substituting (13) and (16) into (69) yields the result in (21).
Proof of Theorem 5
Theorems 4 and 5 are similar. Because we already gave the detailed proof for Theorem 4, we only show the major steps to prove Theorem 5 in this section.
Assume that pdf(D 
Substituting (13) and (16) into (70) yields the result in (24) .
