Starting from a model of the within-die systematic variations using principal components analysis, a model is proposed for estimation of the parametric yield, and is then applied to estimation of the timing yield. Key features of these models are that they are easy to compute, they include a powerful model of withindie correlation, and they are "full-chip" models in the sense that they can be applied with ease to circuits with millions of components. As such, these models provide a way to do statistical timing analysis without the need for detailed statistical analysis of every path in the design.
INTRODUCTION
The yield of an integrated circuit (IC) is a complex function of a number of factors related to both design and manufacturing. Beyond issues of design centering [1, 2] , which focuses mainly on tuning the manufacturing process, yield is also affected by circuit design. As part of circuit timing verification, one has to leave enough margin so that circuit delay variations do not affect yield too adversely. We will focus on this part of the overall yield problem, referred to as timing yield or circuit-limited yield [3, 4] . Traditionally, this has been taken care of by using the right worstcase file [5] as part of timing or performance verification (typically, during static timing analysis). The worst-case files specify the values of transistor parameters for various process corners, including the nominal and various extremes of device behavior. A circuit is deemed to have passed the timing test if it meets the performance constraints for all worst-case files belonging to that process. However, this approach is becoming less feasible today, especially for high-performance chips. For one thing, it can be too conservative and does not provide the user with any quantitative feedback on the robustness of the design [5] ; it is a pass/fail approach. In addition, this traditional approach cannot handle within-die statistical variations [4] (mismatch between devices on Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. the same chip) which have become important in deep sub-micron processes [3] .
Statistical techniques offer a better alternative approach; statistical transistor modeling techniques [6, 5] have been used for quite some time. Recently, due to the increased importance of within-die variations, there has been an increased interest in tackling the timing yield problem by employing statistical techniques as part of the circuit timing analysis step [4, 3, 7, 8, 9] . The aim is to extend traditional static timing analysis so that it takes into account statistical delay variations.
Within-die variations are of two types: systematic and statistical. The systematic type are due to spatial location of a certain feature on the die and due to the context of that feature in terms of the layout patterns around and above it. Techniques have been proposed [10] for taking into account this type of variations. However, statistical within-die variations have not been adequately addressed. A key contribution of this paper is to take within-die statistical correlations into account.
To be sure, some prior work has been done on this. In a number of cases [4, 9, 11, 12] , it has been assumed that within-die variations are totally uncorrelated (so that path yields are multiplied to give the chip yield), an assumption which is not true in practice. In order to avoid making this assumption, one needs to express the correlations between within-die parameter variations with a model that can be easily built from process data. This point is key, and is hard to do -there are no published models, for instance, for how exactly the variations are correlated across the die as a function, say, of the distance between components. A model of correlations in terms of distance is mentioned and used in [13] , but no details or data are given; it is not clear what shape the model should take nor how one would build it from process data. In [8] , even though statistical within-die variations are not taken into account, a suggestion is made at the end as to how one may include them and take care of correlation by enforcing correlation between features that are in the same region of the layout. This theme was further developed recently where use was made of principal components analysis [14] or a quad-tree partitioning [15] to express a region-wise spatial correlation among within-die variations. Here too, it is not clear how one would identify these regions and how the model would be built from process data. Finally, since these methods depend on placement information, these types of timing analysis become final sign-off tools and are unusable during circuit design.
We propose an approach to take within-die statistical correlations into account with a model that can be easily be built from process data, and which can be applied pre-placement.
PROPOSED APPROACH
In our approach, we capture the within-die correlations using principal components analysis (PCA). This is not, by itself, necessarily an improvement over prior art because, as with previous methods, the coefficients of the PCA would have to be evaluated somehow from process data, and would depend on position in the layout. However, with this model in hand, we then develop an approach to estimate a lower bound on the timing yield which which does not require knowledge of the individual PCA coefficients, but requires only the order of the PCA (the number of terms). While estimating detailed correlation functions is hard to do from process data, estimating only the order of the PCA would seem to be much easier, and we will suggest ways in which this may be done. Since layout information is not required, this ap-
460

28.4
proach becomes applicable to the pre-layout phase, during circuit design and optimization.
Previously proposed techniques for statistical static timing analysis change the static timing flow so that one is propagating distributions of delay, instead of simply delay. Even though there may be ways of doing this efficiently, this does represent a significant change of methodology! For one thing, statistical cell models [16] would need to be built if one is to use a cell-level or block-level flow. In contrast, our approach does not propagate distributions. Instead, the result of our approach is a selection of a "device file" setting with which to run traditional static timing analysis, which is somewhere within the extremes of device behavior. For example, while the "nominal" device file may call for a setting of ∆L = 0 (for channel length variations) and the "worst-case" file may call for a setting of ∆L = +3σ L , our approach aims to predict the value "k" such that if the setting of ∆L = kσ L was used for all devices, and if the circuit timing is verified using traditional static timing analysis, then the circuit would give the desired timing yield. As a result, our approach preserves existing static timing methodology and only assumes the existence of statistical transistor models, which have been standard for some time.
We will do this by working with a "generic critical path" concept, in the style of [13] , and examining the statistical properties of large ensembles of such paths. Due to the typically huge number of critical paths on chip, the law of large numbers [17] will come into play, and we will show that the actual number of paths "drops out" of the yield equation. We will be left with a yield lower bound expression that depends only on the various components of the variances (these will be explained below) and on the "device file setting". For a minimum desired yield, we will work backwards to find the required settings of device parameters.
PARAMETER MODEL
For a given circuit element or layout feature i, let its coordinates on the die be (x i , y i ) and let X(i), be a zero-mean Gaussian random variable (RV) that denotes the variation of a certain parameter of this element from its nominal (mean) value. Thus, for example, X(i) may represent channel length variations of transistor i. Correlation between values of X(i) at different locations on the die may be expressed by means of an autocorrelation function, but this is not a practical approach. Instead, it is standard practice [18] to express the correlation by first breaking up the variations into die-to-die and within-die components, as follows:
The die-to-die component X dd is an independent zero-mean Gaussian RV that takes the same value for all instances of this element on a given die, irrespective of location. The within-die component X wd (i) is a zero-mean Gaussian which can take different values for different instances of that element on the same die. This leads to the following relationship between the variances:
Then, the within-die component is further broken down into two components, a systematic component and a "random" component:
where, for each i, the random component X wdr (i) is an independent zero-mean Gaussian. Notice that the use of the term systematic here is somewhat different than the mention that was made of it in the introduction. In the introduction, we distinguished between systematic and statistical variations. In this case, the systematic component of the variations is statistical. Unfortunately, this same term is used in the literature to denote two different things. Throughout the rest of the paper, the term "systematic" will denote statistical variations such as X wds (x i , y i ). The systematic component X wds (x i , y i ) contains an explicit dependence on location because it is usually taken to represent the extent of correlation across the die, and correlation is usually dependent on relative location. A similar relationship follows for the variances:
One way to express the systematic component of the within-die variations is to use a principal components analysis (PCA) [19] and write:
where Z j are independent standard normal RVs (Gaussians with zero mean and unity variance) and where the coefficients a ij are such that:
The RVs Z j correspond to underlying independent unobservable factors. The value of p and the coefficients a ij represent the extent of correlation across the die. For example, if p = 1, then the within-die spatial correlation coefficient is 1, there is perfect correlation; a single underlying RV Z 1 determines the value of the systematic component of X wd all over the die. A p > 1 allows for less than perfect correlation. We will adopt the PCA expansion (5) as our "correlation model" for the within-die component. At first glance, this model appears hard to use because it seems to depend on knowledge of the values of all the a ij parameters. These coefficients depend on layout and, in any case, it is not clear how one would compute them from process data. A brute-force PCA expansion of the millions of instances of a parameter on a chip would be impractical. However, it will be shown in the following sections that one can estimate a lower bound on the yield without having to know the values of the a ij parameters. Instead, it will be sufficient to know: 1) the "order" (p) of the PCA expansion, and 2) the ranges (max and min) of the variance terms given above. In the paper, the crucial step in the analysis is in the transition from (27) to (28), where an expression for yield that depends on all the a ij terms is transformed (using Cauchy's inequality) to one that depends only on the sum of all the a 2 ij terms, which is easily available as the variance (6) .
An important question, however, is how p is to be estimated. We can name three ways in which this may be done. First, based on knowledge of the process, it may be possible to simply identify a number of underlying independent factors that are responsible for the systematic variations, such as specific equipment or process steps. Second, one may associate each Z j with a certain spatial location on the die, such as was done recently in [14] . Thus, if the chip area is partitioned into, say, four quadrants, and if one has some sense about distances over which the autocorrelation functions die down, one may be able to make an estimation of p. Third, and this may be the most practical approach, we can measure yield for a certain parameter, from process data, and then use the formulas to be derived below for parametric yield to work backwards to compute a value of p.
PARAMETRIC YIELD MODEL
With the random parameter model given above, we now define the parametric yield for parameter X as:
where n is the number of instances of this parameter on chip. Here, X(i) is a generic parameter that may represent transistor channel length variations, threshold voltage variations, etc. In fact, X(i) is any statistical quantity on chip that may be characterized by the parameter model introduced in section 3. When X(i) is a simple parameter, such as channel length, then parametric yield is the probability that all device lengths on the die are less than some threshold x. We will later on below show how path delay can itself be viewed as a parameter with its own triplet of variances (σ 2 dd , σ 2 wds , σ 2 wdr ) that we will relate to the underlying transistor parameter variances. This will allows us to express timing yield based on a parametric yield model. Thus, the material in this section, although focused on parametric yield, will actually be directly useful for computing timing yield. Note also that, although we are expressing yield as a function of an upperbound constraint on the value of the parameter, our work can be extended to cover lower bound and/or interval constraints.
Since X dd is an independent zero-mean Gaussian with variance σ 2 dd , then Z 0 = X dd /σ dd is an independent standard normal RV (mean 0, variance 1), and the expression for Y (x) can be expanded as:
We now recall a result from basic probability theory that will be used repeatedly in the paper. Let A be an arbitrary event, and X be an RV with a probability density function (pdf) f (x). Then (see [17] , pg. 85) we have:
This result is simply an extension to the continuous case of the simple fact that P{A} = P{A | B} · P{B} + P{A | B} · P{B}, where B is another event. Applying (9) to (8) , and denoting by φ(·) the pdf of the standard normal distribution, gives:
) and denote its cumulative distribution function (cdf) by Y wd (a) = P {X wd ≤ a}, then:
which means that:
where E[·] is the mean or expected value operator. The bulk of the effort will now be directed at computing the cdf Y wd (a). We first consider the special case p = 1 separately, before covering the general case.
Special Case Ô = 1
In this case, X wds (x i , y i ) = σ wds (x i , y i )Z 1 , where Z 1 is an independent standard normal. Therefore:
where we have again made use of (9) and of the fact that X wdr (i) are independent. Since X wdr (i) is a zero-mean Gaussian with variance σ 2 wdr (i), then:
where Φ(·) is the cdf of the standard normal. Let σ wds0 = min ∀i (σ wds (x i , y i )) and σ wds1 = max
is monotonically increasing, then (14) leads to the lower bound:
Let σ wdr0 = min (17) is broken into two integrals, one of which is minimized at σ wdr1 and the other at σ wdr0 , so that:
and if a ≤ 0 then a similar analysis leads to:
If these lower bounds on Y wd (a) are used in (12) then the result would be a lower bound Y 0 (x) on the yield: Y (x) ≥ Y 0 (x). These bounds are expected to be tight if the differences (σ wds1 − σ wds0 ) and (σ wdr1 −σ wdr0 ) are small. With a simple change of variables, as will be illustrated below, Y 0 (x) can be computed by numerical integration. If a yield of, say, better than 90% is desired, then one can set Y 0 (x) = 0.9 and work backwards to get the value of the threshold x.
Illustration
For illustration purposes, it is instructive to consider the simplified special case when σ dd = σ wds (x i , y i ) = σ wdr (i) = σ, ∀i. In this case, it becomes possible to get exact solutions to (12) , instead of lower bounds, as follows. Starting from (15), we get:
and, combining this with (12) (to see why it is valid to combine the two in this way, see pp. 164-165 of [17] ) leads to:
In order to compute this, we use the definition of the expected value operator (as an integral) and with a change of variables of u = Φ(z 0 ) and v = Φ(z 1 ), we arrive at:
Plots of this parametric yield for different values of n are shown in Fig. 1 . Notice that yield decreases for larger n, as expected.
The General Case
so that:
where we have made use of (9) a number of times (p) and of the fact that X wdr (i) are independent, and where:
We know from basic probability theory that, if X is an RV and a 1 ≤ a 2 are two real numbers, then P{X ≥ a 1 } ≥ P{X ≥ a 2 }.
In the problem at hand, we have:
where the 2nd inequality follows from Cauchy's inequality [17] . Therefore:
which, using
wds (x i , y i ), leads to: The transition from (27) to (28) is a key step and is fundamental to our contribution. A yield expression based on (27) would be very hard to use in practice because it requires knowledge of the individual a ij coefficients. As previously mentioned, it is not clear how one would obtain these coefficients, which anyway are functions of layout, from the process data. However, estimation of the lower bound based on (28) would be quite feasible because it depends only on knowledge of the variance. It is precisely this step in the analysis that allows us to make yield estimation based on a generic critical path and without requiring layout information.
Plugging (29) into (24) gives:
where the 2nd inequality is true because Õ È Z 2 j ≥ 0, and where σ wds1 , as before, is the largest σ wds (x i , y i ). Let Qp ≥ 0 be an independent positive RV such that Q 2 p = È p j=1 Z 2 j , then Q 2 p has the chi-square (χ 2 ) distribution with p degrees of freedom [17] . Therefore, we can replace the above right-hand-side by:
where f χ 2 p (·) is the pdf of the χ 2 distribution with p degrees of freedom. As was done in the p = 1 case, a case-analysis based on the sign of a gives our final result. If a ≥ 0 then:
where, as before, σ wdr1 and σ wdr0 are the largest and smallest σ wdr (i), respectively. If a ≤ 0, then similarly:
Illustration
Consider again the special case where σ dd = σ wds (x i , y i ) = σ wdr (i) = σ, ∀i. Then (31) leads to:
and, combining this with (12), leads to:
Plots of this parametric yield are given in Fig. 1 , for various values of p and n. Notice that a larger p has the same effect as a larger n; more things can go wrong and the yield is lower.
Bounded Variations
Notice that, in the above expressions for yield, the yield decreases for larger n. One would somewhat expect this, but it is surprising to note that the yield approaches zero as n goes to infinity, for any combination of values of the three variances. This may also be seen in the above plots in Fig. 1 . Even if the threshold is set at 10σ, there is still some parametric yield loss. This is somewhat non-physical, and arises due to the fact that we have assumed that the distribution of X wdr (i) is normal; recall that the normal distribution extends to ±∞ in both directions. In reality, one would expect process variations to be bounded by some upper and lower bounds. If a device somewhere deviates by large amounts, like 9σ or 10σ, then chances are there is a serious problem with that die, and that it would be lost due to other reasons, other than timing yield that is. Therefore, it is a good idea to limit the spread the cdf of X wdr (i) to some multiple of σ in order to avoid these non-physical effects at large n. In this section, therefore, we will use a truncated normal distribution for X wdr (i). For clarity of presentation, we will restrict the analysis to the illustrative special case introduced above wherein σ dd = σ wds (x i , y i ) = σ wdr (i), ∀i. The analysis can be extended to the general case. Suppose, therefore, that X wdr (i) is bounded by ±kσ, and let Φt(x) represent the cdf of the truncated standard normal, which is 0 for x ≤ −k and 1 for x > k.
We can plug Φt(·) instead of Φ(·) (for X wdr (i)) into the above equations and plot the resulting yield integrals, as shown in Fig. 2 . In this case the yield loss at higher n values is limited so that the 1e6 and 1e8 plots in each group are indistinguishable. This is to be expected, because the "tail" of the distribution has been cut off, and it is primarily the tail that causes the yield loss at very large n.
When working with a truncated normal, it is noteworthy that we can derive lower bounds on the yield that are independent of n. The derivations are not shown, for brevity, but lead to the following results. For the special case when p = 1, we have:
A plot of this yield lower bound is shown in Fig. 2 , in the p = 1 group. The lower bound is very tight, and is indistinguishable on the plot from the 1e6 and 1e8 curves in that group. In the general case of p ≥ 1, we can show:
where U (x) is 1 for x ≥ 0 and 0 for x < 0. Some plots of this yield lower bound are shown in Fig. 2 . As before, the lower bound is very good, and is indistinguishable on the plot from the 1e6 and 1e8 curves in each group.
TIMING YIELD MODEL
In deep sub-micron CMOS, process-induced delay variations are due mainly to variations in the MOSFET threshold voltage (Vt) and effective channel length (Le). Due to short-channel effects, Vt may depend on Le (the so-called Vt roll-off effect), so that Vt and Le are not independent variables. We capture this by assuming that Vt can be expressed as the sum of a term that depends on Le and another independent term, so that, as RVs, we can express Vt and Le of transistor i as follows:
is the mean (expected value) operator. We are interested in the RVs L(i) and V (i), which are assumed to be independent of each other. We assume these RVs to be zero-mean Gaussians and break them up in the usual manner as: 
Gate Delay
We assume that, for a given chip design in a given technology, one can define a "nominal" representative logic gate, with appropriate output loading and input slope. For reasons that will become clear below, this gate should be typical of gates on critical paths in this technology. Due to the nonlinearity of the relationship between gate delay and transistor parameters, the mean value of gate delay does not necessarily coincide with its nominal value (the value corresponding to the case when L(i) = 0 and V (i) = 0). Furthermore, the distribution of gate delay would not necessarily be Gaussian. Simple experiments with HSPICE, however, reveal that this nonlinearity is not strong, at least not in 0.13µm CMOS. Therefore, we will ignore these complications for now, and simply assume that gate delay is a Gaussian with mean equal to its nominal value.
For all the transistors within a logic gate, we assume that their channel length variations are captured with a single RV L(i) and their threshold voltage variations are captured with a single RV V (i). Having ignored the nonlinearity between gate delay variations and the Vt and Le variations, then if D(i) is the deviation of the delay of logic gate i from it's mean (nominal) delay, we have:
where α and β are sensitivity parameters, with suitable units, that one can easily obtain from circuit simulation of a representative logic gate. Notice that, in general, α > 0 and β > 0. (For a specific industrial 0.13µm process, we have found that for a minimum-sized inverter, α ≈ 0.857ps/nm and β ≈ 17.3ps/V.) As a result, we can express the statistical variations in delay of gate i as:
wdr,V (i). These equations provide a way in which the statistical model of gate delay (i.e., its three variances) can be computed from the underlying statistical model of transistor parameters.
Path Delay
Consider a path of N logic stages (gates). Variations in path delay are due to variations in the delays of both the gates and the interconnect. For simplicity of presentation, we will focus on the gate delay variations only. Interconnect delays can be handled in a similar way. Having assumed that nominal gate delay coincides with mean gate delay, the same follows for paths.
Let D N (j) denote the deviation of the delay of path j from its mean (nominal) value. Since
The gates on a path exist at various different locations. We will make the simplifying assumption that as far as physical location on the die, for purposes of computing the within-die-systematic component, all gates on path j share the same "nominal" coordinates (x j , y j ), so that:
This is motivated by the expectation that gates on a critical path should be nearby on the die, and differences between their position-dependent within-die-systematic variations should be minor. Based on the independence relations between the terms in the above, we have σ 2 
wdr,D (j). As forσ 2 wdr,D (j), we may approximate it using the average value of σ 2 wdr,D (i) over the whole die, which we denote byσ 2 wdr,D , so that:
(46) With this, we have a full statistical model of path delay, so that we can treat it as a "parameter" and we can talk about its yield, as was done for the generic parameter X(i) in sections 4 and 4.3.
Timing Yield
The timing success of an integrated circuit depends on a number of factors, including max delay violations, min delay violations, clock skew violations, etc. In this work, we focus on max delay constraints and consider a circuit to "pass" the timing test if its longest (critical) path delays are below some threshold (our work can be extended to cover other timing concerns). We let N be the number of stages (gates) on a path that would be representative of these critical paths, and we consider that the chip contains a (typically large) number of disjoint (non-intersecting) critical paths of N stages (gates) each, so that our expression for the chip timing yield becomes: 0 (·) depends on the variances of D N , which can be computed using the expressions for path and gate variances given above, from underlying transistor level variances. This τ is the timing margin of an N -gate path, for the desired specified yield Y. Therefore, in order to get the desired yield, the circuit should be designed to "pass" the timing constraints when D N (j) = τ , for all j. Therefore, we set D(i) = τ /N, and based on (42) we require:
This gives the range of possible settings of L(i) and V (i) required to achieve a timing deviation of τ on N -gate paths. If the circuit "passes" under these conditions, then the desired yield would be achieved. To simplify matters, suppose we want the L(i) and V (i) settings to be the same multiple of their individual σ's, i.e., let:
Notice that this is feasible (and δ > 0) because both α and β have the same sign (positive). This leads to:
This δ effectively defines the "worst-case file" for which the circuit should be tested (simulated, or checked) for timing constraint violations, so as to guarantee that the timing yield is at least Y. 
APPLICATION TO STATIC TIMING
We will now illustrate how the above timing yield model allows us to choose a setting for the transistor parameters so that a desired yield is achieved if the circuit passes traditional static timing analysis. 
and in the general case we have:
Plots of Y 0 (τ ) for a few values of p are shown in Fig. 3 , for k = 3 and N = 9. One can use this type of figure as follows. If p = 6 and we want 90% yield (i.e., Y = 0.9), then it is clear from the figure that we need τ ≈ 5σ, which leads to:
Let r = (ασ L )/(βσ V ) then:
If, for example, r = 1, then:
so that the circuit would need to be simulated (and its timing checked) with all its transistors' Le and Vt set at their +2σ points. Notice that, since α and β depend on transistor sizing, then (55) provides a way in which δ can be controlled by circuit optimization and/or process tuning.
CONCLUSION
A method for statistical timing analysis has been developed, based on a timing yield model. The model is a "full-chip" model in that it can be applied with ease to large chips, before layout. This is achieved by using a measure of yield based on use of a generic critical path concept and capturing the statistics of a large collection of such paths using a model of within-die correlations that uses principal components analysis. This results in a methodology whereby one can select the right setting of the transistor parameters to be used in simulation or in traditional timing analysis in order to verify performance while guaranteeing a certain desired yield.
