Abstract-Manufacturing process variations lead to variability in circuit delay and, if not accounted for, can cause excessive timing yield loss. The familiar traditional approaches to timing verification, such as the use of process corners and predefined timing margins, cannot readily handle within-die variations. Recently, statistical static timing analysis (SSTA) has been proposed as a way to deal with variability. Although many powerful techniques have been proposed, the fact that SSTA requires a significant change of methodology has delayed its wide adoption. In this paper, we propose a framework whereby the familiar concepts of corners and margins, which are generally meaningful at the transistor or cell level, are elevated to the chip level in order to handle within-die variations. This is achieved by using high-level models, such as the generic path model or the generic circuit model with different classes of paths, to represent the behavior of typical designs. These models allow us to determine "yield-specific" margins (setup and hold margins) and virtual corners, which, if applied during standard (deterministic) timing analysis, would guarantee the desired yield. Our framework can be used at an early stage of circuit design and is consistent with traditional timing verification methodology.
I. INTRODUCTION

P
ROCESS variations have an impact on circuit delay and can cause timing yield loss if circuit delay fails to meet the performance constraints. Traditionally, process variations have been taken care of in various ways. In microprocessors, it is typical to check circuit timing with nominal transistor files, and to specify some predefined timing margin which should be left as slack between the nominal delays and the timing thresholds, in order to account for process variations. In ASICs, the practice is to typically design circuits by making sure that the chip passes the timing requirements at all process corners, including nominal, worst, and best cases of device behavior. If these settings are too pessimistic, then designers are forced to waste time and effort optimizing a circuit using design conditions that are too stringent. There has been considerable discussion in the literature that the traditional methods of using process corners or using a timing margin are breaking down. For one, in microprocessors, where nominal process files are used and a timing margin needs to be left as slack, there are no easy ways to decide what margin should be used, particularly when one wants to account for within-die variations, which have become important in modern technology. In addition, if a given margin is used, there is no indication as to what the resulting timing yield would be. On the other hand, for ASICs, the number of corners is increasing, making it very expensive to explore all corners, and anyway, this traditional corner analysis approach cannot handle withindie statistical variations [1] .
Statistical techniques offer one alternative approach to dealing with variability. Due to the increased importance of withindie variations, there has been an increased interest in tackling the timing yield problem by employing statistical techniques as part of the circuit timing analysis step [2] - [7] . The aim is to include statistical delay variations as an extension to traditional STA leading to statistical static timing analysis (SSTA). In early works [1] , [3] , [8] , [9] , within-die variations were assumed to be totally uncorrelated. This assumption is not true in practice; however, it is usually hard to express the correlations between within-die parameter variations with a model built from process data. Different attempts to model correlations have been proposed. In [4] , principal component analysis (PCA) was used to decorrelate variations on a set of independent random variables (RVs). In [10] , a quad-tree partitioning is used to express a regionwise spatial correlation among within-die variations. In [6] , correlation is taken care of using a canonical model, where each variation is expressed in terms of global sources of variations. In most of the published works, variations are assumed to have a Gaussian (normal) distribution, and linear (first-order) delay models are used. Recently, however, several authors have investigated the case of non-Gaussian distributions and/or nonlinear delay models [11] , [12] . Overall, SSTA has significant computational cost, and some have suggested [7] that some SSTA techniques unnecessarily complicate the design flow, and the point was made that it may be useful to explore early and simple SSTA techniques that would maintain the current timing verification methodology.
With regard to the design flow, most proposed SSTA techniques are applicable to fully specified and placed designs and require advanced correlation models built from extensive process data. Thus, these methods are "final sign-off" tools and are not usable during the circuit design phase. Also, requiring detailed process data can be problematic. For one, it may not be available during the early design phase of a new chip, which is 0278-0070/$25.00 © 2008 IEEE often being manufactured in a new process. In addition, there is no agreement yet on a simple and universal model of withindie correlation. One possibility is to assume that the correlation dies down with on-die distance according to some polynomial function. However, it is shown in [13] that distance is not the key determinant of correlation but that correlation depends on whether the distance in question is along a vertical or horizontal direction. There is also evidence that correlation depends on location within the reticle field, and not purely on distance. Finally, SSTA techniques require a change in the traditional timing verification methodology, which explains why these techniques are not yet widely adopted. Given all these reasons, it seems worthwhile to explore other options.
In this paper, we address the aforementioned challenges by concentrating on the following aspects: 1) focusing on early analysis, possibly in the absence of a complete circuit instance, which allows one to take corrective actions early in the design phase; 2) operating in the absence of within-die correlation information, for the reasons stated earlier, namely that correlation information can either be unknown for a new process or inaccurate; 3) keeping the current timing verification methodology which uses corners and timing margins in place, and extending it to handle within-die variations. This requires elevating the concepts of corners and margins to the circuit (i.e., chip) level whereby virtual corners and timing margins are deduced from circuit performance.
In order to enable early analysis, and given the absence of complete details of the circuit implementation, it is necessary to construct high-level models that capture circuit characteristics. A model was used in [14] in which a generic critical path model was employed as a way to capture the dependence of circuit worst-case delays in a given technology, in the absence of a specific circuit instance. In this model, a circuit is assumed to consist of a large number of identical paths, whose delay is representative of critical path delays in the target design. Each of these generic paths consists of a specific number of identical stages (a stage is a logic gate and the interconnect at its output). This model has been successfully employed in the industry, and was recently extended in [15] , where it is shown that the statistics of a large number of such generic paths match very well the statistics of critical path delays in the actual circuit. In this paper, we make use of the generic path model, and extend it by proposing a flexible generic circuit model based on multiple classes of generic paths, with different depth. Such a representation can model virtually any circuit by simple discretization of its path delays into specified classes. We use these models to represent and analyze typical designs in a target technology.
As stated earlier, we will assume that within-die variations have some unknown correlation on the die. This does not mean that we ignore correlation. On the contrary, we prove that, by taking correlation to its extremes, we get useful (conservative) bounds on the timing yield. Assuming that variations are positively correlated (this is further discussed hereafter), we show that the conservative case is achieved when the systematic 1 within-die variations of gate delay are assumed to be totally correlated (ρ = 1) along a path, and the systematic within-die variations of path delay are assumed to be uncorrelated (ρ = 0) across a block (i.e., a collection of paths).
The analysis hereafter will lead to closed-form expressions for the timing yield. Furthermore, the results will be two sided, i.e., will take into account yield loss due to both setup and hold time violations. The yield expressions allow one to extract "yield-specific" timing margins that should be used on top of the nominal maximum and minimum circuit delay, during deterministic timing analysis, in order to guarantee the desired yield. We refer to these margins as setup and hold timing margins. We also show how to extract virtual corners at the chip level, as opposed to traditional process corners which are extracted at the transistor or gate level. Since these virtual corners are based on circuit performance, they do take within-die statistical variations into account. Finally, we propose a method that allows "controlled" budgeting of yield loss between setup and hold violations. Our framework is therefore consistent with traditional timing verification methodology. It can be applied at an early stage of the design flow to predict which corners and margins to use in order to achieve a target yield. Early versions of this work have appeared in [16] .
The rest of this paper is organized as follows. Sections II and III describe the generic path model and the mathematical framework pertaining to the formulation of the timing yield expressions. In Section IV, we show how we determine the yield-specific margins and virtual corners that should be used to guarantee a desired target yield. Sections V and VI describe the generic circuit model and the mathematical framework leading to the selection of setup and hold margins. Margin budgeting, as well as the control over yield loss, is presented in Section VII. We conclude in Section VIII.
II. BACKGROUND: THE GENERIC PATH MODEL
In this section, we first present the parameter model which captures process variations, and then show how we construct the generic path model. The purpose is to provide some background based on the material from [15] and [16] .
A. Parameter Model
For a given circuit element or layout feature i, let X(i) be a zero-mean Gaussian RV that denotes the variation of a certain parameter of this element from its nominal (mean) value. Thus, for example, X(i) may represent channel length variations of transistor i. It is standard practice [17] to break up the variations into die-to-die and within-die components as follows:
The die-to-die component X dd is an independent 2 zero-mean Gaussian RV that takes the same value for all instances of this element on a given die, irrespective of location. The withindie component X wd (i) is a zero-mean Gaussian which can take different values for different instances of that element on the same die. This leads to the following relationship between the variances:
Then, the within-die component is further broken down into two components, namely, a systematic or somewhat correlated component and a "random" component
where, for each i, the random component X wdr (i) is an independent zero-mean Gaussian. A similar relationship follows for the variances
The within-die systematic component is expressed in the following way:
where Z wds (i)'s are the correlated standard normal RVs (mean zero and variance one). Hence, our model for parameter variation X(i) consists of an independent zero-mean die-to-die component X dd with variance σ 2 dd , a correlated systematic within-die component X wds (i) with variance σ 2 wds (i), and an independent random within-die component X wdr (i) with variance σ 2 wds (i). In previous work, PCA is used to model within-die correlation by expressing variations in terms of global variables. In this paper, however, we do not require the use of any specific correlation model. Instead, we express the within-die systematic components as positively correlated RVs and derive bounds on the timing yield which hold irrespective of the correlation model used. We show next how gate delay variations and path delay variations can be modeled using the same parameter model defined previously.
B. Gate Delay
In general, there is a nonlinear relationship between gate delay and transistor parameters. HSPICE simulations in an industrial 90-nm CMOS library reveal, however, that this nonlinearity is not strong. Therefore, we will simply assume that gate delay is linearly dependent on the process and, hence, is Gaussian with mean that is equal to its nominal value.
For all the transistors within one logic gate, we assume that their channel length variations are captured with a single RV L(i), and their threshold voltage variations are captured with a single RV V (i). We also assume that L(i) and V (i) have been "decorrelated" using PCA [18] , or any other transform, so that they are independent. For each gate, we can extract sensitivities to the different varying process parameters using circuit simulation. Hence, if D(i) is the deviation of the delay of logic gate i from its mean (nominal) delay, we have
where α and β are sensitivity parameters, with suitable units, that one can easily obtain from circuit simulation of a representative logic gate. As a result, we can express the statistical variations in delay of gate i as
so that
These equations provide a way in which the statistical model of gate delay (i.e., its three variances) can be computed from the underlying statistical model of transistor parameters.
C. Generic Path Delay
Consider a generic path of N logic stages (a stage is a logic gate and the interconnect at its output). We will focus only on gate delays and on transistor L and V t variations. The methodology can be easily applied when more device parameters are of interest or when interconnect parameter variations are to be included as well. Let D N (j) denote the deviation of the delay of path j from its mean (nominal) value
Recall that we are working under the assumption that withindie systematic correlation is unknown, and thus, the systematic path delay component N i=1 D wds (i) cannot be explicitly resolved as is. However, because we will be using upper and lower bounds in the yield analysis section where we study the statistics of a large collection of generic paths, we also use bounds here when deriving the delay model for these paths. In Appendix A, we prove that a lower (upper) bound on the distribution of a sum of Gaussian RVs with unknown correlation is achieved when these RVs are assumed to be totally correlated (independent). Applying this result to the systematic within-die component of path delay leads to the following equation for the variances:
where 
III. YIELD ANALYSIS
With the random parameter model given previously, we now define the parametric Max yield and the parametric Min yield for parameter X as
where n is the number of instances of this parameter on chip.
Here, X(i) is a generic parameter that may represent process variations (channel length or threshold voltage), or path delay variations as we have shown in Section II-C, in which case the parametric yield becomes a timing yield. In other words, when X(i) represents path delay variation, the parametric Max yield is the probability that all path delay variations are less than x, and the parametric Min yield is the probability that all path delay variations are greater than −x. Note that x is generally positive since, in the Max yield case, we are interested in x values that are greater than the mean of X (i.e., zero), while in the Min yield case, we are interested in values that are smaller. Thus, the material in this section, although focused on parametric yield, will actually be directly useful for computing timing yield. It is noteworthy that both Max and Min yields are of equal importance. Chips do fail because the maximum (minimum) circuit delay is greater (smaller) than a performance constraint determined by the setup (hold) constraints.
A. Slepian's Inequality
In this section, we will find bounds on the two parametric yields that were defined earlier. Using the following theorem from multivariate normal probability [19] , we will prove that these bounds are a direct consequence of extreme cases of correlation between the within-die systematic components across the die.
1) Slepian's Inequalities: Let V = ({V (i)}, i = 1, . . . , n) and W = ({W (i)}, i = 1, . . . , n) be two random vectors of size n, with both of them being multinormally distributed with zero mean and covariance matrices Σ = {σ ij } and Γ = {γ ij },
Note that σ ii = σ 2 i is the variance of V (i) and σ ij is the covariance of V (i) and V (j).
The aforesaid result is quite interesting. It states that, given two random vectors V and W having the same variances (i.e., σ ii = γ ii ), if one of them is more correlated than the other (i.e., σ ij ≥ γ ij for all i = j), then (14) and (15) hold.
A direct consequence of Slepian's inequalities is to be able to find upper and lower bounds on both Max and Min parametric yields. As presented in Section II-A, the parameter vector X is the following:
where X wds (i)'s are arbitrarily correlated across the die. To overcome the lack of information about within-die systematic correlation, which is typically hard to get, we will introduce two RVs that represent extreme cases of within-die systematic correlation: the case of independence and the case of total correlation. Let X (0) be a random vector whose elements have the same marginal distributions as those of X but with the property that its within-die systematic components are independent. Also, let X
(1) be a random vector whose elements have the same marginal distributions as those of X but with the property that its within-die systematic components are positively totally correlated. In other words, X, X (0) , and X (1) have the same individual variances but only differ in the extent of correlation of the within-die systematic components, i.e., they differ in their covariances. Now, suppose that Σ, Σ (0) , and Σ (1) are the covariance matrices of parameter vectors X, X (0) , and X (1) , respectively. These matrices have the same diagonal elements (representing variances) but differ by their off-diagonal elements (representing covariances). Remember that X is a zero-mean random vector, and its covariance matrix Σ = {σ ij } will be simplified to the following, for i = j:
where σ ij is the off-diagonal element of the covariance matrix Σ. The second part of the aforementioned equation is the covariance Σ wds = {σ wds ij } of the within-die systematic component. Therefore, we can write
Notice here that the covariance of X (0) and X (1) will have the same form as the aforementioned equation and only differ by the value of σ wds ij : For X (0) , σ wds ij = 0 since the within-die systematic components are independent (correlation coefficient equals zero). For X (1) , σ wds ij is maximum, since maximum covariance occurs when the systematic RVs are positively totally correlated (correlation coefficient equals +1).
If we assume that the within-die systematic components are positively correlated (i.e., their covariances are always positive σ wds ij ≥ 0), then the correlation coefficients are between the two extremes of zero and one, so that
Some comments are in order with respect to this positivity assumption. The assumption of positive correlation is practical for many sources of variability. A physical variation that slows down a transistor, a gate, or a path is likely to have the same effect on another that lies nearby. If the other device/gate/path is far away, then it would probably be independent anyway. In this paper, the RVs that will be assumed to be positively correlated are the gate delays in one case and the path delays in another case. Thus, we are not assuming that all process variables are positively correlated. By saying that two gates (or two paths) are positively correlated, we mean that when process variations cause the delay of one of them to increase, then the delay of the other does not decrease. Notice, however, that both gate and path delays are typically functions of several process variables. Therefore, this assumption is not as strong as requiring that corresponding sensitivities to the same process parameters be always of the same sign. To be sure, most commonly used process parameters, such as L and V th , have similar effects on gate and path delay, e.g., an increase in channel length will almost always slow down a logic gate; such process variables lead to positive correlation. However, the fact that gate and path delay variations are an aggregate effect of dependence on a number of process variables is the best justification for our assumption; even if one or two pathological process variables, in advanced technology for instance, are such that they cause opposite effects, they are unlikely to dominate to such an extent so as to lead to overall negative correlation. Finally, in the case of path-to-path correlation, the fact that many paths share common subpaths would also contribute to overall positive correlation between them. In any case, given this assumption, then combining (19) with Slepian's inequalities, we can write
where Y max (x) and Y min (x) are the parametric Max and Min yields defined earlier. The aforesaid equations are fundamental, because they capture the lower and upper bounds on the Max and Min yields when the within-die correlation is unknown or uncertain. In the following sections, we will derive expressions for the lower and upper bounds given in (20) .
B. Lower Bounds
Consistent with the previous section, we will now find the expressions for the Max and Min yield lower bounds by starting with an assumption that X wds (i)'s are independent. Let Y (lb) max (x) and Y (lb) min (x) be the Max and Min yield lower bounds, respectively. For the lower bound analysis, the withindie systematic and random components X wds (i) and X wdr (i) are both independent RVs. Therefore, we can replace them both by an independent within-die component X wd (i) = X wds (i) + X wdr (i). The following are the yield lower bound expressions:
Since X dd is an independent zero-mean Gaussian with variance σ 2 dd , Z 0 = X dd /σ dd is an independent standard normal RV (mean zero and variance one), and the expressions for the yield lower bounds can be expanded as
We now recall a result from basic probability theory that will be used repeatedly in this paper. Let A be an arbitrary event and X be an RV with a probability density function (pdf) f (x). Then (see [20, p . 85]), we have
This result is an extension to the continuous case of the simple fact that P{A} = P{A|B} · P{B} + P{A|B} · P{B}, where B is another event. Applying (25) to (23) and (24) and denoting by φ(·) the pdf of the standard normal distribution give
Since X wd (i)'s are independent, then we can express their joint probability as a product
Now, replacing the integrals by the mean or expected value operator E[·] and using the fact that, for a normal RV W with zero mean, P{W > −w} = P{W ≤ w}, we get Equations (28) and (29) represent the parametric Max and Min yield lower bounds. Note the dependence on the number of parameter instances n. Later in this paper, we will show that as n goes to infinity, the aforementioned equations will take other forms which are independent of n.
C. Upper Bound
Similarly, we will now find expressions for upper bounds on the parametric Max and Min yields. Recall from Section III-A that when X wds (i)'s are positively totally correlated, one gets an upper bound on the parametric yields. The following equations represent the upper bounds on the parametric yield, where X wds (i)'s are totally correlated:
Total correlation of the within-die systematic components means that there is a unique RV that models all the systematic variations across the die. Hence, the model for X wds (i) will be the following:
where Z 1 is an independent standard normal. Making use of (25) twice on variables Z 0 and Z 1 , and noting that X wdr (i)'s are independent with variance σ 2 wdr (i), we get the following expressions:
The same analysis used in the lower bound case is applied here to get the aforesaid equations. For the sake of conciseness, several steps were omitted. As a result of Sections III-B and III-C, we have an upper and a lower bound on each of the parametric Max and Min yields. With a simple change of variables, as will be illustrated in the next section, these bounds can be computed by numerical integration. If a Max or Min yield of, for example, better than 90% is desired, then one can set the lower bound Y 
D. Illustration
As an illustration, we will assume all variances to be the same across the die. This means that σ 
where σ wd was defined earlier to be σ 2 wds + σ 2 wdr . In order to compute the aforementioned equations, we use the definition of the expected value operator (as an integral), and with a change of variables of u = Φ(z 0 ) and v = Φ(z 1 ), we get
The plots of the bounds on the parametric Max yield for different values of n are shown in Fig. 1 , where we have assumed that σ σ 2 being the total variance. Notice that the yield decreases for larger n, as expected. Also note that as x increases, it is more likely that the max of X(i) is less than x.
E. Bounded Variations
Notice that, in the previous expressions for the yield, the yield decreases for larger n. One would somewhat expect this, but it is surprising to note that the yield approaches zero as n goes to infinity for any combination of values of the three variances. This may also be seen in the plots of Fig. 1 . This is somewhat nonphysical and arises due to the fact that we have assumed that the distribution of X wdr (i) is normal; recall that the normal distribution extends to ±∞ in both directions. In reality, one would expect process variations (and, consequently, delay variations) to be bounded. If a device somewhere deviates by large amounts, like 6σ or 7σ, then chances are that there is a serious problem with that die and that it would be lost due to other reasons other than timing yield. Therefore, it is a good idea to limit the spread of the cdf of X wdr (i) to some multiple of σ in order to avoid these nonphysical effects at large n. In this section, therefore, we will use a truncated normal distribution for X wdr (i). For clarity of presentation, we will restrict the analysis to the illustrative special case introduced in Section III-D where all variances are the same across the die. The analysis can be extended to the general case. Suppose, therefore, that X wdr (i) is bounded by ±kσ, and let Φ t (x) represent the cdf of the truncated standard normal, which is zero for x ≤ −k and one for x ≥ k.
We can plug Φ t (·) instead of Φ(·) (for X wdr (i)) into the aforementioned equations and plot the resulting Max and Min yield integrals, as shown in Fig. 2 for the Max yield. In this case, the yield loss at higher n values is limited so that the 1e6 and 1e8 plots in each group are indistinguishable. This is to be expected, because the "tail" of the distribution has been cut off, and it is primarily the tail that causes the yield loss at very large n.
When working with a truncated normal, it is noteworthy that we can derive asymptotes on the parametric yield curves that are independent of n. This means that as n tends to infinity, the parametric yield curves will be equal to these asymptotes. The 
where, as before, Z 0 = X dd /σ dd is an independent standard normal RV. Equations (40) and (41), particularly when applied to timing yield as we will do later in this paper, exhibit similar observations as was made in [21] . Namely, the within-die variations determine the mean of the yield, while the die-todie variations determine the spread of the yield. Fig. 2 shows the plot of these asymptotes for the Max yield case. These asymptotes are very tight and indistinguishable on the plot from the 1e6 and 1e8 curves. In practice, one is quite interested in the yield lower bound expressions. Table I summarizes the conditions required to get a lower bound on the Max (Setup time) yield and the Min (Hold time) yield. It summarizes the results of this section, along with Section II-C, to show that, in the case of path delay, where gate delays are the "parameters," in order to produce a lower bound on the yield, one must set the within-die systematic variations of gate delays to be totally correlated (ρ wds = +1). Looking at block delay, where delay is the maximum/minimum of a large number of paths, with path delays as the "parameters," and in order to produce a lower bound on both Setup (Max) and Hold (Min) yields, one must set the within-die systematic variations of path delays to be independent (ρ wds = 0).
IV. APPLICATION: TIMING MARGINS AND VIRTUAL CORNERS
In the previous section, we have presented the mathematical framework leading to closed-form expressions for the parametric yield. We have also noted that when X(i)'s represent path delay variations, the parametric yield is essentially a timing yield, which is of more interest. As stated in Section I, two dual concepts are important for designers, namely, the concept of a timing margin to account for variations and/or the concept of a corner or device file setting, with which to run timing analysis. In this section, we show how we use the timing yield expressions in order to select, for a desired target yield, the corresponding timing margin and virtual corner that guarantee that yield. Note that the corner is labeled as virtual because it is extracted from a design's performance, unlike traditional corners which are extracted by looking at transistors' or gates' performances.
Let N 1 be the number of stages (gates) on a path that would be representative of long critical paths in a given design. We also consider that the design contains a large number of such disjoint (nonintersecting) critical paths of N 1 gates. If one is interested in the maximum circuit delay, then the Max timing yield expression can be used
where D N 1 (j) is the path delay variation described in Section II-C and Y 0 (lb) max (·) is the lower bound expression for the yield found in Section III-E, with path delay being considered as the "parameter." Since the paths being considered are disjoint, then any correlations between their delays are only due to correlations in the process variations, and not to the sharing of circuit component. If Y is the desired target Max yield, then the techniques of Section III effectively provide the inverse function to compute the margin τ 1 for any desired Y
This τ 1 is the timing margin of an N 1 -gate path for the desired specified Max yield Y. Therefore, in order to get the desired yield, the circuit should be designed to "pass" the timing constraints when D N 1 (j) = τ 1 for all j. Now, to get the virtual corner, we need to move from the margin to the process (transistors). We use the same technique of Najm and Menezes [5] , where the margin τ 1 is equally distributed first among the gates along the path, i.e., D(i) = τ 1 /N 1 , and then, using the gate delay model, we relate the transistor corner with the gate margin
This δ effectively defines the device setting for which the circuit should be tested for timing constraint violations so as to guarantee that the timing Max yield is at least Y. Similar analysis can be used for the Min yield case to verify the minimum circuit delay. We now illustrate this with a numerical example. Assume that the variances of channel length and threshold voltage follow the following ratio: 
The last two equations are notable for the absence of the square factor in N 1 , since they result from averaging of variations due to independence. Let σ
be the total path variance. If we want a Max yield that is greater than 95%, then we use the expression in (45) with Y = 0.95 and truncation factor k = 3 to get the timing margin which is, in this case,
Then, using (46), we get
Note that σ
resulting from adding the path variances in the lower bound case, and assuming that N 1 = 9, we get
If, for example, r = 1, then δ ≈ 1.93, so that the circuit would need to be simulated (and its timing checked), with all its transistors' channel lengths and threshold voltage being set at their +1.93σ points. Notice that, since α and β depend on transistor sizing, (56) provides a way in which δ can be controlled by circuit optimization and/or process tuning. Note that δ ≈ 1.93 is obtained from the lower bound which corresponds to the worst possible setting of correlation as discussed earlier. We have also used the upper bound on the yield (corresponding to best setting of correlation) to compute an optimistic virtual corner at a 95% yield, and it was found to be δ ≈ 1.43.
V. GENERIC CIRCUIT MODEL
The generic path model for circuits is useful to study setup margins (Max yield) or hold margins (Min yield) separately. However, in order to study the interactions among long and short paths, and the tradeoffs between setup and hold margins, a more detailed model is required. In this section, we provide such a model, which we refer to as the generic circuit model, which involves the specification of different classes of paths, as an extension of the generic path model. According to this model, a generic circuit is a collection of a certain number M of classes of paths. Each class j = 1, 2, . . . , M consists of n j identical generic paths of depth N j . The depth is the number of stages along a path as defined earlier; a stage is a logic gate or cell along with its fanout interconnect network. In a logic circuit, if a logic path is well optimized, then every stage is loaded by a total output capacitance which is three to four times as big as its own input capacitance, hence the standard FO4 configuration, so that the stages on a path have approximately the same delay.
We use the same assumption as before that all stages have the same nominal delay and the same sensitivity to process variables (α and β). Thus, all stages are identical, and all paths within the same class are identical; paths in different classes have different depth.
The classes can be viewed as the result of discretization of path delays into "bins" of paths with equal or similar delay. We further assume the existence of a "circuit depth histogram" which provides the fraction of the total number of paths that each class constitutes. We express these fractions as probabilities
where N is a discrete RV that represents path depth, N j is the depth of paths in class j, and γ j is the fraction of total paths that have a depth of N j . If n is the total number of paths in a circuit, then the number of paths n j in class j is
An example is given in Table II , which, for a test circuit A with five classes, lists the values of N j and γ j for every class. Also shown is the corresponding path count n j in every class based on an assumed total path count of 10 000. The table also shows the nominal path delay D (nom) N j in every class. Based on such a circuit model, we will provide the mathematical framework that leads to predicting timing margins that need to be left to guarantee a desired timing yield. We will also validate our margins with Monte Carlo simulations. We consider the generic circuit to be an intuitive and useful model for logic circuits in the same way as the generic path has been found to be useful [14] , [15] . Recall that timing yield is the probability that circuit delay is within the specified timing constraints. These constraints can be upper limits on the maximum circuit delay, lower limits on the minimum circuit delay, or both in what is known as interval or two-sided constraints. Yield loss is incurred if circuit timing falls outside the constraints. Note that constraints on the maximum circuit delay are also known as setup time constraints, and constraints on the minimum delay are known as hold time constraints. In the next section, we define and express the setup yield and the hold yield in closed forms. Then, we combine the two analyses and present a margin budgeting scheme which allows to make a tradeoff between yield loss due to setup and hold violations. Recall that, in our generic circuit representation with multiple classes, each class consists of identical generic paths whose delay variation is modeled using the generic parameter model. Let D N j (i) be the path delay variation of path i in class j. Then, we can express path delay variation using the parameter model in the following way:
where j = 1, . . . , M is the class index, i = 1, . . . , n j is the path index in class j, and the right-hand side is simply the three components of variability. Note that, in the second line, we have combined the systematic and random components into one within-die component of variability whose variance is equal to the sum of the variances of the systematic and random components. Now, combining the aforesaid model for path delay with (59) gives the formal expression for setup yield
where the operator is used to indicate that we are interested in the probability of all these joint events. In other words, the setup yield is the probability that, over all classes M , all path delay variations in each class are less than a timing margin τ j (specific to each class j). This margin is the amount of "padding" or slack, which should be kept between the maximum delay constraint and the nominal path delay in class j, to account for delay variations. Using the previous expression for the setup yield, we will later show how we can predict a unique setup margin τ s that should be left as slack on the maximum nominal path delay in order to guarantee a desired setup yield for the circuit under consideration.
Starting from (62), and noting that the second intersection defines a statistical max operation, we can write the following:
In Section III-A, it was shown, using Slepian's inequality, that if the systematic within-die delay variations are assumed to be uncorrelated, then this would lead to a lower bound on the statistical max and min operations. We will use again this result here to write the following lower bound on the setup yield Y S :
where D (wd) N j (i)'s are assumed to be uncorrelated for different i's (hence independent because it is Gaussian).
A. Asymptotic Convergence
Let Z j be an RV that is equal to the maximum delay variation for class j. This is shown in the following:
where
Note that because the die-todie component is the same for all i's, then we can take it outside the max operation. We will now prove that as n j → ∞, the variance of W j will go to zero; therefore, we can approximate W j by a constant μ j that is equal to its 50th percentile W j,50% .
Let Y be the maximum of n independent identically distributed standard normal RVs. In studying the asymptotic properties of such an RV, it has been shown in [22] that V = l n (Y −l n ) has a unique limiting distribution that is independent of n, where l n is a coefficient (not an RV) that is proportional to √ 2 log n. Since V has a distribution that is independent of n, its variance s 2 is independent of n. Therefore
where Var[·] denotes variance and
which is true because l 2 n is proportional to log n. This result can be applied to W j , which is the maximum of n j independent zero-mean Gaussian RV D (wd) N j (i), after scaling each by its standard deviation to transform them to standard normal RVs. Since the variance of W j goes to zero for large n j , we can approximate it by its 50th percentile W j,50% which we denote by μ j . Now, letẐ j be an RV such that
Then, Z j converges toẐ j for large n j . Note that μ j is given by
where σ
Fig . 3 shows the plots of the distributions of Z j (solid) andẐ j (dashed) for increasing values of n j = 10, 100, 1e3, 1e6, 1e7, and 1e9. The approximation is very tight for n j that is greater than 100, and the two become indistinguishable for larger n j . Since we anticipate each class of paths to contain hundreds, if not thousands, of paths, then the approximation is very good, andẐ j will be used in lieu of Z j in the analysis hereafter. Also note that the use of truncated normals accelerates the convergence and guarantees that μ j in (71) remains bounded. 
B. Setup Yield Expression
Starting from (64), and replacing the expression inside the brackets byẐ j , we get the following expression for the setup yield:
Recall that the die-to-die component of variation affects all the circuits in the same way, i.e., the variation is governed by a single RV
is the standard deviation of the die-to-die component and Z 0 is a standard normal RV. Note that σ 
where Y o is the desired lower bound on the setup yield Y S .
C. Setup Margin τ s
The aforementioned equation for the setup yield Y S is very important because it will allow us to budget each margin τ j and to finally assign a unique setup margin τ s for the circuit, which guarantees a desired setup yield as follows. Let a min be such that
Recall that μ j and σ (dd) N j are determined and fixed for each class, but τ j is to be determined. If a setup yield of, for example, Y S = Y is required, then we can set the lower bound Y o in (77) to Y, and get a value for a min as follows
We can then use this value of a min to budget the yield between the different classes. In order to reduce the design effort for class j, one would like the margin τ j to be small, and so we require
where, effectively, we require the different class contributions to the yield to be equal. In this way, we have "relaxed" the yield requirements for each class while ensuring that the overall desired yield is met. As mentioned, this is important, since it has the benefit of reducing the design effort required to meet the timing and yield constraints. This gives us the following expression for each class margin:
Recall that the margin τ j represents how much designers should leave as slack between the nominal path delay of class j and the maximum delay constraint (setup constraint). Let D (nom) N j be the nominal path delay of the generic paths of class j. Then, we can define a unique setup margin τ s to be padded on top of the maximum nominal path delay for all classes in order to guarantee that the desired setup yield is attained. This is expressed in the following way:
which may be explained with the help of (hashed bars), we get the class margins τ j (solid bars) using (81). We then apply (82), subtracting the maximum nominal delay from the maximum delay with the added margin τ j to get the setup margin τ s . Therefore, we are providing a unique margin to be allowed on the nominal maximum circuit delay in order to guarantee the required yield. Note that no other path delays exceed the point defined by the maximum nominal delay with the added setup margin τ s .
D. Monte Carlo Validation
In this section, we validate our approach with Monte Carlo analysis based on an industrial 90-nm CMOS library. For this matter, we use circuit A that was defined in Table II . This circuit has five classes of paths with different depths, nominal delay values, and numbers of paths. We assume that two process parameters are varying (channel length and threshold voltage), to which we determine gate delay sensitivities for every class of path. In our experiment, it was observed that the variation in process parameters resulted in 20%-27% 3σ gate delay variation. Also, we have assumed a breakdown of (50%, 25%, and 25%) total parameter variance into die-to-die, systematic within-die, and random within-die variances, respectively, as was done earlier. Applying the analysis described in Section V, we can get the path delay model for each class of paths.
We assume that the desired setup yield is Y = 99.73%. Applying our analysis on circuit A, without truncation of parameter variations, we get a setup margin τ s = 4.85 ns that corresponds to 36% of the maximum nominal path delay of 13.5 ns found in Table II . Recall that this margin is valid for arbitrary within-die correlation since it was derived from the yield lower bound. It is therefore a conservative timing margin; this means that, for a given correlation setting, the true setup margin is expected to be less than the margin that we have predicted. We verify this statement through Monte Carlo simulations for different within-die correlations. To this end, we have considered a k × k grid on which we place all the 10 000 paths of circuit A. We also use a number (p) of RVs to model the within-die correlations between paths. If p is large, withindie correlation between paths is weak. If p is small, correlation is strong. We have run Monte Carlo for 10 000 samples and generated the distribution of maximum path delay. From this distribution, we determined the 99.73% delay percentile and subtracted from it the maximum nominal path delay to get the setup margin. We have run this experiment for different values of p to emulate situations ranging from high correlation to almost independence. Monte Carlo analysis predicted setup margins ranging from 3.54 ns (for strong correlations) to 4.54 ns (for weak correlations) as opposed to our 4.85-ns predicted margin. Fig. 5 shows the predicted margins for the experiments that we performed. Note that the leftmost Monte Carlo margin corresponds to the case of strong correlations, and the rightmost Monte Carlo margin corresponds to the case of weak correlations. 
E. Hold Yield
In the previous section, we presented in detail the analysis that led to the final expression for the setup yield Y S and the corresponding setup margin τ s . In this section, we will briefly repeat the same analysis for the hold yield. Recall that the hold yield can be informally defined in the following way:
Y H = P{All path delays are greater than min constraint}. (83) Using the same analysis as before, we can rewrite the hold yield to be
where τ j is the margin (positive) that should be left between the minimum constraint and the nominal path delay for class j. The aforementioned equation can be transformed in the same way as previously done
Again, using Slepian's inequality, the assumption that D (wd) N j (i)'s are independent will give us a lower bound on the hold yield. This allows us again to approximate the minimum of n j independent Gaussian RVs by a constant μ j that is equal to U j,50% , the 50th percentile of the distribution of U j = min
The final expression for the hold yield is
In the same way as before, we will choose a b max to guarantee a desired hold yield of, for example,
Then, we can budget τ j in such a way that
This gives us the margins that should be left between the minimum delay constraint (hold constraint) and the nominal path delay of every class of paths
Note here that τ j is positive because b max and μ j are both negative. Finally, we combine the aforesaid margins with the values of nominal path delay D (nom) N j to get a unique hold margin τ h for the circuit. This margin should be left as slack between the minimum constraint and the minimum nominal path delay over all classes
Our hold analysis was validated using the same strategy presented in Section VI-D, and have found that Monte Carlo simulations predicted hold margins ranging from 1.56 ns to 2.05 (for various correlations) as opposed to 2.22 ns predicted by our approach.
VII. APPLICATION: MARGIN BUDGETING
In the previous section, we presented a way to verify that a desired setup yield or hold yield is met. The method is simple, in that it provides a setup margin τ s , and a hold margin τ h , which should be allowed on the nominal maximum and minimum path delays, respectively, to guarantee the desired yields. We now combine the two analyses together, and give an expression for the total yield Y T , which captures both setup and hold yield loss. This provides control over the amount of yield loss from either setup or hold violations. This means that if hold yield is dear, one can pick the margins in such a way that all timing yield losses are due to setup yield loss and vice versa. The total yield is
Again, a lower bound is guaranteed when the within-die variations are assumed to be uncorrelated. Then, by applying the approximations that were presented earlier, mainly replacing the minimum and maximum operations by their constant estimators gives the final expression for the total yield
where μ j , μ j , Z 0 , b max , and a min are as defined before. Note that we need to assign values for a min and b max in order to guarantee a desired total yield Y T . Once we have chosen their values, the same previous analysis applies, i.e., find τ j and τ j from a min and b max using (81) and (92), respectively, and then determine the setup margin τ s and the hold margin τ h using (82) and (93), respectively. Looking at (97), it is easy to see that if a total yield of, for instance, Y T ≥ 95% is desired, then different solutions (a min , b max ) exist. For each solution, the amount of yield loss from setup and hold violations differs, but the total yield loss is the same. Let L T , L S , and L H be the total, setup, and hold yield losses, respectively. Then, we can write
Note that the total yield loss equals the sum of yield loss from setup and hold. We now define the "yield loss budget" ξ in [0, 1] to be the fraction of total yield loss induced by hold violations. This means that
Combining the aforementioned four equations enables us to get a unique (a min , b max ) and, hence, a unique (τ h , τ s ) solution for a specific yield loss budget ξ
We have tested the previous analysis on circuit A shown in Table II . For a total yield of Y T = 95% and a yield loss budget of ξ = 0.8, our results show that, without truncating the parameter distributions, the resulting margins are τ s = 4.44 ns (33% of the maximum nominal delay of 13.5 ns) and τ h = 1.81 ns (23% of the minimum nominal delay of 8 ns). We have performed additional tests for different test circuits with varying number of classes of paths; the margins were derived at two yield targets, namely, 95% and 98%, and for two cases, namely, without truncation (k = ∞) and with truncation (k = 3), and a yield loss budget of ξ = 0.8. Our results are recorded in Table III , where setup and hold margins are normalized to the nominal maximum and minimum circuit delay, respectively. Fig. 6 shows an interesting plot of τ h versus τ s , for circuit A at a desired yield of 95%, which was obtained by sweeping the yield loss budget ξ between 0 + and 1 − . As expected, when the setup margin is increased (more slack), causing setup yield loss to decrease, we can reduce the hold margin and incur more hold yield loss while maintaining the same total yield. This gives designers more flexibility in their choice of margins.
VIII. CONCLUSION
In this paper, an early technique for timing analysis was proposed, one that maintains the traditional methodology of using timing margins and corners. Our technique is based on a timing yield model which allows one to determine and budget the setup and hold margins required for a proposed chip design, as well as to elevate the concept of corners to the chip level by carefully selecting a "virtual corner" based on the desired circuit yield metric. The technique is valid irrespective of (and for any possible setting of) the within-die correlations, which allows one to handle situations of unknown or uncertain correlations. The process requires up-front specification of the target process technology and the circuit style, through a generic circuit description in terms of one or more classes of generic paths. As a result, the proposed technique provides setup and hold timing margins which, if observed for the longest and shortest nominal path delays during subsequent circuit design, would allow the chip to meet the timing yield target. These margins are not unique, but may be traded off against each other, so that one may allow more setup yield loss than hold yield loss, or vice versa, by simply changing a single parameter.
APPENDIX BOUNDS ON SUM OF GAUSSIAN RVS
In this section, we will find bounds on the distribution of the sum of N normally distributed RV X(i)'s with unknown correlation. These bounds will turn out useful when we build the generic path variational delay model. Essentially, we prove that if the unknown correlation ranges from zero (independence) to one (total correlation), then a lower bound on the distribution of the sum is achieved when X(i)'s are assumed to be totally correlated, and an upper bound is achieved when X(i)'s are assumed to be independent.
Let S = N i=1 X(i), where S is a zero-mean normal RV with some variance; call it σ 2 s . The distribution of S is as follows:
where Φ(·) is the standard normal cdf. Since Φ(·) is increasing, then P{S ≤ x} is decreasing in σ s (for x ≥ 0, which is the range of interest if one is looking to achieve high yield). The variance of S is a function of the variances of X(i)'s and the unknown correlation coefficient ρ ij
Therefore, it is easy to see that a lower (or upper) bound on the distribution of S is achieved when σ s is maximized (or minimized), i.e., when ρ ij are set to one (or zero).
