Abstract -Statistical Timing Analysis (SSTA) is a method that calculates circuit delay statistically with process parameter variations, die-to-die (D2D) and within-die (WID) variations. In this paper, we model that WID parameter variations are independent for each cell and line in a chip and D2D variations are governed by one variation on a chip. We propose a new method of computing a full chip delay distribution considering both D2D and WID parameter variations. Experimental results show that the proposed method is more accurate than previous methods on actual chip designs.
I Introduction
With the advent of deep sub-micron technologies, the delay variations caused by process, temperature, voltage drop and cross talk are increasing. Static timing analysis (STA) uses the worst-case corner variations in delay calculation, which means that it calculates the delay assuming that all of the process variations cause each gate to take the largest delay during set-up checking. However, since the random parts of the variations of each gate cannot be worst simultaneously on the same chip statistically, STA is usually optimistic by a large margin, which causes over-estimation in delay calculation, requires excessive delay margin in design phases, and makes the timing closure difficult.
Statistical static time analysis (SSTA) handles the random parts of the process variations as probability distributions to calculate the delay statistically. It has been studied over the years [1, 2] . Several approaches to model the delay distribution have been proposed, and can be categorized in the following three classes: 1) Gaussian Approach [3, 4, 5, 6] Parameter variations are expressed either as a single normal random variable or a linear sum of normal variables. The delay distribution will be Gaussian. Statistical operations can be executed with less computational penalty, but this approach has a limitation in expressing the non-Gaussian on statistical maximum operations.
2) Non-Linear Gaussian Approach [7, 8, 9] Parameter variations are expressed as a non-linear function of normal random variables. It can express any distributions other than Gaussian. However, the statistical maximum operation contains theoretical errors because it is not closed with respect to statistical operations, and it is time-consuming compared with the Gaussian approach.
3) Piecewise Linear Approach [10, 11, 12] The distributions of parameter variations are expressed in a piecewise linear approximation. It can express any form of distributions, and provide accurate statistical operations if fine intervals are allowed. However, the computational time of the operations increases with the number of intervals.
On the other hand, in modeling the delay distribution, it is important to model die-to-die (D2D) parameter variations and within-die (WID) parameter variations separately [1, 2] . D2D parameter variations result from lot-to-lot, wafer-to-wafer, and global variations within wafer, and give the same parameter variations to each gate on a chip. WID parameter variations result from random noises during fabrication, and can be divided into correlated or systematic parts and independent parts. D2D parameter variations are required to be considered in order to predict the accurate timing yield for the mass production processor designs which use bin sorting by speed.
In this paper, we adopt the delay model of [11] for D2D and WID parameter variations as follows:
(1) Where D norm is nominal delay, D inter is delay variation due to D2D parameter variation and D intra is delay variation due to WID parameter variation. D2D and WID parameter variations are independent. We call D as total delay and distribution of D as total delay distribution. Total delay distribution can be obtained as statistical ADD (see Section 2.3) of D2D and WID parameter distributions. We adopt the piecewise linear approach to model the delay distribution of WID parameter variations, which can handle the non-normality of the delay distributions caused by statistical operations and the non-normality of gates and wires [9] .
An approximate calculation method for statistical maximum operations to consider the D2D and WID parameter variations based on piecewise linear approach was proposed in [11] . It works fine for a small number of paths, but the calculation errors become large as the number of paths increases.
To calculate the statistical maximum operations accurately, we introduce new non-Gaussian model for D2D parameter variations. The idea is as follows:
For given two paths, the statistical maximum of their WID parameter variations can be obtained with the multiplication of CDFs (Cumulative Distribution Function) of their WID parameter variations. Then, we calculate the statistical maximum of the two paths considering the WID parameter variations and the correlations of their D2D parameter variations. This method is theoretically accurate, which will be discussed in detail in Section 3.1. Finally, the D2D parameter variations of the two paths will be obtained by moment matching applied to their WID variations and their statistical maximum. To handle more than two paths, we apply this iteratively. The obtained D2D and WID parameter variations are used to calculate the statistical maximum for the next path.
The main contributions of this paper are as follows:
We provide a new method to extract the statistical maximum of D2D parameter variations of the two paths from the statistical maximum of WID parameter variations and their statistical maximum considering both D2D and WID variations. A path pruning technique is proposed to speed up the method. We demonstrate the accuracy and efficiency of the proposed methods on three typical actual designs. The rest of the paper is organized as follows. In Section II, we review the path analysis and the statistical delay calculation. In Section III, a statistical maximum operation method and a speed-up technique are proposed. Section IV shows the experimental results. Section V concludes this paper.
II. Preliminary

Check Path, Check Path Delay and Chip Delay
For given a pair of flip-flops, the whole set of the circuit elements and paths between the pair, ones on the shared clock path from the clock source and the pair is defined as a check path. Fig. 1 shows an example of a check path. S is the shared clock source of FF1 and FF2. The delay of the path from S to D of FF2 through FF1 is defined as D path , and the delay of the path from S to CLK pin of FF2 as C path .
We define a check path delay as follows: Check Path Delay = D path -C path -C.
(2) where C is the circuit dependent constant time including the setup time of FF2. We can obtain the frequency of the check path from inverse of the check path delay. We define the maximum of all check path delays in a chip as the chip delay. We can obtain the frequency of the chip from inverse of the chip delay. 
Check Path delay Distribution and Chip Delay Distribution
When the delays of elements in a chip are distributed due to variations, the check path delay will be distributed, and is called check path delay distribution. Then, the chip delay will be distributed in the same way, and is called chip delay distribution. Timing yield can be calculated from the chip delay distribution.
For given D2D and WID parameter variations to each element of a chip, one method to calculate the chip delay distribution was proposed by [11] . It uses the following two steps:
Step 1: Calculation of D2D and WID check path delay distributions For each check path in the chip, calculate the D2D and WID delay distribution of D path separately. Likewise, calculate the D2D and WID of C path . Then, calculate the check path delay distribution from equation (2) .
Step 2: Calculation of chip delay distributions Apply the statistical maximum operation over all D2D and WID of D path and C path to obtain the chip delay distribution.
In
Step 2, Monte Carlo simulation would yield most accurate results. However, it is time-consuming, because for actual chip designs more than hundred thousands of check paths should be handled at a time for each simulation event. Also piecewise linear approach we adopted requires huge amount of memory to save the whole check path delay distribution.
One way to avoid this problem is to calculate the chip delay distribution iteratively. First, choose any two check paths from the set of all the check paths, and calculate their check path delay distribution. Then, choose another check path, and calculate its check path delay distribution and the check path delay distribution obtained from the previous calculation. Continue this process until all of the check paths are processed.
Two approximations are proposed by [11] to execute Step2. They are discussed in Section 2.5.
Statistical ADD and MAX
A directed acyclic graph G=(V,E) called Timing Graph is used to express signal flows of a circuit. Each node v i represents input or output of a gate and the direction of an edge represents the direction of the signal in the circuit. Each edge has a weight, which is a random variable of gate or line delay variations. We show an example of timing graph in Fig.2 . SSTA executes statistical operations from v 0 , the source, to v 5 , the target. Each node saves the result of the statistical operations as a delay random variable from v 0 to the node. Statistical ADD will be applied to a node with a single input edge. For example, the random variable of v 2 can be calculated as statistical ADD of the random variable of v 1 and that of the edge e 1 . Let X be the random variable of v 1 , Y be the random variable of e 1 and Z be the random variable of v 2 . Then Z = X+Y. If X and Y are independent, statistical ADD can be calculated as convolution of two random variables as follows.
F z (t) = F x (t-s)f y (s)ds (3) Where F z is CDF of the delay random variable of v 2 , F x is CDF of v 1 and f y is PDF (Probability Density Function) of the random variable of e 1 .
Statistical MAX will be applied to a node with multiple inputs. For example, v 5 has two inputs, e 2 and e 3 ; we calculate statistical MAX of the delay random variable of e 2 and e 3 . Let X be the delay random variable of e 2 from the source, Y be that of e 3 from it, and Z be the delay random variable of v 5 . Then Z = max(X,Y). In this case, it is difficult to calculate CDF or PDF of Z because the delay random variables, X and Y, which have the common edge e1 in the paths to the source, are not independent. If X and Y are independent, CDF of statistical MAX, Smax(X,Y), is calculated as follows.
Smax(x,y)(t) = F x (t)F y (t) (4) Where F x is CDF of X and F y is CDF of Y. The CDF calculated by (4) gives an upper bound of Smax(X,Y) [10] , even if X and Y are not independent. 4A-2
Statistical Subtraction
As shown in equation (2), the delay random variable of C path should be subtracted from that of D path to obtain that of check path delay statistically. Statistical subtraction can be calculated in the same manner as statistical ADD. Let X and Y be the delay random variables of D path and C path , respectively. If X and Y are independent, the delay random variable Z = X-Y is calculated as follows.
F z (t) = F x (t+s)f y (s)ds (5) Where F z is CDF of Z, F x is CDF of X and f y is PDF of Y.
D2D and WID parameter Variations
Equation (1) is the delay variation model that we adopt for our proposed method. Since we also adopt the piecewise linear approach to model the delay distribution of WID parameter variations, any shape of the distribution of WID parameter variations can be expressed. We assume WID parameter variations are independent for each gate or line. In general, WID parameter variations have correlation for each gate or line. Practically, since equation (1) can contain one correlated random variable, D inter , we can simply express the WID correlation with putting the correlated parts into the D2D parameter variations.
D2D parameter variations give variations uniformly to all elements of a chip. They are modeled by one delay random variable for a chip. To cope with non-Gaussian variations and to be closed with respect to statistical operations, we model D2D parameter variations as follows.
D inter = ap*z (z>0), an*z (z<0) (6) Here "ap" and "an" are positive coefficient and negative coefficient, respectively, those are given by cell libraries and technology parameters, and z is a random variable that is the same for all gates and lines. The mean of z is zero and its standard deviation is one. D2D parameter variations express the correlated parts of the total delay variations for all gates and lines in a chip.
According to the definition of statistical MAX, for given two parameter variations, d 1 = a 1 *z d 2 = a 2 *z, it should be: max(d 1 ,d 2 )=max(a 1 ,a 2 )*z (z>0),min(a 1 ,a 2 )*z (z<0). This means that the simple variation model of D inter = a*z for all z cannot express this result of statistcal MAX any more. Generally, approximation methods, such as moment match, will be applied to obtain the resulatant parameter variation with errors. However, equation (6) is closed with respect to statistical MAX operations, which means that statistical MAX of the two parameter variations in equation (6) can be calculated rigorously in the same expression of equation (6) .
For given two D2D parameter variations, d 1 = ap 1 *z(z>0),an 1 *z(z<0) d 2 =ap 2 *z(z>0),an 2 *z(z<0), statistical ADD can be expressed as d 1 +d 2 = (ap 1 +ap 2 )*z (z>0), (an 1 +an 2 )*z (z<0) (7) and statistical MAX can be done as max(d 1 ,d 2 )=max(ap 1 ,ap 2 )*z (z>0), min(an 1 ,an 2 )*z(z<0). (8) Equation (7) and (8) show that equation (6) is closed with respect to statistical operations, ADD and MAX. This delay variation model is simple. However, practically, we could predict timing yield for microprocessor designs successfully with fitting the random variables to the delay results of ring oscillators [13] . Figure 3 shows the iterative steps of chip delay distribution calculation as discussed in Section 2.2:
Previous Methods
It is difficult to calculate statistical MAX distribution (Smax) accurately considering D2D and WID parameter variations. In case of our D2D model, CDF of statistical MAX of two check path delays F1 and F2 (Smax (F1,F2) ) is calculated as follows.
Where p 1 (p 2 ) is WID PDF of F 1 (F 2 ), p is standard Gaussian PDF, f 1 (f 2 ) is D2D model function of F 1 (F 2 ) as (6) . It is hard to calculate this integral by numerical methods or Monte Carlo simulations.
In [11] , following approximate methods of statistical MAX calculation are proposed. We can apply these two methods to iterative chip delay distribution calculation. But these methods become inaccurate when the number of check paths becomes larger.
The previous method 1 obtains D2D and WID parameter distributions independently. When the number of check paths becomes larger, this method chooses the smallest negative coefficient (an) of D2D parameter variations from all check paths according to equation (8) . However, a check path that has smaller path delay distributions, a small D2D coefficient, tends to have a smaller path delay, which, generally speaking, seldom affects chip delay distributions. Therefore, the negative D2D coefficient of chip delay will be estimated to be pessimistic.
A problem of the previous method 2 is to handle D2D parameter variations, which are correlated, as independent random variables, which causes pessimistic estimation as the number of check paths becomes larger.
III. Proposed Method
In this section we propose a new iterative chip delay distribution calculation method considering D2D and WID variations accurately. In Section 3.1, we propose a new 
4A-2 statistical MAX operation for D2D and WID parameter variations, which will be used in our statistical MAX calculation iteratively. In Section 3.2, we propose a path pruning technique to speed up the statistical MAX operation. A check path which has the delay small enough to be ignored in terms of statistical MAX will be skipped in the operation, because it does not affect chip delay distribution. We introduce the idea of dominance to decide which check path should be pruned.
Statistical MAX operation
Fig . 5 shows the basic idea of our proposed statistical MAX operation. For given two paths, the statistical maximum of WID parameter variations and their statistical maximum assuming both D2D and WID parameter variations are calculated. Then, moment matching is used to extract the D2D parameter variations of the two paths from the WID and the statistical maximum. Following are the detailed steps of our proposed statistical MAX operation. Step1. Apply operation (4) ( )
where symbols in right hand sides are same as integration (9) . Monte Carlo simulation or numerical integration, which requires less computational time than integration (9), can be applied to compute the above integrations, (10) and (11) .
Details of Step3 are explained in Appendix 1. Although the proposed method is an approximate calculation, it can obtain more accurate chip delay distributions than previous methods, because not only WID but also D2D parameter variations can be calculated with considering the correlation of D2D parameter variations statistically in Step2.
Path Pruning technique with dominance
In Fig. 3 , we define dominance of distribution F i to chip delay distribution D as follows. (Fig.6) . If dominance of Fi is small enough that we may regard distributions Smax(F 1 ,…,F i-1 ,F i ) and Smax(F 1 ,…,F i-1 ) are the same, F i can be skipped during Step 3 in Fig. 3 .
Dominance of F i can be estimated from statistics of distributions Smax(F 1 ,…,F i-1 ) and F n . We set mc, c : mean and standard deviation of WID distribution of F i , apc, anc : D2D coefficients of F i , md, d : mean and standard deviation of WID distribution of Smax(F 1 ,…,F i-1 ), apd, and : D2D coefficients of Smax (F 1 ,…,F i-1 ) , and assume that WID distribution is Gaussian. Then we get the following estimation.
where h=max(|apc-apd|,|and-anc|), k is any positive constant, is standard Gaussian CDF and A is constant that is derived from the 4 th moment and mean of the check path delay distribution (see Appendix2). In practice, 4 th moment is obtained from the worst check path delay distribution, and mean from the best check path delay distribution in order to save the computational time of A and to make A pessimistic. We also mention that dominance can be computed analytically from tightness probability [5] if D2D distributions are normal (ap=an).
Our proposed estimation is not mathematically rigorous, because we assume that the check path delay is positive. We believe, however, that this assumption is adequate for the actual circuit data. The details are discussed in Appendix 2.
To prune check paths, a threshold, TH, of dominance is required. Since TH has an impact on the accuracy and CPU time, it should be determined according to the chip designs or applications. TH can be controlled by constant k, and once constant k is fixed, inequality (13) can be used for pruning. We note that the CPU time and the accuracy are trade-off relation. The smaller k is, the larger TH and the number of pruned paths become. Then, the CPU time become smaller and the accuracy become lower.
To be precise, dominance estimation (13) assumes normality of WID distributions. We, however, apply this estimation to non-Gaussian WID as criteria of path pruning heuristics. 
IV. Experimental Results
We applied the proposed chip delay calculation to three typical actual chip designs. We selected these chips from a set of chip designs, 130nm to 65nm, based on the path density around critical paths. The number of check paths on each chip is shown in Table 1 . The check path sets are selected based on nominal delay criteria. We calculate chip delay distributions of these chips with different methods: our proposed method, Monte Carlo simulation method, and previous methods. Monte Carlo results will be used as the reference. The number of intervals for piecewise linear expression for WID distribution is 100. We applied Monte Carlo with 10000 iterations to calculate mean and standard deviation of statistical MAX distribution (See Section 3.1 Step2). The number of iterations for Monte Carlo simulation method was also 10000.
In our experiments, we set k=5 in (13) to obtain the following dominance estimation: It can be seen from these figures that the CDF from the previous method 2 has large errors because of the reason discussed in Section 2.6. The CDF of the previous method 1 has errors at low probability region (less than 0.5), because the negative coefficient (an) of D2D variation of chip delay tends to be estimated smaller as discussed in Section 2.6. Therefore CDF tends to be sharper at low probability region. On the other hand, the proposed method is accurate even in low probability region, because D2D variations of chip delay are calculated accurately with statistical MAX operation proposed in Section 3.1. The proposed method is much faster than Monte Carlo because of the pruning technique. Table 3 shows the number of check path selected (i.e., not pruned away) for statistical MAX operation. The number of selected check paths also shows the characteristics of the path density around critical paths. V. Summary and Conclusions In this paper, we proposed a new iterative method of chip delay distribution calculation considering both D2D and WID parameter variations. The proposed method includes a new statistical MAX operation and a path pruning technique to save the CPU time. We applied the proposed method on three actual chip designs. We showed that the proposed method is more accurate than the previous methods and is faster than Monte Carlo.
Appendix 1. D2D Coefficient of Statistical MAX
We explain calculation of D2D coefficient of statistical MAX of two check path delay distributions in Step 3 of Section 3.1. We set • mt, t: mean and standard deviation of statistical MAX, • mc, c: mean and standard deviation of WID distribution of statistical MAX.
• ap, an : D2D coefficients of statistical MAX Then, mean (m) and standard deviation (sd) of D2D distribution of statistical MAX can be calculated as:
The total delay distribution is given by the convolution of D2D delay distribution and WID delay distribution. Then the following equations are derived by moment matching, and give the coefficients ap and an In case1-1, we choose the coefficient pair which has the same sign as the D2D coefficients of check path delay distribution. In case1-2, we choose the coefficient pair which has larger covariance with D2D distribution of check path delay. The covariance of two D2D distributions whose coefficients are (ap 1 ,an 1 ) and (ap 2 ,an 2 ) is given by cov = (ap 1 *ap 2 +an 1 *an 2 )/2 + (ap 1 -an 1 )*(ap 2 -an 2 )/2 .
In Case2, we choose an and ap by least squares method such that S(ap,an) = (ap-an-A) 2 + (ap 2 +an 2 -B) 2 could be minimum. Stationary condition of S(ap,an) is as follows.
ap + an = 0, 4*ap 3 + 2*(1-B)*ap -A = 0 These equations have a real solution such that ap>0 and an<0.
Appendix 2. Estimation of Dominance
We derive the estimation of dominance (16) 
Where p is standard Gaussian PDF. Let integral domain into x+f 1 (z)>=y+f 2 (z) and x+f 1 (z)<y+f 2 (z) divided, then we give ( ) ( )
Each integrand of right hand side can be regarded as positive because integrant means delay of a circuit. Therefore, we give following inequality.
( )
And by Hoelder inequality ( [14] , p15), we give following inequality.
Then, we can estimate dominance as follows. 
Dominance
The integral in second bracket on right hand side means the probability that check path delay of F i is larger than delay of Smax(F 1 ,..,F i-1 ). Next, we estimate the integral. 
Integrate the right hand side with respect to x with dividing the integral domain into sub-domains that are more or less than the -k-sigma point of p1. Then, we get the following estimate. The first term represents the probability that y + (ap 2 -ap 1 )*z > m 1 -k* 1 . The second term represents the probability that y + (an 2 -an 1 )*z > m 1 -k* 1 . By assumption, random variable y+(ap 2 -ap 1 )*z and y+(an 2 ) where h=max(|ap 1 -ap 2 |,|an 1 -an 2 |), then the first and second term are less than (-k) which is the probability that normal random variable is more than the k-sigma point. We give the following estimation. 4A-2
