Piocess variatiuns have become a critical issue in performafrce verification of high-performance designs: We present a new, stofistical timing analysis method that accounts for inter-and intra-die process variations and their spatial correlations. Since staristical timing analysis has an exponential run time complexiw, wepropose a method whereby a sratistica1,bound on theprobability distribution function of the exact circuit delay is computed with linear run time. First, we develop o model.for representing inter-and intra-die variations and their spatial correlations. Using this model, we then show how gate delays and arrival times can be represented as a sum of components, such that the correlation information berween arrival times and gate delays is preserved. We then show how arrival rimes are propagated and merged in the circuit to obtain an arvival time distribution that is an upper bound on the distribution of the exact circuit delay. Weprove the correctness of the bound and also show how the houndcan be improved hypropagating multiple arrival times. The proposed algorithms were implemented and tested on a set of benchmark circuits under several process variation scenorioios. The results were compared wifh Monte Carlo simulation and show an accuracy of 3.32% on average over all test CaSeS.
Introduction
Static timing analysis has become an indispensable part of performance verification. Static timing analysis has the advantage that it does not require input vectors and has a run time that is linear with the size of the circuit. A number of methods have been proposed to increase the accuracy of static timing analysis through improved delay models and analysis techniques. In recent technologies, the variability of circuit delay due to process variations has become a significant concern. As process geometries continue to shrink, the ability to control critical device parameters is becoming increasingly difficult, and significant variations in device length, doping concentrations, and oxide thicknesses have resulted.
Traditionally, process variations have been modeled in static timing analysis (STA) using so-called case analysis. In this methodology, best-case, nominal and worst-case SPICE parameters sets are constructed and the timing analysis is performed several times, each time using one case file. Each execution of static timing analysis is therefore deterministic, meaning that the analysis uses deterministic delays for the gates and any statistical variation in the underlying silicon is hidden, While this approach has been successfully used in the past to model die-to-die variations, it is not able to accurately model variations within a single die. With the continual scaling of feature sizes, the ability to control critical device parameters on a single die has become increasingly difficult. Using a worst-case analysis for these so-called intra-die variations therefore leads to very pessimistic analysis results since it assumes that all devices on a die have worst-case characteristics, ignoring their inherent statistical variation. The emerging dominance of intra-die variations therefore poses a major obstacle for deterministic STA, giving rise to the need for statistical timing analysis approaches.
Permission to make digital or hard copies of all or part of this work for personal or cIas5room use is granted wilhoul fee provided that copies are not made or distributed for profit oc commercial advantage and that copies hear this notice and the full cililion on the first page. To copy otherwise, to republish, l o post on servers or to redistribute to lists, requires prior specific permission and/or a Tee.
In general, process variations can be divided into inter-die variations and intra-die variations. Inter-die variations are variations that occur from one die to the next, meaning that the same device on a chip has different features among different die of a wafer, from wafer to wafer, and from wafer lot to wafer lot. Intra-die variations are vanations in device features that are present within a single chip, meaning that a device feature varies between different locations on the same die. Intra-die variation result from equipnient limitations or statistical effects in the fabrication process, such as statistical variations in the doping concentrations.
Intra-die variations often exhibit spatial correlations, where devices that are close to each other have a higher probability of being alike than devices that are placed far apart. This has been reported especially for gate length variations [I] . Intra-die variations can also have a deterministic component due to topologically dependencies of device processing, such as CMP effects and optical proximity effects [Z] . In some cases, such topological dependencies can be directly accounted for in the analysis [3][4], whereas in other cases, such variations are treated as random.
Statistical timing analysis is similar to deterministic timing analysis in that arrival times are propagated through the circuit from primary inputs to primary output. In statistical timing analysis, however, the gate delays and arrival times are represented with random variables. The difticulty of statistical timing analysis results from the correlations that arise among the arrival times in the circuit and between the arrival times and gate delays. These correlations must be taken into account when arrival times are propagated in the circuit, leading to an exponential run time complexity and making statistical timing analysis a challenging problem.
A number of statistical timing analysis approaches have been proposed in recent years [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] . In [13] thecorrespondence between deterministic timing analysis and statistical timing analysis was first shown. However, the proposed method does not address the correlation between the arrival times. In [14], a novel mcthod using discretized probability distributions is proposed. However, the run time of the method is exponential and the proposed approaches to reduce the run time have an uncleni impact on the accuracy. In [ 151, a novel method using statistical bounds is proposed with gate delays restricted tu Gaussian distributions. However, to obtain a high quality bound, it is necessary to enumerate all paths in the circuit, leading to exponential run time complexity. In [16], a path based statistical delay computation is presented using an accurate delay model. However, the analysis is performed one path at a time and the number of critical and near-cntical paths in a circuit can be very large. In [17], a new circuit optimization method was therefore proposed that reduces the number of near critical paths in a circuit, thereby improving the statistical delay ofthe circuit. Finally, in [IS] a method using statistical hounds is presented that addresses the arrival time correlations due to path reconvergence. However, the method does not address arrival time correlations due to spatial correlations between the gate delays.
In this paper, we therefore propose a new statistical timing analysis approach to model the impact of process variations on circuit delay. We model both inter-and intra-die process variations and account for spatial correlations of the gate delays. In our analysis, we focus on gate length variability since it has been shown to have a dominant impact on gate delay [I] . However, our analysis can be easily extended to other process variations as well. We fust present a model for inter-and intra-die gate length variation and their spatial correlations. Gate delays and arrival times are represented as a sum of random variables, and presewe the spatial correlation infonnatioo.
The correlation between the arrival times complicates the computation of the maximum arrivals times, as required during arrival time propagation. Since the exact computation of the maximum arrival time requires exponential run time, we propose a method that produces an upper bound on the exact anival time in linear run time.
We prove the correctness of the proposed bound in the presence of spatially correlated gate delays. The obtained bound is itself a random variable with a probability distribution function, allowing for the computation of useful statistical quantities such as confidence points. In order to improve the proposed bound, we propose a method whereby multiple arrival times are propagated in the circuit at the expense of additional mn time. We implemented the proposed methods and tested them on benchmark circuits. We demonstrate that using the proposed methods, the statistical delay of a circuit can be computed with high accuracy.
The remainder ofthis paper is organized as follows. In Section 2, we present our model of process variations and our modeling assumptions. In Section 3 we present our approach for statistical timing analysis. In Section 4, we present the heuristic method for improving the quality of the bound by propagating multiple arrival times. In Section 5. we present our results and in Section 6 we draw our conclusions.
Process Variation Model
In this section, we present our model for process variations. We consider two basic types of process variations in our analysis: interdie variations and intra-die variations. Intra-die variation can he further divided into random variations, and spatially correlated variations. Random intra-die variations have no dependence on the location of the devices, while intra-die variations that are spatially correlated produce an increased likelihood of similar gate lengths for devices that are closely spaced versus those that are placed further apart. We first discuss our model for inter-and intra-die variations which is based on the model in [I91 and then discuss how this model is extended to account for spatial correlations.
We propose the following model, where the device length L,,,,",, of device k is the algebraic sum ofthe nominal gate length, the interdie device length variation A L d e r and intra-die device length variation, ALkt,a,k:
where A Ljnar and ALinrm,k are random variables. Lnom represents the mean of the gate length across all possible die. All devices on a die share one variable ALinur for the inter-die component of their total device length variation, which represents a variation of the chip mean of the gates of a particular die. A Lin,,o,k represents the variation of an individual gate from this chip mean. For the moment, we ignore the spatial correlation of intra-die variations, and hence each device is represented with a separate independent random variable ALinrm,k, where all random variables ALintro,k have identical probability distributions. For the purpose of our discussion, we assume that both random variables A Li,,,e, and A Linr,u,k L m d , k = Lnom + A L j n t e r + ALinrro,k 3 have a truncated normal distribution. This reflects the fact that the gate length in an operational chip cannot be less than some finite minimum value or more than some finite maximum value. However, any suitable distribution can be used, and our proposed approach is not restricted to normal distributions. After defining a model for the gate length variation, the delay dk of gate k is now defined as follows:
(EQ 2)
Since function Dk is in general a non-linear function, finding the distribution of dk can be difficult. However, we take advantage of the fact that the gate length variations A Lie,,, and A L;n,ra,k are typically small, with typical 3-sigma values of less than 15% of LnOm Hence, we make the simplifying assumption that, for small variations, the change in gate delay is linear with the change in gate length. Hence, we can write EQ2 as follows:
where ADp(Linnr) and ADk(L;,,,,a,k) are the change of gate delay due to inter-and intra-die gate length variation. For convenience, we define ADk() as follows:
where the sensitivity of the delay with respect to device length aDk/aL is computed at the nominal device length. We can now express the delay o f a gate with the following simple expression:
where a= aDk/JL. Note that instead of using EQ4 any linear fitting function could be used as well. Although EQS uses a simple linear approximation, such an approximation was found to give very good accuracy for current process variabilities 
Spatial Correlation Model
In EQI, the intra-die variation of gate delay is modeled by assigning an independent random variable for each gate. However, in the presence of spatial correlation, these random variables become dependent which greatly complicates the analysis. We therefore propose the following method for modeling spatial correlation ofintra-die process variation.
We first divide the area of the die into regions using a multi-level quad-tree partitioning, as shown in Figure 1 . For each level I, the die area is partitioned into Z1-by-2' squares, where the first or top level 0 has a single region for the entire die and the last or bottom level m has 4"' regions. We then associate an independent random variable ALl,, with each region ( I , r ) to represent a component of the total intra-die device length variation. The variation of a gate k is then composed as the sum of intra-die device length Components ALL,, where level 1 ranges from 0 to m and the region r at any particular level is the region that intersects with the position of gate k. For the gate in region 2,1 in Figure I , the components of intra-die device length variation are therefore A&,l, ALI,, and A h , , . The intra-die device length of gate k is now defined as the sum of all random variables AL,,, associated with a gate:
(EQ 6) 0 < I < m. r intersects k also that length A h , , associated with the region at the top level of the hierarchy is equivalent to the inter-die device length ALinler since it is shared by all gates on the die.
We can control how quickly the spatial correlation diminishes as the separation between two gates increases by controlling the allocation of total intra-die device length variation among the different levels. If the total intra-die variance is largely allocated to the hottom levels, and the regions at top levels have only a small variance, there is less sharing of device length variation between gates that are far apart and the spatial correlation will diminish quickly. On the other hand, if the total intra-die variance is predominantly allocated to the regions at the top levels of the hierarchy, then even gates that are widely spaced apart will still have significant correlation and spatial correlation will diminish more slowly as spacing increases. The proposed model is therefore flexible and can be easily fit to measured device length data.
Based on the above model for intra-die spatial correlation, we can combine EQ5 and EQ6 to obtain the following expression of the delay a gate:
ALi,r+ ALmndom,k 1 0 < i<m. r i n t e r s l s k
[
Note that all random variables in EQlO are independent random variables. This has the advantage that spatial correlations can be processed using only independent random variables, which simplifies the analysis. Note also that some of the random variables in EQlO will occur in the expressions of multiple gate delays.
Finally, to simplify the notation, we rewrite EQlO using a more general form as follows:
Where Li and A D,dom,k are random variables and ai are constants. A D,a,dom,~ is the random delay due to uncorrelated intra-die gate length variation. The variables Li correspond to one of the random variables in the proposed model, such as ALinicr and ALi,, The sum is taken over all random variables present in the model and ai = a f o r the random variable A Linter and for the random variables ALl,r associated with the gate, based on its position in the die. For all other i, ai = 0. Note that EQll is simply a more general and convenient form of EQ10, where the delay of a gate is expressed in terms of all random variables in the model, instead of just those associated with that particular gate. Using EQ11, the delay of a gate is expressed as a sum of independent random variables, some of which may be shared in the delay expression of one or more gates,
In the following Section, we show how to perform timing analysis based on the proposed model for process "xiation.
Statistical Timing Analysis Method
Static timing analysis is performed by propagating arrival times from the primary inputs to the primary outputs using repeated application of two operations:
1. Propagation. Arrival times are propagated from the input of a gate to the output of that gate. In the process. the delay of the gate is added lo the arrival time. where ALi,, are the random variables associated with the quad-tree and AL,,dom,kis an independent random variable, assigned to each gate to model uncorrelated delay variation.
It must be ensured that the sum of all random variables ALl,, associated with a gate always adds up to the total intra-die gate length variation. This can be accomplished by assigning all random variables associated with a particular level the same probability distribution and by dividing the total intra-die variability among the different levels.
Using the described model, gates that lie within close proximity of each other will have many common intra-die device length components resulting in a strong intra-die length correlation. Gates that lie far apart on a die share few common components and therefore have weak correlation. For the three gates shown in Figure 1 in regions (ZI), (2,4) and (2.15) the intra-die device length variation is expressed as follows:
(EQ 8)
We can observe from the above equations that gates I and 2 are strongly correlated, as they share the common variables ALI,] and
A~, J .
On the other hand, gates 1 and 3 are more weakly correlated as they share only the common variable
Note that the devices that are closely spaced, but fall in different squares, will have less correlation than those that are equally spaced, but fall within the same square. However, this issue can be addressed by using an additional quad-tree which is offset by half the size of the smallest square. Figure 1 shows an example of a die with 3 levels of partitioning resulting in 16 regions at the bottom level. Since the number of regions at the bottom level grows as 4'" it is possible to obtain a fine partitioning of the die with only a moderate number of levels. Note Statistical timing analysis can he performed in the same manner using propagation and merging, except that both the gale delays and the arrival times are now random variables. In this case, the arrival time is specified either with a cumulative distribution function (Con or probability density function (PDF). To simplify the implementation of statistical STA it is often more convenient to approximate continuous PDFs and CDFs with discrete functions. For computational efficiency, we use discrete PDFs and CDFs in the implementation of our proposed method. However, for generality, we will formulate the statistical liming analysis task using continuous functions.
The difficulty in statistical liming analysis arises from the correlations between the random variables, which arise from one of two sources. First, reconvergence of circuit paths results in amval times that are dependent, since they share a common portion of their path delay. However, in [le] i t was shown that ignoring the correlation resulting from reconvergent fanout produces an upper bound on the statistical delay and results in a conservative analysis.
The second source of dependence results from spatial correlations between gate delays. It is clear that if the delay of two gates is correlated, the arrival times at their outputs will be correlated as well thereby complicating the merging operation of these two arrival times. Furthermore, spatial correlation also results in dependence between an arrival time and the gate delays themselves. This complicates the propagation opration where the delay of a gate is added to the arrival time at its input node.
It is easy to show that, unlike correlations resulting from reconvcrgent paths in the circuit, ignoring spatial correlations may not result in an upper hound on the statistical delay. This i s intuitively obvious from the fact that spatial correlation makes the intra-die variability more similar to that of inter-die variability, which increases the delay of circuit paths. The correlation between the arrival times and between arrival times and the gate delays must therefore be accounted for during the propagalion and merging operation Note that i f we express the delay of a gate using a single random variable. by convolving its independent components in EQ11, it will he very difficult to recover the correlation information between this gate delay and another. In the proposed approach, we therefore maintain the representation of the delay of a gate using its sum of components, as shown in the right hand side of EQI I . Similarly, we need to preserve the correlation information of arrival times. Hence, we also represent the arrival times in the liming analysis using a sum of components. Similar to that of the gate delay in EQII, an arrival time a is therefore expressed as follows:
where A,,,,, is the arrival time at nominal process conditions. Li are the random variables of gate length, p, are constant coefficients and AA,,,jom is the uncorrelated component of arrival time variation. We will show that by expressing the arrival times in the same form as that of gate delay, their correlations can be determined and correctly addressed.
Using the proposed representations for gate delays and anival times, we now perform arrival time propagation and merging, such that the form of the zrrival times is maintained. Below, we will Li. This is computationally complex and also destroys the required form of asrival time. We therefore propose an altemate method for merging two arrival times, and prove that this method results in an arrival time whose CDF is an upper bound on the CDF of exact arrival times, while preserving the form of the anival time expression. The method is simple and has linear run time with the number of random variables L; Using this approach, it is therefore possible to perform statistical Liming analysis with linear run time in terms of circuit size, while guaranteeing a conservative analysis.
Below, we first define a statistical hound on the CDF of a random variable. We then discuss the methods for arrival time propagation and merging. Finally, in Section 4, we present a method whereby multiple arrival times can he propagated, improving the obtained bound at the cost of additional run time.
Statistical bounds variable as follows:
We define an upper bound on the CDF of an arrival time random 
Figure 2 shows two arrival time CDFs P(I) and Q(Q, where Q ( I ) is an upper hound on P(I). Note that the upper bound Q(I)
is itself a valid CDF and that all confidence points are bounded by Q(I) on P (I) . By using CDF Q(r) instead of P(I), we will overestimate the delay corresponding to a performance yield, resulting in a conservative analysis for late arrival times, as shown in Figure 2 . Similarly, for a particular required delay, the probability that a die will meet this delay constraint will be underestimated.
We now introduce following useful lemma for arrival time CDFs: Lemma A. If two random variables a and x have arbitrary CDFs P(n) and Q(x) and for any value of a random variable xis such that a i x then, the probability distribution of x is a statistical upper hound on the probability distribution of a.
Proof:
Consider an arbitrary fixed value of I . We then separate cases
x < f and x > I and using the fact that according to the assumption a < x , we can write:
P ( 0 5 f ) = P ( r < r , a < l ) + P ( x > l , o < l ) (EQ 13) = P ( x i I ) + P ( x > 1, a 5 1 )
From which it follows that P J r ) = P ( x < r ) + P ( x > r , u 5 r )
(EQ 14)
= P x ( r ) + P ( x > r , a 5 i ) > P , ( r )
0
It should be noted that in Lemma A random variables a and x need not be statistically independent. We now show how amval times can be computed using propagation and merging. While the propagation operation is exact, the merging operation results in an upper hound on the CDF of the exact arrival time
Arrival time propagation
During the propagation operation, the delay of a gate is added to an amval time. We perform this operation using the following procedure:
Procedure 1: from which follows EQl5. Note that the computation of a2 using EQl5 is exact and therefore correctly accounts for the spatial correlation of the arrival time U , and the gate delay d. Also, propagation using EQl5 is efficient as a simple summation of the coefficients of U ] and d is performed. Since random variables AD,,,,,) ,,, and AAranduml are independent, computation of A Arundom,2 is performed by simple numerical convolution.
Maximum operation
As mentioned earlier, computing an exact maximum of two arrival times aI and a2 where each is expressed as a sum of c o m pnents, requires enumeration of the random variables Li. which is expensive. Also, the resulting arrival time would not he in the required form and spatial information would not he available for futhcr propagation and merging operations. We therefore propose a merging operation, which is efficient, and which generates an arrival time whose CDF is an upper hound on the exact arrival time. The proposed procedure is based on the following theorems. ative values of Li . Also. since Mrondom,l and Mrondom,z are correlated only through path reconvergence, ignoring their correlation during their maximum computation will result in an upper hound [IS] , and hence the maximum of AA,a,dom,l and AArandom,2 can be efficiently computed numerically.
904

Multiple Arrival Time Propagation
While the maximum operation in Procedure 2 has the desired features that it is conservative and preserves the required form of arrival times, it nevertheless introduces error in the analysis. The degree to which ermr is introduced by Procedure 2 is dependent on the relative magnitude of the different components of al and a t If, for instance, pi, > 0, for all i, and also A,,om,l > A, , 2 and the minimum value of M,,,,,,, with non-zero probability is greater than the maximum value of M,dm,z with non-zero probability (i.e. M,adom,l > M,dom,Z for all possible values). it is easy to show that the arrival time computed by Procedure 2 is exact. However, if some terms of arrival time al are greater than a2 and some terms of amval time a2 are greater than a / , it is clear that a (conservative) error is introduced in the analysis.
To improve the analysis, we therefore extend the proposed appraoch by propagating multiple arrival times. In this case, only those arrival times are merged that result in a small error while those arrival times whose merger would result in a high error are propagated separately. If the correct arrival times are selected, it is clear that the analysis accuracy will improve. Given a set I. of K arrival times incident at a node, we must select a subset m of M arrival times to propagate, while all other arrival times are merged with other arrival times. It is clear that the optimal set of arrival times to propagate depends on many factors, including the arrival times that will combine with the set m later in the circuit. Determining the optimal set is an intractable problem. We therefore propose the following heuristic to select the set of amval times m given a set of incident arrival times k.
First, we compute for each pair of arrival times, mi and mj the maximum arrival time aij using Procedure 2. Then, we determine the mean of each arrival time a i j which is computed by summing the means of each component. Finally, we select the amval time nij with the minimum mean and replace the original two arrival times mi and mj with in the set m. This procedure reduces the size of the set m by one arrival time. The procedure is then repeated until the number of arrival times in m is reduced to a set of K anival times, that can he propagated.
The above selective merging procedure effectively merges those arrival times incident on a node that result in an "early" arrival time that will have less impact on the overall delay of the circuit. These arrival times are therefore good candidates for merging, while arrival t i e s whose merger would result in a late arrival time are propagated. The selective merging procedure is repeated at each node.
Finally, at the output node of the circuit, the set of K propagated arrival times mnst be merged to obtain the final amval time of the circuit as a whole. Since the arrival t i e s K do not need to he propagated further in the circuit, their p~i c u l a r form, in terms of a sum of independent random components, need not be preserved. Hence, we can convolve the components Am,{, (a,, i . Lj) , and M,,dom,i into a single random variable before taking their maximum. This has the advantage that the ermr introduced by Procedure 2 is not incurred in the final merger of the arrival times at the output node. However, the arrival times are correlated, and to compute their exact maximum would require high computational complexity. We wiIl therefore show that, due to the particular form of the arrival times, their correlation can be ignored and the computed maximum will bound the exact maximum. Hence, the maximum of the convolved amval times can be efficiently computed using simple numerical techniques.
In 1181, it was shown that the CDF of mar(xl+y, x2+i). wherexl. xa y. and z are independent random variables, is an upper bound on the CDF of max(x+y, x+z), when x1 and x2 have an identical probability distribution as x . However, the form of our particular problem is more general in that we require the computation of mar(x+y, ar+z), where x, y. and z are independent random variables and a is a positive constant. We will now show that, similar to the previous case, the CDF of max(xl+y, oq+z) is an upper bound on the CDF of max(x+y, ar+z). This means that ignoring the correlation between the two arrival times (x+y) and (ar+z) during the maximum operation will result in an upper bound of the CDF of the exact maximum. We prove the correctness of this bound with the following theorem.
Theorem 1: Let x, x,. x2, y. and z be positive, independent random variables with probability density functions p ( 4 , p(xl), p(x2) q(yJ. r(z), noting that xI and x2 have the same probability density functions as random variable x. For any positive constant value a>O the CDF of random variable mar(x+y, ar+z! is upper bounded by the CDF of random variable max(x,+y, q + z ) .
Proof: The CDF of random variable mnx(x+y, ar+z) is: We now rewrite P(tJ as follows, by rearranging the terms:
We now convefl this integral over the 4 dimensional volume into an iterated integral we obtain the following expression for P ( 0 : 
Thus for any y and L R(y, z ) 6 S(y, 2)) , from which using EQ21 and EQ26 we obtain Q ( f ) 2 P ( f ) . Therefore, according to Definition I , CDF Q(t) of random variable -(x,+y,a2+z) is an upper bound of the CDF P(f) of random variable mm(x+y, ar+z). 0
~35.W c51lS
C~U
Results
The statistical hound computation, as well as the proposed refinement method were implemented and tested on the synthesized version of ISCAS85 [20] benchmark circuits. Delay sensitivities were calculated for the standard cell library which used a 180 nm nominal device length. We used 3 levels of intra-die variation to model spatial correlation, as shown in Figure 3 . Accordingly, each gate k was randomly allocated a location on a 4x4 grid, which determined the random variables'associated with that gate along the hierarchy. Process variability information was used for different scenarios having a total standard deviation of 10%,14% and 15% from L",,,. The computed hounds were compared with Monte Carlo simulation and worst case analysis. Monte Carlo simulation was performed for 10,OW samples. The worst case analysis assumes the total variation to be inter-die variation and computes the 99% confidence point for the circuit delay CDF by setting ALinrer at its 99% point. For each gate length random variable, a Gaussian delay distribution uuncated at the 3 sigma point, was used. Table 1 shows the results for the hound computation and refinement using multiple arrival time propagation. A total standard deviation of 14% was divided among inter-die (5.7%). intra-die with spatial correlation (8.06%) and random intra-die variation (10%). For each circuit, the total number of nodesledges (column 2) is shown. The 99% confidence points for worst case analysis (column 3), for single and multiple anival times (column 4 & 5 ) and for Monte Carlo (column 6 ) is shown. The % error between the Monte Carlo results and our approach (column 7) was 2.98% on an average. Although we only report the 99% points in Table I , the computed hounds are CDFs and allow the computation of other confidence points. Column 8 shows the runtime of our algorithm for 100 arrival times. For most circuits, the run time is very small with the maximum being 300 seconds. Table 2 shows comparisons between 99% confidence points obtained by our algorithm using LOO arrival times and Monte Carlo simulation for two different variation scenarios. In (Column 2, 3 & 4) a total standard deviation of 10% was equally divided among inter-die. intra-die and random variations. The average error for all the circuits was 2.35%. The runtimes were small, not exceeding 300 seconds. In (Column 5,6 & 7) a total standard deviation of 15% was again equally divided among the three components. Average error was 4.63% for all circuits and maximum runtime was 280 seconds. Figure 3 shows the CDFs for the proposed upper hounds with 
Conclusions
In this paper, we have proposed a new statistical timing analysis algorithm. The method has a linear run time and computes an upper bound on the distribution of the exact circuit delay. We first, proposed a model for inter-and intra-die process variations that accounts for spatial correlations. We then presented an efficient method for propagating arrival times in the circuit, which is linear in run time, and computes an upper bound on the distribution function of the exact circuit delay. We proved the correctness of the bound and showed how the bound is improved by propagating multiple arrival times at each node, using a heuristic method for selecting propagated arrival times. We tested the proposed methods on a number of synthesized benchmark circuits and demonstrated the accuracy and efficiency of the approach.
