We presenr an efficienr staristical riming analysis algorithm thnr predicts the probability distribution of the circuit delay while incorporating the effects of spatial correlations of intra-die parameter variations, using a merhod based on principal component analysis. Themethoduses a PERT-like cirruitgraph traversal, andhas a run-time that is linear in the number of gates and interconnects, as wellas the number of grid panitions used to model spatial correlations. On average, the mean and standaid deviation values computed by our method have errors of 0.2% and 0.9%, respectively, in comparison with a Monte Carlo simulation.
INTRODUCTION Device and interconnect parameters in new technology generations
show a significant amount of variability due to process variations. and the prediction of circuit performance is becoming a challenging task. Conventional static timing analysis (STA) handles variability by analyzing a circuit at multiple process comers, and it is generally accepted that such an approach is inadequate. An alternative approach that overcomes these problems is statistical STA, which treats delays not as fixed numbers, but as probability density functions (PDFs), tnking the statistical distribution of parametric vanations into consideration while analyzing the circuit.
Process variations can be classified as follows: inrer-die variations are the variations from die to die, while intra-die variations correspond to variability within a single chp. Inter-die variations affect all the devices on the same chip similarly, while intra-die variations affect different devices di,fferently on the same chip. It used to he the case that the inter-die variations dominated intradie variations, so that the latter could he safely neglected. However, in modem technologies, intra-die variations are rapidly and steadily growing and their effects significantly affect the vanability of performance parameters on a chip. A number of publications on statistical timing analysis have focused on circuit performance prediction considering intra-die variation [ I h l l . However, most prior work has ignored intra-chip spatial correlations hy simply assuming zero correlations among devices on the chip. The difficulty in considering spatial correlations between parameters is that it always results in complicated path correlation structures that are hard to deal with. The authors of (51 consider correlation between delays among the transistors inside a single gate, but not correlations between gates. The work in 161 uses a Monte Carlo samplinghared framework to analyze circuit timing on a set of selected sensitizahle true paths. Another method in [7] computes path correlations on the basis of pair-wise gate delay covariances and used analytic method to derive lower and up er hounds of circuit delay.
The statistical timing analyzer in [SI 18,s into account capacitive coupling and intra-die process variation to estimate the worst case delay of critical path. The a proach in [9] proposes a model for spatial correlation and a megod of statistical timing analysis to compute the delay distribution of a specific critical path. However, the PDF for a critical path may not be a good predictor of the distribution of the circuit delay (which is the maximum of all path 62 1 delays), as explained in Section 2. Moreover, any strictly pathbased method will eventually be faced with an explosion in the number of critical paths.
We propose an algorithm for statistical STA that computes the distribution of circuit delay while considering correlations due to path reconvergence as well as spatial correlations. We model the circuit delay as a correlated multivariate normal dismbution, considering both gate and wire delay variations. The complexity of the algorithm is O(n x (N9 + N I ) ) , which is linear in the number of gates N and interconnects N I , and also linear in the number of grids n %at are used to model the variational regions. In other words, the cost is, at worst, n times the cost of a deterministic STA.
PROBLEM FORMULATION
Under process variations, parameter values such as the transistor gate length, width, the metal line width and the metal line height are random variables. Some of these variations are deterministic, while others are random: this work will focus on the effects of random variations, and will model these parameters as random variables. The gate and interconnect delays, as functions of these a r m t e r s , also become random variables. The task of statistical TA is to find the PDF of the circuit delay under these variations.
Since for interconnect, at each metal layer, we consider the following parameters: metal width W w , , metal thickness T;,t and ILD thickness HILD,, where the suhscript 1 represents that the random variable is of layer 1, where 1 = 1 . , . nloyers. We believe that t h~s framework is general enough that it can he applied to handle variations in other parameters as well.
Spatial Correlations
To model the intra-die spatial correlations of parameters, we partition the die into nTow x ncol = n grids. Since devices [wires] close to each other are more likely to have more similar characteristics than those placed far away, we assume perfect correla- Under this model, a parameter variation in a single grid at location (z, y) can be modeled using a single random variable p(s, y).
For each type of parameter, n random variables are needed, each representing the value of a parameter in one of the n grids. In ad- 
STATISTICAL TIMING ANALYSIS ALGORITHM

Modeling Gatehterconnect Delay PDFs
In this section, we will show how the variations in the process a ramcters are translated into probability density functions (PDbsi that describe the variations in the gate and interconnect delays that correspond to the weights on the nodes and edges, respectively, of the statistical timing graph. We will use first-order Taylor expansions to approximate the distributions of the gate or interconnect delays.
In this work, we use the Elmore delay model for simplicity to calculate the interconnect delays'. The interconnect delay can be expressed as a function of the process parameters of the interconnect and the receivers, such as W;,t,, T,,t,, H~L ! , , W,,, L, and Tom. Recall that under our model, we divide the c h p area into grids so that the parameter variations within a grid are identical, 'However. it should be emphmizzd that my delay model may be used, and all Ihai is necd.4 is me sensitiviry of the delay LO thc process pamnrten, say, through a full circuit simulation and adjoint network analysis. 
Orthogonal Transformation of Correlated Variables
Correlation due to reconvergent paths is known to be a problem for statistical timing analysis, even when spatial correlations are ignored. When these relationships among process parameters are taken into consideration, the correlation structure becomes even more complicated. To make the problem tractable, we use the Prin- 
The covariance of dj and dj. cov(di, d j ) , can be computed by C,"=, kid+.
PERT-like 'Raversal of Statiitieal STA
Using the techniques discussed up to this point, all nodes and edges of the slatistical timing graph may he modeled as normally distributed random variables. In this section, we will describe a procedure for finding the dstribution of the statistical longest path in the graph.
In conventional deterministic STA. the PERT algorithm can be used to find the longest path in a graph by traversing it in topological order using two types of functions: the sum function. and the p r function. To apply the PERT algorithm in our statistical timing analysis, we must find the probability distributions of the sum and mm functions of a set of correlated Gaussian random variables:
2) A,,, = max(d1, ". ,&),
where d, is a random vanable representing either gate delay Or wire delay expressed in the form as equation (6).
Computing the distribution of thesum function
The computation of the distribution of sum function is simple. Since d, , , is a linear combination ofnormally distributed random variables, dsum is a normal distribution with the mean pd4l.m = cy=, 4 and variance = E:=, kij.
Computing the distribution of the max function
The we approximate the mar function as n o d even though it is not really normal, and proceed with further recursive calculations. We will show in Section 6 that such inaccuracies are not significant and the results match vely well with the simulation results from a Monte Carlo analvsis. A t [hia p i n t . not only the nudes mnd edges. hut also the re>uIts ul.wu and m u functions ac ehpressed a linea iunctions of rhr principl compnsnri. aid rhus d 1 1 pith dela)s in ihr st31rsti-cal riming graph also hccomc thc lincir iunctions oi the pnncipal conipunriiri. Therr.furs. us can tind the Jictnhution of rlaisrird longest path using a PERT traversal, incorporating the computa-A t [hia p i n t . not only the nudes mnd edges. hut also the re>uIts ul.wu and m u functions ac ehpressed a linea iunctions of rhr principl compnsnri. aid rhus d 1 1 pith dela)s in ihr st31rsti-cal riming graph also hccomc thc lincir iunctions oi the pnncipal conipunriiri. Therr.furs. us can tind the Jictnhution of rlaisrird raversal, incorporating the computation of slim and mar functions described above. TO further speed up the process, several techniques may be used
1.
Before we mn the statistical timing analyzer, we perform one tun of deterministic timing analysis to determine a loose bound on the best-case and worst-case delays for all paths. Any path whose worst-case delay is less than the hest-case delay of the longest path will never be critical, we can safely remove the edges that lie only on such paths. + N I ) ) Therefore, the run time complexity of the algorithm is O(n. (Ng + N I ) ) , which is n times that of deterministic STA.
During the
EXPERIMENTAL RESULTS
The proposed algorithm was implemented in C++ as the software r k a g e "MinnSSTA", and tested on the edge-triggered ISCAS89 Figure 2 , for the largest test case 338417, we show the plots of the PDF and CDF of the circuit delay for both MinnSSTA and MC methods. It is observed that the curves almost perfectly match each other. In Table 2 , we also provide the CPU times for both methods. To show the PCA steps require very little mn time, the run time for this part is also listed. We can see that the CPU time of MinnSSTA on all test cases is very fast. The circuit with the longest m time, ~35932, was analyzed in only 182 seconds. while the MC simulation reouired vver 13 hours.
To show the importance of considering spatialcorrelations, we consider the mfference between performing statistical timing analysis while considering spatial correlation and while ignoring it.
Since this is a comparison to determine why spatial correlations are important, the CPU time is not a consideration. Therefore. we tun another set of Monte Carlo simulations (MCNoCorr) on the same set of benchmarks, tlus time assuming zero correlations among the devices and wires on the chip. The comparison between Table 3 . It can be observed that although the mean values me close, the variances of the uncorrelated cases (MCNoCorr) a n much smaller than the correlated cases (MO. On average, the standard deviation of the correlated case increases by 115.7%. Again, we plot the PDF and CDF ewes of both simulations for circuit 938417 in Figure 3 . It is seen that the CDF and PDF curves of MCNoCorr deviate significantly from those of MC In other words, statistical timing analysis without considering correlation may incorrectly predict the real performance of the circuit and could even overestimate the performance of the circuit. This underlines the importance of developing efficient statistical STA methods that can incorporate spatial correlations.
As an altemative, we consider the option or using multiple process comers (MPO for these experiments, and these results are also displayed in Table 3 . On avera e, such an approach overesThese results also emphasize the imporlance of considering spatial correlations during statistical STA, as is done by our algorithm.
