I. INTRODUCTION
The importance of statistical static timing analysis [1] [2] [3] [4] [5] is increasing in designing high density, high speed and low power VLSIs in deep sub-micron era. Because, designers often set excessive margins derived from the worst-case analysis in order to avoid the effect of the delay time uncertainty, and such excessive margins usually bring overdesign of circuits. If designers can estimate the distribution of critical path delay caused by all local uncertainties, overdesign of circuits may be eliminated so that high density and high performance VLSIs are produced with high yield [6] [7] [8] .
Although several researches have been done on this topic [1] [2] [3] [4] [5] [6] [7] , all of them except [5] assume that the distributions of all signal delays are independent each other. However, such an assumption is not realistic in In this paper, we present a new algorithm for the statistical static timing analysis of a CMOS combinatorial circuit, which can treat not only correlations of delays of signals denoted in Fig. 1 but also correlations between delays of transistors contained in a logic gate. This algorithm assumes that the delay of each logic gate is modeled by a normal distribution (Gaussian distribution), as usually done in the previous researches. In order to compute the distribution of the output delay of a logic gate, a maximum operation on the stochastic variables is necessary, and the proposed algorithm uses a normal distribution of two stochastic variables with a coefficient of correlation [9] for the maximum operation.
If delays of all signals are independent, the distribution of the maximum delay can be computed in O(n+m) time [3] , where n and m are the numbers of vertices and edges of the graph representing a given combinatorial circuit. But, since the proposed algorithm take the correlation into account, the time complexity of the algorithm becomes O(n•m) in the worst-case. However, for real combinatorial circuits, we can expect that the time complexity is much less than O(n•m).
II. PRELIMINARIES
In order to find the distribution of the maximum delay of a CMOS combinatorial circuit, we represent the circuit by an acyclic graph G=(V,E), as shown in Fig. 2 . In the figure, each box denotes a logic gate, which is drawn to show the correspondence between the circuit and the graph. Each vertex contained in a box represents a terminal of the corresponding logic gate, and a vertex in the left or right side in a box corresponds to an input or an output terminal, respectively. Each sink, from which no edge goes out, corresponds to a primary output, and T denotes the set of sinks. Vertex v 0 is the unique source into which no edge comes, and each edge going out from v 0 comes into a vertex corresponding to a primary input. These edges are used to introduce the differences and distributions of arrival times of primary inputs. Each edge in a box goes out from the vertex representing an input of the corresponding logic gate, and comes into the vertex representing the output of the gate. Edges exclusive of the ones going out from v 0 represent interconnects, if they are not contained in a box. Each edge e=(v,w) contained in a box has delays t 0 (e) and t 1 (e), such that t 0 (e) and t 1 (e) denote the switching times of pMOS and nMOS transistors, respectively. Henceforth, we use "b" to indicate 0 or 1, and t b (e) denotes t 0 (e) or t 1 (e). Delay t b (e) has a certain uncertainty, and is a stochastic variable. It is determined by saturated current I dsat , load capacitance C load of the transistor, and slew rate t slew of the gate voltage. The uncertainty of the delay comes from the distribution of I dsat , which mostly depends on the distribution of gate length L g of the transistor. Since the distribution of L g can be modeled by a normal distribution like the distribution of threshold voltage V t [10] , we model the distribution of delay t b (e) of edge e in a box by a normal distribution N(µ b (e), σ The distribution of L g depends on the distributions of space S poly between adjacent polysilicon gates, and gate width W g and length L diff of diffusion area of the transistor. Therefore, by extracting these quantities from the mask pattern, the distribution of L g can be estimated, and hence the distribution of the switching delay t b (e) of a transistor. With the use of the distribution data and the proposed algorithm, we can execute post-layout timing analysis, gate size optimization [6, 7] , and layout optimization of a macrocell [8] .
Since the distribution of L g depends on the distribution of space S poly between adjacent gates, delay t b (e) has a correlation with the delay of the edge corresponding to the adjacent transistor. Therefore, we introduce a correlation between delays t b (e') and t b (e") of edges e' and e" which correspond to transistors with the same type in a logic gate, and denote the coefficient of correlation between these delays by ρ b (e',e"). Such correlations are introduced only for edges contained in a single logic gate.
For an edge e representing an interconnect, delay t b (e) designates the interconnect delay, which is assumed to be constant, that is, σ b (e)=0. This delay is evaluated by a method such as PRIMO [11] . If interconnect delay is also distributed by a normal distribution, then we may introduce a variance σ b (e)≠0. But, in such a case, the proposed algorithm must be modified, because the distributions of t 0 (e) and t 1 (e) may have a correlation. Now, let us show that the maximum delay to a vertex w in G can be estimated by using the edge-delays introduced above.
Let Firstly, we consider the case where vertex w corresponds to a sink or an input node of a gate. In this case, only one edge e = (v,w) corresponding to an interconnect comes into w, and hence d(w,0) and d(w,1) can be calculated by the following equations:
Since t 0 (e) and t 1 Obviously, we have r 00 (w,w) = r 11 (w,w) = 1, r 01 (w,w) = r 10 (w,w) = r 01 (v,v) (= r 10 (v,v) ).
Next, let us consider the case where w corresponds to the output node of a logic gate with k inputs. Let v i (i=1,2,...,k) be a vertex corresponding to an input node of the gate, and let e i = (v i ,w) be incoming edges of w. (See Fig. 3 )
If the logic gate is a NAND gate, then d(w,0) is calculated by the following equation: However, when all inputs other than j are 1, w becomes 1 after the delay d(v j ,0) + t 0 (e j ). Since d(w,1) is the maximum among these delays d(v j ,0) + t 0 (e j ), we can calculate d(w,1) by the following equation:
Similarly, if the logic gate is NOR gate, we can calculate d(w,0) and d(w,1) by the same maximum operations. If the logic gate is an invertor, we set k = 1 in the above equations and need not to take the maximum. 
Through a similar discussion, we see that d(w,0) is computed by
Thus, in the case of XOR gate, we need new edge-delays different from t b (e) and a different maximum operation. In order to simplify the descriptions, we assume in the following that a given combinatorial circuit does not contain XOR gates. Therefore, the maximum operation is conducted only among d( .
We use these equations in procedure DISTMAX( In(w), U ) as follows. We first reduce a given circuit graph G=(V,E) by repeating the reduction of two series edges e'=(u,v) and e"=(v,w) into one edge e*=(u,w). Since the delays of two series edges are independent, the mean and variance of t b (e*) are given by µ b (e*) = µ b (e') + µ b (e"),
respectively, and correlation coefficient ρ b (e*,e) of t b (e*) with t b (e) of edge e coming into w is given by Initially, set Front as the set of vertices corresponding to primary inputs, and we see that conditions (A) and (B) can be satisfied for this Front.
Then, repeat the following procedure, until Front becomes T. and Var [ M ] by the equations in [9] .
The time complexity of the proposed algorithm can be analyzed as follows.
Let k w be the number of in-coming edges of a vertex w∈V-{v 0 }, and h max be the maximum number of vertices contained in Front simultaneously. Then, the time required for adding a vertex w to In the worst-case, the maximum size h max of Front is O(n), where n = |V| is the number of vertices. However, in real circuits, h max is smaller than n, so that the time complexity of the algorithm is less than O(n•m).
V. EXPERIMENTAL RESULTS
In order to see the performance of the proposed algorithm, we first applied our algorithm to the graph in Fig. 4 , which corresponds to the circuit in Fig. 1 . The mean µ and standard deviation σ of delays of edges (a,x) and (c,y) are 20 and 1.4, respectively, and those of edges (v,x), (v,y), (x,z), and (y,z) are 10 and 0.7, respectively. We changed the delay of edge (u,v), and computed the maximum delay to vertex z, as shown in Table 1 . In the experiments, we assumed that all edge-delays are independent.
In the table, "Monte Carlo," "Ours," and "No Correlation" show the results obtained by Monte Carlo simulation (20,000 iterations), by our algorithm, and by ignoring correlation of delays d(x,b) and d(y,b), respectively, and "error" is the relative error to the result of Monte Carlo simulation. We showed in the table correlation coefficient r(x,y) and cpu-time spent in ULTRA-SPARC IIi 400MHz workstation, too. Fig. 5 shows the distributions of the maximum delay d(z) in the case of the top row in the table. Although the difference between "Ours" and "No Correlation" may not be large, the difference increases if σ becomes large. Table 2 shows the results of a few circuits in ISCAS 85 benchmark, where n and m are the numbers of vertices and edges of the graph representing the circuits, respectively, and error is the percentage of the difference between "Ours" and "No Correlation." In the experiments, the circuits are laid out by a macrocell layout system [8] , and the mean values of delays are extracted from the layout. The standard deviation σ is set as σ=0.15µ, and the edge delays are assumed to be independent again. The cpu-time is mainly depend on the number of numerical computations for normal distribution, and hence it can be reduced by using table look-up technique. Thus, further investigation about correlation and reduction of CPU time are future work.
VI. CONCLUSION
In this paper, we proposed a new algorithm for the statistical static timing analysis which can treat correlations of delays of re-convergent paths and correlations of delays of adjacent transistors.
Since the algorithm takes the correlations into account, the worst-case time complexity is O(n•m), where n and m are the numbers of vertices and edges of a given acyclic graph representing a CMOS combinatorial circuit, respectively.
In order to see effects of correlation, we applied our algorithm to some benchmark circuits, and found that the difference between the results obtained by our algorithm and those obtained by ignoring correlations is not large enough, comparing to the increase of the computational effort. However, since the difference increaaes, if the distribution becomes large, further research is necessary.
There still remain a few problems on the statistical static timing analysis, among which one of the most important issues is the deletion of false paths [12] . We are tackling this problem by using the proposed technique.
