Under modem VLSl technology. process variations greatly affect circuit performance, especially clock skew which is very timing sensitive. Unwanted skew due to process variation forms a bottleneck preventing further improvement on clock frequency. Impact from intra-chip interconnect variation is becoming remarkable and is difficult to he modeled efficiently due to its distributive nature. Through wire shaping analysis, we establish an analytical bound for the unwanted skew due to wire width variation which is the dominating factor among interconnect variations. Experimental results on benchmark circuits show that this bound is safer, tighter and computationally faster than similar existing approach.
INTRODUCTION
When the VLSl feature size becomes progressively smaller, previously negligible process variations start to affect circuit performance significantly. These process variations result from many non-ideal conditions such as etching error, mask mis-alignment and defects. For clock networks, these variations cause clock skew variations or unwanted skews. Since clock skew is a lower bound for clock period. the unwanted skew due to process variations form a bottleneck preventing improvement on clock frequency. For modem high performance circuit designs, process variation induced clock skew is taking a greater and greater portion of clock period time. Therefore, it is very important to model the impact of process variations on clock network performance so that it can be considered during circuit/clock design.
Realizing the great importance of process variations, many works have been carried out to model their effect [IO, 7, 2, 9, 15, 1, 17, 141 . One major objective of these modeling works is to find a reliable estimation on the worst case timing performance induced by process variations, either the worst delay along timing critical paths in timing analysis or the worst skew in a clock network. The estimation results can he applied as a feedback to guide further design iterations.
One common approach to find the worst case performance is to run Monte Carlo simulations for a certain number of iterations and pick the worst case performance among the results. In order to obtain a reliable estimation, the number of iterations is generally very large and consequently the high computational cost makes this method impractical. Another straightforward technique is to estimate the performance only at comer points of process variations. Even though this technique is computationally fast, it is overly pessimistic in estimating the worst case delay due to gate process vaxiations as correlations among the gates are neglected. Therefore, the comer-point technique is useful only when a loose bound needs to The effect of interconnect process variations is becoming more and more pronounced and it is found in 1111 that interconnect variations may cause 25% clock skew variations. The impact of interconnect variation is intrinsically hard to be modeled efficiently. since the worst case interconnect delay does not always occur at the process variation comer points. This fact makes the comer-point technique not applicable for interconnect variations. Furthermore, interconnect variations is distributive in nature in contrast to the localized variations for transistors. Since an interconnect wire may span a long distance, using a single variable to model each process parameter such as wire width or thickness is inadequate to capture the different process variation levels in different regions. This is especially true when the intra-chip variations start to dominate the inter-chip variations [l2] . A naive approach to solve this problem is to segment a long wire into smaller pieces, and consider the variation of each piece individually. However, this approach may increase the number of variables considerably and thereby slow down the estimation speed.
Our main objective in this paper is to obtain a fast skew bound estimation, that can be used in applications such as process variation driven clock network design and chip level design planning. Note that, in the above applications, the speed and fidelity of the estimation is more important that the numerical accuracy, so that it can be used in the inner loop of optimization. This requirement is quite different from the requirements of the post optimization estimation where probabilistic approach is better. In this work, we concentrate on the effect of wire width variation which is the dominating factor compared to other wire parameters such as wire h c kness, since wire width shrinks faster under the non-uniform technology scaling. We have derived an analytical bound for unwanted clock skew between any two clock sinks due to wire width variation. Our proposed technique is based on the observation that estimating the worst case skew due to wire width variation is closely related to the non-uniform wire sizing problem in physical optimizations. The minimum delay non-uniform wire sizing problem for a 2-pin wire(sing1e load path) has been solved in [0, 31. We derive the maximum wire shaping function for both single load path and multi-pin trees. The minimum delay wire shaping formula for multi-pin trees is also obtained in our work. Previously, a more general version of this problem was solved through an iterative algorithm [41. Similar to the works of [0, 3, 41, we employ Elmore delay model for our derivations because of its high fidelity and ease of computation. Besides the hound, we discovered that the hound for skew between two clock sinks depends on their common upstream path, even though the skew between them is independent of the common path. This dependence is analyzed and found to be monotone in practice. These results establish an analytical bound for the unwanted skews due to wire width variation. Since the analytical bound can he computed very quickly, it can he applied to process variation driven clock network design as well as chip level design planning. It is to he noted that, for the above applications, speed and fidelity of the the estimation is more important than numerical accuracy. This is because the estimation will he queried frequently in the inner loop of optimization and design planning. Thus, the analytical hound fits the requirement precisely.
PROBLEM ANALYSIS
Given a pair of sinks SI and s2 in a clock routing tree, our objective is to find a hound for the worst case skew due to wire width variation. The skew between the two sinks can he expressed as 412 = 1, -12 where rt and r2 are the delay from clock driver to s1 and s2, respectively. When there is wire width variation for the paths from dnver to SI and s2, the delay rl and r2 vary in certain ranges of [ I I , , ,~~,~, , , , ,~] and [f2,min,r2,mar]. respectively. Evidently, the worst case skew occurs at 4 1 2 ,~~ = rtFN -h,,,in or q12,,,in = rl,min -rzFar. In previous work, sometimes the skew is defined as max(\qlzm,\, \ql+inl) or the maximum absolute value of skews among all clock sink pairs. The latter definition of global skew is usually for traditional zero-skew clock network. For modem aggressive VLSl designs, useful skews 1161 are applied more frequently, thus, we use the concept of local pair-wise skew instead of the single global skew. In handling process variations and other delay uncertainties, target skews are usually specified as a set of permissible ranges [I31 instead of a set of single values. Since there might be skew violation on both the upper-bound side and the lower-hound side of a permissible range, we consider the maximum and the minimum skew separately. It can he seen that a hound for the worst case skew can he obtained by estimating the maximum and the minimum delays under process variations.
In order to estimate the maximum and the minimum delays due to wire width variation, we reduce the routing tree into a simplified model demonstrated in Figure 1 . In Figure I (a), we wish to estimate the worst case skew hetween sink st and sa. Since the common upsrream path SO Y v, for st and sq does not contribute to the skew between them, we lump its wire resistance together with the driver resistance into a virtual driving resistor R at the nearesr commn parent node v7 for st and sq in Figure I (b) . We such that the delay from the virtual driver R to the sink is maximized or minimized. The minimum delay wire shaping function for a path without branch loads is solved in [8,3]. An iterative wire shaping algorithm is provided in [4] to minimize a weighted sum of sink delays in a routing tree. Though this algorithm can guarantee the optimal wire shaping solution and can he adopted directly in our case, the convergence rate of t h s algorithm has not been proved. Since we wish to minimize the delay to only one particular sink instead of a weighted sum of sink delays, we are able to derive an analytical formula of wire shaping in Section 3. The formula for the maximum delay wire shaping is introduced in Section 4. The wire sheet resistance is denoted as r and the wire capacitance per unit length and unit width is represented as c. If there are k branch loads as indicated in Figure 2 , we define the lumped downstream branch load capacitance as:
: O < _ X < / , Ca +Ci The Elmore delay of the path in Figure 2 can be expressed as:
We can define the upstream resistance at position x as R(x) = R + Jk &dx. Fixing the wire shaping function w(x) except at an infinitesimal strip of width S at z and let w(z) to be a variable y , we may obtain the first order derivative as: 
THE MINIMUM DELAY WIRE SHAP-ING FOR PATH WITH BRANCH LOAD
The minimum delay wire shaping for a path with single load w Case 5: The width is wh, when x is greater than a value h, and the wire shaping follows exponential function from I; to h.
Case 6: The width is wj0 when x is smaller than a value g. whj when x is greater than a value h, and the wire shaping follows exponential function from g to h. We call the position of x = g and x = h as switching points. The method to decide the switching points is very complicated as shown in 131. Moreover, it is possible that all six cases need to be evaluated to find the exact minimum delay wire shaping. In practice, one can take the wire shaping according to Equation (4) without considering the wire width bound, and round the width to either wlo or whi at where the width from Equation (4) exceeds the bound. The switching points can be found in the rounding process and the wire shaping in the exponential segment between x = 8 and x = h can be recomputed according to the updated upstream resistance and downstream capacitance. Note that this is slightly different from the rounding-alone mentioned in 131. Even though this heuristic may result in suboptimal solutions, the computation becomes much easier and we observed that the error due to this approximation is negligible in practice.
THE MAXIMUM DELAY WIRE SHAP-ING
In this section, we will derive the maximum delay wire shaping for both both single load path and path with multiple branch loads. For the ease of presentation, we stait with the single load situation where k = 0. It has been shown in Section 2 that the delay is a convex function with respect to w (~) . Figure 4 . Therefore, w ( x ) has to be either w,,, or wh; to maximize the delay. Because of Equation (3). delay function with respect to w(x) is monotonously decreasing when x is large or the position is closer to the driver. Similarly, the delay function is monotonously increasing when the position is closer to the sink side. Tlus fact can be translated to the effect that w ( x ) will be wjo when x is large and whi when x is small. When the value o f x increases, the delay function with respect to wire width at x changes in the direction from (a) to (bj to (c) in Figure 4 .
This is illustrated in
Therefore, the maximum delay wire shaping looks like the exanple in Figure 5 . The next problem is to find the partitioning point x = p where the wire width is switched from wj0 to wh;. When k = 0. the Elmore delay for Figure 5 can be written as:
Figure 6: Case 6 for the minimum delay wire shaping, the wire width is wl0 in [O,g] and Whi in [h,L] . Betweeng and h, the wire shape fool 10ws function. In order to find the p that maximize the delay, we first find the derivative:
for path with branch loads is the same and can be extended from the single load case easily. The skew upper hound between them dr dp 2RP + P _ = -Since a is always negative, reached when < 0 and the maximum delay is canbe expressedas 9 1 4~~ = 1 1 , --1~~~~. From Section 3 and4, we know that both 11,-and f+,in depend on the value of R. We will analyze
By plugging Equation (7) and the expression of fi into fip, we have Pp = -apZ. Then the maximum delay Equation (5) can he
&J-
by evaluating % and *.
1 RWIO
by letting f = 0
P
In other words, p satisfies the following equation:
n If we transform the driver into a piece of wire with width wlo whose resistance is same as the driver resistance and treat the load CO as a wire of width whi with same value of capacitance CO, then the above equation shows that p makes length of the fat segment the same as the length of the thin segment.
When we consider the situation with branch loads, i.e., k > 0, the properties on $ for Equation (3) do not change and the wire shape is same as in Figure 5 and the value of p is determined by:
Even though the wire shapc in Figure 5 looks strange, it may happen in reality. If the path is routed in an L-shape with a horizontal and a vertical segment, the two segments are usually on different metal layers Therefore, it is likely that the width variation has a abNpt change at layer switching.
THE SKEW BOUND DEPENDS ON THE COMMON PATH
It is easy to see that the variation along the upstream common path for a pair of sinks does not contribute to the skew between them. For the example in Figure I , the variation along path SO -y has nothing to do with the skew value between sink st and sink sq. However, when the wire width at the common path changes, the resistance R in Figure I changes and the maximudminimum wire shaping from v7 to SI and SA will change as well. Therefore, the skew bound is affected by the variation of R or their common path.
Theorem 2: Givenfuredswitching points. the skew upper(1ower) h a n d berween two sinks is a convex(concave) funerion wirh respect to their common driving resistance.
Proof: Let us consider the case where skew upper hound between two sinks SI and sq need to be computed. Let the distance from SI and s4 to their nearest parent node v7 he L1 and Lq. respectively. The load capacitance at SI and sq are CI and Cq, respectively. The branch loads are temporarily ignored, as the conclusion transformed to:
Then we can obtain the derivative: dp dy
The derivative of f4,,in depends on the six different cases described in Section 3. We will show the derivations for the most basic case 1 and the most complex case 6. Derivations for other cases are similar, and all these cases lead to the same conclusion.
For case 1, the equation for the minimum delay (for single load case) is given as [SI
where W(i) is the Lambert's W function a n d i = 9 6 . Since the Lambert's W function satisfies W(i)eW(*) =P, we can obtain its derivative as
Then we have the derivative of f4p,n with respect to R
Combining Equation (10) and (141, we can get the second order derivative of q14,-:
Evidently, Now we consider case 6 which is more complex and general.
The wire shaping for case 6 is depicted in Figure 6 . We can divide > 0 and 9,qW(R) is a convex function. this wire into three segments: (i) the thin segment from x = 0 to x = g, (ii) the exponential segment between x = g and x = h and (iii) the fat segment from x = h to x = 4 . We use Cth,n = cwc,g, c, , = J," cw(x)dx and c,, = C W h j ( 4 -h) to represent the wire capacitance for each segment. Similarly, the wire resistance for eacb segment can be defined as RI,,;" = 2, R,, = &dx and We canynd the wire delay for the exponential segment itself as:
For the exponential segment, we can treat the fat segment as part of its driving resistance and the thin segment as part of its load capacitance. Thus we can express the delay fromx = h l o x = g as:
(R + R f o l )(Cup + C h i n +c4) + i + R e x p ( cc hi"
The value of rerp can be obtained through Equation (12) The total path delay from x = 4 to x = 0 can be written as: is still a convex function for case 6. Other cases can be proved in the same way.
For a path with multiple branch loads, the difference is that the constant load C 4 is replaced by the branch load function C,,(x). Since the branch load function is piecewise constant. it does not change the property of f& > 0 either. Since qt+,in = l l F i n -1 4 ,~~ is symmetric to 9 1 4~~ =llflmnr -r4,mm. wecan conclude that q14pin is a concave function with respect to R. Q.E.D.
According to Theorem 2, we need to compute the wire shaping forqlr,mru twice, one with the minimumRby setting the wire width along the common path 50 -v7 to w h i , the other with the maximum R by letting the wire width along the common path be wl0. The maximum of the two skew results is finally selected as the worst case bound.
EXPERIMENTAL RESULTS
In order to validate the bound we defived, we implemented our formulas, Monte Carlo and the interval analysis [IO] for comparisons. Even though Monte Carlo method is not efficient, it can The test cases are the r l -r5 which are applied in the bounded skew clock routing (BST) work [SI. We downloaded the bounded skew clock routing code from the GSRC hookshelf (hrrp ; //vlsicad.ucsd.edu/GSRC/bao~hel f /Slors/BST/j and generated clock routing trees for rl -rS by running the BST code. The global skew hound is set to be 1 0 0~s .
We assume *30% variations on wire width. We implemented OUT formulas, Monte
Carlo and the interval analysis method in C language. The experiments are performed on a PC with Pentium 111,655 MHz pmcessor and 512 MB memory
For each clock tree, we calculated the skew bound of one of the sinks with respect to every other sink in the clock tree. Thus for a clock tree with n sinks, we will haven -1 pain. We evaluated the skew bound due to wire width variations for the 6725 pair of sinks in the five test cases of rl -r5 by all of the three methods. In order to obtain a meaningful estimation, we segment long wires into small pieces of about 5Opn for the Monte Carlo and interval anal-ysis. Therefore, the wire width for each piece is an individual variable. For each sink pair, we run Monte Carlo simulation for 50,000 trials when the width for each wire piece is selected randomly in the range of [wl0,whi]. When estimating the minimum delay wire shaping we applied the heuristic described in Section 3 to decide the switching points and the optimal wire shaping for them. This heuristic brings great implementation convenience with negligible .~ quality penalty. ~ We take the result from Monte Carlo simulation as baseline. The result from our b u n d is evaluated hy taking the difference between them. In other words, we evaluate the maximudminimum skew from our bound minus the maximumlminimum skew from Monte Carlo simulation for each pair of sinks. The bound result from the interval analysis is evaluated in the same way. Figures 7 and 8 show the histograms of the difference for the maximum skew and the minimum skew respectively. The horizontal axis represents the skew difference in ns and the vertical axis represents the number of sink pairs with a panicular value of skew difference. In Figure 7 , the difference from our bound is always greater than zero.
Meanwhile, the difference from our bound for the minimum skew in Figure 8 is always non-positive. This fact tells that our method provides a bound for the worst case skew in practice, even though we applied heuristic in the implementation. Some sink pairs have similar path lengths to their nearest common parent node and the dnver, thus they have similar skew variation behaviors. This fact results in serval peaks in the histograms. Figure 7 and 8 show that the peaks from our method are closer to zero difference compared to the peaks from the interval analysis. Thus, our method provides a tighter bound than the interval analysis. The computation time for each method is shown in Table 1 . In Table 1 , column 2 and column 3 show the number of sinks and the number of sink pairs whose skew are evaluated. We can see that the runtime from the Monte Carlo simulation is impractical. Compared with interval analysis, the runtime of our method is always shorter.
CONCLUSION AND FUTURE WORK
Through wire shaping analysis, an analytical bound for the unwanted skew due to wire width variation is established. Since this bound can be obtained very quickly, it can be applied to interconnect variation driven design and design planning. Experimental results show that our method is safer, faster and more accurate than the interval analysis. This result can be extended to consider the effect of buffer variations on clock skew and can be used to enhance the toolkit to hamess the progressively severe process variation effect.
