An interconnect joining a source and a sink is divided into xed-length uniform-width wire segments, and some adjacent segments have bu ers in between. The problem we considered is to simultaneously size the bu ers and the segments so that the Elmore delay from the source to the sink is minimized. Previously, no polynomial time algorithm for the problem has been reported in literature. In this paper, we present a polynomial time algorithm SBWS for the simultaneous bu er and wire sizing problem. SBWS is an iterative algorithm with guaranteed convergence to the optimal solution. It runs in quadratic time and uses constant memory for computation. Also, experimental results show that SBWS is extremely efcient in practice. For example, for an interconnect of 10000 segments and bu ers, the CPU time is only 0:127 second.
Introduction
In the past, gate delay w as the dominating factor in circuit design. However, as the feature size of VLSI devices continues to decrease, interconnect delay becomes increasingly important. Nowadays, feature size has been down to 0.25m in advance technology. Interconnect delay has become the dominating factor in determining system performance. In many systems designed today, as much as 50 to 70 of clock cycle are consumed by interconnect delay 8 . It is predicted in 11 that the feature size will be reduced to 0:18m by 1999 and 0:13m by 2002. So we expect the signi cance of interconnect delay will further increase in the near future.
Both bu er sizing and wire sizing have been shown to be e ective techniques to reduce interconnect delay and many works have been done during the past few years. For example, 2, 3, 4 , 1 0 , 1 4 are various results on wire sizing alone. 16 applies the sequential quadratic programming approach to simultaneous gate and wire sizing. This algorithm is comparatively slow a s i t h a s t o solve a sequence of quadratic programming subproblems. Also, no bound on the run time of the algorithm is reported. 15 gives an algorithm for simultaneous bu er insertion, bu er sizing and wire sizing based on dynamic programming. However, their algorithm runs in pseudopolynomial time and requires a substantial amount of memory. 1, 7, 9 g i v e greedy algorithms for simultaneous transistor bu er and wire sizing. These algorithms are shown to be very e cient in practice. However, no bounds on the run time of them are known. 5 considers bu er insertion, bu er sizing and wire sizing simultaneously and a closed form optimal solution is obtained. However, in that paper, only wire area capacitance is considered. Wire fringing capacitance, which will become more and more signi cant as feature size decreases, is ignored. Taking wire fringing capacitance into account signi cantly complicates the problem and 5 can only give an approximate solution. 6 shows that the simultaneous bu er insertion and wire sizing problem can be formulated as a convex quadratic program. The convex quadratic program has a small size and some special structures, and so can be solved very e ciently. However, if bu er sizing is considered also, only a brute-force enumeration of the bu er sizes is proposed. See 8 for a comprehensive survey on previous works.
In this paper, we consider the problem of minimizing interconnect delay by simultaneously sizing bu ers and wire segments. Basically, a n i n terconnect joining a source and a sink is divided into some xed-length uniformwidth wire segments. Some of the adjacent segments have bu ers in between. The problem is to determine the bu er sizes and segment widths so that the Elmore delay from the source to the sink is minimized. In particular, both wire area capacitance and wire fringing capacitance are taken into account, and an approach completely different from that in 5 is required here. The details of the problem formulation are discussed in Section 2.
We make the following contributions in this paper:
We present an iterative algorithm SBWS for the simultaneous bu er and wire sizing problem. We prove that SBWS always converges to the optimal solution.
We prove that for an interconnect wire consisting of n bu ers and segments, SBWS runs in On 2 + n log 1 time, where speci es the precision of computation see Theorem 1. Since log 1 is bounded by the number of bits in the input, the total run time is quadratic to the input size. This is the rst polynomial time algorithm for the simultaneous bu er and wire sizing problem considered in this paper. SBWS requires only constant memory for computation. We demonstrate experimentally that SBWS is also extremely e cient in practice. For example, for an interconnect of 10000 segments and bu ers, the CPU time is only 0:127 second. Besides, we observe that SBWS runs in linear time in practice. The rest of the paper is organized as follows. In Section 2, we present the formulation of the simultaneous bu er and wire sizing problem. In Section 3, the algorithm SBWS, its optimality proof and its run time analysis are presented. In Section 4, some experimental results to show the e ciency of SBWS are presented. In Section 5, we discuss some extensions of our results.
Problem Formulation
In this paper, a component means either a bu er or a wire segment. Given a source with driver resistance R D , a sink with load capacitance C L , the source and the sink are linked by a n i n terconnect consisting of n components. The i-th component is either a bu er of size x i or a wire segment of width x i . The simultaneous bu er and wire sizing problem is to minimize the delay from the source to the sink with respect to x 1 ; : : : ; x n . See Figure 1 for an illustration. Given the driver resistance RD, the load capacitance CL, the number of components n, the set of component indexes of bu ers B, and the set of component indexes of wire segments W, the objective is to nd the optimal wire widths and bu er sizes x1; : : : ; x n such that the delay from the source to the sink is minimized.
In general, the source and the sink can be anything. However, in order to simplify the notations, we will treat them as bu ers of xed size in this paper. Let the source In this paper, the widely used Elmore delay model 13 is used for delay calculation. Basically, the Elmore delay from the source to the sink is the sum of the delays associated with the components, where the delay associated with a component is equal to its resistance times its downstream capacitance. In other words, the Elmore delay from the source to the sink is given by
The problem is to minimize D with respect to x 1 ; : : : ; x n .
3The Algorithm SBWS As a result, nding the optimal solution to the problem is equivalent to solving 8 and 9 for x 1 ; : : : ; x n , where R 0 ; : : : ; R n ; C 0 ; : : : ; C n satisfy 4, 5, 6 and 7.
Instead of solving the system of equations 4, 5, 6, 7, 8 and 9 directly, w e consider a modi ed system the modi ed system will also be a solution of the original system, and hence the optimal solution of the simultaneous bu er and wire sizing problem.
We will show in the following how to solve the modi ed system of equations in linear time. First of all, we h a v e to prove the lemma below which relates x i ; R i and C i for any wire segment i. In SOLVE, step 1 follows from 6 with i = n and that C L = b c n+1 x n+1 , step 4 follows from 4, step 5 follows from 8, step 6 follows from 6 with i = i , 1, step 9 follows from Lemma 1, step 10 follows from 9, and step 11 follows from 7 with i = i , 1.
As mentioned above, in order that the solution of the modi ed system is also a solution of the original system, the value of R 0 computed by SOLVER n must equal Hence the lemma follows.
2
To nd the value of R n such that R 0 R n = R D , Lemma 6 implies that binary search can be used. Lemma 7 gives us a condition to terminate the binary search such that the precision of the solution is within . So what is left now is a range to start the binary search. We can nd it by rst making an initial guess R of R n . Next, R is repeatedly divided or multiplied by 2 until SOLVER R D SOLVE2R. Then the range R;2R will contain the optimal R n and hence can be used to start the binary search. The algorithm is summarized below.
A good initial guess for the value of R in step 1 can be obtained by the result of 5 . When there is no fringing capacitance, we can use 5 to nd the exact value of the optimal x n . With fringing capacitance as in our case, we can use it to obtain a good approximation to x n , and hence a good approximation to R n . The formula to i for 1 i n. As n 1 + =3 n , n,1 1+ =3 n 2 = 1+2 =3 n + =3 n 2 1+3 =3 n = 1 + =3 n,1 .
We can apply the idea inductively to show that 0 1+ .
Therefore, together with Lemma 6, 1 R 0 0 =R 0 1 + . If R 0 n R n , using 1=1 + =3 n R 0 n =R n , we can prove similarly that 1=1 + R 0 0 =R 0 1. 2
So by Lemma 7 and Lemma 8, if 1=1 + =3 n R 0 n =R n 1 + =3 n , then jx 0 i , x i j=x i for 1 i n.
The number of iterations for the binary search to guarantee 1=1 + =3 n R 0 n =R n 1 + =3 n is at most Olog3 n = = On + log 1 . Since each iteration takes On time, we h a v e the following theorem. 
Experimental Results
In this section, we will show that the algorithm SBWS is extremely e cient in practice. We h a v e implemented SBWS in C. We run it on a IBM PC with a 200 MHz Pentium Pro processor. The precision parameter is set to 0:1. Di erent v alues for the number of components n ranging from 1000 to 10000 are used. For each v alue of n, 100 problem instances are generated randomly. The average CPU time and the average number of calls to the procedure SOLVE are reported in Moreover, we observe that the number of calls to the procedure SOLVE is around 12 for all cases. So run time is linear in practice. The CPU time is plotted as a function of n in Figure 5 below.
Discussion
Our result can be extended in several ways: Wire area and power consideration: Our algorithm SBWS can be extended easily to minimize a weighted sum of total wire area, power and delay. As the objective in 3 is changed, the optimality conditions 8 and 9 will also be the di erent. However, it is not difcult to see that the problem can still be solved by the ideas of this paper without much modi cation. For other objectives like minimizing delay subject to area bound or minimizing area subject to delay bound, we can apply the Lagrangian relaxation technique as in 4 to reduce the problems to a problem of minimizing a weighted sum. Interconnect with tree topology: SBWS is designed for interconnects with a line topology. As this is the case for most interconnects in a circuit, SBWS can be applied to them directly. However, there are some interconnects with tree topology. For weighted sink delay objective, those interconnects can be handled by SBWS using a similar technique as in 4 . That is we use an iterative algorithm to optimize the tree edges one at a time. At each time we manipulate an edge, we k eep all the other edges xed and apply SBWS to that edge. For other objectives like minimizing maximum delay or minimizing area with delay bounds, we can apply the Lagrangian relaxation technique as in 4 to reduce the problems to a problem of minimizing weighted sink delay.
Better theoretical bound on run time: A quadratic run time is proved in Theorem 1. However, the experimental results in Section 4 suggest that the actual run time of SBWS is close to linear. In fact, for Lemma 3, we can argue that i,1 i and i,1 i . This implies that for Lemma 8, if 1=1+ =Om R 0 n =R n 1 + =Om, then 1=1 + R 0 0 =R 0 1 + . So we conjecture that with a tighter analysis, one can prove that SBWS runs in On log m time.
