To improve the performance of critical nets where both timing and wire resources are stringent, we i n tegrate bu er insertion and driver sizing separately with non-Hanan optimization and propose two algorithms: BINO simultaneous Bu er Insertion and Non-Hanan Optimization and FAR-DS Full-plane AWE Routing with Driver Sizing. For BINO, we consider the realistic situation that bu er locations are restricted to a limited set of available spaces after cell placement. The objective of BINO is to minimize a weighted sum of wire and bu er costs subject to timing constraints. To achieve this objective, we suggest a greedy algorithm that considers two operations independently: iterative bu er insertion and iterative bu er deletion. Both are conducted simultaneously with nonHanan optimization until the improvement is exhausted. For FAR-DS, we i n vestigate the curvature property of the sink delay as a function of both connection location and driver stage ratio in a twodimensional space. The objective o f F AR-DS is to minimize a weighted sum of wire and driver cost while ensuring that the timing constraints are satis ed. Based on the curvature property, w e search for the optimal solution in the continuous two-dimensional space. In both BINO and FAR-DS, a fourth order AWE delay model is employed to assure the quality o f optimization. Experiments of BINO and FAR-DS on both IC and MCM technologies showed signi cant cost reductions compared with SERT and MVERT in addition to making the interconnect to satisfy timing constraints.
Introduction
As the VLSI technology develops into the deep sub-micron era, the interconnect resistance is no longer negligible and its performance plays a critical role to the whole circuit. As the result, many e orts 1 18 have been carried out in recent years to improve the interconnect performance. According to the classi cation in 1 , these works have e v olved along three major aspects: the delay model, the objective formulation and the solution space. The progress on each of these aspects will be brie y reviewed as follows.
When the interconnect resistance was not signi cant, it could be simply modeled as a lumped capacitance that is proportional to the wire length. Therefore, in early research, the interconnect performance criterion was purely geometric and focused on wire length based objectives such as reducing the routing radius and the total wire length. As wires have become longer and thinner, this geometric evaluation no longer su ces to reduce interconnect delay as resistive e ects become signi cant. A more elaborate delay model is necessary to augment wire length considerations in performance evaluation. The Elmore delay 19 model has been widely used due to its simplicity and high delity 2 . Its simplicity not only removes the need for large amount of computation, but also provides a platform on which many theoretical properties can bederived and exploited. One major Elmore delay based routing method, SERT 2 , grows the routing tree in a greedy fashion to minimize the source-sink delay. Another Elmore delay application, the P-tree algorithm 3 , rst searches for a good permutation of the sinks and then limits the solution space to the topologies induced by this permutation. In later work, the drawbacks of Elmore model have been addressed and second 5 and third order 6 models have been applied. Most recently, the work in 1 suggested a table lookup method to remedy the de ciencies of the Elmore delay model.
With regard to the objective formulation, the total wire length area and delay are usually the major targets. Minimizing total wire length can reduce the cost, power consumption and improve the routability. All of these advantages lead to the use of area minimization as a common baseline for objective formulations. For delay reduction, there are many forms in which the objective may be stated, including minimizing either a weighted sum of sink delays, or the maximum delay, or the critical sink delay. As a more appropriate formulation, the research in 5,9,10 focuses on satisfying the timing speci cation in an e ort to trade o the unnecessary delay reduction into area minimization.
The solution space of nodes in the routing tree has long been restricted to the Hanan grid since it simpli es the problem nature, and it can be proven that optimal solutions lie only on Hanan grid points if the unconstrained objective is to minimize the wire length or a weighted sum of sink delay. However, if we formulate the objective so as to satisfy the timing constraints, the optimal Steiner points are very likely to lie at the non-Hanan grid points, as indicated in 10 . The work of 10 developed the MVERT algorithm, which exploits the piecewise concavity of delay violation functions to search for the optimal Steiner points. Its experimental results showed that expanding the solution space to non-Hanan points can signi cantly reduce the wire cost.
In this paper, we continue the e ort of non-Hanan optimization to deal with the condition where both timing and wire resources are stringent. We i n tegrate bu er insertion and driver sizing with nonHanan optimization in separate formulations to further improve the interconnect performance. These two approaches resemble each other in term of their algorithmic skeleton, although the nature of the problems is di erent.
Bu er insertion is a promising technique 12 17 that is essential for large nets. Most of the methods in 12 17 are implemented through dynamic programming in a bottom-up fashion. However, all of these methods have been restricted to only Hanan grid routing. Moreover, each of these approaches neglects the e ects of restrictions on the bu er locations, i.e., it is assumed that bu ers can be inserted in any arbitrary position as long as they can improve the interconnect performance. In real situations, this is not always permissible because the optimal bu er location may already have been occupied by other cells and it is undesirable to disturb the placement. Most recently, the work of 18 takes the restrictions to bu er locations into consideration and suggests an exact algorithm for two-pin nets. The problem environment w e consider here is a limited set of bu er spaces where bu ers are to be inserted into the interconnect after the placement stage. The concept of soft edge is employed to increase the possibility that a bu er space is exploited. We guide each m o ve in the optimization in a greedy fashion and conduct bu er insertion and non-Hanan optimization BINO simultaneously and iteratively until no further improvements are possible.
Another e ort in our work is simultaneous driver sizing and non-Hanan optimization FAR-DS. We have investigated the curvature properties of the delay as a function of the connection location and driver stage ratio in a two-dimensional space under the Elmore delay model. Though the Elmore model may bepoorfor speci c points, it still provides a valid prediction of qualitative properties 2 . According to the solution region properties, we suggest two search s c hemes to nd the optimal solution in the objective that can minimize a weighted sum of the wire cost and the driver cost, while satisfying the timing constraints. In both FAR-DS and BINO, we use a fourth order AWE delay model 21 to assure the integrity of the optimization. This paper is organized as follows. In section 2, we will introduce the concept of a soft edge, review some background on non-Hanan optimization and the motivation for using a fourth order AWE model. Section 3 will describe the problem environment and formulation of BINO simultaneous Bu er Insertion and Non-Hanan Optimization. The problem properties and formulation of FAR-DS Full-plane AWE Routing with Driver Sizing will be investigated in section 4. Section 5 will present the algorithms for BINO and FAR-DS, followed by a complexity analysis in section 6. Finally, the experimental results will be discussed in section 7.
A list of notational terms used in this work is listed below: n: number of sinks.
: weighting factor for total wire capacitance.
C C : the closest connection point b e t ween a node and an edge.
l ij : Manhattan distance between two nodes v i and v j .
T k : a subtree rooted at node v k .
2 Preliminaries A routing tree T is described by a set of nodes V = fv 0 ; v 1 ; v 2 :::g and a set of edges E = fe 1 ; e 2 :::g.
Generally, w e refer to v 0 as the source. The location for a node v i is speci ed by its coordinates x i and y i , and an edge in E is uniquely identi ed by the node pair v i ; v j . The edge length is given by the Manhattan distance between the two nodes, which i s jx i , x j j + jy i , y j j.
Since a routing tree is built in rectilinear space, each edge must be either horizontal or vertical. We introduce another type of edge, a soft edge, whose orientation is not speci ed until the tree construction and optimization are completed. The concept of soft edge will be explained through a simple example in Figure 1 . In this example, a source v 0 and two sinks v 1 and v 2 are given, and a minimum Steiner tree is to beconstructed on this node set by adding one node to the tree at a time. We may begin by connecting v 1 to v 0 . In rectilinear space, there are two L-shaped connection options, shown by the dashed lines in Figure 1 a; one bend is required for each connection. These two options will provide di erent results when v 2 is connected to the tree, and a lower-L connection for v 1 is obviously better in this example, since it provides a shorter wire length. However, when the numberof sinks is large, it may bevery hard to see which option is better. As suggested in 2 , the decision can bedeferred until all the sinks are joined into the tree, and we can connect v 0 and v 1 by a soft edge that is formally de ned as follows.
De nition 1: A soft edge is an edge connecting two nodes v i ; v j 2 V , such that:
1. x i 6 = x j and y i 6 = y j , 2. its edge length l ij = jx i , x j j + jy i , y j j, 3 . the edge route between v i and v j is not determined. The soft edge connection between v 0 and v 1 is shown in Figure 1b . We will refer to the traditional edges in rectilinear tree with xed orientations as solid edges. The sink v 2 is connected to the routing tree at the closest connection C C point, de ned below, between v 2 and edge v 0 ; v 1 .
De nition 2: The closest connection C C point betwe e n a n o d e v k and an edge v i ; v j is de ned by its coordinates x C C and y C C such that: x C C = medianx i ; x j ; x k and y C C = mediany i ; y j ; y k .
Note that in de nition 2, the edge v i ; v j can be either a soft edge or a solid edge. If the C Cpoint does not coincide with either of v i ; v j and v k , a Steiner node is introduced at the C C point. In the example of Figure 1 , Steiner node v 3 is introduced. After this connection has been made, edge v 3 ; v 1 and v 3 ; v 2 are solid edges. Since all of the sinks have been joined to the routing tree, we can convert the soft edge v 0 ; v 3 in Figure 1 c into an L-shaped connection, as shown in Figure 1 d.
The advantage of using soft edges is that it provides a set of exible connection choices for subsequent routing steps and avoids premature decisions. In section 3, we will show that the use of soft edges also has some advantages that aid our bu er insertion algorithm. The concavity property exploited in non-Hanan optimization 10 depends on the concept of maximal segment 2 , which requires the assignment of a de nite orientation horizontal or vertical for each edge. Although by de nition, the orientation for a soft edge is not xed, the concavity property continues to hold for a soft edge, and we can extend the philosophy of non-Hanan optimization to general edges including both solid and soft edges.
Background on non-Hanan optimization
For a general form of a routing tree, shown in Figure 2, Corollary 1: Under the Elmore delay model, for any interval z l ; z r 0; C C , if the delay violation at a sink is positive when the connection point is either at z l or z r , then the delay violation at this sink is positive when the connect point i s a t a n y location in this interval. In Figure 3 , the delay violation functions of a two-sink net are depicted. If the objective is to minimize wire cost subject to timing constraints, the optimal connection Steiner point here is a point with a non-positive maximum delay violation, lying as close to C Cas possible; for this particular example, this corresponds to z . As in this example, the optimal connection point is, in general, likely to bea non-Hanan point.
The work of 10 showed this advantage of using non-Hanan points and proposed the MVERT Maximum delay Violation Elmore Routing Tree algorithm to perform the non-Hanan optimization globally for an interconnect routing tree. Based on properties similar to Corollary 1, MVERT nds the optimal connection point through a quasi-binary-search and obtains signi cant wire cost reductions. In fact, non-Hanan optimization can also help interconnect to meet timing constraints besides a ecting wire cost reductions. In the example of Figure 3 , it is probable that only non-Hanan points can satisfy the timing constraints for both sinks.
Non-Hanan optimization on soft edges requires additional speci cations, since the search on delay violation functions only provides an optimal Manhattan distance z and this does not, in general yield a unique point. For simplicity, we choose x 0 and y 0 both to be proportional to z. For example, if z = 0 :3C C , w e will choose x 0 = 0 :3x C C and y 0 = 0 :3y C C .
The motivation for using fourth order AWE
As interconnect wires become increasingly thinner and longer, the interconnect resistance may overshadow the driver resistance. Consequently, the downstream capacitance is shielded to the driver resistance by the interconnect resistance. This e ect is called resistive shielding 23 . The Elmore delay does not correctly take the resistive shielding e ect into account and tends to overestimate the delay. This error can be remarkably large, especially for the stub situation i.e., when a sink that is close to the source co-exists with a much longer wire, where the Elmore delay can be several times larger than the actual delay. Table 1 shows an example of a net with ve sinks to illustrate the inaccuracy of the Elmore delay. The routing topology of this net is illustrated in Figure 4 . The load capacitance is the same for each sink. The delays at all sinks are computed using the Elmore formula, fourth order AWE and a SPICE transmission line model, and the percentage errors relative to SPICE are calculated. The Manhattan distance from each sink to the source are also listed for reference. We can see that the error of Elmore delay can beover 300 and the delay from fourth AWE is clearly superior. In fact, as the minimum feature size shrinks, this trend will become more and more severe. To see how this will a ect non-Hanan routing, consider the graph in Figure 5 . The graph plots the delay violation function against the location of the connection point, z, as pictured in Figure 3 . The dotted curve indicates the Elmore delay while the solid curve represents the fourth order AWE result.
The solution corresponds to the point closest to C Cwhere the delay violation function is negative or zero. For the Elmore delay, which o verestimates the delay near the source, no solution is found, whereas an actual solution exists and corresponds to z .
On the other hand, we h a ve observed that the Elmore model tends to under-estimate delay at sinks far from the source 1 . This may lead to the opposite error, as can beseen in the last row of Table 1 . This under-estimation may result in over-reduction of cost while the timing constraints have not been satis ed yet. On the whole, a higher order model is greatly superior to the Elmore model in handling non-Hanan points.
The reason that we choose fourth order instead of a second or third order model is that a second order yields less accuracy and for many examples that we tried, and we found that the third order model induces positive poles more often. The additional computation cost of fourth order AWE as compared to a second order model is minor.
In the computation of fourth order AWE delay, we rst use the RICE algorithm 22 to obtain the moments. We solve the denominator of Pad e approximation result, which is a fourth order polynomial, using a closed-form formula to obtain the poles. After an inverse Laplace transformation, the time-domain exponential functions are expanded about the Elmore delay t o fourth order Taylor series polynomials. A closed-form solution to a fourth order polynomial exists and may be used to calculate the delay value. Since the Elmore delay may befar o from correct value, sometimes the expansion about Elmore delay m a y still cause signi cant error, though it is much smaller than the error from the Elmore delay. We restrict such error by another iteration with expansion about the result from the rst iteration. This process is iterated until convergence, and we found that we always converged within three iterations. This method is related to the Newton-Raphson root-nding method: the NewtonRaphson method uses a rst order Taylor series in each iteration, and our method uses a fourth order expansion instead.
3 The problem environment and problem formulation for BINO The BINO algorithm is applied in a post-placement scenario where bu er insertion is possible, but it is preferable to do so in regions that are left unoccupied by a n y cells, so as not to disturb the placement. For MCM technology, a bu er location is desired to bewithin a chip and close to its chip bond pads, because it is not cost e ective to insert a bu er either on the substrate between chips or within a chip but far from any bond pad. The input to BINO then includes a set of pre-de ned available bu er spaces scattered in the routing region. These bu er spaces are represented by small squares, as demonstrated by the dark grey areas b 1 and b 2 in Figure 6 a. It is assumed that only one bu er can be inserted in each space and the center of the bu er must lie within the square. Larger bu er spaces can easily be expressed as a union of small spaces.
Intuitively, a bu er space is considered for bu er insertion only when a routing path passes through it, since no extra wire cost is incurred under this condition. However, even if no path passes through a bu er space, it may beworthwhile for the wire to make small detour to increase the possibility of exploiting a bu er space. Based on this idea, we de ne a territory box for an edge as follows:
De nition 3: For an edge v i ; v j , its territory box is a rectangle speci ed by lower-left corner point x min ; y min and upper-right corner point x max ; y max , such that:
x min = minx i ; x j , , y min = miny i ; y j , , x max = maxx i ; x j + , y max = maxy i ; y j + , where is a small amount of o set.
The idea of a territory box is demonstrated by the light grey regions in Figure 6b . Note that the territory box for the soft edge v 0 ; v 3 is larger than for any solid edges between v 0 and v 3 . The rule that we will follow is as follows: a bu er space is considered for bu er insertion in an edge only when there is an overlap between this bu er space and the territory box of this edge. In the example of Figure 6 , bu er space b 1 overlaps with the territory box of edge v 0 ; v 3 and b 2 overlaps with the territory box o f v 3 ; v 2 ; therefore, we can insert bu ers v 4 and v 5 as in Figure 6c . After the non-Hanan optimization following the bu er insertion, the wire slack in Figure 6 c may beremoved and the tree shown in Figure 6d may be obtained. This example shows that the use of soft edges can greatly increase the possibility of overlap as compared to using predetermined L-shaped connection composed of two solid edges. We consider both inverting and non-inverting type bu ers in our work. The inverting type bu er is simply an inverter and the non-inverting type bu er is composed by a pair of cascaded inverters. The inverter model is same as the driver model in section 4.1 and has a medium driver size. The motivation for combining bu er insertion with non-Hanan optimization can be illustrated by the example in Figure 7 . As discussed in section 2.2, in order to reduce wire cost, it is desired to move the connection point as close to C Cas possible, i.e., to maximize z. However, the value of z may be capped by the constraint of non-positive delay violation as illustrated in Figure 7a . The utility of bu er insertion is to relax this timing constraint, if possible, so as to achieve further wire cost reduction as in Figure 7b .
We state the problem formulation as follows:
Given a source v 0 , a set of sinks V = fv 1 ; v 2 :::v n g, timing speci cations Q = fq 1 ; q 2 ; :::; q n g for all sinks and a set of available bu er spaces P = fp 1 ; p 2 ; :::; p m g, construct a Steiner routing tree and choose a subset B iv P and B ni P on which inverting and non-inverting bu ers are inserted, respectively, such that the following problem is solved: Figure 8a . The driver D 0 is minimum sized and will not be changed in driver sizing. The driver model that we will use is shown in Figure 8 b. The interconnect delay among these drivers are neglected. The driver sizing problem is to choose optimal number of driver stages h and the proper size for each driver. It is known that for optimal solution, the ratio of driver size at one stage to its previous stage is uniform and no less than 1. We refer to this uniform ratio as the stage ratio . The objective of FAR-DS is to minimize the cost of the routing tree, subject to a timing constraint at each sink. In contrast with 10 , we extend the cost here to include both wire cost and driver cost, i.e., we perform topology optimization and driver sizing simultaneously. The rationale behind this is to permit the driver to share the task of delay optimization with the interconnect by sizing it, thereby obtaining a better result than optimizing the driver size and interconnect topology separately.
We formally state the problem formulation as follows: The second term in the objective function is from the total driver capacitance. The objective function can be interpreted as a minimization of the total wire length and total driver capacitance. The parameter is a user-speci ed weighting factor.
Properties of solution regions for FAR-DS
For a general connection of a node and its downstream subtree to a partial tree, as illustrated in Figure   2 , where a node v k is to be connected to an edge v i ; v j , we investigate the properties of the delay violation function with respect to z and in a two dimensional space. The delay from the cascaded drivers is given by: with a 0 and a 1 being constants. The parameter C t is the total load capacitance seen by the driver in the last stage when v k is connected to v i directly. A derivation of the above results is described in the appendix.
When is xed, u i = fz is a quadratic function of z and the coe cient of the second order term is non-positive. Therefore we can obtain the following result: Property 1: u i = fz; is a concave function for a constant v alue of .
If we keep z constant, there are also properties that will help the search for the optimal solution. These properties can be found by i n vestigating the partial derivatives of u i with respect to as follows: This property is especially useful in solution search, since it predicts the bottom of the valley shaped delay violation function surface in the two-dimensional space of z and . One observation is that the curve in equation 9 is independent o f which sink is considered, i.e., equation 9 de nes the bottom of valley for the delay violation functions of all the sinks. We call the curve de ned by equation 9 the valley curve for delay violations.
In equation 9, when z is at C C , the numerator reaches the minimum and becomes the total load capacitance seen by the driver in the last stage when v k is connected to C C . Obviously, this total load capacitance is always greater than the minimum gate capacitance, C g , of a driver. This fact provides the following property: This valley curve also sets a border for di erent monotone properties with respect to as follows:
Property 6: For a speci c z, fz; is a monotone increasing function of when h+1 q Ct,cz Cg .
Property 7: For a speci c z, fz; is a monotone decreasing function of when h+1 q Ct,cz Cg . These properties are derived from Elmore delays. Though the Elmore delay may have large errors for speci c points, its qualitative delity is still true 2 and can serve as good strategic guide. Our experimental results also support this assertion. Both BINO and FAR-DS consist of two phases. Phase I is the routing tree construction process, which is the same for BINO and FAR-DS. This procedure is called SART Steiner AWE Routing Tree, and is similar to SERT except that the Elmore model is replaced by a fourth order AWE model.
In SART, starting with a single source, a partial routing tree is grown in a greedy fashion. In each growing step, a previously unconnected sink is selected and connected to an edge in the partial tree such that the maximum delay is minimized.
Phase IIof BINO and FAR-DS are di erent, but share a similarity i n the non-Hanan optimization framework. For reference, the outline of the non-Hanan optimization algorithm is shown in Figure  9 10 . In this framework, all of the sinks are sorted in the descending order of Manhattan distance from source. Then each n o d e v k and its downstream subtree T k are disconnected and reconnected back to the routing tree. The routing tree without T k is represented by TnT k . In the search for the best reconnection point, v k and T k are connected to each edge in TnT k tentatively at the local optimal point.
The connection point is selected to be the choice that gives the largest improvement according to the objective.
For two routing trees T 1 and T 2 on the same signal net, if T 1 cannot meet timing constraints and T 2 can result in a smaller maximum delay violation among all sinks, this implies an improvement from T 2 in spite of any cost increase. If T 1 can satisfy all the timing constraints, T 2 provides improvement only when it reduces cost and satis es the timing constraints. Perform non-Hanan optimization for T Figure 9 8. Insert bu er at p best , which gives the largest improvement 9. P P , p best Figure 10 : BINO, iterative bu er insertion algorithm.
In BINO, the non-Hanan optimization framework is embedded in a greedy bu er insertion scheme illustrated by Figure 10 . On each bu er space, we insert a bu er tentatively and conduct non-Hanan optimization. After all of the bu er spaces have been tested, the solution that can provide the largest improvement is chosen as the nal decision. This process is repeated iteratively until there is no improvement or no bu er space left. The optimal solution of assigning inverting or non-inverting type to each bu er line 6 in Figure 10 can be achieved through dynamic programming. Since we only insert one bu er in each iteration, the ability to obtain an optimal bu er insertion solution is hindered, as shown by the single-sink example in Figure 11 . It is well known that optimal bu er locations often distribute evenly along an interconnect path 15 . Therefore, for the net in Figure  11 , the optimal solution may be as shown in Figure 11 d. If we insert only one bu er in an iteration, the rst iteration is likely to result in the scenario shown in Figure 11 b and the optimal solution cannot be reached.
In order to alleviate the above di culty, w e supplement the method with an iterative bu er deletion procedure using a method similar to 7 , that is described in Figure 12 . In this scheme, we rst insert bu ers at all spaces that overlap with any edges. Then we delete one bu er in each iteration in a greedy fashion similar to iterative bu er insertion. Since this proceeds in the opposite direction as compared to the iterative bu er insertion, it plays a complementary role. For the example in Figure 11 , the iterative bu er deletion starts with c and can naturally result in the optimal solution in d. On the other hand, if the optimal solution is b, iterative bu er deletion is worse than iterative bu er insertion.
In our work, we perform both iterative bu er insertion and iterative bu er deletion independently for a net and choose the better of the two results. In Phase II of FAR-DS, a two-dimensional search replaces the role of the quasi-binary-search in MVERT, which is line 6 in Figure 9 , to nd an optimal connection point and driver size simultaneously.
When we reconnect a node v k to an edge v i ; v j , we look for a 3-tuple z; ; h such that the objective function of problem 3 is minimized while the maximum delay violations for all sinks are non-positive. We v ary h between 1 and h max and search an optimal z; pair in a two dimensional plane for a xed h value.
For this case, cW = C t , cz and the objective 3 can be translated to: minimize g = , c z+ 1 , C g + C d For a speci c value of g, the objective function above corresponds a curve in the z; plane, as the objective curves shown in Figure 13 . The objective 10 can beinterpreted as to nd a point i n z; plane such that the constraints in 10 are satis ed at this point and the point is on a objective curve as low as possible.
We will illustrate the optimal solution search s c heme through the Figure 13 . The solution search can be restricted within the rectangle bounded by 0 z C Cand 1 max . Consider the valley curve de ned by equation 9. This curve is always above = 1 in the interval 0 z C C , according to Property 4. One common scenario is that this valley curve i n tersects with upper border of the rectangle at a point a and with the right border at b, as in Figure 13 a. From Property 3 and Property 7, we can say that in the rectangle de ned above, u i reaches its minimum on the segment = max to the left of a, and on valley curve speci ed by equation 9 to the right of a. These two segments can be integrated into a single function:
= min max ; h+1 s C t , cz
which is the thickened line in Figure 13 a. namely, S 1 z 1 ; 1 ; we refer to this as the rst order solution. After the rst order solution has been found, the previously described solution re nement technique can beused to obtain two new smaller sectors shown by the shaded regions in Figure 13 c where the optimal solution will be searched. Even if there is no solution on this segment, the search region can bere ned to the two sectors like in d.
We call this solution search s c heme as valley-guided search V-search, and describe it in Figure 14 .
The above is the method to search optimal z; for a speci c h. The optimal h is found by a s w eep from h = 1 t o h = h max and the above search is carried out for each h value. The value of h max is given Since the use of valley curve increases the dependency of the solution on the Elmore delay model, and we use a higher order AWE model to evaluate the delays for every sink in our algorithm, it is possible that the discrepancy between Elmore model prediction and the actual AWE evaluation may give rise to a suboptimal solution.
We suggest an alternative search method called the iterative search I-search scheme that does not depend on Elmore model quantitatively and illustrate it in Figure 15 . In this method, we begin with an initial and perform non-Hanan optimization to obtain an optimal z for this value of . Next, this z is xed and an optimal is searched and so on. This process is repeated until there is no further improvement. From Property 6,  we know that the delay violation function u i is a convex function along direction, thus, we cannot apply the quasi-binary-search suggested by 10 along direction. We perform the search in a manner between binary search and linear search. If the maximum delay violation is non-positive for a speci c value of , w e continue to search a better solution at a smaller value, otherwise, we m ust search at both larger and smaller values. The computation factor from searching along the direction is bounded by max , 1= , where is the resolution on . Since the h value is swept from 1 to h max , the complexity of FAR-DS is Oh max n 4 L max .
The above results only provide a loose bound, because the worst case for the quasi-binary-search along the z direction is almost impossible in practice. Therefore, the computation cost in average case is one order lower than the above theoretic results.
Experimental results
The experiments are emphasized to test the improvement from our algorithms in terms of both timing and cost objectives de ned in the problem formulation. Each signal net is randomly generated and are tested for BINO and two F AR-DS algorithms, as well as SERT and MEVRT for comparisons. In order to obtain a more general conclusion, we include both IC and MCM technology in the experiments and the number of sinks ranges from 5 to 20. To form a common base for comparisons with FAR-DS, we use cascaded drivers also in SERT, MVERT and BINO, and choose h = 1 and = 2 :5, which can provide a middle level of driving ability. For all of the timing results, driver and wire delays are calculated from RC and a fourth order AWE model, respectively.
The experimental results are shown in Table 2 and Table 3 for IC and MCM technology, respectively.
The parameters for MCM are from 2 . The IC parameters correspond to 0.18 m technology and are scaled from the data in 2 . The bu er space locations for BINO are generated randomly. The area of each bu er space is chosen to be 100m 100m for IC and 200m 200m for MCM. According to our experiments, the variations of delay from the change of a bu er position within a bu er space is small and can beneglected. The o set for the territory box of an edge is set to behalf of the bu er space size. In experiment for FAR-DS, the value of max was chosen as 4 for both IC and MCM technology. Since we consider the situation where the interconnect resources are more stringent, the weighting factor for wire cost is chosen to be 0:7 for both BINO and FAR-DS.
In Table 2 and Table 3 , the left-most column is the number of sinks for each test. The parameter W is the total wire length and the u max is the maximum delay violation. The column labeled m corresponds to the number of input bu er spaces, and to its left is the number of bu ers, jBj, nally inserted. The CPU time in seconds for BINO and FAR-DS are also listed. The last row provides the percentage change of total wire length compared to the result of SERT.
Since the timing constraints are quite stringent, most of the maximum delay violations, u max , from the results of SERT are positive. Sometimes MVERT e v en results in a worse delay violation than SERT, due to errors from the Elmore delay model. In other cases, the improvements from MVERT on both delay and wire cost are limited and the timing constraints are often unsatis ed, because the speci cation is unachievable without driver sizing or bu er insertion. This hinders the ability of pure non-Hanan optimization to reduce the cost further and BINO or FAR-DS becomes a necessary step. Both BINO and FAR-DS can also satisfy the timing constraints that are impossible for SERT and MVERT.
Besides timing improvement, we can see that BINO and FAR-DS can reduce signi cantly more cost than MVERT under these somewhat harsh conditions. Sometimes MVERT m a y e v en increase the wire length to meet the timing constraint which can be seen from the result of the rst net in IC technology. The results from the two di erent v ariants of FAR-DS have no signi cant di erence.
Comparing the experimental results from BINO and FAR-DS, we can see that BINO can provide more wire cost reduction than FAR-DS in most cases, and the larger timing slacks from BINO also indicate its potential on dealing with even more stringent timing constraints. Although FAR-DS is not so powerful as BINO, it shows an adaptive nature that can often trade o the timing slack into less driver cost. This is especially true for the V-search scheme of FAR-DS, whose most timing slacks are close to zero.
These experiments were carried out on a SUN Ultra-10 station. The computation time of FAR-DS mostly depends on the size of signal nets while the CPU time of BINO is more irregular because it also depends on the number of bu er spaces overlapping with routing tree. In most cases, the CPU time is within one minute. In the worst case for a net of 20 sinks, the run time is less than four minutes for both FAR-DS and BINO. On the whole, the computational cost of our algorithm is reasonable, since these optimizations are carried out only for global timing-critical nets.
We also perform experiments to check the e ect from using soft edges on the same set of nets and the result is shown in Table 4 . This result con rms the conclusion that using soft edges can greatly increase the possibility that bu er spaces overlaps with routing edges. 8 Conclusion
When we extend the non-Hanan optimization to improve the performance of critical nets where both timing and wire resources are stringent, bu er insertion is shown to bea strong augmentation to the timing optimization toolkit, even with location restrictions. A combination of driver sizing and nonHanan optimization can provide a continuous two-dimensional space. A search for the optimum in this space may beguided by properties derived from the Elmore delay model, which may have large quantitative errors but good qualitative delity. These properties are used to direct heuristics that use a fourth order AWE model for wire delay calculation. For drivers, a more accurate model can be applied in place of an RC switch model in a similar fashion. Experimental results show that both BINO and FAR-DS can bring both timing and wire cost improvements signi cantly. As compromises to computation time, uniform driver stage ratio and RC driver model are adopted in this work. In future work, we would like to replace them with non-uniform stage ratio and a more elaborated driver model, respectively. Also an e cient integration of BINO and FAR-DS will be considered.
Acknowledgment
The authors would like to thank the anonymous reviewers for their constructive comments. 18 Both 2 and 3 are constants. If a sink is in T j , its Elmore delay is form of the sum of f 1 , f 0 and f 2 . When a sink is in T k , its Elmore delay is the sum of f 1 , f 0 and f 3 . If a sink is not in the downstream of n i , its Elmore delay is simply f 1 . In all these cases, the delay is either a linear or a quadratic function of z with non-positive coe cient for the second order term. Therefore, we can obtain the conclusion that delay or delay violation function for any sink is a concave function with respect to z, which is same as Theorem 1. For cascaded drivers, R d = R 0 h , and we need to take i n to account the driver delay T D = hR 0 C d + C g .
After we combine the interconnect with driver delay, we can obtain a general form of delay violation function vs. z and as follows:
fz; = ,a 2 rcz 2 + R 0 C t , cz h + a 1 z + R 0 C g h + a 0 19
