We pr ovide a new theoretical framework for constructing Steiner routing trees with minimum Elmore delay. Earlier work [3, 13] has established Elmore delay as a high delity estimate of \physical", i.e., SPICEcomputed, signal delay. Previously, however, it was not known how to construct an Elmore delay-optimal Steiner tree. Our main theoretical result is a generalization of Hanan's theorem [11] which limited the number of possible locations of Steiner nodes in an optimal delay rectilinear Steiner tree. Another theoretical result establishes a new decomposition theorem for constructing optimal-delay Steiner trees. We develop a br anch-andbound method, called BB-SORT-C, which exactly minimizes any linear combination of Elmore sink delays; BB-SORT-C is practical for routing small nets and for delimiting the space of achievable routing solutions with respect to Elmore delay.
Introduction
Due to the scaling of VLSI technology, i n terconnection delay dominates the design of high-performance systems [8, 1 7 ] . Performance-driven routing has thus received considerable attention; a typical goal is to minimize average or maximum source-sink delay i n a g i v en signal net. Early work, e.g. [9] , implicitly equated optimal routing with minim um-cost Steiner routing. More recent w orks recognize that delay minimization and wire length minimization can be far from synonymous. Cohoon and Randall [5] consider both the cost (total edge length) and the radius (longest source-sink path length) of the heuristic routing tree. Cong et al. [6] use a parameter to guide the tradeo between cost and radius minimization; Alpert et al. [1] a c hieve a more direct cost-radius tradeo between minimum spanning tree and shortest path tree constructions; and Cong et al. [7] propose the use of rectilinear Steiner arborescences [15] . Such previous routing methods have essentially \ge-ometric" objectives which are dicult to tune to specic technology parameters. Boese et al. [2] h a v e addressed this aw with a construction that greedily optimizes Elmore delay directly. Supporting investigations in [3] demonstrate that Elmore delay has high delity to physical (SPICE-computed) delay (i.e., near-optimal Elmore delay implies near-optimal SPICE delay). This conrms earlier studies by Kim et al. [13] and Vlach e t al. [19] .
A natural question at this point is: How m uch better is possible? What is the performance envelope for routing tree constructions? Boese et al. [3] used branchand-bound to construct optimal Elmore delay spanning trees and found that the Elmore R outing Tree (ERT) construction of [2] i s o n a v erage only 2.3% above optimal for 7-pin nets. The more signicant open question concerns the near-optimality of Steiner tree heuristics: the essential diculty has been a potentially unbounded number of candidate Steiner node locations, which makes even branch-and-bound impossible.
In this paper, we present new theoretical results that allow construction of Elmore delay-optimal Steiner trees. Our key result restricts the Steiner nodes in an optimal Elmore delay rectilinear Steiner tree to the \Hanan grid," generalizing a theorem of Hanan for minimum cost Steiner trees [11] . Using this restriction and a new decomposition theorem (which also applies to minimum-cost Steiner trees) we show h o w branch-andbound can construct a Steiner Optimal Routing Tree (SORT). Our results also give new restrictions on the structure of a SORT. Our experimental results establish that the SERT-C and SERT constructions of [2] are on average within only 5% of optimal for 5-pin nets and within 16% of optimal for 9-pin nets, depending on the technology parameters.
Preliminarie s
Previous performance-driven routing constructions generally address net-specic objectives (cost, radius, cost-radius tradeos, etc.) rather than sink-specic objectives which exploit the critical-path information typically available from iterated placement and routing phases of performance-driven layout. [2] showed that a signicant timing improvement i s a c hieved by minimizing delay to a single critical sink, with only a small tree cost penalty as compared to the 1-Steiner algorithm of [12] . Thus, we use the critical-sink problem formulation of [2] . 1 31 st ACM/IEEE Design Automation Conference ® Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying it is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1994 ACM 0-89791-653-0/94/0006 3.50
A signal net N consists of a set of pin locations fn 0 ; n 1 ; :::; n k g in the Manhattan plane, which are to be connected by a r outing tree T (N). Location n 0 is designated to be the source, with the n i locations (1 i k) denoting sinks. The cost of an edge in T (N) is the Manhattan distance between its endpoints. The cost of a routing tree T (N) is the sum of its edge costs. Elmore delay i n T ( N ) b e t w een n 0 and sink n j is denoted by t(n j ). Finally, each sink n i is given an associated level of criticality, i 0. Our goal is to solve the Critical-Sink Routing Tree (CSRT) Problem:
Given signal net N, c onstruct T (N) which minimizes P k i=1 i t(n i ).
Elmore Delay
Elmore delay [10, 16, 18 ] is a distributed RC delay approximation dened as follows. Given routing tree T (N) rooted at the source n 0 , let e v denote the edge from node v to its parent i n T ( N ). The greedy Elmore r outing tree (ERT) approach o f [ 2 ] minimizes Elmore delay directly during the construction of a routing tree. The ERT algorithm for spanning trees is analogous to Prim's minimumspanning tree construction [14] : starting with a trivial tree containing only the source, ERT iteratively nds a pin n i in the tree and a sink n j outside the tree so that adding edge (n i ; n j ) yields a tree with smallest delay in the growing tree 1 .
ERT extends to Steiner routing by allowing each new pin to connect to an edge rather than to a pin in the existing tree. Connections to an edge are always made so that the induced Steiner node is located at the point on the edge closest to the new pin. (The exact placement/embedding of an edge is allowed to vary within its bounding box.) Very substantial delay s a vings for all of the ERT v ariants are reported in [2] ; moreover, the ERT approach is ecient because Elmore delay t o all sinks can be evaluated in linear time.
Summary of Algorithms
The rest of this paper will concentrate on the following high-performance Steiner routing methods. 2 
Theoretical Results
We use the following short-hand conventions. \De-lay" will always be Elmore delay; \max delay" is the maximum source-sink delay in the net. Finally, a Steiner node is \on the Hanan grid" if it is located at the intersection of horizontal and vertical lines through pins in the net.
CSRT is NP-Hard
For any given set of circuit parameters 3 , the minimum cost Steiner tree problem can be reduced to the CSRT problem for a single critical sink, as shown in Figure 1 . The \generic" variant o f C S R T, which seeks to minimize maximum sink delay, can use the the same reduction by setting n c far away from n 0 so that the maximum delay occurs at n c . minimum-cost Steiner tree instance (N) reduces to a CSRT instance (N 0 ) with critical sink nc directly left of the pin n 0 in N with smallest x-coordinate. t(nc) is minimized by a tree with edge (n 0 ; n c ) plus the min-cost Steiner tree over N.
Branch-and-Bound for Optimal Delay Steiner Trees
The branch-and-bound method of [3] for optimal spanning trees starts with a tree containing the source and incrementally adds sinks to a growing tree while evaluating delay at each step. When the delay exceeds that of any complete tree seen so far, the search i s pruned and the algorithm backtracks. The algorithm avoids redundant testing of topologies by adding sinks in breadth-rst order, with sinks with the same parent connected in increasing index order. In this way, a n y tree topology will correspond to a unique ordering of the sinks and can be tested by the algorithm at most once.
We modify the method of [3] to nd the optimal delay Steiner tree by assuming that an optimal tree can always be constructed iteratively by connecting a sink by a new edge directly to the source or by a closest connection to some edge in the current tree 4 . The modication is simply that connections are considered to each edge in the current tree (plus the source), rather than to each pin. Branch-and-bound pruning is used again to reduce the complexity of the search. Redundant testing of topologies is greatly (although not completely) avoided by restricting the order in which sinks can be added to construct any particular topology. Figure 2 gives details of our Branch-and-Bound method for Steiner Optimal Routing Trees with a single Critical sink (BB-SORT-C). A simple modication to Step 11 can minimize a linear combination of delays or, for BB-SORT, minimize the maximum sink delay.
BB-SORT-C Algorithm Input: signal net N with critical sink n 1 Output: Steiner tree T over N having optimal t(n 1 )
call Add Sink(i,n 0 ) if (num pins(T ) = = j N j )
11. best = t(n 1 ); T = T 12. else 13. for j = 1 t o i 1 14. if (n j 6 2 T ) call Add Sink(j,Next(e)) 15.
for j = i + 1 t o j N j 1 16. if (n j 6 2 T ) call Add Sink(j,e) 17. T = Delete Connection(i,e,T)
Figure 2: Pseudo-code for BB-SORT-C. Note that n 0 is treated like an edge in Step 3 because connections are considered to the source and to all edges in the current tree. Procedures not dened in the template: Next(e) returns the edge after e in a list of edges in T ordered by when they were added to T ; Make Connection(i,e,T) connects n i to T by a closest connection to e; Delete Connection(i,e,T) reverses the call to Make Connection in Step 8.
3.3 Sub-Optimality of BB-SORT Figure 3 gives an example in which BB-SORT constructs a sub-optimal tree in terms of maximum sink delay. Figure 3(a) shows what appears to be the optimal tree, with maximum delay t(n 1 ) = t ( n 2 ) = 28.625. All Steiner nodes in this tree are on the vertical line 4 A closest connection to a given edge is made by creating a Steiner node at the point on the edge closest to the new sink. x = 1 : 5, which is outside the Hanan grid. Part (b) shows the tree returned by BB-SORT, with maximum delay t(n 1 ) = 28.641.
Given that the example in Figure 3 was constructed carefully by hand, we believe that other counterexamples are rare and that BB-SORT almost always gives the optimal \generic" Steiner tree. 
Optimality of BB-SORT-C
For any linear combination of sink delays, our branchand-bound method constructs the optimal tree. In this section we state the lemmas and theorem used to obtain this result, along with sketches of the proofs themselves. Complete proofs are contained in [4] . 5 
Denitions
Let T be a Steiner tree over net N minimizing f = P k i=1 i t(n i ), with each i > 0. 6 For convenience, we normalize time and distance so that unit wire resistance and unit wire capacitance both equal one. We consider any tree T as a set of nodes and edges, and so v 2 T for node v and e 2 T for edge e are well dened. A completely vertical or horizontal edge is called a straight edge; other edges are L-shaped.
The closest connection between three nodes is the location of the Steiner node in a minimum-cost Steiner tree over the nodes. 7 The closest connection between node v and edge e is the closest connection between v and the endpoints of e. Assume that a tree T is rooted at n 0 . W e dene T nv to be the tree induced by removing node v and its descendants from T , then removing all degree-2 Steiner nodes. We s a y that node v 2 T is connected t o edge e 2 T nv if its parent n o d e i n T is located on edge e (including perhaps an endpoint o f e ). If parent(v) is located at the closest connection between v and edge e 2 T nv, then v makes a closest connection to edge e.
Proof of Closest Connections in T Lemma 1:
Suppose node a 2 T , a 6 = n 0 , is connected to edge e 2 T na. Then either parent(a) = n 0 or a makes a closest connection to e in T .
Proof Sketch: Let x = parent(a) and let c be the closest connection between node a and edge e = ( p; b), as in Figure 4 . For convenience we o v erload x, a, b, c, and p to also represent the edge lengths from p to these respective nodes or locations. It is easy to see that x c, since otherwise moving x to c will reduce tree cost and reduce or leave unchanged all path lengths. For p x c, application of the Elmore formula shows that delay f is a concave function of x. 8 Consequently, f can only be minimized at the boundaries x = p or x = c. F urther application of the Elmore formula shows that the capacitances of edge (p; d) and subtree T d do not aect the concavity o f f for x between q and c, and so x 6 = p (unless p = n 0 ). Thus, either x = c or x = p = n 0 . By itself, Lemma 1 is not sucient to prove optimality of BB-SORT-C. The tree in Figure 5 has all nodes v 6 = n 0 either connected to n 0 or making a closest connection to an edge in T nv; h o w ever, this tree cannot be constructed by BB-SORT-C. 
Hanan Grid Proof for T
Dene a segment to be a contiguous set of straight edges in tree T which are either all horizontal or all vertical; a 8 We apply the Elmore formula for t(n j ) to three cases of n j : Proof Sketch: Figure 7 shows that jF a r ( M ) j j Near(M)j: I f S M is the smallest subsegment o f M with M's entry point q 0 as an endpoint and with jF a r ( S ) j <j Near(S)j, then S can be shifted to S 0 as in the gure, thereby reducing delay at some sinks while leaving delay at the others unchanged. Suppose that jF a r ( M ) j =j Near(M)j and that no subsegment S of M containing q 0 has jF a r ( S ) j <j Near(S)j. Then Figure 8 shows how M can be shifted to M 0 so as to reduce delay at some sinks without increasing delay a t a n y others.
Lemma 3: Any maximal segment i n T m ust contain either a sink or the source.
Proof Sketch: (See Figure 9. ) Let M be a maximal segment i n T not containing a pin and such that every MS below M topologically does contain a sink. Without loss of generality, assume that M is a vertical segment. Coordinates x 1 and x 2 in Figure 9 represent positions of M which w ould intersect nodes below M in the tree topologically (i.e., p 1 and p 2 ); x 0 represents the x coordinate of M. Application of the Elmore formula shows that delay function f is concave i n x 0 between x 1 and x 2 , and so x 0 = x 1 or x 0 = x 2 in T . I f x 0 = x 1 , then either p 1 is a sink or there is another vertical MS through An immediate corollary is a generalization of the classic result of Hanan [11] to the Elmore delay objective. 9 Corollary 1: Any Steiner node in T is located on the Hanan grid.
Decomposition Theorem for T
To prove the optimality of BB-SORT-C, we need to show that an optimal tree T can be constructed iteratively from tree T 0 = fn 0 g by successively adding some ordering of sinks n 1 ; n 2 ; : : : ; n k to create trees T 1 ; T 2 ; : : : ; T k =T with each n i making a closest connection to some edge in tree T i 1 . We start with 9 Hanan's original theorem may be viewed as a special case of this Corollary with the driver on-resistance r d 1 . T = T k and successively \peel o" sinks. At each step, we nd an interior node q 2 T i whose children are all leaves and peel o one of q's children. Any o f q 's children may be peeled o except P i n ( q ), which i s dened so that pins peeled o later will still make a closest connection to some edge in the current tree (see [4] ). Figure 10 gives an example of a possible pin ordering that could be used by the decomposition procedure.
In [4] we use this peeling decomposition to prove the following:
Theorem 1: There exists a sequence of subtrees T 0 = fn 0 g; T 1 ; T 2 ; : : : ; T k =T such that for each i , 1 i k , (i) there is a sink n i 2 T i such that T i 1 = T i nn i , and (ii) either edge (n 0 ; n i )2T i or n i makes a closest connection to some edge in T i 1 .
Corollary 2: BB-SORT-C is optimal for any positive linear combination of sink delays.
4 Implications: Steiner ERT's Are Near-Optimal
We h a v e implemented BB-SORT and BB-SORT-C in C on a Sun SPARC I ELC workstation, and compared them to the SERT and SERT-C heuristics of [2] and the 1-Steiner algorithm of [12] . Our results use four typical IC and MCM technologies ( 4.1 Near-Optimality of SERT-C Table 2 compares Elmore delay of trees constructed by the SERT-C algorithm and optimal Elmore delay trees found by BB-SORT-C for each of the four technologies. Net sizes range from ve to nine pins, limited by the exponential running time of BB-SORT-C. The table indicates that any future Elmore delay improvement b y Steiner tree heuristics will be limited to between 0.0% and 4.9% for 5-pin nets and between 0.1% and 15.8% for 9-pin nets. 
Elmore-Optimality of \Generic" SERT Algorithm
The counter-example in Section 3.3 showing that BB-SORT is not always optimal was carefully constructed by hand; even then, BB-SORT w as only 0.06% above optimal. Thus, we believe that BB-SORT is within one percent of optimal in essentially all cases. In Table 3 we compare SERT and 1-Steiner with the \SORT" trees of BB-SORT. It appears that the SERT constructions are very nearly optimal: the worst case occurs for IC2 and IC3 for jN j = 9, where SERT delays are only 3.9% above those of BB-SORT. 
Conclusions
Two main theoretical results show that the BB-SORT-C branch-and-bound method can be used to nd Steiner trees that are optimal for any linear combination of sink Elmore delays. Our rst result is a generalization of Hanan's theorem [11] to Elmore delay. W e then establish a new decomposition theorem for optimal Elmore-delay trees. When the objective is to minimize the maximum Elmore delay in a net, we give a counterexample for which our BB-SORT d o e s not return the optimal tree. Nevertheless, we believe that BB-SORT will almost always return a tree well within one percent of optimal.
BB-SORT-C and BB-SORT m a y be used for routing small nets; a more far-reaching implication of our results lies in delineating the achievable space of performancedriven routing solutions. Our simulations for the SERT-C heuristic of [2] indicate that it is within 5% of optimal on average for 5-pin nets and within 16% on average for 9-pin nets. The \generic" SERT constructions appear to be even closer to optimal (within 1.5% for jN j =5 and 4% for jN j=9).
