Existing routing problems for delay minimization consider the connection of a single source node to a number of sink nodes, with the objective of minimizing the delay from the source to all sinks, or a set of critical sinks. In this paper, we study the problem of routing nets with multiple sources, such as those found in signal busses. This new model assumes that each node in a net may be a source, a sink, or both. The objective is to optimize the routing topology to minimize the total weighted delay between all node pairs (or a subset of critical node pairs). We present a heuristic algorithm for the multiple-source performance-driven routing tree problem based on e cient construction of minimum-diameter minimum-cost Steiner trees. Experimental results on random nets with submicron CMOS IC and MCM technologies show an average of 12.6% and 21% reduction in the maximum interconnect delay, when compared with conventional minimum Steiner tree based topologies. Experimental results on multisource nets extracted from an Intel processor show as much as a 16.1% reduction in the maximum interconnect delay, when compared with conventional minimum Steiner tree based topologies.
Introduction
The competitive nature of the VLSI industry has created a strong demand for techniques to improve the performance of integrated circuits. Methods to increase speed, and to reduce area or power consumption, are of great interest.
Scaling of device dimensions has resulted in changes to many fundamental design goals: where previously the bulk of system delay had been generated by the switching times of devices, it is now common that the interconnecting wires between devices accounts for the dominating portion of the delay. These changes have created new areas in need of optimization, and new measures by which we gauge solution quality.
With smaller minimum feature size comes a reduction in transistor channel width and length, resulting in relatively constant transistor on resistance; the reduction in wire width on the other hand results in higher unit wire resistance 2]. As a result, the resistance ratio 8], de ned to be the driver resistance divided by the unit This work is partially supported by DARPA/ITO under Contract J-FBI-93-112, NSF Young Investigator Award MIP9357582, and a grant from Intel Corporation. length wire resistance, is reduced signi cantly. This shift produces a situation where the length of the path between a driver and sink can have comparable resistance to that of the transistor channel. Thus, changes to the interconnect length and topology can have a signi cant impact on delay. The result in 11] showed convincingly that interconnect topology optimization has a considerable e ect on interconnect delay reduction when the resistance ratio is small.
A number of optimized interconnect topologies have been proposed, including bounded-radius bounded-cost trees 9], AHHK trees 1], LAST trees 21], maximum performance trees 7], A-trees 11], low-delay trees 5], and IDW/CFD trees 18]. These methods consider both the traditional concern of low total wire length, and also the path length or Elmore delay between the source node and the timing-critical sink nodes.
Although many of these methods e ectively reduce the interconnect delay, all of them assume that there is a single source node driving one or more sink nodes and minimize the delay from the unique source to all sinks, or a set of critical sinks.
In practice, many timing-critical nets may have multiple sources, each of them controlled by a tri-state gate and driving the net at a di erent time. Signal busses are instances of such nets. In these cases, the existing performance-driven routing algorithms for single source nets may perform poorly, as a topology optimized for one source may result in high interconnect delay when some other source becomes active. Figure 1 presents a pair of four-node routing trees with the same wire length. The rst routing tree, optimized for node p 1 , has relatively high delay when node p 2 drives the net. The second routing tree provides a lower overall maximum delay when all four nodes might be sources or sinks. Delay times with respect to the driving nodes are shown in Table 1 .
Note that the second routing tree, which minimizes the maximum linear delay, does not fall entirely on the Hanan grid 16]. For the single source model under Elmore delay, 4] showed that an optimal tree which minimizes the maximum delay to any sink may not be contained by the Hanan grid, but also observed that these cases were rare. For problems with multiple sources, a solution restricted to the Hanan grid may be far from the optimal solution, as shown in Figure 1 . Therefore, we cannot restrict our search for solutions to this grid. Table 1 : Topology e ects on delay. For a net with multiple sources, the delay to a given sink depends on which node drives the net.
In this paper, we study the problem of routing nets with multiple sources. This new model assumes that each node in a net may be a source, a sink, or both. The objective is to optimize the routing topology to minimize the total weighted delay between all node pairs, where the weight between a node pair indicates the priority of delay minimization between this pair of nodes. We present an algorithm for the performance-driven multiple source routing tree problem based on construction of minimum diameter A-trees. Some preliminary results of our work were presented in ISCAS '95 12].
Problem Formulation
Given a set of points P = fp 1 ; p 2 ; :::; p n g on the Manhattan plane, and a non-negative weight W(p i ; p j ) as the weight between each pair of source p i and sink p j to indicate the timing criticality between this pair of points, the performance-driven multiple source routing tree (PD-MSRT) problem is de ned as nding a Steiner tree T which connects all points in P and minimizes the following two objectives:
Total weighted delay WD(T) between pairs of nodes p i and p j . i.e. WD(T) = P pi;pj W(p i ; p j ) delay(p i ; p j ).
Total tree length L(T), de ned as the sum of the lengths of each tree edge.
We assume that the rst objective has higher priority than the second one. For simplicity, one may assume that W(p i ; p j ) 2 0; 1], i.e. non-critical pairs of points have weight zero and critical pairs have weight one. Values between 0 and 1 provide a greater degree of freedom in \tuning" for performance optimization, although the heuristic presented here can make only limited use of this. The delay between a pair of points, delay(p i ; p j ), may be estimated using an appropriate model, such as the linear delay model (where delay is proportional to path length), the Elmore delay model 15], or calculated using SPICE.
Given a point p i 2 P, we use (x i ; y i ) to denote the x and y coordinates of point p i : We will utilize an additional point q in some proofs, and denote its location with (x q ; y q ). For any two points p i and p j , we de ne the distance d(p i ; p j ) between them as their Manhattan distance, jx i ? x j j+jy i ? y j j. Given a tree T, we de ne the distance between nodes p i and p j in T as d T (p i ; p j ), the sum of edge lengths along the unique path between the points. The diameter D of tree T over a set of points P, D T (P), is de ned as the maximum d T (p i ; p j ) over all pairs p i , p j . Given a point set P, we de ne the diameter D(P) of the set to be the maximum distance between any pair of points in the set. Clearly, we always have D T (P) D(P).
Note that if the weights of all pairs are zero, the PD-MSRT problem as we have formulated it becomes the classical minimum Steiner tree problem, which is NP-hard. Therefore, the PD-MSRT problem is also NP-hard.
However, if we do not minimize the total wire length, and only wish to minimize the total weighted Elmore delay, the complexity of the problem is not known.
When the delay bound of each pair of timing-critical nodes is given, one can also formulate the constrained multiple source routing tree problem as nding a Steiner routing tree which satis es the delay constraint between every timing-critical pair and minimizes the total tree length.
In the following, we consider a simpli ed version of the general problem. We treat the sources and sinks of the routing problem as nodes in a graph, or points on a plane, and restrict path weighting to f0, 1g. Our approach to this problem is through the construction of minimum diameter trees with minimized total wire length. The analysis in 11] indicated that total wirelength minimization under a shortest path constraint is an appropriate objective for single source routing problems in submicron design. We use that result as the motivation for our diameter-based objectives.
Minimum Diameter Tree Construction
For our algorithm, we minimize the maximum path length between any pair of critical source and sink, in order to minimize the maximum linear delay between any pair of critical source and sink.
In the case of a single driver, such minimization can be obtained by radius minimization, with direct paths between the driver and all sink nodes. Shortest path trees rooted at the source achieve this goal. A number of works address the radius objectives, both for general path length minimization, and also for skew minimization in clock nets 13, 3, 6, 20] . A minimumradius construction with a suitable root point may be also be a minimum diameter construction.
When there are multiple sources and sinks, path length minimization can be achieved by minimizing the maximum distance between any pair of nodes, which leads to diameter minimization. Our goal is to construct a minimum diameter routing tree with minimum total tree cost, as measured by a combination of maximum path length, average path length, and total tree length.
A number of results for minimumdiameter trees on the Euclidean plane were presented in 17]. In particular, it was shown that the diameter of the smallest enclosing circle for a set of points also gives the minimum diameter for a tree connecting those points. After determination of this minimum diameter circle, a star topology connecting the center of the circle to each point in the set was shown to have the minimum diameter possible of any Steiner tree over the points. We follow their general approach, but address the Manhattan plane and also pursue tree length minimization.
The work in 13, 3, 6, 20] can be used to construct minimum diameter trees, but they are concerned mainly with skew minimization instead of total tree length minimization. The Manhattan minimum diameter Steiner tree problem has not been explicitly studied in the literature. Our work studies the construction of minimum diameter Steiner trees in the Manhattan plane with minimized tree length.
In the next two subsections, we discuss general constraints related to Manhattan minimum diameter trees, present a pair of facts which give the lower bound for tree diameter, and then present a simple method, the Minimum Diameter A-Tree (MD A-Tree) algorithm, to construct trees that obtain this lower bound. A third subsection presents the Minimum Cost Minimum Diameter A-Tree (MC MD A-Tree) algorithm, which provides a method to optimize the tree construction.
Manhattan Tree Diameter Minimization
We de ne a tilted rectangle (TR) as a region de ned by a rectangle with sides at 45 degree angles with respect to the X and Y axis. Such a region may be de ned by a set of four equations, named boundary equations. In the Manhattan plane, the analog of the Euclidean circle is the tilted square (TS), the set of points p such that d(p; c) D=2 for some center point c and diameter D. A TS is a special case of a TR, where the distances between opposite sides are equal. Obviously, the maximum distance between any pair of points contained in a TS of diameter D is less than or equal to D. The distance between points on opposite sides of a TS will be D.
Given a point set P, STR(P) and STS(P) are the smallest tilted rectangle and a smallest tilted square enclosing P, respectively. Clearly, STS(P) contains STR(P). Figure 2 shows an STR and two STSs for a point set. Note that STR(P) is unique, but there may be a set of STSs for a given point set.
Fact 1 For a point set P, the diameter of the point set D(P) is equal to the diameter D of an STS containing the points.
Fact 2 A shortest path tree T rooted at the center c of an STS for a point set P is a minimum diameter tree.
Both of these facts can be derived from earlier works on zero-skew clock routing 13, 14, 3] 
Minimum Diameter A-Tree Algorithm
Our algorithm presented in this paper for Manhattan minimum diameter tree construction consists of two basic steps (as shown in Figure 3 ). The rst step is to identify the root point r of the tree, which could be the center of an STS, or some other point obtained by methods we will describe in section 3.3.4. The second step is to construct a shortest path tree rooted at r with low tree length. A rectilinear shortest path Steiner tree is also called a rectilinear arborescence 22] , or an A-Tree in short. Since the A-Tree algorithm 11] has proven to be very e ective in generating a shortest path Steiner tree on the Manhattan plane with near minimum total wirelength, it is used in the second step of our algorithm.
Alternatively, for large problems where run time is a consideration, a simpler heuristic by Rao et al. 22] (which was the basis for the A-Tree algorithm) may be used instead of the A-Tree algorithm.
The most basic variation of our algorithm is called the Minimum Diameter A-Tree (MD A-Tree) algorithm. It simply computes the STS for the point set P, and then constructs an A-tree rooted at the center of the STS. In an A-tree T rooted at point r, for any point p i in P, d T (p i ; r) = d(p i ; r), as the routing within the tree is guaranteed to be a shortest path. As Fact 2 indicates, the resulting routing tree will have minimum diameter.
1. Identify a root point r for the point set P. 2. Construct a shortest path tree connecting the root point r with each point in P. As was noted in Figure 2 , the STS of a set of points is not necessarily unique, and so there may be a number of acceptable \root" points for the center of a minimum diameter tree. The set of centers of STSs form a diagonal line, and has been observed previously 20]. It is not always necessary to place the root of the A-tree at the center of an STS, however. This freedom leads to an optimization approach that is discussed in the next subsection.
Minimum Cost Minimum Diameter A-Tree Algorithm
By using an A-tree construction, we ensure that the tree distance between any point and the root is equal to the Manhattan distance. Having this, it is easy to see that as long as the root point r satis es the constraint d(p i ; r) + d(r; p j ) D for all distinct p i and p j , the tree will have minimum diameter. This freedom allows for further optimization, with the possibility of reductions in tree length and weighted path length. An example of such an instance for the Manhattan plane is given in Figure 4 . By shifting the \center" point slightly, a reduction in tree length is obtained without an increase in the maximum diameter of the tree.
Feasible Region
As there can be more that one location that can serve as the root of a minimum diameter tree, we would like to compute this region precisely.
We de ne the feasible region (FR) of a set of points P as
If only a subset of point pairs are critical, we can de ne P c P as the subset of points which are part of a non-zero weighting, i.e., P c = fp i 2 P j W(p i ; p j ) 6 = 0 _ W(p j ; p i ) 6 = 0 for some p j g. D c is de ned to be the diameter of point set P c . We then de ne the critical feasible region (FR c ) of P c as The diameter for the FR c is less than or equal to the diameter for the FR. Note that the FR c does not place constraints on path lengths between non-critical pairs, so the path length between a non-critical pair may be greater than D c . When all node pairs are critical, the FR and FR c are equivalent, but in general, the two regions are not equivalent and may even be disjoint.
In the Euclidean plane, the constraint d(p i ; r) + d(p j ; r) D de nes an ellipse, with points p i and p j as focii. The feasible regions can be formed simply by intersecting a set of ellipses. A similar property holds for the Manhattan plane.
We de ne an octilinear segment to be a segment that is either horizontal, vertical, or has slope 1; an 2 . An example is shown in Figure 5 . An octilinear region (OR) is de ned to be a convex region that is bounded by no more than eight octilinear segments; it can also be represented by the intersection of no more than eight octilinear bound equations. For the following, we will utilize this property to nd the intersection of a number of ORs is at the center of the STS; point r 2 is within the feasible region for the set of points. Both trees have a maximum diameter of 12, but the tree rooted at r 2 , a point which is not the center of an STS, has lower tree length.
In order to determine the set of points which may serve as the center of a minimum diameter tree, we nd the intersection of all OEs for the point pairs. As the OEs are convex, their intersections (and the feasible regions) will also be convex. There are no more than O(n 2 ) point pairs, resulting in the same number of OEs. It can be shown that the intersection of any number of OEs can be represented by a single OR. The shaded OR in Figure 6 shows the feasible region for the root of a shortest path tree that will result in a minimum diameter tree.
Theorem 1 The FR(P) and FR c (P) for any set of points P and any set of critical pairs are non-empty.
Proof: Fact 1 showed that the minimum diameter of a tree connecting the points was equal to the diameter of an STS; let c be the center of such an STS. Fact 2 showed that a shortest path tree rooted at c has minimum As this point is contained by each OE, it is also contained by the intersection, and therefore the FR(P) and FR c (P) are non-empty.
2
Theorem 2 Any shortest-path tree rooted at point r 2 FR(P) is a minimum diameter tree. Any shortest-path tree rooted at a point r 2 FR c (P) is a minimum diameter tree over the critical points.
Proof: This arises directly from the de nitions of FR(P) and FR c (P). For any point r in the feasible region, d(p i ; r)+d(r; p j ) D for all pairs p i and p j ; a shortest path tree T rooted at r ensures that d T (p i ; r) = d(p i ; r), so d T (p i ; r) + d T (r; p j ) D. 2 As FR(P) and FR c (P) are non-empty, and points within these sets allow for the construction of minimum diameter trees, we will use them to guide our search for root points of low cost (in terms of path length or tree length) trees.
Clearly, construction of FR(P) and FR c (P) can be performed in O(n 2 ) time, by simply intersecting the n 2 OEs formed by all point pairs. In cases where n is large, it may be desirable to use a low complexity method to construct a shortest path tree (i.e., 22]). In the next subsection, we will present a method for linear time computation of these regions, preventing feasible region construction from dominating the run time.
Note that in general, the feasible region de nes an area. The nal \merging segment" obtained by the planar zero-skew clock routing algorithm of Kahng and Tsao 20] may be a subset of the feasible region.
Linear Time Computation of Feasible Region
In this section we show how to compute FR(P) in linear time; FR c (P) can be computed similarly by considering only critical points. We approach the problem by determining which pairs of points generate the most constrictive bounds for each of the eight octilinear bound equations that may de ne the feasible region. We will consider two cases: one for the uppermost horizontal bound of FR(P), EQ N , and one for the upper right diagonal bound, EQ NE . Other bounds may be obtained by similar methods. Pseudocode for the algorithms to compute these bounds is given in Figure 7 . Proofs that these algorithms are correct are given in the next two Lemmas. Figure 7 : Pseudocode for algorithms which nd the boundaries for the feasible regions in linear time. Other bounds may be obtained by simply replacing the smallest or largest value criteria with those speci ed in Tables  3 and 4. Lemma 1 For the two points p i ; p k 2 P, x i x k , which form the most constrictive upper horizontal bound EQ N of FR(P), the point p i will have minimal x i +y i value, and the point p k will have maximal x k ?y k value.
Proof: Let p i and p k be the two points which generate the most constrictive upper horizontal bount EQ N , and x i x k . Consider Figure 8 .
We By a similar set of arguments, it is clear that the p k element of the pair must have the largest x k ?y k value. In some instances, the point with the smallest X + Y and the point with the largest X ? Y may be the same; in this case, it is clear that the point will either be a \p i " or a \p k ". We consider the second smallest x j + y j and second largest p l :x ? p l :y points as well, with the most constrictive bound being formed from the point that the criteria select in common, and one of the secondary points. Lemma 2 The two points p i ; p j 2 P which form the most constrictive upper right diagonal bound EQ NE of FR(P) will have the two smallest x i + y i and x j + y j values.
Proof: Assume p i and p j are the point pair which most constrictive upper right diagonal bound EQ NE of FR(P).
Consider Figure 9 . Assume that we shift the input point set so that it is contained in the rst quadrant (this will not change the shape of the feasible region, or the points which provide the bounds). We label the intersection of EQ NE and the X axis as point q; the bound EQ NE which generates the minimal q will also generate the most constrictive bound on the FR.
Without loss of generality, we will assume that point p i is to the left of p j . For this proof, we will assume that y i y j , the other case being solved similarly. We now have the following set of equations. Figure 9 : Computation of the upper diagonal bound of the feasible region. The most constrictive bound on FR(P) will be generated by the linear inequality of OE(p i ; p j ) that intersects the X axis at the minimum q.
Theorem 3 FR(P) and FR c (P) can be computed in O(n) time, where n is the number of points in set P.
Proof: Fact 1 gave a linear time method to obtain diameter D(P), while Lemmas 3 and 4 gave linear time methods to identify the pairs of points which generate the most constrictive bounds for EQ N and EQ NE .
Other bounds can be obtained in a similar manner, using the criteria shown in Tables 3 and 4 . 2
Necessity of Feasible Region
The lemmas given above for the linear time construction of the feasible region are useful in proving another property of the feasible region: any minimum diameter tree must intersect FR(P). To prove this, we will need an additional lemma.
Lemma 3 The upper horizontal bound EQ N of FR(P) will intersect FR(P).
Proof: Consider Figure 10 . Let p i and p j be the two points with smallest and second smallest X + Y values respectively; Lemma 4 showed that these points generate the most constrictive upper right diagonal bound EQ NE . Similarly, the points which de ne the upper right diagonal bound EQ NW are labeled p k and p l . We identify the intersection of these two bounding lines as the point q, resulting in 2 Similarly the lower horizontal bound EQ S , and the two vertical bounds EQ E and EQ W also intersect the feasible region (if only at a single point). The diagonal bounds cannot restrict the feasible region away from a horizontal or vertical bound (leaving a \gap" between feasible region and one of these bounds).
Theorem 4 Any minimum diameter tree connecting a set of points P must intersect FR(P).
Proof: We will prove this by creating a contradiction: if a tree has minimum diameter over the points and does not intersect FR(P), it contains a cycle.
Assume we have a minimum diameter tree, and that it does not intersect FR(P). Consider Figure 11 p i ; p j ; p k ; and p l respectively. Lemma 1 showed that these points are the ones which generate the horizontal and vertical bounds of FR(P). In a minimum diameter tree, there must be a path from p i to p j of length no greater than D. This path cannot go above the upper horizontal bound EQ N , as this bound is generated by these two p i and p j . As Lemma 3 showed, EQ N must intersect the feasible region, if only at a single point, so if the path is to avoid the feasible region, not only must it go beneath the EQ N boundary, but beneath the entire feasible region, passing through some point q ij . Similar constraints are placed on the p i to p k path, the p k to p l path, and the p j to p k path.
Thus, we have a set of paths p i ! q ij ! p j , p j ! q jl ! p l , p l ! q lk ! p k , and p k ! q ki ! p i . Clearly, there is a cycle q ij ! q jl ! q lk ! q ki ! q ij , and the construction cannot be a tree.
2 Thus, we show that not only will selection of a root point from the FR(P) allow for the construction of a minimum diameter tree, but also that if a minimum diameter tree is required, the tree must pass through this region.
Summary of MC MD A-Tree Algorithm
As was shown in Figure 4 , some locations of shortest path tree root points will lead to lower cost trees (in terms of path length or tree length) than other points. The MC MD A-Tree algorithm follows the basic outline given in Figure 3 ; we rst determine a set of candidate tree root locations, then select one root point from the set of candidates, and then nally construct an A-Tree rooted at that point.
A number of variations on tree root restriction and selection are possible for the algorithm, and we summarize them here.
First, there is a choice of which feasible region to use, either FR(P) or FR c (P). The \appropriate" selection is dependent on both the locations of the critical sources and sinks, and their number. Table 8 in Section 4 compares the two approaches.
From within the feasible region, we may have a number of points that are acceptable roots for a minimum diameter shortest path tree; in fact, if there is no underlying grid, there may be an in nite number of acceptable points. To make root selection tractable, we restrict candidate root points to be either Hanan grid points within the feasible region, points at the intersection of the Hanan grid lines with the boundaries of the feasible region, and corner points of the feasible region. It should be noted that the feasible region does not always contain points on the Hanan grid; this is the case for Figure 1 . In this gure, the feasible region consists of the single point at the center of the STS for the point set, and the only candidate root point is a \corner point" of the feasible region.
We can easily show that points which are not candidate root points, but are in the feasible region, do not need to be considered as possible roots (for linear delay and tree length objectives). It was shown in 22] that an optimal arboresence can be found on the Hanan grid lines, and we assume that all trees considered are so constrained. We de ne a Hanan Proof: Without loss of generality, assume that the tree contains a vertical segment passing through the root at (x r ; y r ), and that a series of a horizontal segments branch o towards the left, and b branch towards the right. A shift in the root position by x r , while maintaining the same topology, results in a change in tree length of (a ? b) x r . Thus, for a given topology, and within a Hanan cell, we have tree length as a linear function of the x r .
2 Within a Hanan cell, and with a xed topology, we have tree cost as a linear function of the X and Y coordinates of the root. Given this, we have the following theorem.
Theorem 5 There exists a length optimal minimum diameter A-Tree, rooted at a candidate root point.
Proof: Assume we are given a length optimal minimum diameter A-Tree T , with a root point (x r ; y r ) within a Hanan cell, but not at a candidate root point. By the previous lemma, we can shift the root location of this optimal tree, and it's cost will change linearly.
This results in a simple linear programming problem, with the root location constrained by intersection of the Hanan cell C and the feasible region FR. Since both C and FR are convex, C \ FR is also convex. The minimum cost of T will be achieved at a corner point of C \ FR, which is in the candidate root set. 2 Similar properties hold if linear delay is our cost objective. From the set of candidate root points one must be selected to serve as the root of the A-Tree. We have performed experiments for root selection using two objective functions. One computes P d(p i ; r) or P W(p i ; p j ) (d(p i ; r) + d(p j ; r)) as an estimation. The other performs actual A-Tree construction to obtain accurate distance measurements.
Experimentally, we have found the best solution performance to be that of the minimumdiameter tree rooted within FR c (P), with the root minimizing the total tree length of an actual A-Tree construction. Detailed results are given in the next section.
The time complexity of our algorithm is comprised of two components. One is feasible region construction, which is O(n). The other is the complexity of the shortest path tree construction. If we utilize the A-Tree algorithm, the complexity for constructing each A-Tree is O(n 3 ), and the total complexity of the step is O(kn 3 ), where k is the number of candidate root points (at worst O(n 2 )).
If we use a faster A-Tree heuristic such as a variation of that by Rao et al. 22] , the complexity of each A-tree construction is reduced to O(nlog n). The total complexity of the step is O(knlog n). In this case, the overall complexity of MCMD A-Tree is O(knlog n), and the reduction of complexity for feasible region construction from (n 2 ) to O(n) is signi cant.
Experimental Results
To evaluate the performance of the routing topologies, we used HSPICE to simulate a sized inverter driving a minimum-width wired network. The transistor model used for both 0:5 m CMOS IC and MCM technologies was the 0:5 m CMOS IC technology \nominal" model supplied by MCNC. Interconnect resistance and capacitance parameters are shown in Table 5 ; these values are the same as those used in 11] and 4]. For each technology, we generated 100 test sets for each set size of 4, 8, and 16 randomly placed nodes. Table 5 : Technology parameters based on advanced MCM designs.
For the 0:5 m IC technology experiments, the surface area spanned was 1cm square, with a grid size of 10 microns (resulting in a 1000 by 1000 grid). Net segments were modeled as \ " circuits, with the capacitance of each wire segment divided between its endpoints. Segments longer than 1000 microns were broken into smaller segments (for example, a segment 1500 microns long is modeled as two -model segments, one of 1000 microns, and the other of 500 microns). Experimentally, we found that shorter segment sizes did not improve accuracy, but did increase simulation time. Transistor sizes for the inverters were 20:0 m 0:5 m and 19:0 m 0:5 m for NMOS and PMOS, respectively; these sizes were selected to provide roughly equal rise and fall times, and to provide delay values that were comparable to that of a 270 resistor (as was used in 11, 5] ).
For the MCM experiments, the surface area spanned was 10cm square, with a grid size of 100 microns (resulting in a 1000 by 1000 grid). Segments longer than 10000 microns were broken into smaller segments. Transistor sizes for the inverters were 200:0 m 0:5 m and 170:0 m 0:5 m for NMOS and PMOS, respectively; these sizes were selected to provide roughly equal rise and fall times, and to provide delay values that were comparable to that of a 25 resistor (as was used in 11, 5] ).
Delay for all cases was measured as the time between the input of the driver reaching 50% of the target value and the sink reaching 50% of its nal value. Rise and fall times for the inputs to the drivers were 0:1ns for all tests.
For each tree, we compute three types of delay. The rst, maximum delay (MD) is the maximum delay of any source node to any sink node in a tree. The second, average maximum delay (AMD), is the average of the maximum source-sink delays for each source. The third, average delay (AD), is the average of all source-sink delays. Delay results are in nanoseconds, while tree and path lengths are in centimeters.
The topologies compared are as follows.
Minimum Diameter A-Trees. The STS for the weighted points is obtained, and an A-tree spanning all points is constructed.
Minimum Cost Minimum Diameter A-Trees. The FR(P) or FR c (P) is constructed, and then possible root points from this region are evaluated by the construction of a tree at each point. The minimum length tree is selected as the nal topology.
1-Steiner trees. To obtain a tree with low total length, the 1-Steiner algorithm of Kahng and Robins 19] is used. These trees place no bounds on path length, but have very good performance for tree length minimization.
Note that existing performance driven interconnect optimization algorithms, listed in Section 1, assume a xed single driver, making comparisons with these methods inappropriate. As the optimal multisource routing solution does not lie entirely on the Hanan grid in general, the branch and bound method (as used in 4]) is di cult to apply. We are unaware of any existing multisource routing algorithm which can be used for a fair comparison. Also, there is no known method to compute an optimal multisource routing solution to evaluate the optimality of our algorithm.
Simulation results for 0:5 m CMOS IC parameters are given in Table 6 . While the 1-Steiner algorithm produces wire lengths that are from 1% to 8% lower, the average diameter of trees produced by the Minimum Diameter A-Tree and Minimum Cost Minimum Diameter A-Tree algorithms was from 6.6% to 24.2% lower. MCM technology examples, shown in Table 7 , produced identical length results (the MCM test sets are scaled versions of the CMOS IC test sets). When given the freedom to select a root point, the MinimumCost Minimum Diameter A-Tree algorithm obtained tree lengths that were from 1% to 2% lower than those of the Minimum Diameter A-Tree algorithm.
MD A-Tree and MC MD A-Tree produced similar delay results. Using 0:5 m CMOS IC parameters, the two algorithms produced as much as an 12.6% maximum delay reduction, and a 6.3% average maximum delay reduction compared to the 1-Steiner algorithm. Using MCM parameters, the two algorithms produced as much as a 21% maximum delay reduction, and a 15.2% average maximum delay reduction.
For these experiments, it is assumed that the weight between all pairs is 1, indicating that every path is critical.
Surprisingly, the MD A-Tree algorithm produced slightly better maximum delay values for the MCM examples than the MC MD A-Tree algorithm. In these cases, the \shifted" center location of the tree root results in some branches having disproportionally high capacitance, and the tree is \unbalanced." As has been observed in other high performance routing studies, the location of branching has an e ect on the total delay. An example is shown in Figure 12 ; the rst topology is a simple MD A-Tree construction which, while having higher tree length, has lower maximum delay. While the source involved in the maximum delay path is the same for both topologies, the sink which has maximum delay changes.
In the Figure 12 : An example in which a MD A-Tree topology has slightly lower maximum delay than a MCMD A-Tree, despite slightly higher tree length.
the required path length is P W(p i ; p j ) d(p i ; p j ). The required path length provides a lower bound (which in some instances cannot be achieved with a tree topology).
When only a subset of paths are critical, the MC MD A-Tree algorithm produced maximum delay improvements ranging from 6% to 14.2%. Trees rooted within the critical feasible region FR c (P) had slightly lower delay than those rooted in the feasible region FR(P). Tree lengths were comparable.
Note that performance improvements for multisource routing problems are not as dramatic as was observed with single source problems 11]. An optimization which may be bene cial for source p i may result in poor performance for a second source p j ; thus, the multisource problem has a greater number of constraints, resulting in less improvement in general. When only a small number of pin pairs are critical, the minimum diameter constraint may be overly restrictive.
Finally, we present an example of the application of these algorithms to a net extracted from an industry circuit. The Intel Corporation provided pin locations for six multi-source nets from one of their processors; for ve of these nets the pin counts were very low. For nets with few pins, the topologies and delay characteristics of the minimum cost minimum diameter A-Tree topologies were nearly identical to the 1-Steiner topologies. The sixth and largest net contained twelve nodes, consisting of three inputs, ve outputs, and ve bidirectional nodes. Some nodes of this net were extremely close together, resulting in eight discrete groups. The optimized MC MD A-Tree topology had a signi cant impact on this net. We evaluated the topologies using HSPICE and the 0:5 CMOS IC technology parameters. The 1-Steiner and MC MD A-Tree topologies for this net, as well as delay and lengths are shown in Figure 13 , For the single largest net, MC MD A-Tree provided a 16.1% reduction in the maximum delay, while the average reduction over the six nets was 2.7%. For all nets, the MC MD A-Tree topology had a maximum delay that was less than or equal to the maximum delay of the 1-Steiner topology.
In all cases, the time required for topology construction was much smaller than the time required to perform the HSPICE simulations, and the bulk of topology construction time was consumed by the A-Tree algorithm. The most complex variation, MC MD A-Tree, had run times ranging from 0.6 seconds to 6 seconds for examples with 16 nodes and all pairs critical. The run time of the algorithm was strongly in uenced by the size of the feasible region, as this impacts the number of candidate root locations; a larger feasible region (which can occur when fewer points are critical) resulted in a larger run time.
Conclusions and Future Work
We have formulated a new performance driven routing problem which considers interconnect topology optimization when there are multiple sources in a single net, and have presented a heuristic solution for the problem through the construction of Minimum Cost Minimum Diameter A-Trees.
We have also given a linear time algorithm to determine the set of possible root locations for a minimum diameter shortest path tree, and shown that any minimum diameter tree must pass through this region. For most problems, the run time of the A-Tree algorithm dominates total run time; if we adopt a lower complexity tree construction approach, linear time construction of the feasible region becomes bene cial.
When compared with the 1-Steiner algorithm, minimum diameter A-trees produced lower maximum and average delays, with slightly higher tree length. Consideration of the feasible region resulted in reductions in tree length, and also in delay for most cases. Table 8 : Comparison of 1-Steiner, MD A-Tree, and MC MD A-Tree algorithms for maximum delay (MD), average maximum delay (AMD), and average delay (AD) through HSPICE simulation with 0:5 m CMOS IC technology parameters. All test sets consisted of 8 nodes, with only a given number of node pairs being critical. Also included in this table are the average weighted path lengths (WPL), weighted diameter (WD), and required path length (RPL). The required path length provides a lower bound for the weighted path length, and in some cases cannot be obtained. Experiments considering both the feasible region (FR(P)) and the critical feasible (FR c (P)) were performed.
We are currently investigating a number of related performance driven multiple source routing problems. We are interested in the use of path length bounds for critical pairs while minimizing total wire length, possibly through the extension of previous bounded radius bounded cost algorithms such as BRBC 9] and AHHK 1]. Driver and wire sizing has proven e ective for single source routing problems, and we are extending these to the multiple source domain. For practical VLSI routing applications, we are also considering routing in the presence of obstacles and non-tree topologies with low total tree length. All of these problems will need extension to support more accurate delay models, so that optimization can be done for delay constraints rather than geometric constraints.
