Abstract-In this paper, we study the interconnect layout optimization problem under a higher order resistance-inductance-capacitance model to optimize not only delay, but also waveform for interconnects with nonmonotone signal response in the context of multipchip-module global routing. We propose a unified approach that considers topology optimization and waveform optimization simultaneously. Using a new incremental moment-computation algorithm, we interleave topology construction with moment computation to facilitate accurate delay calculation and evaluation of waveform quality. Our algorithm considers a large class of routing topologies, ranging from shortest path Steiner trees to bounded-radius Steiner trees and Steiner routings. We construct a set of required arrival-time Steiner (RATS) trees, providing smooth tradeoffs among signal delay, waveform, and routing area. When combined with the MINOTAUR MCM global router (Cong and Madden, 1998), (Madden, 1998) that we have developed, the RATS-tree solutions prove to be effective in reducing overall routing congestion.
I. INTRODUCTION
As very large scale integration (VLSI) circuitry reaches deep-submicrometer device dimension, operates at gigahertz clock frequencies, and is packaged in highly integrated multichip modules, the performance of interconnect structures becomes a dominant factor in determining system performance. Recent studies showed that interconnect performance could be optimized with several techniques, which included topology optimization, wire-sizing optimization, and/or repeater optimization; [9] gave a comprehensive survey of these techniques. However, most of these techniques were applied independently and were based on resistance-capacitance (RC) models that might not be appropriate for multichip-module (MCM) designs in which the inductance effect dominated. Furthermore, because they assumed a single "optimal" layout structure for each timing-critical net, these layout techniques stringently restricted the global routing solution space. Such a restriction significantly reduced the flexibility that a global router had in minimizing global congestion. In this paper, we study the problem of interconnect layout optimization for performance and signal integrity under a higher order resistance-inductance-capacitance (RLC) model and its application to global routing for MCM designs. Our approach has the following advantages.
1) It considers a higher order moment-based RLC model, which is more suitable for MCM designs. Most of the previous results on interconnect optimization were achieved under RC interconnect models for integrated circuit designs only. Studies on topology optimization such as A trees [7] , low-delay trees [1] , iterative Dreyfus-Wagner and constructive force-directed Steiner trees [16] , non-Hanan routing [17] , wire-sizing optimization [3] , [6] , [7] , [11] , [12] , [27] , [31] , as well as the more recent studies that combined topology construction with repeater insertion and/or wire sizing [18] , [28] , [29] , [34] assumed RC models for the interconnects. With the exceptions of [12] , [18] , and [29] , which considered higher order RC models, all other studies used the Elmore delay model [10] , which accounted for the first order RC effect only. These studies did not consider the inductance effect, a dominant factor in high-speed MCM designs. Our proposed paper overcomes these shortcomings by considering an RLC interconnect model during the optimization process. By employing a higher order RLC model in our study, not only can we improve the accuracy of delay estimation, we can also better approximate the time domain waveform. 2) Our method is capable of constructing a large class of routing topologies, ranging from shortest path Steiner trees to boundedradius Steiner trees and Steiner routings. All of the previous topology construction algorithms were limited to a single class of routing topology. For example, the A-tree algorithm considered only the shortest path Steiner trees [7] and the set of topologies generated by the P-tree algorithm [28] was restricted to a permutation-induced abstract topology predetermined by a traveling salesman heuristic. By considering a large class of routing topologies, our algorithm is able to produce a set of topologies, providing smooth tradeoffs among signal delay, waveform, and routing area. It generates, for each timing-critical net, several timing-correct topologies from which a global router can select one that is compatible with the routing topologies of other nets. Such flexibility results in significant improvements in the overall congestion levels of global routing solutions.
Section II of this paper formulates the interconnect layout optimization problem. In Section III, we present the proposed interconnect layout optimization algorithm. In Section IV, we present an incremental approach for computing moment in a bottom-up fashion. We present the experimental results in Section V and conclude the paper in Section VI.
II. PROBLEM FORMULATION
Given a net of pins or terminals fs0; s1; s2; . . . ; sng to be electrically connected, we assume that s 0 denotes the source (or driver) of the net and that the rest of the pins are sinks (or receivers). We use PT(u; v) to denote the unique path from u to v in an interconnect tree T; d T (u; v) the path-length of P T (u; v), and d(u; v) the Manhattan distance between u and v. We denote the signal delay from u to v by tT(u; v). The source node s0 will generally be referred to as the root of an interconnect tree and each node v in the tree is connected to its all sinks, an optimal RATS tree is an optimal Steiner arborescence [26] . If we relax the requirement such that all sinks have the same required arrival time, then an optimal RATS tree is an optimal bounded-radius Steiner tree [4] . Lastly, an optimal RATS tree with unbounded qi's is an optimal Steiner tree [22] . The path-length formulation captures the delay for unloaded lossless transmission lines in MCM printed circuit board designs perfectly. In this formulation, the output response at the end of an interconnect e is a replica of the input signal delayed by the time-of-flight (or propagation
LCjej where L and C are the unit-length interconnect inductance and capacitance, respectively, and jej is the length of the interconnect. In general, it is assumed that p LC is a constant. Therefore, the time-of-flight (or propagation delay) on the path from the source to sink si is t f (s0; si) = e 2P (s ;s ) p LCjev j, which is proportional to the path length d T (s 0 ; s i ).
A more general formulation is to model MCM interconnect structures as lossy transmission lines. Under this formulation, the delay at each sink is the sum of the propagation delay t f and the rising/falling (or transition) delay t of the signal response waveform [32] tT(s0; si) = t f (s0; si) + t(s0; si): (1) In this paper, we model the transition delay with a higher order moment-based delay model. More specifically, we use the two-pole-based analytical delay model proposed in [21] 
where m j i is the jth order moment of the voltage transfer function of node i. Moments of an RLC interconnect can be computed by the methods proposed in [20] and [33] . In this paper, we present a new approach to moment computation in Section IV.
Signal response waveform is another important factor in interconnect design. Under ideal situations, one would prefer the transmission of the input signal to the output not to be distorted. However, due to impedance mismatch and reflection, ringing may occur at the output node, resulting in excessive settling time and voltage overshoot or undershoot, which may adversely affect the circuit performance. In this paper, we define the signal settling time to be the time taken for the signal to settle above 90% of V dd . Voltage overshoot (undershoot) is the maximum deviation over (under) the final voltage.
Similar to [13] , [14] , and [25] , we use moments as an indirect metric to measure signal quality. For example, if we use the two-pole model in (2) to model the interconnect, then ringing can be attributed to the existence of complex poles in the transfer function. The condition for the poles to be complex (i.e., for the output response to be nonmonotonic, or underdamped) is for i = 4m 2 to be negative. When i is strictly positive, we have an overdamped and monotone response. When i is exactly zero, the signal is said to be critically damped. An underdamped signal generally has a faster transition delay when compared to a damped signal. With other attributes (such as the signal delay and the wiring length) being equal, we prefer a RATS tree with i 's as close to zero as possible. 1 To summarize, we propose to solve the following problem. In general, the RATS-tree algorithm follows a branch-and-bound (B&B) paradigm by considering the merging and the skipping (of merging) of subtrees at a Steiner merging point as in [26] . There are three significant differences. First, we consider the rerooting of a subtree (a concept introduced in [19] ) at Hanan grid points. Second, [26] was purely based on path-length delay model; we consider a higher order RLC model. Third, we generate a set of topologies.
1) RATS
Each node in the B&B search tree is associated with a peer topology set T that contains a forest of subtrees constructed so far and a scan level K = order(m), where m is the Steiner merging point last considered in the subtree merging process. The B&B search starts with a peer topology set with all single-terminal trees and a scan level K = 1.
Not to be confused with a node in a constructed topology, we refer to a node in the B&B search tree as a B&B node.
We expand a B&B node characterized by (T ; K) by considering a new Steiner merging point m among all Steiner merging points of tree roots in T such that m is the highest ordered node with order(m) < K. Let P (m) = fpj m p; Tp 2 T g denote the set of tree roots in T that dominate m. For each node p 2 P (m), a merging operation connects m and p by a shortest path and eliminates Tp from T .
We make a shortest path connection between m and p by growing Tp along the Hanan grid points from p to m in a bottom-up fashion. A new subtree Tm is added to T and K is updated to order(m). If the new Steiner merging point is a terminal, then such a merging is called a terminal merging. Otherwise, it is called a Steiner merging. Note that when xm 6 = xp and ym 6 = yp, there are several shortest Manhattan paths from m to p. We consider the two single-bend routings between m and p; in contrast, [26] considered only a single routing. As in the B&B-based minimum rectilinear Steiner arborescence algorithms in [26] , we consider the skipping of a Steiner merging operation. In this case, while K is updated to order(m), we keep T unchanged. Therefore, skipping a Steiner merging operation generates an additional child B&B node in the B&B search tree.
Both terminal merging and Steiner merging operations consider merging at only the roots of subtrees, thereby producing only shortest path trees. In order to consider a large class of routing solutions, we allow merging at nonroot nodes of the subtrees. We achieve nonroot merging by rerooting. After each terminal merging or Steiner merging, the resultant topology T is rerooted at various Hanan grid points in T, creating several topologies that consist of the same nodes and edges in T, but rooted at different nodes. These rerooted topologies have identical routing structure, but different sink delays and signal waveforms. Therefore, rerooting creates several sibling B&B nodes for the B&B node newly generated by the terminal merging or Steiner merging operations. Fig. 1 shows a partial B&B search tree corresponding to the application of the RATS-tree algorithm to a five-terminal net. In this example, we assume that the source is at the origin and the sinks are in the first quadrant. In the B&B search tree, root nodes of peer topology sets are depicted by empty circles and nonroot nodes by filled circles.
For illustrative purpose, the scan level K = order(m) is shown as a dashed line of distance jmj from the origin. The topmost starting B&B node is expanded into the center row of B&B nodes. The leftmost B&B node in the center row is the result of Steiner merging and the rightmost node corresponds to the skipping of the Steiner merging operation. The leftmost B&B node is further expanded by rerooting its newly created subtree at other Hanan grid points. The middle two B&B nodes in the center row show two out of the three rerooted topologies. The leftmost B&B node in the bottom row is due to the application of terminal merging to the leftmost B&B node in the center row. While we do not skip terminal merging, rerooting still applies.
A brief outline of our RATS-tree algorithm is given in Fig. 2 . We use a queue to implement a breadth-first traversal of the B&B search tree. At the beginning of the algorithm, the queue contains a B&B node with a peer topology set T containing all single-terminal subtrees and a scan level K = 1. We use the pair (T ; K) to denote a B&B node.
Until the queue is empty, the algorithm iterates the expansion of a B&B node removed from the head of the queue by either performing a terminal/Steiner merging operation or skipping a Steiner merging operation. In the case of subtree merging, the resultant topology is rerooted to generate sibling B&B nodes. All newly generated B&B nodes are appended to the end of the queue.
We explore the B&B search tree in a breadth-first traversal order. The reason for the breadth-first expansion order is to facilitate as much pruning as possible. We prune the solution space as follows: each topology T is associated with a triple (Cap(T); Slack(T); SQ(T)), where Cap(T) is the total capacitance, Slack(T) = mins 2T (qi 0 tT(si)) with tT(si) being the two-pole sink delay of s i in T (Section II), and SQ(T) is the signal quality of the topology. Recall that we use the metrics i = 4m as a measure of the degree of damping for sink si. If i > 0, si is overdamped. If i < 0, s i is underdamped. Otherwise, it is critically damped. Therefore, we measure the signal quality of a tree T by SQ(T) = min s 2T i ; the signal quality is measured by the worst signal response waveform among all sinks. If two RATS trees T and T 0 are rooted at the same node and cover the same set of sinks, then they share the same alias. Considering two topologies T and T 0 that have the same alias, we say that T 0 is redundant if Cap(T) Cap(T 0 ); Slack(T) Slack(T 0 ); SQ(T 0 ) SQ(T 0 ), and at least one of the three inequalities is a strict inequality. The RATS-tree algorithm given in Fig. 2 assumes no pruning of the B&B search tree. To consider pruning, a B&B node with (T ; K) is expanded only if all partial RATS trees in T are irredundant. After a new subtopology is constructed, the RATS-tree algorithm prunes the set of topologies that share the same alias as the newly created topology.
Although the pruning technique is very effective in keeping the solution space small, it should be noted that it is a heuristic, as the signal delays in different edges under a higher order RLC model are not independent as in the case for the Elmore delay model-the Elmore delay contributed by each edge, which is due to the edge resistance and the total downstream capacitance, can be computed independent of topology construction at the upstream. This is due to the additive property of the Elmore delay. Under the higher order delay model, however, adding a new edge at the upstream affects signal delays contributed by the downstream edges, since the (p + 1)th order moments depend on the pth order moments (see Section IV). The fact that the addition of an upstream edge also has different levels of effect on the signal integrity of downstream sinks further complicates the matter. In Section IV, we present a bottom-up algorithm that computes the moments of sinks in a topology in an incremental fashion, exploiting the fact that the interconnect structure changes only slightly from an iteration to the next in the RATS-tree algorithm.
IV. INCREMENTAL BOTTOM-UP MOMENT COMPUTATION
Moments can be computed by the polynomial-time algorithms in [20] and [33] . However, these works compute moments by traversing the entire tree iteratively and do not allow incremental computation of moments as the tree topology changes. As a result, when the topology changes during routing tree construction, another round of iterative tree traversals is needed to recompute the moments. Even when we restrict the topology change to a simple addition of an RLC segment to the root of the original tree, which is the basis of our bottom-up topology construction algorithm, moments cannot be incrementally updated easily using the previous methods. Therefore, these previous approaches are not suitable for our RATS-tree algorithm. An independent study by [29] recently presented a method to incrementally compute moment in a bottom-up fashion for RC interconnects. However, it did not consider the inductance effect. In the following, we present our algorithm to handle moment computation of an RLC tree.
Consider an RLC tree Tv rooted by node v. For any node w in Tv, be the total pth order moment weighted capacitance of Tw [33] , where Cj is the capacitance connected to node j . The moment-weighted capacitance is also called the subtree admittance in [20] . Now, we add a new edge uv at the root of Tv to obtain a new tree Tu rooted at u (see Fig. 3 ). Let C p T be the new total pth order moment weighted capacitances of T w for w in T u . Similarly, let m p w be the new pth moment of node w in Tu. Let Rv; Lv, and Cv be the total resistance, inductance, and capacitance of the edge uv, respectively. In [33] , moments were derived recursively as follows: 
Proof: Equations (4)- (6) follow from (3) directly. For p = 0, we can verify that (7) is trivially true. For p = 1, (7)-(9) are trivially true since the first moment is equivalent to the Elmore delay and one can easily verify that (7)-(9) compute the Elmore delay. We will prove that (7)-(9) hold for p > 1 by induction.
Assuming that (7)- (9) Therefore, (7) is true. Therefore, (9) holds for all child nodes of v, i.e., it is true for all descendant nodes one hop from v. Assuming that (9) holds for all descendant nodes h hops from v, we shall prove that it also holds for all descendant nodes h + 1 hops from v. Let w be a descendant node h hops from v and z be a child node of w. Therefore, by induction on the number of hops a descendant node is from v, (9) holds for all nodes in T v . Then, by induction on the order p, (7)- (9) (7) and (8) . Then, we can update the moments of all the sinks in the topology by (9) . From the above theorem, we can state the following corollary, which allows us to incrementally update sink moments during the merging operation and compute the total pth order moment weighted capacitance at the new root u. 
From Theorem 1 and Corollary 1, the time complexity to update the moments of n sinks in a tree is O(n 1 p 2 ). The auxiliary space requirement is O(p). On the other hand, the time complexity of the method proposed by [33] is O(g1p), where g is the total number of grid nodes in the tree and the auxiliary space requirement is O(g). Since our RATS-tree algorithm is based on the Hanan-grid, g could be in the order of O(n 2 ).
We can integrate the incremental bottom-up moment computation algorithm with our RATS-tree algorithm easily. For each irredundant topology constructed by our algorithm, we keep their corresponding sink moments (up to a prespecified pth order) for the topology. As we grow a topology along the path of Hanan grid points toward the new root in the merge operation, we compute the length of each new edge and derive the interconnect resistance, inductance and capacitance. These RLC parasitics are used to update the moments of the sinks using Theorem 1. We then use Corollary 1 to compute the weighted capacitances at the new root. The RATS-tree algorithm then prunes the set of topologies that share the same alias as the newly created topology.
V. EXPERIMENTAL RESULTS
We implement the RATS-tree algorithm in C++ language and evaluate the algorithm for MCM designs in two groups of experiments. The first experiment demonstrates the ability of the RATS-tree algorithm to construct a set of topologies that provide a smooth tradeoff among signal delay, waveform, and routing area. The second experiment demonstrates how RATS-tree solutions could provide a performance-driven global router new opportunities in reducing overall routing congestion level.
A. Results on Random Nets
In the first experiment, we use randomly generated netlists with six to 12 terminals on a 10 2 10-cm MCM substrate. In the first set of experiments, we run the RATS-tree algorithm under the path-length formulation to investigate the tradeoff between the path-length and the routing cost of the topologies generated. Ten random n-pin nets are generated for each n ranging from six to 12. We apply the RATS-tree algorithm under the two-pole model. The purpose of this experiment is to investigate the impact of routing topology on signal delay and integrity. The interconnect parameters that are used by our algorithms for moment, delay, and signal quality computations are obtained from the Micro Module System (MMS) D500 process on aluminum offered through MIDAS. Assuming a nominal width of 19 m, the interconnect resistance, inductance, and capacitance are 236.84 /m, 301.49 nH/m, and 128.99 pF/m, respectively. The load capacitance of each sink is assumed to be 1 pF, and the driver resistances range from 10 to 30 , depending on the size of the net and the proximity of the terminals.
For each net, we set the required arrival time of sink s i to be k 2 p LCd(s0; si), where k > 1; k is larger than one in order to account for the transition delay. For each net, our algorithm constructs a large class of topologies satisfying the delay requirements. We then run SPICE simulations using the transmission line model to evaluate the sink delay and measure signal integrity in terms of the signal settling time and voltage overshoot of these constructed topologies. The signal delay is the time taken for the output response waveform to reach 90% of V dd (assuming a rising signal) for the first time. The signal settling time is the time taken for the signal to settle above 90% of V dd . In general, a sufficiently large set of topologies is generated by our algorithm, which provides tradeoffs among maximum sink delay, signal settling time, voltage overshoot, and routing cost. Table I shows the maximum delays, signal settling times, voltage overshoots, and routing costs for the topologies generated by our RATS-tree algorithm for one of the randomly generated 9-pin nets for k = 2; 3, and 6. We also include the topology generated by the Borah-Owens-Irwin (BOI) Steiner algorithm proposed by [2] . Both the delay and settling time of each topology are in nanoseconds and Also included are the maximum sink delay, signal settling time, voltage overshoot, and routing cost for the BOI Steiner topology [2] .
the total wire capacitance is in picofarads. The voltage overshoot is normalized with respect to V dd .
When k = 2, most of the RATS trees generated are shortest path Steiner tree. As we can see from Fig. 5(a) -(e), except for RATS5, the topologies RATS1-4 are all shortest path Steiner tree. In fact, RATS1 is an optimal Steiner arborescence [26] . While it is the best among RATS1-5 in terms of maximum delay, RATS1's total wire capacitance, settling time, and voltage overshoot are not necessary the smallest. The topology generated by [2] shown in Fig. 5(j) has the longest signal delay, but the least total wire capacitance, which is 20% smaller than that of RATS1. Fig. 5 (f)-(g) and (h)-(i) shows topologies generated for k = 3 and k = 6, respectively. Note that the total wire capacitance of RATS8 is 20% smaller than that of RATS1. In fact, the total wire capacitance of RATS8 is identical to that of the BOI Steiner topology generated by [2] . However, RATS8 has a better signal delay because the RATS-tree algorithm keeps the solution with a larger delay slack during the pruning. In total, 16 and 20 topologies are generated for k = 3 and 6, respectively. Note that if we do not consider pruning, the numbers could be much larger. There are also overlaps among the three sets of topologies for k = 2; 3, and 6.
The runtime required by the RATS-tree algorithm increases as the net cardinality increases. For example, the worst case CPU time for 12-pin nets is 152 s, whereas the worst case CPU time for 6-pin nets is 0.3 s. In general, the average CPU time is much lower; the average CPU times for 12-pin nets and 6-pin nets are 22 and 0.2 s, respectively. It is interesting to note that for the larger examples (n = 10; 11, and 12), the runtimes increase when we increase k to a certain extent. Beyond that, the runtimes decrease. We believe that the runtime is reflective of the number of (sub)topologies generated by the RATS-tree algorithm.
When k is either very small or very large, we observe that the pruning is effective in reducing the number of (sub)topologies generated, thereby cutting down the runtimes.
B. Integration With MINOTAUR Global Router
The ability of the RATS-tree algorithm to generate a set of routing topologies for a timing-critical net complements a performance-driven MCM global router called MINOTAUR [8] , [30] . Presented with a set of candidate routing topologies for each timing-critical net, the MINO-TAUR global router has the flexibility to choose a topology from the set of candidate topologies of each net (not necessary of the least cost) to minimize the overall congestion and possibly optimize other objective functions of the global routing solution. Essentially, the global router considers routing of multiple nets simultaneously; it performs a topology selection heuristic called iterative deletion to remove candidate topologies of all nets one at a time until each net is left with only one candidate topology. The objective is to select a set of compatible topologies for congestion reduction.
To evaluate the effectiveness of the RATS-tree algorithm in providing a set of compatible topologies for different nets, we perform experiments on two Microelectronics and Computer Technology Corporation (MCC) multichip module benchmark circuits mcc1 and mcc2 that have been used in [23] . The distributions of net sizes in mcc1 and mcc2 are shown in Table II [30] .
We first construct approximate minimum Steiner tree topologies [2] for nets in mcc1 and mcc2. We sort the pins from these nets in descending order of their delays and designate the first T % of the sorted pins as critical pins for optimization. For each of these critical pins, we set its required arrival time to be X % smaller than its delay in the corresponding approximate minimum Steiner tree topology. For each of the remaining (100 0 T )% noncritical pins, we use its delay in the corresponding approximate minimum Steiner tree topology as its required arrival time. Subsequently, for every net that has critical pins with tighter timing constraints, we apply the RATS-tree algorithm to generate candidate topologies that meet the specified timing constraints for all pins in the net. In our experiments, we evaluate the Only a small portion of the nets are large enough to obtain benefit from interconnect topology optimization. maximum border congestion, which measures the number of crossings from one routing region to another, for the benchmarks using T = f5%; 10%g and X = f5%; 10%; 15%g. The number of nets optimized and the minimum, maximum, and average number of candidate RATS trees for the nets are shown in Table III [30] . These candidate RATS trees achieve the specified X % delay reduction for all the T % critical pins without compromising the performance of noncritical pins. Note that most timing-critical nets in mcc2 do not have multiple candidate topologies because most of them are 2-pin nets and the RATS-tree algorithm considers routing on a Hanan grid only.
We perform experiments using two general approaches in the topology selection problem to illustrate the benefits of considering 
In these experiments, we select T % of the high-delay pins and attempt to obtain at least an X% delay reduction for each. If no performance optimization (T = 0%) is required, the maximum (C ) and average (C ) congestion levels obtained by the MINOTAUR global router for mcc1 are 33 and 23, respectively. The maximum and average congestion levels for mcc2 are 167 and 93.1, respectively. multiple candidate topologies. In the first approach, we select a single high-performance RATS tree (the one with minimum area) for each performance critical net; this is the "traditional" approach taken by a performance-driven global router. In the second approach, the RATS-tree algorithm generate (possibly) many candidate topologies for each timing-critical net and the MINOTAUR global router applies the iterative deletion heuristic to select a set of compatible topologies for several timing-critical nets. Results of these experiments are shown in Table IV , [30] . Note that without considering performance optimization (T = 0%), the maximum congestion level, denoted C max , and the average congestion level, denoted C avg , for mcc1 are 33 and 23, respectively. The maximum and average congestion levels for mcc2 are 167 and 93.1, respectively.
The results in Table IV (a) show that when the global router is restricted to a single high-performance topology for each timing-critical net, the performance-driven global routing will result in a considerably higher congestion level. Note that in this experiment, the RATS tree returns only a single high-performance (minimum area) RATS tree, possibly among many high-performance RATS trees constructed, to the MINOTAUR global router. We believe that similar results will hold for any topology optimization algorithm that returns a single topology for each net to the MINOTAUR global router. In fact, most of the existing topology optimization algorithms construct only a single high-performance topology.
Table IV(b) shows the results when the MINOTAUR global router considers several candidate topologies for each timing-critical net, select one that is compatible with those of other timing-critical nets. It is interesting to observe that for mcc1, the maximum and average global congestion can be maintained at levels similar to those when no performance optimization is required. On the other hand, the congestion levels increase significantly for mcc2 whether we consider a single high-performance interconnect structure or multiple candidate topologies. A possible explanation for the above behaviors is that there are several candidate topologies available for each timing-critical net in mcc1 as shown in Table III . However, most timing-critical nets in mcc2 do not have multiple candidate topologies due to the fact that most of them are 2-pin nets and the RATS-tree algorithm considers routing on a Hanan grid only. As the routings for these nets are restricted, the overall congestion levels increase as in Table IV (a). 2 
VI. CONCLUSION
In this paper, we describe a RATS-tree construction algorithm under a higher order RLC interconnect model. The RATS-tree algorithm optimizes signal waveform, not just delay; it considers the impact of routing on signal delays and response waveforms by incrementally computing sink moments in a bottom-up manner during topology construction. Due to its capability to handle a large class of topologies, the RATS-tree algorithm returns not one, but a set of routing topologies that provide tradeoffs among routing cost, signal delay, and signal integrity. We also show that performance optimization needs to be carried out in a more global context; optimization that produces a single solution for each net is unlikely to work well. Coupling the RATS-tree algorithm with the MINOTAUR global router using the iterative deletion heuristic illustrates the potential of routing of multiple nets simultaneously.
