In this paper we present a new global router appropriate for Multichip Module MCM and dense Prin ted Circuit Board PCB design, which utilizes a hybrid of the classical rip-up and reroute approach, and the more recent iterative deletion 9 method. The global router addresses performance issues by utilizing recent results in high performance interconnect design, while still e ectively minimizing global congestion.
Introduction
Advances in VLSI fabrication technology have resulted in an increasing interest in interconnection and packaging technologies. For many systems, the delay due to circuit interconnections can dominate the total system delay 1, 5 . Interconnect optimization must be considered to achieve high performance at the system level.
One approach that provides many bene ts to packaging level interconnection is the Multichip Module MCM tec hnology. This technology increases practical packing densities, and removes a level of interconnection. In MCM designs, a number of bare chips are placed directly onto a substrate, with the substrate itself providing the circuit interconnections. Routing problems for MCM designs have many similarities to routing for dense Printed Circuit Boards PCBs.
Routing within MCM substrates or PCBs, however, can be an extremely di cult problem with a potentially large number of interconnect layers. A Fujitsu supercomputer design, for example, has over 50 interconnect layers, each with a large routing capacity 10 . The MCM PCB routing problem in general may be considered to be an extremely large general area routing problem, and cannot be decomposed into small channel or switchbox routing problems easily. Additionally, high performance designs must consider signal crosstalk due to coupling capacitance between long parallel lines, and variable width wiring to address impedance mismatches and wiring termination requirements.
The problem of MCM PCB routing has been considered by a n umber of authors. One set of approaches perform global and detail routing in a single step.
An early MCM router was presented in 10 , and performs a combination of 2D and 3D routing obtain a solution. Computation times for this method were relatively high.
In 13 the SLICE router was presented; it constructs detailed routes through a mixture of planar routing and twolayer maze routing. While this router was superior to a 3D maze router, it requires relatively high computation times. The V4R router 14 was subsequently presented, and eciently routes layer-pairs one at a time using routes with no more than four vias each.
The M 2 R 4 router addresses performance concerns by routing critical nets on a single layer. The remaining nets are routed using x , y layer pair routing.
In 17 , a linear programming approach w as presented. Randomized rounding is used to obtain integer solutions. A small number of possible topologies are considered, and routes are restricted to a channel intersection graph.
The MCG 2 router considers a small number of possible routes for each net, and attempts to maximize the routing density o n a n y l a y er-pair by selecting a set of compatible routes from among the unrouted nets. The results of this router are quite good, requiring fewer routing layers than many of the other MCM routers. Performance concerns are addressed by removing nets which exceed signal crosstalk thresholds, and routing them on other layers.
In 20 , the MLR router was presented; it rst performs layer assignment of nets, followed by Steiner optimal area routing using a hierarchical ; approach to nd lowcost paths between pins.
The SEGRA 3 MCM global router has been described, and also approaches the routing problem by completing pairs of layers in each pass. A simple greedy heuristic is used to select which nets are to be extended or completed during each pass.
While the methods mentioned above consider global and detail routing in a single step, there have also been approaches which consider the problem in two steps.
In 21 , the MCM substrate is decomposed into a set of towers," rectangular regions containing portions of the routing surface through all layers. They rst distribute routing density across a three dimensional surface, using hierarchical decomposition, then determine locations of nets on the boundaries of the towers, and nally obtain a routing for each t o w er using a detail router.
We approach the problem in a manner similar to 21 , and consider the global routing problem in this paper. In contrast to these previous approachs, our MINOTAUR global router o ers the following features.
We utilize current high-performance variable-width interconnect optimization results, obtaining structures which address delay and signal integrity requirements 6 . We can in fact consider a set of high performance interconnect structures for each signal net, and perform congestion optimization from a global perspective. We optimize global congestion by not restricting the layers a route may use; some approaches route a single layer-pair at a time, and cannot obtain routing solutions which span multiple or non-paired layers. We combine the freedom and exibility of maze routing solutions, with the global optimization abilities of the iterative deletion method. While we consider MCM and PCB routing in this paper, the basic techniques used in our router can be easily applied to multilayer VLSI IC routing problems as well.
The remainder of this paper is organized as follows. In Section 2, we present our problem formulation. We also describe our approach to congestion estimation in this section. In Section 3, we describe the MINOTAUR global router. We detail our methods for applying rip-up and reroute in Section 3.1, the iterative deletion approach in Section 3.2, our support for high-performance interconnect structures in Section 3.3, and the integration of algorithms in Section 3.4. Experimental results are presented in Section 4, and we conclude our paper with Section 5.
Problem Formulation
Our routing model is similar to that of 21 ; we divide the 3-dimensional routing surface into a set of tiles," with each tile consisting of a number of layers. We use tile-layer" to indicate a single layer within a tile. The routing substrate is modeled as a graph, with tile-layers as nodes, and both vias and the borders" between tile-layers as edges. Preferred-direction routing is obtained by including or excluding border edges. Figure 1 shows an example of our routing model. The routing problem consists of determining the interconnect structure for a set of nets N. In addition to basic feasibility concerns based on physical routing capacity constraints, there are a number of performance issues as well; many of these were detailed in 4 . In this paper, we are interested in meeting delay, signal integrity, crosstalk, and routing capacity constraints, while minimizing total wire length, maximum congestion, and routing resource costs.
We assume that a detail router will be used to complete the routings of the individual tiles, and determine solution feasibility through the maximum congestion of any tile-layer node, border edge, or via edge. We estimate solution congestion as the number of topology edges using a particular routing resource. Note that the border congestion and tilelayer congestion metrics are in general not comparable.
3 The MINOTAUR Global Router Our approach is to rst decompose each non-critical signal net into a set of edges. In most of our experiments, we use a simple minimum spanning tree algorithm, although we can also utilize Steiner tree or high performance interconnect algorithms. Steiner topologies had little impact on solution quality in our experiments. Nets which require highperformance interconnect structures are not decomposed into 2-pin edges, and are handled as described in Section 3.3.
For each edge of the non-critical nets, we wish to nd a path through the modeled MCM substrate; we model a path as a set of tile-layer nodes, border edges, and via edges.
Our global router contains two basic algorithmic approaches: a traditional rip-up and reroute scheme, and a method based on iterative deletion. We rst discuss the details of each, and then show h o w they are integrated.
Rip-Up and Reroute
Rip-up and reroute is one of the oldest approaches to the global routing problem; it generally requires long computation times and provides few guarantees as to solution quality. Despite these drawbacks, rip-up and reroute continues to be used in many routing applications. The rip-up and reroute portion of our global router is similar to that of 19 , in that we iteratively change our congestion bounds to encourage convergence to a low-congestion solution.
Given the complexity of MCM and PCB designs, we b elieve that rip-up and reroute may be necessary for aggressive design. We implement rip-up and reroute in our router as follows. Each edge is rst routed using a maze router based on Lee's classic algorithm 15 . We then iteratively rip-up and reroute each edge, continuing for a user-speci ed number of iterations, or until solution quality stabilizes. Details of our maze routing cost functions, and edge orderings used, are given below.
If high-performance interconnect structures are required, we select a single high-performance interconnect structure with the minimum area for the performance critical net, and do not modify it. The rip-up and reroute process operates only on the edges of non-critical nets, and attempts to nd routes around" the congestion of the critical nets. Congestion optimization of the high-performance nets by the iterative deletion method is more robust, and is detailed below.
Maze Routing Cost Functions
As is done in a number of other routers, we assign a cost to each tile-layer node, border edge, and via edge. Our objective is to assign costs to each resource such that we nd a relatively short path in terms of wiring distance, while avoiding heavily congested areas. We are therefore interested in knowing what cost functions result in the best performance, a question that has received relatively little attention in the literature.
In 19 , the cost of routing in the most congested region is in nite, while the remaining regions have l o w cost. In 12 , the cost of a routing path includes terms for the number of other routes that a path touches" or crosses," as well as two terms to capture the route length in preferred and non-preferred routing directions. In 18 , steps in the preferred routing direction were assigned a cost of 1, while non-preferred steps were assigned a cost of 30. In 16 , costs were piecewise-linear functions depending on a measure of the utilization and the capacity of each edge. Clearly, there are many possible ways to calculate the cost of using a particular resource.
In general, we expect resources which h a v e high usage or large numbers of obstacles to have relatively high cost. We have explored three cost functions in depth, and describe them here.
We rst consider a step function, similar to those of 19 , 18 , and 12 , which provides a low cost if congestion falls below a speci c threshold value. A high cost results if congestion is at or above the threshold value. Second, we considered a piece-wise linear slope function, similar to that of 16 , which provides low cost when congestion is below a threshold, high cost if congestion is above cap, and a linearly interpolated cost when congestion is between the threshold and cap levels. For each iteration of rip-up and reroute, we determine the maximum and minimum congestion for any resource. We modify the step and slope functions to re ect current congestion levels by adjusting the threshold and cap parameters. We refer to this as Adaptive Congestion Estimation ACE, and perform experiments using ACE-step and ACE-slope cost functions. A third function, which w e refer to as a two-tier" function, is similar to the slope function when routing congestion is below a xed physical capacity. If congestion is above p h ysical capacity, routing costs are increased substantially. These cost functions are illustrated in Figure 2 . The ACE-slope cost function Figure 2 : ACE-step, ACE-slope, and two-tier" cost functions. These functions are used to determine resource cost during maze routing.
We nd that the ACE-slope cost function is consistently superior to the ACE-step function. Detailed results are given in Section 4, Table 2 . Many routing applications which utilize step-like functions 19 , 12 , 18 , for example may obtain substantial improvements in results by adopting a slope-like cost function.
Our global router supports separate cost functions for each t ype of resource tile-layer nodes, border edges, and via edges, and can optimize these independently.
The Net Ordering Problem
A perennial question in global routing problems has been which net do we route or reroute rst? 9, 2 . To e v aluate the importance of net ordering in our implementation, we compared ve di erent edge orderings within the rip-up and reroute framework. Our orderings were low length rst, low cost rst, the reverse of these, and the default order of our design database.
Surprisingly, there was negligible di erence in the results by the various orderings. While examples can be constructed in which certain orderings are clearly superior to others, we found little evidence that this occurs in our experiments. In our model, we h a v e no hard upper limit on routing capacity, and perform rip-up and reroute until solution quality converges. Our study concludes that solution quality measured in our routing model is far more sensitive to the choice of routing cost functions than to the edge ordering.
The Iterative Deletion Method
Rip-up and reroute is computationally expensive. Through iterative deletion, w e can obtain high quality results while using rip-up and reroute sparingly. In this section, we describe iterative deletion; Section 3.4 describes integration with rip-up and reroute. The use of iterative deletion was rst proposed in 9 , and was recently improved in 7 .
In many iterative improvement methods, we repeatedly modify a solution, changing a small portion at each step. Rather than representing the problem with a single instance, and changing one aspect of it at each step, the iterative deletion method represents a set of possible states. Consider the example in Figure 3 , which is a relatively simple planar routing problem with three nets. Figure 3a has high congestion, while Figure 3b has high coupling capacitance due to parallel wire segments. Figure  3c has both low congestion and low coupling capacitance, and is a subset of Figure 3d . The iterative deletion process begins with a redundant set of paths such as Figure 3d , and iteratively removes a single high-cost redundant path while attempting to minimize congestion and crosstalk among the remaining connections. Interactions between paths such as coupling capacitance can also be modeled within this framework.
The set of possible connections for each edge is called the path set. In Figure 3d , the path sets for each edge consist of the two single-bend routes. By having several possible routes for each edge available at the start of the iterative deletion process, we can easily determine which paths are high quality. In the example, the lower-left routing for net C does not interfere with any other path, allowing the alternative to be removed without degrading solution quality. This approach h a s a n umber of signi cant features. First, we obtain a global estimate of congestion, and can determine not only where congestion is likely, but also where it is unavoidable. Second, crosstalk between paths can be modeled and considered during optimization. By selecting a set of routes that are compatible at a global level, we can obtain high quality solutions quickly. Iterative deletion was shown to be an e ective method for selecting a set of channel segments in standard cell IC routing problems 7 , and motivates our use of the method here. Our congestion estimation here is similar to the channel density estimation of 7 . We recalculate congestion levels after the removal of each redundant path.
Performance Optimization
Many approaches for interconnect optimization in high performance design have been proposed. 11 and 5 provide an overview of current i n terconnect optimization approaches. Recently, Required-Arrival-Time Steiner trees RATS-trees have been proposed 6 to provide a set of solutions allowing trade-o among signal delay, w a v e-form, and routing area. This algorithm uses a bottom-up dynamic programming approach, and can synthesize a set of high performance structures; Figure 4 shows two possible trees for a 9-pin net.
While the our basic iterative deletion approach considers alternate routings for each topology edge, we consider alternate trees for each critical net. Rather than removing a single path between a pair of pins during deletion, we instead remove one of the candidate interconnect trees generated by the RATS-tree algorithm. The RATS-tree algorithm and our global router support variable-width routing.
The Hybrid Approach
We h a v e three approaches that form a hybrid between the rip-up and reroute approach, and the iterative deletion method. The rst applies iterated rip-up and reroute, and then adds the single bend paths to the path set for each edge. We then perform iterative deletion on the path sets.
Our second approach begins by generating single-bend paths for each edge, and then applies iterative deletion to these path sets. After iterative deletion, we use the maze router to nd a new candidate path for each edge, and add it to the path set for the edge. Multiple iterations of this process can produce high quality results quickly.
A third approach applies iterative deletion as above, followed by rip-up and reroute for non-critical nets.
Experimental Results
Experiments were performed with the MCC multichip module benchmark circuits mcc1 and mcc2, and also on the three test cases from 14 , summarized in Table 1 . We report maximum and average congestion levels Cmax and Cavg for either the tile-layer metric or the border metric these two are 
Comparison of Cost Functions
To compare cost function performance, we route each example as a single layer problem, using a 16 by 16 routing grid. We use a low routing cost of 1, and tested combinations with high routing costs of 2, 4, 8, and 16, ACE cost function threshold points at 0.2, 0.4, 0.6, and 0.8 of the range between the minimum and maximum congestion level, and also a threshold point at the average congestion level. A summary of these results is shown in Table 2 , in which w e show the best, worst, and typical" performance of the rip-up and reroute method for either the step or slope cost functions, and the result of a single pass of iterative deletion considering the rip-up and reroute paths and the single-bend candidate paths. In the typical" cost function con guration, we use a high routing cost of 16, and the cost function threshold point at the average congestion level.
While the congestion values during the rip-up and reroute process converged for all experiments, they converged to vastly di erent v alues depending on the cost function, and the high, low, and threshold values used to con gure the cost function; maximum congestion levels di ered by a s m uch as a factor of 2.29. We nd that the results using the slope cost function are superior to the step based cost function. When the high, low, and threshold values for tile costs are not well suited to the routing problem, a single pass of iterative deletion can improve the results considerably.
Comparison with V4R
In Table 3 , we compare maximum tile-layer congestion values of our approach with the results of the MCM global router V4R 14 . Congestion estimation favors the V4R router, as the MINOTAUR result considers distinct tree edges, while the V4R result reported considers distinct nets. For the single layer comparisons, we merge layer assignments of the V4R routing into a single layer. Our global router obtains congestion reductions ranging from 35 to 48 for the single layer model, and reductions of 14 to 38 for the multi-layer model. Table 2 : Experiments on MCM benchmarks considering a variety of cost functions. We report tile-layer maximum and average congestion Cmax and Cavg, and show the best and worst performance using a classic rip-up and reroute approach, and the results of applying a single pass of iterative deletion to the rip-up and reroute result combined with the two single-bend candidate paths. We also show results using a typical" cost function con guration. For the single-layer comparison, the V4R layers were merged into a single layer. Table 4 shows experimental results and routings using a two-tier" cost function and di ering physical routing capacity constraints. Assuming a wire pitch o f 7 5 , w e h a v e a p h ysical capacity per border of 36 for mcc1 and 125 for mcc2. W e also consider cases with increased wire pitch, resulting in physical capacities of 24, 20, 16, and 10 for mcc1, and 100, 80, and 60 for mcc2, respectively. W e apply either an ACE-slope cost function, the two-tier" function, or an ACE-slope approach followed by a post-processing step using the two-tier" cost function. When physical constraints are relatively loose, using a two-tier cost function or ACEslope followed by post-processing can further improve the averge congestion and expected wire length, as shown in the case with a routing capacity of 125 for mcc2. When an unrealistically tight capacity constrnt is given as is the case for the capacity o f 6 0 f o r mcc2 the two-tier cost function may give a solution that is inferior to the one obtained using ACE-slope.
Experiments with Physical Capacity Constraints

Performance Optimization
To e v aluate the impact of performance optimization on global solution quality, w e applied the RATS-tree algorithm to a n umber of nets from the benchmarks mcc1 and mcc2. The number of nets optimized, minimum and maximum number of RATS-trees, and average number of RATS-trees per net are shown in Table 5 . For these experiments, we constructed Steiner trees for each net, and evaluated the delay for each using SPICE. T of the pins with highest delay w ere selected, and we attempt to obtain an X reduction in signal delay for these pins no delay constraints are placed on the other nets or pins. Router performance is shown in Table 4 : Four-layer routing experiments on mcc1 and mcc2 with the border congestion metric, using physical routing capacity constraints shown in parenthesis. We use either the ACE-slope cost function, a two-tier" function, or the ACE-slope cost function followed by improvement with a two-tier" function as a post-processing step. The number of monotonic and non-monotonic paths are also shown. Table 6 : Performance optimization comparison of rip-up and reroute using a single topology for critical nets with iterative deletion followed by rip-up and reroute. In these experiments, we select T of the high-delay pins, and attempt to obtain at least an X delay reduction for each. In experiments with no performance optimization, rip-up and reroute obtains maximum and average congestion levels of 33 and 23.0 for mcc1, while congestion levels are 168 and 93.4 for mcc2. These results assume the single-layer routing model.
6.
We consider two routing approaches for this problem. The rst approach selects a single minimum-area candidate RATS-tree for each constrained net, and then applies rip-up and reroute the the remaining nets. The second approach i s a h ybrid, and applies iterative deletion to select a single RATS-tree for each constrained net, and then applies ripup and reroute. When a number of candidate topologies are available, the hybrid approach obtains a 19.5 reduction in maximum congestion. In this set of experiments, the minimum-area RATS-trees were quite similar under the di erent performance constraints, resulting in similar performance for the rip-up and reroute approach.
Conclusions
We h a v e presented an MCM PCB global router which integrates the classic rip-up and reroute approach, with the modern iterative deletion method.
We draw a n umber of conclusions from our experiments. We nd that slope-based cost functions are superior to stepbased cost functions, rip-up and reroute performance is heavily dependent on cost function parameters, and net ordering has little impact for our routing model. Iterative deletion is e ective for congestion minimization when we h a v e signal delay and integrity performance constraints, or when rip-up and reroute cost function parameters are not well chosen.
We are currently integrating our MCM global router with a performance driven variable width multi-layer detail router, and will report results of these experiments when they are complete. We are also considering the use of a small set of candidate routes such as those suggested in 2 , allowing us to eliminate maze routing entirely. Our methods may also extend to handle general area routing for performance driven multi-layer VLSI IC design problems.
