This paper studies a natural formulation of the timing driven maze routing problem.
INTRODUCTION
With the growing influence of interconnect on overall system performance, a great deal of research has been done in recent years to ameliorate the problem via CAD-based techniques. Such techniques include automatic wire sizing and tapering and buffer insertion. While such work is of undeniably fundamental interest, important issues relating to how such optimizations may be performed in a particular context have often been overlooked. For instance, the global routing phase of the design flow is primarily concerned with carefully managing scarce routing resources in designing global routing topologies; clearly optimizations such ss wire sizing and spacing cannot be considered in isolation of such resource allocation issues. This paper addresses this kind of situation via timing-driven maze routing.
In the timing-driven maze routing problem, we are given a routing (multi-)graph in which edges are annotated with resistance and capacitance values; the task is to %nd "good" paths connecting given source and destination vertices.
*This work was supported by the UIC Campus Research Board and the Design Automation Conference Scholarship Program
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies arc not made or distributed for profit or commercial advantagc and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on scrwrs or to redistribute to lists. requires prior specific permission and/or a fee.
LSPD '99 Monterey CA USA Copyright ACM 1999 I-581 13-089-9/99/04...$5.00
There are several natural interpretations of "good"; for instance, a minimum delay solution whose cost (e.g., total capacitance') does not exceed some given budget. Thus, in contrast to traditional maze routing [l, 2] where any wiring connection is sufficient, our problem is fundamentally multidimensional since both cost and delay are considered. We present a straightforward labeling algorithm optimally solving such problems.
The key points of the paper are summarized as follows. While the algorithm is presented using the Elmore delay model, the framework is flexible enough to enable use of different delay estimators (e.g., estimators approximating the effect of wire-to-wire coupling or incorporating improved load modeling as in [3] ).
While the algorithms may %nd use in a number of different scenarios, an obvious application is in global routing. We expect the techniques to particularly be useful in modern global routing schemes which incorporate some notion of signal ordering or track assignment viaa technique such as pseudo-pin assignment (see e.g., [4, 5, 61) . The basic algorithm is a straightforward labeling algorithm which can be viewed as a generalization of Dijkstra's algorithm [7] . The resulting algorithm has pseudo-polynomial running time which limits its practical application if implemented naively. An important contribution of the paper is a set of speedup techniques based on calculating lower bounds during the computation. These techniques result yield speedups of up to 300X versus the naive implementation.
208
The remainder of the paper is organized as follows. In the next section, we give problem formulation and explain some necessary terminologies.
In section 3, we present a generic timing-driven maze routing algorithm and improvements over the generic algorithm. Section 4 presents results on the test grids followed by conclusion in section 5.
'Total capacitance is a natural cost measure because of its correlation with both routing area and dynamic power consumPtion in CMOS technologies 2 PRELIMINARIES Multi-Graph Model We assume that the routing graph is given as a multi-graph (i.e., there may be multiple edges between vertices) and that each edge e in the multi-graph has two labels: capacitance (G) and resistance (re). This multi-graph model allows arbitrary resistances and capacitances -i.e., they can be derived by any means. For instance it is conceivable that the effect of capacitive coupling could be approximated in this model. Varying parasitics from one routing layer to the next are also easily accommodated. Further, thii model naturally reflects already-used path or routing blockage by absence of edges between vertices.
Ehnore Delay We use the Elmore delay model [B] for interconnection delay. The propagation delay along a wire segment e = (u,u) is approximated by rc(ce/2 + G) where c, and r, are capacitance and resistance of the wire e and cv is the down-stream capacitive load at vertex u.
Problem
Formulation We formulate the timing-driven maze routing problem as follows: Given:
A partial multi-gmph G = (V,E) in which each edge e E E is annotated with o resistance r, and a copocitonce ce, o source terminal s with driving impedance rd, o sink terminol t with load cqocitonce ct, and o capacitance wnstmint cspec .
Objective:
Find o path in G connecting s and t such that the Elmore delay of the poth is minimized subject to the total wire capacitance not exceeding capec.
We note that there are several equally natural altemative formulations.
For instance, we may wish to minimize total capacitance subject to a maximum delay specification; alternatively, we may wish to minimize a weighted sum of capacitance and delay. Our algorithms require only minor modifications for solving such variants.
Dominance
Relation Our algorithm for solving the problem is bottom-up and based on labeling schemes. "Labels" of a vertex u take the form of lists of candidate (sub-)solutions representing paths from the vertex 21 to the sink t. These paths will be characterized by two parameters: l c: total capacitance of the path uu t. For notational convenience we use the dominance relation in the context of sets of vectors; let P be a set of (c, d)-pairs:
In this section, we will first present a generic timing-driven routing algorithm and then the strategies to accelerate the generic algorithm. For any vertex u in the routing grid, let Figure  1 . Generic Timing-Driven Maze Routing Algorithm. The algorithm is bottom-up: labels are computed from the sink toward the source. P(u) denote a list of candidate paths from u to the sink t. Each such path is candidate "tail" of a path s CU u cc, t. Since, in step m8, the dequeued tuple is checked for dominance and it is discarded if it is dominated by P(u), at each stage of the algorithm P(u) is a lii of minim01 paths:
In practice, P(u) at vertex u is maintained in a linked list. By the facts that P(u) is a list of minimal paths and that the dequeued information (a candidate subpath from u to t) in step m5 is in non-decreasing order of c (since Q is a priority queue), P(u) has the following useful property (assume P(u) has k elements): is the minimum delay path with the capacitance constraint (which is the solution of the problem), the members of P(s)
can be considered as a set of candidate solutions with different tradeoffs between capacitance and delay. The algorithm has a pseudo-polynomial bound on the running time as described in section 3.1.
Complexity Analysis
A worst-case analysis of the algorithm can be derived as follows based on the following assumptions:
The capacitive values ce are given as integers. This is not a limitation since discretization is always possible.
Assumption 2 The node degree in the target routing graph G is bounded by a constant (as is typical in VLSI applications) .
The analysis gives a pseudo-polynomial bound on the running time of the algorithm -i.e., the running time is not only a function of the size of the routing graph but also of the values with which the graph is annotated (namely c's).
Let U = &z~c.
The following observation follows from the minims&y of P(u). Two non-dominated source-to-sink paths are discovered by the generic timing-driven maze routing algorithm (Figure 3) . Each solution gives a different capacitance vs. delay tradeoff.
In Figure 4 we show an example of all the final solutions of P(s) using test grid "Grid-4" of which dimension is 100 x 100. (for detail characteristic of the grid, refer to section 4.2). . AU the solutions generated by test id "Grid-4" whose dimension is 100 x 100. Source and sin f are selected at random such that the distance between them is 5000~m in Lr-metric. Solutions are obtained under the assumption that r* = 15OQ and Q = 50jF.
Speedup Strategies
In thii section, we present techniques to accelerate the generic algorithm. The generic algorithm tends to generate large number of intermediate solutions before generating final solutions. By maintaining some useful information we are able to suppress the extension of suboptimal subpaths in the early stages.
Strategy by Lower Bounds on Delay and Capacitance
Suppose we have already found the first k(> 0) s-to-t paths in P(s) i,e., P(s) = (c~,dl), . . ., (ck,dk) and we are considering a candidate label (c, d) at vertex v. If we know a lower delay bound of the upstream paths from the source to v and the sum of d and the lower delay bound is greater than ds we know that all the s-to-t paths induced by (c, d) will be dominated by (ck, dk) . This observation enables us to discard (c, d) so that no s-to-t path induced by (c, d) would be generated. Similarly, if we know the lower bound of capacitance of the upstream paths from the source to v and the sum of c and the lower bound of capacitance is greater than cllpee we can eliminate (c, d). Note that for each vertex u, Imin(w) and cmin(v) can be found by pre-computation using two runs of Dijkstra's algorithm [7] .
Given technology parameters, the minimum wire length of a path p : s w v, driving resistance rs, and load capacitance at u, [9, lo] give methods to determine minimum delay of the path using optimal wire-sizing techniques. While expanding a subpath at vertex 'IL to v along the edge e = (u, u) (in step c6 in routine Candidates) we know Imin(v) by pre-computation and downstream capacitance at u by the construction of the algorithm and the driving resistance rd which is given with the routing graph. Then we can determine a lower bound on the delay of the upstream path p : s u u at w using the method in path (c,d) . If the lower delay bound (d,i,(u) +d) is greater than dk of P(s) (recall (ck, dk) represents the last solution of P(s) and that entries in P(s) Bre discovered in increasing order of c) this subpath need not to be extended any further since all the source-tosink paths induced by this will be dominated by (ck, dk) and we may discard this subpath. Similarly, if (bin(u) + c) is greater than capee, the subpath also need not to be extended since all the source-to-sink paths induced by thii will be beyond the capacitance constraint.
In this way, we can discard many intermediate solutions.
If given cspec is so large that no candidate is discarded by the capacitance bound, the effectiveness of our speedup strategy depends on the list of P(s) since the minimum delay solution in P(s) would at&t the number of discarded subsolutions -i.e., the lower delay solution P(s) has, the more candidates are likely to be discarded.
Further, the speedup strategy is of no use until we find at least one solution in P(s). Note that finding a first solution in P(s) by the generic algorithm takes long time since before generating the solution it may generate thousands of intermediate solutions. Thii problem can be overcome by finding a source-to-sink path in a pre-processing step. While finding bin(u) for each vertex u, we can also determine a minimum capacitance source-to-sink path and total delay d of the path. (hin(t),d)-pair of the path is clearly the first solution of P(s) (ties in c are broken by d) and it can be used to discard subsolutions. A strategy for incrementally extending P(s) and the resulting delay bounding information is given in at the end of this section.
The number of subsolutions discarded bv dk in P(s) (recall P(s) has solutions (cl,dl), (ca,d& I..., (ck,&)j is affected bv the size of Pfsl -i.e.. the more solutions P(s) has, the more useful (smaller delay) solution will be found and eventually by the minimum delay path in P(s) more subsolutions would be discarded in the early stages. Thii observation leads us to find more useful solutions as early as possible and to use an artificial intermediate capacitance bound, Cbnd (Cg < Cbnd 5 Capec).
In figure 5 we present an improved timing-driven algorithm based on the strategy explained above. Two steps are added as pre-processing steps to find I,i, (u) and c,in(u) in main routine and additional computations are added to discard subsolutions in the subroutine Candidates.
Function De&&B in step ~5.4 gives the lower bound of delay of the upstream path at u. To avoid the computational overhead in calculating the lower bound of delay, the function DelayLB uses a table lookup technique. Given the wire length 1 of the upstream path p : s u v and the load capacitance c at v, it returns an interpolated delay value using a a pre-computed table.
Strategy
for Controlling Capacitance Bound ?l%is bound must be computed for the most "optimistic" scenario with respect to unit resistance and capacitance in order to derive a legitimate lower-bound. 211 Figure 7 illustrates the sequence of phases with different cs,,d. If the lower bound of delay or the lower bound of capacitance of a s * v * t path falls in the shaded region, the subpath v w t is discarded so that no s-to-t path induced by it is generated. The number of subsolutions discarded by capec can not be controlled since csPcc is given as input. It is not hard to see that many subsolutions would be discarded by csPcc and a few of s-to-t solutions would be found very quickly if cspcc is very close to ce (in step m0.2 in Figure 5 ) where ~0 is the total capacitance of the minimum capacitance s-to-t path obtained by step mO.l Suppose we set Qnd to some value between co and capec. to-t paths in the early stages. After finding all the solutions whose delay are less than cbnd, we increase c&,d using the control variable a which is increased by multiplying (1 + E). (See step m3.4 and m16.1 in Figure 6 .) Using the saved information for the intermediate solutions we discover another s-to-t paths in next phase. This iteration continues until cbnd becomes greater than cspec or there is no intermediate solution.
In Figure 6 we show a fastest timing-driven maze routing I Characterlstlcs tir id-2 I Grad-3 I Grad-4 I Grid-5 I Grad-6 I Grad-7 I Grid-8 1 I 0.5 0.7 1 0.5 1 0.7 0.5 1 0.7 1 0. Table 1 . Each data is obtained by 10 runs with different source and sink. Among 10 runs it shows the minimum, average, aud maximum CPU time in seconds. Source aud sink are selected at random such that the distance between them IS 5000/.4m in Li-metric. We assume cb& = 00 to generate aU the solutions.
We select source and sink at random such that Is. -tz I + 1s" -t,,l x 5000pm where so and sv (to and tY) are zand y-coordinate for source (sink) on the grid, respectively. This wire length was chosen to approximate a length of wire which is conceivably unbuffered and aiso benefit from wire sizing optimization. Given each source, sink and test grid we ran all the algorithms to compare the running time. To generate all the final solutions we did not give the capacitance bound capee for all the test cases.
For each data we made 10 runs with different source and sink and we show the minimum, average, maximum CPUtime on Table 1 . The data clearly shows that a naive implementation of algorithm is likely to be of little use in practice, but that by applying our speedup techniques runtimes become much more practical. The greatest speedup was achieved via capacitance and delay bounding (up to 300X in some cases) with a more modest improvement due to the iterative capacitance bounding technique.
Required memory space for the generic algorithm is so huge that it is another factor that makes it impractical. For instance, the average memory space of 10 runs for Grid-1 is 330MB for the generic algorithm while the improved algorithms consume only 28MB for the same test instance.
CONCLUSION
We have presented an algorithm and speedup techniques for the timing driven maze routing problem using a multi-graph model. The adopted multi-graph model is quite general and naturally captures optimization techniques such as wire sismg via alternative edges.
The basic algorithm is a straightforward labeling algorithm which has pseudo-polynomial running time and has been found impractical if implemented naively. Fortunately, the speedup techniques presented in this paper based on lower bounds make the algorithm more practical. These techniques result in speedups of up to 300X versus the naive implementation, which prove the effectiveness of our speedup techniques.
The algorithms can be adapted with slight modification for other problems with different objectives such az minimization of total capacitance subject to total delay being less than a given dapec.
There are some natural directions this work may take in the future. One possibility is addressing multi-pin nets. A blending of the techniques in this paper and those in [12] for timing driven Steiner tree construction is a step in that direction. 
