Interconnect planning and synthesis in general, and global routing in particular, are becoming critical to meeting chip performance targets in deep-submicron technologies. In addition to handling traditional objectives such as congestion, wirelength and timing, a new and very important requirement for current global routers is the integration with other interconnect optimizations, most importantly with buffer insertion and sizing.
Introduction
Due to delay scaling effects in deep-submicron technologies, interconnect planning and synthesis are becoming critical to meeting chip performance targets with reduced design turnaround time [1] . In particular, the global routing phase of the design cycle is receiving renewed interest, as it must efficiently handle increasingly more complex constraints for increasingly larger designs (see [14] for a recent survey).
In addition to handling traditional objectives such as congestion, wirelength and timing, a critical requirement for current global routers is the integration with other interconnect optimizations, most importantly with buffer insertion and sizing. Indeed, it is estimated that top-level on-chip interconnect will require up to 10 6 repeaters when we reach the 50nm technology node. Since these repeaters are large and have a significant impact on global routing congestion, buffer insertion and sizing can no longer be done after global routing completes.
In this paper, we review and enhance a powerful integrated approach introduced in [4] for congestion and timing-driven global routing, buffer insertion, pin assignment, and buffer/wire sizing. Our approach is based on a multicommodity flow formulation for the buffered global routing problem. Multicommodity flow based global routing has been an active research area since the seminal work of Raghavan and Thomson [16] . Although the global routing problem is NP-hard (even highly restricted versions of it, see [19] ), [16] has shown that the optimum solution can be approximated arbitrarily close in time polynomial in the number of nets and the inverse of the accuracy. To date, predictability of solution quality continues to be a distinct advantage of multicommodity flow based methods over all other approaches to global routing, including popular rip-up-and-reroute approaches [14] .
The original method of Raghavan and Thomson relies on randomized rounding of an optimum fractional multicommodity flow. Subsequent works [8, 15] have improved runtime scalability by using the approximation algorithm for multicommodity flows by [17] . Yet, only the recent breakthrough improvements due to Garg and Könemann [13] and Fleischer [12] have rendered multicommodity flow based global routing practical for full chip designs [3] . As [3] , our algorithm is built upon the efficient multicommodity flow approximation scheme of [13, 12] .
In next section we review the multicommodity flow approach as applied to buffered global routing in [4] , highlighting the gadget graph construction used to capture valid buffered routes in the context of a (set capacitated) multicommodity flow formulation. The main contribution of the paper is represented by simpler and more efficient gadget constructions for capturing polarity constraints induced by inverter insertion, buffer and wire sizing, and delay constraints (Section 4). Unlike the constructions in [4] , none of the simplified constructions requires changes to the algorithm for multicommodity flow approximation. We conclude the paper with experimental results detailing the scalability and limitations of the proposed methods and with directions for future research.
Problem Formulation
In this section we give an integrated formulation for the global routing and the bounded wireload buffer insertion problems. Polarity constraints induced by the use of inverting buffers, buffer and wire sizing, and timing constraints are individually discussed in Section 4.
As in [3] , we capture wire and buffer congestion using a tile graph G = (V, E) which has an edge between any two adjacent tiles (see Figure 1) , together with buffer and wire capacity functions b : V → IN and w : E → IN, respectively. For each tile v ∈ V , the buffer capacity b(v) is the number of buffer sites located in v. Similarly, for each edge e = (u, v) ∈ E, the wire capacity w(e) is the number of routing channels available between tiles u and v. Let N1, N2, . . . , N k be the given nets, where each net Ni is specified by a source si and a sink ti. For each net Ni, we seek an si-ti path Pi in G buffered using only available buffer sites such that the source vertex and the buffers drive each at most U units of wire, where U is a given upper-bound (the example in Figure 1 has U = 5). Formally, a feasible buffered routing for net Ni is a path Pi = (v0, v1, . . . , v l i ) in G together with a set of buffers Bi ⊆ {v0, . . . , v l i } such that:
• v0 = si and v l i = ti;
• w(vi−1, vi) ≥ 1 for every i = 1, . . . , li;
• b(vi) ≥ 1 for every vi ∈ Bi; and
• The length along Pi between v0 and the first buffer in Bi, between consecutive buffers, and between the last buffer and v l i , are all at most U .
Given feasible buffered routings (Pi, Bi) for each net Ni, the relative buffer and wire congestion are defined by µ = max 
Given:
• Grid-graph G = (V, E), with buffer and wire capacities b : V → IN, respectively w : E → IN;
• 2-pin nets N1, . . . , N k , each net Ni with a source pin si and a sink pin ti; and
Find: feasible buffered routings (Pi, Bi) for each net Ni with relative buffer congestion µ ≤ 1 and relative wire congestion
|Pi|, where α, β ≥ 0 are given constants. 1 The problem is called Floorplan Evaluation Problem in [4] , but the formulation is useful in post-placement scenarios as well. 
Solution Based on Multicommodity Flow Approximation
The high-level steps in the multicommodity flow based approach in [4] are the following:
1. Build an auxiliary graph in which every directed path from a net source to the net's sink captures a feasible wire route between them together with locations for the buffers to be inserted on this route such that buffer load constraints are satisfied. The auxiliary graph is obtained automatically from the tile graph using a gadget construction.
Use the auxiliary graph to formulate the floorplan evaluation problem as an integer linear program (ILP).
To formally express the ILP, we use a 0/1 variable for each source-sink path, and require that exactly one path be chosen for each source-sink pair. The objective is to minimize the wire and buffer congestion subject to a given upperbound on the total wirelength.
3. Find a near-optimal solution to the fractional relaxation of the above integer program using the general framework for multicommodity flow approximation of [13, 12] . Although the integer program has exponential size (there are exponentially many variables corresponding to source-sink paths in the auxiliary graph), the algorithm still runs in polynomial time by representing explicitly only non-zero variables.
4. Finally, use randomized rounding [16] to convert the fractional multicommodity flow to an integer one.
To facilitate understanding of the constructions given in Section 4, we discuss here in detail the construction of the auxiliary directed graph H which captures feasible buffered routings (see Figure 2 ). Recall that, for every feasible buffered routing in the tile graph G = (V (G), E(G), b, w), the wireload of the source and of each buffer must be at most U . The graph H has U + 1
The index of each copy corresponds to the remaining wireload budget, i.e., the number of units of wire that can still be driven by the last inserted buffer (or by the net's source). Buffer insertions are represented in the gadget graph by directed arcs of the form (v j , v U ): following such an arc resets the remaining wireload budget up to the maximum value of U . Each undirected edge (u, v) in the tile graph gives rise to directed arcs (u j , v j−1 ) and (v j , u j−1 ), j = 1, . . . , U , in the gadget graph. Notice that the copy number decreases by 1 for each of these arcs, corresponding to a decrease of 1 unit in the remaining wireload budget. In addition, we add to H individual vertices to represent net sources and sinks. Each source vertex is connected by a directed arc into the U -th copy of the node representing the enclosing tile. Furthermore, all copies of the nodes representing enclosing tiles are connected by directed arcs into the respective sink vertices. Formally, the graph H has vertex set
Each directed path in the gadget graph H corresponds to a buffered routing in the tile graph, obtained by ignoring copy indices for tile vertices and replacing each"buffer" arc (v j , v U ) with a buffer inserted in tile v. Clearly, the construction ensures that the wireload of each buffer is at most U since a directed path in H can visit at most U vertices before following a buffer arc.
Let Pi denote the set of all simple si-ti paths in H. To get an ILP formulation, we introduce a 0/1 variable xp for every path p ∈ P := ∪ k 1 Pi. The variable xp is set to 1 if the buffered routing corresponding to p ∈ Pi is used to connect net Ni, and to 0 otherwise. With this notation, the integrated global routing and bounded wireload buffer insertion problem can be formulated as follows: (1) is similar to the "path" formulation of the classical minimum cost integer multicommodity flow problem [2] . The only difference is that capacity constraints on the edges and vertices of the tile graph G become capacity constraints for sets of edges of the gadget graph H (see Figure 2) . We note that the floorplan evaluation problem can be represented more compactly by using a polynomial number of edge-flow variables instead of the exponential number of path-flow variables xp. However, we use formulation (1) since it leads to stronger fractional relaxations [9] . The exponential number of variables is not impeding the efficiency of the approximation algorithm, which, during its execution, represents explicitly only a polynomial number of paths with non-zero flow.
Instead of solving the relaxation of ILP (1) directly [4] introduces an upper bound D on the wire and buffer area and considers the following related linear program (LP):
Let λ * be the optimum objective value for LP (2) . Solving the fractional relaxation of ILP (1) is equivalent to finding the minimum D for which λ * ≤ 1. This can be done by a binary search which requires solving the LP (2) for each probed value of D. A lower bound on the optimal value of D can be derived by ignoring all buffer and wire capacity constraints, i.e., by computing for each net Ni buffered paths p ∈ Pi minimizing α
A trivial upper bound is the total routing area available, i.e., Dmax = α
w(u, v). In particular, unfeasibility of the fractional relaxation of ILP (1) is equivalent to λ * being greater than 1 when D = Dmax, and can therefore be detected by an approximation algorithm for (2) .
In the interest of space we omit the details of the algorithm for approximating the optimum solution to LP (2), and direct the interested reader to [4] .
Extensions
In this section we show how to extend the multicommodity flow approach to handle polarity constraints imposed by the use of inverting buffers, buffer and wire sizing, and prescribed delay upperbounds (for the extension to pin assignment see [4, 5] ). Polarity constraints have not been considered in [4] , while the constructions presented here for buffer/wire sizing and for enforcing delay upperbounds are simpler and more efficient than the original constructions in [4] . In particular, unlike the constructions in [4] , the constructions given here involve only changes to the gadget graph, leaving the approximation algorithm used to solve LP (2) unchanged.
Polarity Constraints
The basic problem formulation in Section 2 considers only a non-inverting buffer type. In practice, inverting buffers are often preferred since they occupy a smaller area for the same driving strength. Although the use of inverting buffers introduces additional polarity constraints, which may require a larger number of buffers to be inserted, inverting buffers are likely to lead to better overall resource utilization. Algorithms for bounded capacitive load inverting (and non-inverting) buffer insertion have been recently discussed in [7] . The focus of [7] is on single net buffering, with arbitrary positions for the buffers. Here, our goal is to minimize the overall number of buffers required by the nets, and to ensure that buffers are inserted only in the available sites.
Consideration of polarity constraints is achieved by modifying the basic gadget graph given in Section 2 as follows (see Figure 3 ). Each node of the basic gadget is replaced by an "even" and "odd" copy, i.e., v i is propagated into v ). The gadget also allows two inverting buffers to be inserted in the same tile for the purpose of meeting polarity constraints. This is achieved by providing bidirectional arcs connecting the U -th even and odd copies of a tile v, i.e., (u . Finally, source vertices si are connected by directed arcs into the even U -th copy of enclosing tiles, and only copies of the desired polarity are connected by arcs to sink vertices ti.
Buffer and Wire Sizing
Buffer and wire sizing are well-known techniques for timing optimization in the final stages of the design cycle [10] . However, buffer and wire sizing can be equally effective for reducing congestion and/or wiring resources. In this section we show how to incorporate buffer and wire sizing in the multicommodity flow framework. The key enablers to these extensions are again appropriate modifications of the gadget graph.
The gadget for buffer sizing is illustrated in Figure 4 (a) for two available buffer sizes, one with wireload upperbound U = 4 and one with wireload upperbound U = 2. The general construction entails using a number of copies of each tile vertex equal to the maximum buffer load upperbound U . For every buffer with wireload upperbound of U ≤ U , we insert buffer arcs (v i , v U ) for every 0 ≤ i < U . Thus, the copy number of each vertex continues to capture the remaining wireload budget, which ensures the correctness of the construction.
Wire sizing can be handled by a different modification of the gadget graph (see Figure 4(b) ). Assuming that per unit capacitances of the thinner wire widths are rounded to integer multiples of the "standard" per unit capacitance, the gadget models the use of thinner segments of wire by providing tile-to-tile arcs which decrease the tile copy index (i.e., remaining wireload budget) by more than one unit. For example, in Figure 4 (b), solid arcs (u i , v i−1 ) and (v i , u i−1 ) correspond to standard width connections between tiles u and v, while dashed arcs (u i , v i−2 ) and (v i , u i−2 ) correspond to "half-width" connections, i.e., connections using wire with double capacitive load per unit.
Delay Constraints
[4] proposed a method for enforcing given sink delay constraints based on charging wiresegment delays to buffer arcs in the gadget graph, and using a routine for computing minimum-weight delay constrained paths in the algorithm for approximating the fractional solution to ILP (1). Here we give a different method for handling sink delay constraints. The new method is similar in spirit to the constructions in previous sections, relying exclusively on a modification of the gadget graph.
In general, our construction applies for any delay model, such as the Elmore delay model, for which (1) the delay of a buffered path is the sum of the delays of the path segments separated by the buffers, and (2) the delay of each segment depends only on segment length and buffer parameters. However, for the sake of efficiency, segment delays would have to be rounded to relatively coarse units. Figure 5 shows the gadget construction for the case when delay is measured simply as the number of inserted buffers. The idea is again to replicate the basic gadget construction, this time a number of times equal to the maximum allowed net delay. Within each replica, tile-to-tile arcs decrease remaining wireload budget by one unit. In order to keep track of path delays, buffer arcs advance over a number of gadget replicas equal to the delay of the wiresegment ended by the respective buffer (this delay can be easily determined for each buffer arc since the tail of the Figure 5: Gadget for enforcing delay constraints when the delay is measured by the number of buffers inserted between source and sink. The basic gadget is replicated a number of times equal to the maximum allowed net delay (3 in this example). Tile-to-tile arcs decrease remaining wireload budget within a gadget replica, while buffer arcs advance from one replica to the next. arc fully identifies the length of the wiresegment). The construction is completed by connecting net sources to the vertices with maximum remaining wireload budget in the "0 delay" replica of the gadget graph, and adding arcs into the sinks from all vertices in replicas corresponding to delays smaller than the given delay upperbounds.
Experimental Results
In this section we report results for a C implementation of our 2-pin net multicommodity flow based algorithm. All experiments have been conducted on a 360 MHz SUN Ultra 60 workstation with 2 GB of memory, running under SunOS 5.7. We present results for the 10 circuits in [6, 4] , which are derived from testcases first used by [10] . For a more comprehensive set of results -in particular, for results on integrated pin assignment and a comparison of our method to the RABID algorithm of [6] -we direct the reader to [5] . Circuit parameters are summarized in Table 1 . As in [6] and [10] , we decomposed multipin nets into 2-pin nets by making direct connections from the source of a net to each of the net's sinks. Tables 2 gives results for the extension of the multicommodity flow algorithm to inverting buffer insertion, which is about twice slower than non-inverting buffer insertion due to the doubling in size of the gadget graph. Inverter insertion leads to a very small increase in the number of buffers (due to the need to satisfy polarity constraints) which is easily compensated by the smaller size of inverters. At the same time, inverter insertion requires virtually the same wirelength (and often gives improved congestion, see [5] ). Table 3 gives runtime scaling results for the extension of the multicommodity flow algorithm to delay constraints. We Table 2 : Wirelength minimization results for non-inverting vs. inverting buffer insertion. The number of buffer sites was assumed to be the same in both experiments.
