Abstract-This work presents a method for global routing (GR) to minimize interconnect power. We consider design with multi-supply voltage, where level converters are added to nets that connect driver cells to sink cells of higher supply voltage. The level converters are modeled as additional terminals during GR. Given an initial GR solution obtained with the objective of minimizing wirelength, we propose a GR method to detour nets to further save the interconnect power. When detouring routes via this procedure, overflow is not increased, and the increase in wirelength is bounded. The power saving opportunities include: 1) reducing the area capacitance of the routes by detouring from the higher metal layers to the lower ones, 2) reducing the coupling capacitance between adjacent routes by distributing the congestion, and 3) considering different power-weights for each segment of a routed net with level converters (to capture its corresponding supply voltage and activity factor). We present a mathematical formulation to capture these power saving opportunities and solve it using integer programming techniques. In our simulations, we show considerable saving in an interconnect power metric for GR, without any wirelength degradation.
I. INTRODUCTION
Power consumption is a primary design objective in many application domains. Dynamic power still remains the dominant portion of the overall power spectrum. Design with Multi-Supply Voltage (MSV) allows significant reduction in dynamic power by taking advantage of its quadratic dependence on the supply voltage.
Dynamic power is dissipated in combinational and sequential logic cells, clock network, and the (remaining) local and global interconnects. We refer to the latter as interconnect power. Interconnect power can take a significant portion of the dynamic power spectrum. For example, the contribution of the interconnect power is reported to be around 30% of dynamic power for a 45nm high performance microprocessor synthesized using a Structured Data Paths design style and about 18% of the overall power spectrum [1] .
The interconnects are complex structures in nanometer technologies that span over many metal layers. The power of a route segment depends on its width, metal layer, and spacing relative to its adjacent parallel-running routes. These factors determine the area, fringe, and coupling capacitances which impact power. Furthermore, in MSV designs, the power of a routed net depends on its corresponding supply voltage. For example, a route will have lower power if all its terminal-cells have the (same) lower supply voltage. If a net connects a driver cell of lower voltage to a sink cell of higher voltage, its route includes a level converter (LC) and is decomposed into two segments of low and high supply voltages, corresponding to before and after the LC.
We propose a global routing (GR) method that optimizes the interconnect power in MSV designs. Figure 1 shows a generic design flow for a MSV-based GR. After placement and voltage assignment, the location and supply voltage of each cell are known. The supply voltage is determined either through voltage island generation [2] , [3] , or through a row-based assignment in a standard cell methodology. Furthermore, LC(s) are added to any net that connects a driver cell to a set of sink cells of higher supply voltage. Next, GR is applied to minimize the overall wirelength (WL), where the LCs are also included as terminals of a net.
For a given WL-optimized GR solution, we propose to further detour the nets in order to optimize the interconnect power. The interconnect power can be approximated during GR since at this stage the metal layers of each route segment are known. Furthermore, the spacing of parallel routes can be estimated from the routing congestion. Given a WL-optimized solution, the nets can be rerouted to trade off WL with power. For example nets from higher metal layers can be routed to the lower ones for less wire widths and area capacitance. Nets can also be rerouted to spread the congestion, thereby increasing their spacing for less coupling capacitance. Activity factor and supply voltage can be incorporated as a powerweight for each route segment.
We present a mathematical formulation for MSV-based GR to minimize power, and present integer programming-based techniques to solve the formulation. As part of power saving, our methods spread the routing congestion and ensure no additional overflow (of routing resources) and a bounded degradation in WL compared to the initial solution.
To the best of our knowledge, this is the first work of power-driven global routing in MSV designs. Recently the work [4] discusses power-driven GR, however it does not consider the MSV case. Also, it relies on the availability of power-efficient candidate routes for each net but generates such candidate routes quite heuristically. As part of the contributions of this work, we show a formal procedure to generate power-efficient candidate routes from the initial WL-optimized solution while taking into account the overall WL degradation.
The remainder of the paper is divided into 4 sections. Section II describes our MSV-based interconnect model. Section III discusses our formulation and solution procedure for power minimization. Simulation results are presented in Section IV, and conclusions are offered in Section V. 
II. INTERCONNECT MODELING
In this section, we discuss an MSV-based GR model. We assume the level converters (LCs) are placed for some of nets and the supply voltage of each cell is known.
A. Interconnect Modeling in MSV Designs
We are given a grid-graph G = (V, E ) model of the GR problem, where each vertex v ∈ V corresponds to a global bin containing a number of cells. Each edge e ∈ E represents the boundary of two adjacent bins. A capacity re is associated with each edge e, reflecting the maximum number of routes that can pass between two adjacent bins. A net i ∈ {1, . . . , N} is identified by its terminal cells, which are a subset of the vertices V. In MSV-based GR, the terminals of a net may also be the LCs. During GR, a Steiner tree ti in G is found for each net i to connect its terminals. The length of ti is taken to be its wirelength (WL). Figure 2 shows an example. The chip is divided into regions. Each region has either a low (VL) or high (VH) supply voltage. A routed net is specified in the figure. The net has one driver terminal with VL voltage and three sink terminals of VH voltage. The route includes two LCs which are also considered as additional terminals of the net.
For power-driven MSV-based GR, we first decompose a net which contains a LC into a set of sub-nets. We reroute each sub-net as an individual net during power optimization. Consequently, we have N d > N number of nets after decomposition. For example, in Figure  2 , the initial route is shown with its LCs. The net is decomposed into three sub-nets, each of which will be rerouted. The first sub-net connects the driver terminal in VL to the two LCs. The second one connects one LC to one VH terminal. The third one connects the other LC to the other two VH terminals.
The decomposition of each net is done using its initial route and the location(s) of its LC(s). For a net containing LC, starting from its driver terminal, a sub-net corresponding to a low supply voltage is formed that connects the driver terminal to a set of LCs and/or a set of sink terminals of the same supply voltage. Next, one or more sub-nets are formed that connect the LCs to the sink terminals of the same (and higher) voltage level. Our net decomposition procedure finds a minimum number of sub-nets for each net that contains a LC such that each sub-net has only one corresponding supply voltage. Space constraints prohibit us from providing complete details of the decomposition method. The general decomposition procedure is similar to existing works [5] , [6] .
B. Power Modeling
Each decomposed net i ∈ {1, ..., N d } has a corresponding supply voltage Vi and switching activity αi. The required interconnect power for a GR solution is estimated as 
where f clk is the frequency. As seen in Equation 1, the capacitance of routed net i is the sum of the capacitances of its sink cells (denoted by C sink i
) and of its route (denoted by C route i
). Here C sink i is a constant that does not depend on the re-routing, so it is excluded from the optimization. Note that the power of the LCs are considered fixed and thus also not considered as part of the interconnect power optimization. The capacitance C route i for a routed net i is the sum of the capacitances of its unit-length edges that are contained in route ti (given by notation e ti):
The parameter C u e is the capacitance of one routed edge e ∈ E. This capacitance is a function of the metal layer le, wire width we and wire spacing se of the edge e. Specifically, C u e = Ca(le, we) + 2Cf(le, we, se) + 2Cc(le, we, se),
where Ca and Cf are the area and fringe capacitances with respect to substrate, and Cc is the coupling capacitance. As indicated, these capacitances are functions of wire length, width, and spacing, and are provided by the technology library through a lookup table.
In this work, we assume that only one (and a different) wire width is associated with each metal layer, so we exclude the parameter we, and for each edge e ∈ E, its metal layer le is known. The spacing for edge e is estimated from the edge utilization ue in a GR solution. Given the utilization ue and the length of edge e, (computed from the chip dimension and the routing grid granularity), the spacing se is calculated to allow maximum spacing between its corresponding routes. Figure 3 shows an example for ue = 3. This simple averaging strategy may be adjusted if more information is available at the GR stage; (e.g., the adjustment may be due to the fixed short nets which fall inside a single global routing bin). With this approximation, we can express the capacitance of a unit-length route-edge in terms of the edge's metal layer and its utilization. The total capacitance of edge e is given by the product of the per-unit capacitance C u e and the utilization ue: Ce = C u e × ue. Figure 4 (left) shows the curves representing area, fringe, and coupling capacitances for metal layer 1 with respect to edge utilization for a 45nm library [7] , assuming each GR edge is 2μ. The summation of the 3 capacitances (C u e ) is shown on the right.
III. POWER-DRIVEN MSV-BASED GR
In this section, we first present a mathematical formulation of power-driven MSV-based GR. We then discuss integer programmingbased techniques to obtain high-quality solutions to the formulation. 
A. Mathematical Formulation
As described in Section II-B, the per-unit capacitance of an edge e (C u e ) is a function of its metal layer and the edge utilization. Typically, this function is a convex increasing function, as depicted in Figure 4 . We represent the function C u e by a set of line segments denoted by Q u e . For example, the set Q u e is composed of 7 line segments in the library used in this work [7] . Each line segment q ∈ Q Since the per-unit capacitance is convex, its value may be expressed in our mathematical optimization problem for GR with the following set of linear inequalities:
For a given edge utilization ue, the corresponding C To model GR we are given a routing grid graph G = (V, E ), a set of decomposed multi-terminal nets denoted by N d , and edge capacities re. Let Ti be a collection of all Steiner trees that can route net i. We later discuss how to approximate Ti by generating a set of power-efficient candidate trees with consideration of WL degradation. Each tree t ∈ Ti is associated with a binary decision variable xit which is equal to 1 if and only if it is selected to route net i. Let the parameter ate be equal to 1 if tree t contains edge e (if e t). The GR problem for power minimization is given by:
The first term in the expression of the objective function is the interconnect power as explained in Section II-B. It includes activity αi and voltage Vi of net i. The capacitance of a route t of net i is obtained by adding the unit edge capacitances C u e for all the edges e t. Here the route t ∈ Ti will be selected for net i only if xit = 1.
The first set of constraints selects at most one route for each net. The slack variable si is equal to 1 if net i cannot be routed, and the variable is penalized in the objective function by a large parameter M to maximize the number of routed nets. The term
t∈T i atexit represents the edge utilizations ue. The second set of constraints ensures that the edge utilizations are within the given edge capacities. The third set of constraints determines the per-unit edge capacitance C u e for each edge e from its utilization, using the discussed piecewise linear model. The fourth constraint ensures the new wirelength is within a factor β of the initially-provided wirelength W0. Here wit denotes the wirelength of route t of net i.
The constraints of formulation (PGR) are all linear. However, the objective expression is nonlinear (due to the multiplication of variables xit and C u e ). We handle the nonlinearity in a heuristic manner using a two-phase approach. First, we choose a re-routing that attempts to minimize the total capacitance of all edges. Next, per-unit capacitances are estimated (and fixed) based on the solution of the first phase, and a re-routing is sought that minimizes the total estimated power. Each of these two phases become integer linear programs (IPs) which are discussed in the next subsections.
B. Phase1: Minimizing Total Capacitance
This (convex) nonlinear expression may be re-linearized, resulting in another piecewise linear expression for the total edge capacitance that may be used in our linear integer program for minimizing the total capacitance.
Ce ≥ mque + rq ∀q ∈ Qe.
(5)
1) Formulation:
The formulation of phase 1 is given by the following IP:
The objective expression is similar to formulation (PGR) but the first term is replaced by ∀e∈E Ce which represents an estimate of the total interconnect capacitance. The third set of constraints is also updated; the variable Ce replaces C u e in the previous formulation, and the coefficients in the piecewise linear model are updated to use Equation 5 .
2) A Price-and-Branch Solution Procedure: We approximately solve the (PGR-P1) using the a two-step heuristics. First, a pricing procedure is used to generate a set of candidate routes for each net that are power-efficient while considering the WL degradation. The pricing step approximates Ti in the formulation to contain a small set of power-efficient candidate routes, instead of all the potential routes of net i. Second, branch-and-bound is applied to solve (PGR-P1), selecting one route for each net from the set of generated candidate routes. The standard branch and bound algorithm can be carried out using a commercial solver. This two-step procedure of generating candidate routes and then running branch and bound is commonly known as price-and-branch [8] , [9] . The price-and-branch procedure was recently applied to solve the GR problem for WL improvement [10] . We apply the same procedure for power improvement. The major technical difference in our procedure is in the pricing step to find power-efficient candidate routes, which we next discuss in detail.
3) Overview of Pricing for Route Generation:
We solve a linearprogramming relaxation of (PGR-P1) by replacing the binary requirements on the variables xit with constraints 0 ≤ xit ≤ 1∀i, ∀t. The linear program is solved by an iterative procedure known as column-generation [11] . In column generation, we start by replacing Ti (set of all possible routes of net i) in formulation (PGR-P1) by subset Si ⊂ Ti, initially containing one candidate route per net. We then gradually expand Si, adding new routes that may decrease the objective function. Adding the new candidate routes is via a poweraware pricing condition for each net.
Before explaining the procedure in more detail, we first give the following notations:
1. We refer to the LP relaxation of (PGR-P1) in which Ti is replaced by Si and 0≤ xit≤1 by the "restricted master problem" denoted by (RMLP-P1); the solution of (RMLP-P1) for a given Si is denoted by (x,ŝ,Ĉ); 2. We refer to the dual of the restricted master problem by (D-RMLP-P1). The solution of (D-RMLP-P1) consists of (λ ≤ M,π ≤ 0,μ ≥ 0,θ ≤ 0), corresponding to the dual variables for the first, second, and third set of constraints in the relaxed (PGR-P1), respectively.
The iterative column generation procedure including the pricing condition is enumerated below: q∈Qe mqμeq − e t * (πe +θ), then Si = Si ∪ {t * }. 4. If an improving route for some net i was found in step 3, return to step 1. Otherwise, stop-the solution (x,ŝ,Ĉ) is an optimal solution to (RMLP-P1).
Step 3 gives the pricing condition in terms of the solution of the dual problem (D-RMLP-P1) obtained at the current iteration. This step can determine for a given new route t * , if it should be added to the set Si to reduce the objective of (RMLP-P1). However, it does not specify how a new route should be found such that the pricing condition gets satisfied. We discuss a convenient graph-based procedure to generate new route t * which satisfy the pricing condition.
4) Route Generation for One Net:
To find improving routes for net i, we associate a weight we for edge e in the GR grid as:
By the theory of linear programming, for each edge e, at most one dual variable μeq, q ∈ Qe will be positive in an optimal solution to (D-RMLP-P1). Thus, considering route t * , we can compute the pricing condition asλi > ∀e t * we. We take advantage of this interpretation to identify promising route t * which satisfies the pricing condition. Given a route t ∈ Si obtained from previous iterations, we obtain t * by rerouting branches of t with the updated edge weights so that the overall weights of rerouted branches are reduced.
We explain the procedure with the example of Figure 5 . Considering two nets a and b, suppose we are initially given the routes ta and t b for these two nets. After step 2 at the first iteration of column generation, we obtain edge weights which are given in the figure on the left. To obtain a new route t * a for net a, we reroute different branches of ta. For each terminal, we identify a branch as the segment connecting it to the first Steiner point on ta. We then reroute this branch by solving Dijkstra's single-source shortest path algorithm [12] on the weighted graph with the weights of the first iteration, similar to [13] , [14] . The route t * a is shown in the right figure. After adding t * a to Sa we proceed to the second iteration and obtain new edge weights which are shown in the right figure.
The discussed pricing procedure is similar to [10] . However, it differs in the pricing condition and the way edge weights are set up. For solving (RMLP-P1) and its dual at each iteration we use the solver CPLEX 12.0. After obtaining the final set Si, again we use CPLEX 12.0 for the branch and bound step to get the final solution. We further accelerate the process by applying a simple problem decomposition that we will discuss in Section III-D.
C. Phase2: Considering Activity and Voltage
At phase 2, we approximate the per-unit edge capacitances using the solution from phase 1, and re-route the nets to minimize an approximation of the total power. Since the utilization (and hence capacitance) corresponding to the routing solution of phase 2 may be different from phase 1, we heavily penalize any mismatch in our optimization.
1) Formulation:
We compute the following quantities after phase 1:
1. We define a new "effective" capacity for each edge e as re =
atexit, wherexit is the value of the routing solution from phase 1. 2. We define the new per-unit capacitance asC u e =C ẽ re , wherẽ Ce is the value of the edge capacitance from the solution found in phase 1.
With these definitions, the formulation of phase 2 is the following integer linear program:
The first term in the objective expression is summation of an estimate of the power of the nets where ( e tC u e ) is the fixed approximate per-unit capacitance of edge e which contains route t and is obtained using the solution of phase 1 as discussed before. The first set of constraints ensures at most one route is selected per net, otherwise a heavy penalty of M1 is associated if si = 0, and this is reflected in the second term of the objective function. The second set of constraints enforces the new utilization of each edge to bere + e, where e is a new variable which is heavily penalized by a large factor M2 in the objective function if e = 0. In other words, we highly penalize if the rerouting of a net causes a larger edge utilization compared to phase 1. This in effect forces the routing process to keep the mismatch in the edge utilizations as small as possible which translates in the capacitance (which is function of utilization) to remain close to phase 1. We also enforce e +re ≤ re to ensure the edge utilization is not beyond its actual capacity re in the fourth set of constraints. Finally, the third set of constraints ensures the increase in wirelength is bounded by factor β.
2) Solving using Price-and-Branch: The solution procedure is quite similar to the one explained in the previous Section III-B for phase 1. Here, we just note the differences. We denote the restricted master problem by (RMLP-P2) and its solution by (x,ŝ,ˆ ). The dual of the restricted master is denoted by (D-RMLP-P2) and its solution is (λ,π,θ), corresponding to the first, second and third set of inequalities in relaxed (PGR-P2), respectively.
The initial set Si is set to all the candidate routes generated from phase 1. This helps to quickly generate a high quality solution for phase 2. It also ensures that the solution of phase 1 is included as a feasible solution in phase 2.
The pricing condition is given by the following inequalityλi > αiV 2 i ( e t C u e ) − e∈t (πe +θ) and is used to define the edge weights given by we = αiViC u e −πe −θ, ∀e ∈ E.
D. Decomposition
To accelerate solving the two-phase formulation, we apply a simple problem decomposition. We recursively divide the chip into a set of rectangular subregions while balancing the total number of nets that fall inside each subregion. We use the initial WL-optimized solution of [5] to guide this process. We stop when the number of nets at each subregion is at most 3000, which we empirically determined for our experimented benchmarks from the ISPD2008 suite [15] .
Each subproblem is then defined as one rectangular subregion with the set of nets assigned to it. If a net passes from multiple subregions, we force the terminal location on the subregion boundary to be fixed from the initial WL-optimized solution. This allows independent solving of each subproblem without the hassle of later connecting the segments of a route in adjacent subregions. The subproblems are then (one-time) parallel-solved to get the final solution.
Please note, the main difference between our decomposition procedure and [10] is the use of the initial WL-optimized solution to fix the terminal locations on the subregion boundaries and thus avoid later connecting adjacent subproblems.
IV. SIMULATION RESULTS

A. Benchmark Instances
In order to test our solution procedure and determine whether or not significant power savings were possible without increasing wirelength, we modified known benchmarks to include multi-supply voltages. Modifying the benchmarks required us to generate timing data, power data, and place level converters. We implemented the procedure of [2] to generate voltage islands for two voltage levels of VL = 0.9V and VH = 1.1V . The procedure required a sequential netlist with gate-level delay and power models.
Timing Modeling:
We assumed the locations of the sequential elements in the ISPD 2008 benchmarks using the following procedure. First, we obtained a Directed Acyclic Graph (DAG) representation of the benchmarks from the variation provided by the ISPD 2006 placement benchmarks [16] . Using the placement benchmarks, we obtained a DAG by starting from the designated Primary Inputs and traversing in forward direction until reaching the Primary Outputs. We also assumed the nets with more than 50 terminals to be clock trees to identify sequential elements.
We then assumed the delay of each cell (or node in the DAG) is proportional to its size (for unit load) where the unit delay was assumed to be of the inverter of the 45nm library [7] used in this work. We considered loading in our cell delay modeling to be proportional to the cell size which was also given in the placement benchmarks.
Power Modeling: We randomly and uniformly generated the activity factors of each net to be between 0.1 and 0.9. The 45nm library used in this work contained information about the total capacitance (area, fringe, coupling) for each of the 8 metal layers. We used the method described in Section III to extract piece-wise linear model for Ce and C u e for each of the 8 metal layers. For each metal layer, we considered the minimum wire size given in the library. To map edge utilization to spacing, we assumed the length of each edge of the GR grid to be 2μ; for a given utilization we assumed the maximum spacing between the routes mapped to the same GR edge.
Level Converter Placement:
After voltage island generation, we needed to decide the locations of the level converters (LCs). (The procedure in [2] didn't specify these locations). For simplicity, we inserted the LCs on the initial WL-optimized solution that was taken from [5] . The LCs were inserted for any net that had a source terminal driving one or more sink terminals. The procedure minimized the number of LCs and placed them as close as possible to the sink terminals, subject to the available whitespace. The whitespace inside each global bin was derived by evaluating (both) the placement and GR variations of the ISPD benchmarks.
B. Results
Using the initial WL-optimized solution of [5] , and after fixing the locations of LCs, we applied net decomposition (as described in Section 2.1). Table I reports the number of nets, decomposed nets, and LCs in columns 2, 3, 4, respectively. We then applied our power-driven GR procedure using a wirelength degradation factor of β = 0, so no wirelength degradation was allowed. We used CPLEX 12.0 [17] to solve our two-phase formulation, and parallel-processed the subproblems by submitting the jobs to a grid of CPUs of 2GB memory. The number of subproblems (same as number of processors) is given in column 5 (#SP) in Table I .
We then compared three routing solutions.
• The initial WL-optimized solution of [5] ;
• The solution after applying phase 1, obtained by solving the formulation (PGR-P1); • The solution by further applying phase 2, obtained by solving (PGR-P1) followed by (PGR-P2). For each case, we report the wirelength (WL), the total capacitance (C) (
, where C route i is defined in (2)), given in units fF , and the GR power metric P from (1), excluding the constant portions of the expression. The results are reported in Table I in columns 6 to 14. For the initial solution, we report the wirelength (W0) of the NTHU-R2.0 routes that have been augmented with the extra via-only segment(s) to connect the LC(s) to the original routes. (As a result, there is slight increase in wirelength compared to the numbers reported in the work [5] ). For the solutions of phase 1 and phase 2, we report only the percentage improvement in WL, C, and P, all with respect to the initial solution.
As can be seen, applying phase 1 of the power-reduction heuristic results in significant saving of 8.77% in P. Recall, the savings are solely due to capacitance reduction (as can be seen from the higher improvement rate in C compared to P). By further applying phase 2, we see additional improvement in P (on average 16.70%). The improvement in C is slightly larger than phase 1, even though phase 1 solely focuses on optimizing C. This is because we start phase 2 by including all the candidate routes generated from phase 1. Notice that in both phase 1 and phase 2 there is improvement (reduction) in WL compared to W0. It is important to note that no extra overflow was introduced in the power-optimized solutions.
In our simulations, we explicitly bounded the runtime for phase 1 and phase 2. The wall clock runtime of all benchmarks for phase 1 and phase 2 were set to 30min and 40min, respectively. The number of processors (same as subproblems) is given in column 5.
V. CONCLUSIONS
We proposed a formulation for minimizing an interconnect power metric for global routing for design with multi-supply voltage. Power minimization is after an initial wirelength-optimized solution is obtained. We presented a mathematical formulation which considered power saving opportunities by reducing the area, fringe and congestion-dependent coupling capacitances at each metal layer, while accounting for the activity and supply voltage of each route segment. We showed significant savings in the power metric for global routing without any degradation in wirelength or overflow.
