Abstract-Circuit and processor designs will continue to increase in complexity for the foreseeable future. With these increasing sizes comes the use of wide buses to move large amounts of data from one place to another. Bus routing has therefore become increasingly important. In this paper, we present a new bus routing algorithm that globally optimizes both the floorplan and the bus routes themselves. Our algorithm is based on creating a range of feasible bus positions and then using Linear Programming to optimally solve for bus locations. We present this algorithm for use in microarchitectures and explore several different optimization objectives, including performance, floorplan area, and power consumption. Our results demonstrate that this algorithm is effective for efficiently generating feasible routes for complex modern designs and provides better results than previous approaches.
I. INTRODUCTION
As systems become larger and more complicated, global routing of signal wires becomes increasingly important for objectives such as performance and power consumption. These objectives, however, are strongly related to various constraints, such as routability and temperature. For example, a minimum bus area objective increases performance and decreases the power consumption needed to drive buses, but it requires the use of the center location of the floorplan for routing. This necessarily entails decreased routability and increased temperature. Because all of these factors are interrelated, they should be optimized simultaneously.
Recent trends towards large numbers of signal wires have generated interest in bus routing rather than single-wire routing [1] . Using a simple approximation, the runtime of singlewire routing is N times bigger than that of bus routing with an N -bit bus. In microprocessors and DSPs that have many cores, interconnects are major performance bottlenecks. The ever increasing number of bits used in modern designs causes the need for bus routing to be beyond doubt. Yet, bus routing is totally different from single-wire routing because of the significant widths of buses. This makes previous global routing algorithms ([2] , [3] , [4] , [5] , [6] ) unusable for this application.
Previous works [1] , [7] , [8] , [9] addressed various global bus routing algorithms. However, they are either incomplete in terms of mixed objectives, such as routability, execution time, and performance, or not applicable to complicated routing problems, such as in microprocessors where many functional This material is based upon work supported by the National Science Foundation under CAREER Grant No. CCF-0546382 and MARCO C2S2. modules are connected with wide buses. [10] tries to solve global bus routing problems with a straightforward method. However, it routes buses one by one so this approach is inferior to concurrent methods.
In this paper, we present a new bus routing algorithm (H-LP) based on Hanan grids [11] and linear programming (LP). We first describe our algorithm, compare it with other algorithms, and show its application to a 64-bit microarchitecture. Our experiments show the efficiency of H-LP with respect to various objectives, such as runtime, area, and performance. The contributions of this paper are as follows:
• We develop a new bus routing algorithm (H-LP) based on linear programming. It routes all nets simultaneously and overcomes the limitation of the sequential method.
• Our algorithm consistently outperforms several wellknown algorithms in the literature in terms of routability, performance, and area.
II. PROBLEM FORMULATION
The bus routing problem is formulated as follows. is a range for the feasible placement of a real segment. A bus consists of real segments as in Figure 1 .
III. BUS ROUTING ALGORITHM
Our bus routing algorithm (H-LP) is divided into five steps. Figure 2 shows the overall algorithm of H-LP. Each of the steps is explained in the following subsections.
A. Floorplanning
We use the Sequence Pair [12] floorplan representation and simulated annealing to floorplan. We employ three movesswapping two modules in one of the sequences, swapping two modules in both sequences, and rotation of a module. The cost function is as follows.
In equation (1), α, β and γ are weighting factors, A is floorplan area, and R is a routability constant. R is either R 1 (for successful routing) or R 2 (for failed routing). We explain this in detail in Section IV-D. AF i and BA i are access frequency and bus area of bus H i , respectively. The access frequency is computed based on how many times the bus is used during architectural simulations. It is shown in [13] that the minimization of this factor improves the performance of the underlying architecture measured in terms of IPC (instructions per cycle). The bus area is related to power consumption objective.
B. Bus Topology Generation
In this step, we generate a bus topology for each bus one by one for a floorplan. Notice that this step generates virtual segments and we find real segments with Linear Programming.
We start bus topology generation by making a bounding box and extending the borders of all the blocks in a bus to make the Hanan grids within it. An example is shown in Figure 3 . The reason we do not use uniform grids is that uniform grids make alignment of segments harder. We restrict the routing range to the inside of the bounding box in order to prevent long detours.
Next, each grid is marked with V , H, or O as in Figure 4 . V or H mark corresponding to B n means the grid can be connected to the block B n by a vertical or a horizontal segment respectively. O means the grid overlaps the block B n so that it can be used for either a horizontal segment or a vertical segment. We define the degree of a grid as the number of marks on it.
The next step is to heuristically choose grids to route all the blocks connected by the bus. We first compute weights of all grids as follows.
where W (G i ) is a weight of grid G i , and CX(P ) and CY (P ) are respectively X and Y coordinates of the center of a block or a grid P . Then, we sort all grids in decreasing order of degree. If there is a tie, a grid having smaller weight is chosen first. Then we select a seed grid that has at least degree 2 because grids of degree 1 have no connection point, and we iterate grid selection until we cover all blocks to be connected as in Figure 5 . We show an example in Figure 6 . Two routings for the same bus are compared in Figure 6 (b) and Figure 7 . We assume that B 1 is a source and B 2 and B 3 are sinks. Figure 6 (b) has minimum bus area and minimum SSL (Source-to-Sink Length) for all sinks. On the other hand, Figure 7 has bigger bus area and bigger SSL for the sink B 3 . However, we can benefit from the routing in Figure 7 as the bus does not occupy the center region thereby decreasing congestion and increasing the chance of generating a feasible LP. We compare this algorithm with [14] for Bus Topology Generation in Section V-B.
C. Bus Ordering
The segments generated from the previous step are virtual. We find real segments by converting the virtual segments into a LP and solving it. However, we need to satisfy nonoverlap conditions between horizontal (or vertical) segments from different buses. Without a bus ordering step, we need Integer Linear Programming (ILP) to formulate non-overlap constraints. ILP formulation for two vertical segments S i and S j is as follows.
In the above equation, Z i,j is a binary variable which is zero or one, X p is a x-coordinate of bottom-left corner of the segment p, and W p is the width of the segment p. M max is a large value. ILP formulation for y-coordinates is similar.
Solving ILP, however, needs huge amount of time if there are lots of integer variables. We show comparisons of ILP with LP in Section V. In summary here, we definitely need bus ordering if the number of bus increases.
In the bus ordering step for LP formulation, we compare the relative locations of two virtual segments having same direction. For example, if two segments S i and S j are vertical, and the left side of S i is to the left of bus S j , having the real segment of S i be to the left of the real segment of S j is Fig. 7. Different routing obtained by choosing grids G 3 and G 4 . better because it gives us higher degree of freedom. We also compare the right sides of S i and S j to refine the ordering as in Figure 8 . This increases the chances of creating a feasible LP formulation.
D. LP Generation
After bus ordering, we convert the constraints into LP. The objective of the LP can be varied according to our goal; however, constraints cannot be changed. LP generation is explained in detail in Section IV. We use version 5.5.0.12 of lp solve [15] to solve the LP.
E. Rip up and Re-route
Routing failure comes from an infeasible LP caused by congestion. When the routing fails, we first revisit the Bus Topology Generation step and try to make different bus topologies to make the LP feasible. If this does not provide feasible solutions after several iterations, we go back to the Floorplanning step and perturb the current floorplan as in Figure 2 . The Bus Ordering step also has room to make the LP feasible but its effect is small.
There are various techniques for making different bus topologies, such as changing the weights in Equation (2) or selecting a different seed grid. Our choice is to include a virtual segment congestion map. For this, we create a uniform grid for the entire floorplan to see where congestion occurs. When a virtual segment is placed, we add an expectation value to the grids covered by the virtual segment. This expectation value is computed using the following equation:
where BW is bus width, SH is the virtual segment height (for a horizontal segment) and GH is the grid height. This can be refined if a more accurate model is needed. Fig. 9 . Coordinates of the segments.
Fig. 10. 18 connection cases between two segments
Using the congestion map, we rip up virtual segments in congested regions and choose other grids for those buses. The LP is then reformulated and solved. If we fail to formulate a feasible LP within a pre-determined number of iterations then we go back to the Floorplanning step.
IV. LP FORMULATION
The inputs needed for LP formulation are the set of virtual segments and the orders of the segments. We show LP formulation with an example in Figure 9 .
A. Constraints
We have two types of constraints in LP formulation. The first constraint is the allowed range of segments. For example, segment S 1 in Figure 9 should connect the grid B 1 (which is also a block) to the grid G 3 . Therefore the bottom and the top of S 1 should be inside B 1 and G 3 respectively.
Moreover, S 1 should be connected to S 2 , so dependencies exist among connected segments. There are 18 connection cases as in Figure 10 . Here we show a formulation only for case (1) because of limited space. Let F (·) be the coordinate of a fixed segment and N (·) be the coordinate of a new segment connected to the fixed segment. Then the coordinates of the new segment of case (1) are:
If there is no segment connected to the top of the new segment, N (y 2 ) becomes a constant. If it is connected to another segment, the range of N (y 2 ) is affected by the segment. 
B. Constraints -Non-overlap
If a vertical segment V 1 is supposed to be in the left of a vertical segment V 2 , the constraint is formulated as follows:
y-coordinates of horizontal segments are formulated similarly.
C. Objective Function
We hand an objective function and constraint inequalities to a LP solver. We show two examples in the following:
Notice that the equation (7) is linear because S(y 2 ) − S(y 1 ) and S(x 2 ) − S(x 1 ) are BW when the segment is horizontal and vertical, respectively.
SSL :
SSL has a strong relation with performance. In Equation (8), F (f i ) is a constant obtained from access frequency [13] of bus H i and SSL i is the SSL of bus H i .
D. Routability
Here we explain the R value in Equation (1). Since real segments are determined by LP, we have coordinates for all buses if the LP is feasible. Otherwise, we have no coordinates. Therefore the routability (=
N o. of routed buses
No. of buses ) of H-LP is either 0% or 100%. Thus we need only two constants R 1 and R 2 for R.
V. EXPERIMENTAL RESULTS
We implemented H-LP using C++. We use two benchmarks to compare bus routers. The first is MCNC benchmark and the second is μarch used in [10] . The widths of the widest bus and the most narrow bus of μarch are, respectively, 512 bits and 7 bits. The average is 107.6 bits. SimpleScalar [16] and the SPEC 2000 benchmarks [17] are used for IPC measurement. 
A. LP vs ILP for Bus Ordering
We first compare the results of bus ordering by LP and ILP in Table I . #OV R is the average number of non-overlap conditions, so it is same as the number of binary variables in ILP. As Table I shows, the runtime to solve LP and ILP is similar when the number of buses is small, and the bus area of ILP is 5 -10% smaller. However, it takes a very long time to solve ILP when there are lots of buses. In our experiment, solving apte (#blk:9, #bus:36) did not finish within 7 days. This shows why we need bus ordering step along with LP relaxation.
B. Bus Topology Generation Algorithm Comparison
The Steiner Min-Max Tree (SMMT) [14] is a well-known Steiner tree construction algorithm to avoid congestion in multiple net routing. We compare SMMT with our algorithm for Bus Topology Generation. We used same bus ordering step for both algorithms for fair comparisons. Routability for each algorithm is computed as:
# of successf ul trials # of total trials
These values are shown in Table II . We observe that bus areas are comparable, but routability of [14] is lower compared to that of ours. This shows that our algorithm is simple but wellsuited for Bus Topology Generation.
C. Comparison with Previous Works
We next compare H-LP with [7] and [10] in Table III . We observe that H-LP is consistently outperforming [7] and [10] in terms of routability and floorplan area. Note that [7] considers timing constraints and performs buffer/channel insertion, which explains routing failures in most cases.
In Table IV , we compare H-LP with [9] and [10] . We observe the following: [10] ).
• The scalability and efficiency of H-LP becomes evident for large circuits such as ami49 and μarch. Minimizing bus area is an important goal because it affects not only power consumption for driving gates, but also the thermal profile which is caused by joule heating [18] , [19] . H-LP always has smaller bus area while keeping the floorplan area similar to that of [10] .
• H-LP runs slower than [10] for a few small benchmarks, such as apte, xerox or hp, but still faster than [9] . In fact, the runtime of H-LP for small benchmarks can be significantly decreased if we do floorplanning without routing at the beginning and do routing for the last few floorplans.
• [9] does not work well for complex benchmarks, such as μarch, as it uses restricted shapes of buses. [10] or H-LP, however, have higher routability even for those benchmarks because of the flexibility of bus shapes.
D. Applications to Microarchitecture
Figures 11 through 13 show comparisons of IPC (Instructions Per Cycle) for three objectives. 1 We observe that our LP-based concurrent approach consistently outperforms [10] that is based on sequential approach. In addition, Figure 14 proves that our performance-objective function works well to boost performance results. However, there is a trade-off between performance, area and power as shown in Table V . High performance comes at a cost of slightly larger floorplan area, bus area, and power consumption.
VI. CONCLUSION
As the complexity of modern designs reaches ever higher, bus routing has become of extreme importance to ensure ceaseless performance increase. In this paper, we developed a new bus routing algorithm (H-LP) based on Hanan grids and Linear Programming. Our algorithm globally optimizes the placement of all buses to ensure performance. Also our algorithm is capable of optimizing multiple objectives for different applications. We presented results and applications for various microarchitectural goals. The H-LP algorithm is superior to other previous bus routing algorithms in terms of execution time, floorplan area, bus area, and routability. Experimental results show that this algorithm is scalable and efficient for today's highly complex designs. 100% routability is acheived for all benchmark circuits and better optimization results show superiority and applicability of H-LP.
