I. INTRODUCTION
As technology advances, the amount of interconnections between different modules on a chip increases rapidly. Bus routing has become more and more important. Bus driven floorplanning considers bus placement. The objective of the problem is to obtain a bus-routable floorplan such that the area of the chip and the total area of the buses are minimized. In [4] , the authors proposed a unified method to handle simultaneously different kinds of placement constraints, including alignment and abutment. This approach is not suitable for bus driven floorplanning neither as for a bus, the order in which the blocks are passed by the bus is not fixed.
In [3] , the authors made use of the idea from [5] and designed an intact algorithm to solve the bus driven floorplanning problem based on a simulated annealing framework. Each candidate floorplanning solution is checked by an evaluation step to see if the buses are feasible, i.e., the required set of blocks can be passed through by a 0-bend bus. Chen and Chang [2] also addressed this bus driven floorplanning problem based on the B*-tree representation. One major drawback of these two approaches is that, only horizontal and vertical buses are considered and the solution quality will deteriorate when the num-ber of blocks involved in each bus is large. Another previous work [1] improved over them by allowing 0-bend, 1-bend, and 2-bend buses.
It is still very restrictive to allow only 0-bend, 1-bend and 2-bend buses. There is no reasons to impose such restriction except for reducing the number of vias used since vias have adverse effects on delay, area and circuit reliability. However, if all bendings occur at the blocks on the bus net, no extra vias is required since a via exists anyway to connect to each block on the net. If a bending occurs somewhere rather than at the modules on the net, an extra via is required to connect the horizontal and vertical bus components. Therefore, in this paper, we try to address this bus driven floorplanning problem under the constraint that all bendings must occur at the blocks on the bus net. There is no limitations on the bus shape and the number of bendings as long as the above requirement on the bending positions is satisfied. We solve this problem in a floorplanner based on the TCG representation. In our approach, we will first compute the shape of a bus satisfying the above constraint and minimizing the total bus length. This is done by applying a modified minimum spanning tree algorithm on a combined constraint graph (called common graph). After this step, each bus is decomposed into a set of horizontal and vertical bus components. We will then compute the positions of the blocks in the floorplan realization step by properly adjusting the block positions such that all the bus components can pass through their respective blocks successfully. Experimental results have shown that we can improve over [1] in terms of both run time and quality, by having more flexibilities in the shapes of the buses, and replacing the complex shape validation steps by simplier methods. For data sets with buses connecting a large number of blocks, our approach can give satisfactory results effectively, while the approach [1] of restricting to 2-bend buses often cannot give any feasible solutions.
The rest of this paper is organized as follows. A formal definition of the bus driven floorplanning problem will be given in section II. We will discuss the general placement constraints for buses and the bus ordering issue in section III and IV. Details of our algorithm will be described in section V . The experimental results will be presented in section VI.
II. PROBLEM FORMULATION
We assume that there are two metal layers reserved for busrouting, one for horizontal buses and the other for vertical buses. In the bus driven floorplanning (BDF) problem, we are given the following:
1. A set of n rectangular modules M = {m 1 , m 2 . . . m n } and each modules m i is associated with an area a i and an aspect ratio bound [r i , s i ] where r i , s i ∈ R + , and 2. A set of k buses B = {b 1 , b 2 . . . b k } and each bus b j has a width t j and a bus net N j , where t j ∈ R + and N j ⊆ M .
Our goal is to decide the position of each block and the route of each bus, such that no overlapping occurs between any two blocks and between any two horizontal (vertical) components of the buses. Besides, all bendings of the buses must occur at the modules on the corresponding bus nets in order to minimize the number of vias used. The objective is to minimize the chip area and the total bus area.
In this paper, we propose a novel algorithm to solve this problem, without fixing the bus shapes nor limiting the number of bendings as long as they occur at the modules on the bus nets. With more flexibilities in the shapes of the buses, the size of the solution space is increased and a better BDF solution can be obtained. Besides, the overall efficiency can be improved since complex bus shape validation steps are not needed.
III. PLACEMENT CONSTRAINTS FOR BUS
In this section, we will discuss how we can align blocks in a packing in order to allow buses with zero or more bends to pass through. These basic technique will be used in our floorplanner.
A. Zero-Bend Bus
There are only two types of zero-bend buses, horizontal and vertical. In the following, we only discuss the placement constraints for horizontal buses. The vertical buses can be handled similarly.
Consider a set of k modules {m 1 , m 2 . . . m k }, where module m i has a width w i and a height h i for w i , h i ∈ R + . If all the k modules are aligned horizontally, the corresponding horizontal closure graph G h is shown in Fig. 1 (assuming that they align in the order of m 1 , m 2 . . .). Due to the transitive closure property, each node is connected to all the "downstream" nodes by a horizontal edge with a weight equal to its width. On the other hand, the vertical closure graph G v will contains only k isolated nodes without any edges between them. Suppose that we need to generate a horizontal bus b with width t and a bus net N ⊆ M . In order to allow the bus to pass through all its blocks in the final floorplan, we need to maintain a relative relationship between the modules in the vertical direction, i.e., the vertical overlap of the modules has to be at least the bus width t. This can be done by adding constraint edges to G v . We will first add to G v a dummy module m d of height t and zero width to represent bus b. Then, we will add constraint edges between m d and each m i in N . In this case, the distance of m i 's lower right corner relative to m d 's lower left corner must be in the range of [−h i + t, 0], so a pair of constraint edges are added to G v :
1. An edge from m d to m i with weight t − h i 2. An edge from m i to m d with weight 0
Similarly, if we want to generate a vertical bus, a dummy module m d of zero height and width t will be added to G h . Then, a pair of constraint edges will be added to G h between each m i in N and m d as follows.
1. An edge from m d to m i with weight t − w i 2. An edge from m i to m d with weight 0
Notice that instead of adding pairs of edges between every pair of modules in a bus net, this approach of adding a zero area dummy node to represent the bus can help reducing the number of additional constraint edges from quadratic to linear, and hence to improve the efficiency of the floorplanning algorithm. Fig. 2 shows the vertical closure graph G v after inserting the constraint edges. 
B. Multi-Bend Bus
A multi-bend bus is formed by one or more zero-bend bus components. After decomposing a multi-bend bus into a set of 0-bend bus components, the corresponding sets of additional constraint edges for each component can be inserted into the constraint graphs as discussed in the previous section to align the blocks for the bus component to pass through. Fig. 3 shows a placement of four blocks and an L-shaped bus with two bus components, one horizontal and one vertical. The TCGs with the additional constraint edges are shown on the right. Now we are left with the problem of how to decompose a bus into a set of horizontal and vertical bus components such that all bendings will occur at the modules of its bus net. In our approach, we will first build a graph called common graph for each bus. By finding a suitable spanning tree on this graph, we will be able to determine the bus components. More details will be given in section V.
IV. BUS ORDERING
In a feasible BDF solution, no buses should overlap with one another on each metal layer. It means that no horizontal components should overlap with another horizontal component and similarly for the vertical components. This non-overlapping requirement can be enforced by imposing a bus ordering between 2C-3 the buses, e.g., bus i must be put on top of or on the right hand side of bus j. Given a floorplan of n modules {m 1 , m 2 . . . m n } with constraint graphs G h = (V, E h ) and G v = (V, E v ), the edges in E h and E v , representing the relative positions between the modules, may give a natural ordering between two buses b 1 and b 2 with bus nets N 1 and N 2 respectively as follows: For those bus pairs which do not have such natural orderings, we need to assign their orderings explicitly if they may overlap. There are only two cases that two bus components b 1 and b 2 may overlap:
.e., b 1 and b 2 share at least one module.
Case 2 N 1 ∩ N 2 = ∅ and ∃m i ∈ N 1 and m j , m k ∈ N 2 or ∃m i ∈ N 2 and m j , m k ∈ N 1 such that e ji and e ik ∈ E h (or e ji and e ik ∈ E v ), i.e., the modules of b 1 and b 2 interleave with each other in the x-direction (or y-direction).
In these two cases, we will impose an explicit bus ordering to prevent overlapping. Suppose t 1 and t 2 are the widths of b 1 and b 2 respectively and m d1 and m d2 are their corresponding dummy modules in the constraint graphs. An explicit bus ordering can be enforced as follows:
1. When b 1 and b 2 are both horizontal, we add an edge from m d1 to m d2 with weight t 1 or an edge from m d2 to m d1 with weight t 2 to G v 2. When b 1 and b 2 are both vertical, we add an edge from m d1 to m d2 with weight t 1 or an edge from m d2 to m d1 with weight t 2 to G h Fig. 5 shows an example of how bus overlapping can be prevented by imposing an explicit bus ordering. In this example, the overlapping between the two horizontal bus components is removed by adding an edge of weight t 2 from dummy node m d2 to node m d1 in G v . Fig. 5 . Prevention of bus overlap by imposing explicit bus ordering. In this example, b1 is connecting m1 and m6, and b2 is connecting m4 and m5.
V. METHODOLOGY
Simulated annealing (SA) is used as the basic searching engine in our floorplanner. In each iteration of the annealing process, a floorplan, represented by a pair of transitive closure graphs (G v and G h ), is generated. A pair of reduced constraint graphs (G v and G h ) (whose structures will be described later) will then be constructed to for the efficiency of the later processes. For each bus, we will create a graph called common graph from the two reduced constraint graphs, on which we will apply a modified minimum spanning tree algorithm to determine the set of bus components. Then, we will decompose the bus into a number of horizontal and vertical components. A set of constraint edges will be added to G v and G h to align the blocks for the bus components to pass through according to the method in section III. Meanwhile, we will check whether the bus is feasible. If the bus is infeasible, its constraint edges will be removed and a penalty term will be added to the cost of the annealing process. After processing all the buses, some more constraint edges will be inserted to prevent bus overlapping. Finally, we will perform a single source longest path algorithm to determine the positions of the modules and the buses. At the end, we will compute the cost of the BDF solution according to the total chip area, the total bus area and the number of infeasible buses.
A. Construction of Reduced Graphs
Given the constraint graphs G h = (V, E h ) and G v = (V, E v ) of a candidate floorplan solution, we will construct a
2. E h ⊆ E h and e ij ∈ E h iff e ij ∈ E h and m i , m j ∈ V , 3. E v ⊆ E v and e ij ∈ E v iff e ij ∈ E v and m i , m j ∈ V , 4. the weight of e ij ∈ E h (E v ) is the longest path distance between m i and m j in G h (G v ) respectively. The reduced graphs (G h and G v ) contain the constraint modules as nodes and the weights on the edges represent the distances between the modules in G h and G v . The weights of the edges in E h and E v can be found by performing an all pair longest path algorithm on G h and G v .
B. Construction of Common Graph
For each bus b j with width t j and bus net N j , we will further construct a common graph denoted by G cj = (V j , E j ), where 1. V j = N j , 2. E j = {e ik |e ik ∈ E h ∪ E v and m i , m k ∈ N j }, and 3. the weight of an edge in E j is the same as that of the corresponding edge in E h or E v , depending on where it comes from. The common graph for bus b j contains all the modules on the bus net N j . Its edge set includes all the edges in G v or G h connecting any two modules in N j . Due to the transitive closure properties of G v and G h , the resulting common graph G cj is a complete graph with |N j | nodes.
C. Spanning Tree for Bus Assignment
A bus is required to pass through all the modules on its bus net. No matter what its shape is, the routing of the bus must span all the nodes in the common graph. Therefore, our aim is to find a good spanning tree denoted by T j (V j , E T j ) from the common graph G cj (V j , E j ).
In order to reduce the total bus area, our goal is to find a minimum spanning tree. However, the minimum spanning tree on G cj does not always lead to a feasible bus. The first reason is that the number of bus components passing through a module may exceed the maximum number allowed (we call this the capacity of the module), e.g., a connected module can at most allow one horizontal and one vertical bus component of the same bus to pass through. The second reason is that adding the corresponding set of constraint edges for a particular bus component (as described in section III) may create positive cycles in the constraint graphs because its alignment requirements on the modules may contradict with those of some other selected bus components.
To solve the first problem, we modified the Kruskal's algorithm as follows. When constructing the spanning tree, we will update the number of vertical edges (edges from G v ) and horizontal edges (edges from G h ) connected to m i for all m i ∈ N j . Whenever a new edge (m i , m k ) is included in T j , not only that we will check if T j becomes cyclic (as in the traditional Kruskal's algorithm), we also check if the capacities of m i and m k are violated. We will just skip the edge if either of them is true. If no spanning trees can be constructed at the end, the bus is regarded as infeasible. To solve the second problem, we will incorporate the bus feasibility check to be described in section E.
D. Formation of Bus Components
There are two possible types of edges in the spanning tree T j . Those coming from G v are vertical edges while those coming from G h are horizontal edges. We will group those adjacent tree edges of the same kind to form one bus component. This grouping is performed until every tree edge is contained in one and only one component. Finally, the adjacent vertical edges will form a vertical bus component and the adjacent horizontal edges will form a horizontal bus component.
E. Bus Feasiblity Check
In the modified Kruskal's algorithm discussed in section C, in fact, the positive cycle detection can be performed for each edge found during the spanning tree construction, However, the run time will be very expensive in that case. Therefore, in practice, we will perform the detection only after the whole spanning tree and all bus components of a bus is found.
For each bus with all its bus components found, a set of constraint edges will be added as discussed in section III. These edges together with the dummy modules (one for each component) will be added to the reduced graphs G h and G v . By using the Bellman-ford algorithm, positive cycles in either G h or G v can be detected. The bus is regarded as infeasible if positive cycles exist. Otherwise, we will keep the constraint edges and the dummy modules in G h and G v and also copy them to G h and G v . The total number of dummy modules in the constraint graphs will be equal to the total number of bus components among all the feasible buses.
F. Overlap Removal
To prevent overlapping between two bus components which do not have a natural ordering, an explicit ordering (and thus additional constraint edges) will be needed if they may overlap as discussed in section IV. In fact, there may be more than one feasible orderings for a set of bus components, and we can consider exhausting all cases to find the best one. However, as the width of a bus is relatively small compared with those of the modules, the ordering has little effect on the bus feasibility. Therefore, we will just choose one ordering arbitrarily in our floorplanner.
G. Floorplan Realization
After adding all the constraint edges for the buses, the resultant floorplan can be obtained by performing a single source longest path algorithm on the constraint graphs. Besides finding the coordinates of the modules, the y-coordinate of a horizontal bus component and the x-coordinate of a vertical bus component can be obtained from the longest path distances of the dummy nodes in G v and G h respectively.
H. Simulated Annealing
Simulated annealing (SA) is used as the basic searching engine in our floorplanner.
2C-3

H.1 Set of Moves
There are four kinds of operations to perturb a TCG: (1) swap two nodes in both of G h and G v , (2) exchange a module's height and width, (3) Reverse a reduction edge in G h or G v , and (4) move a reduction edge from one TCG (G h or G v ) to the other.
H.2 Cost Function
The objective of the BDF problem is (1) to minimize the area of the floorplan, (2) to minimize the total bus area, and (3) to accommodate all the buses, so the cost function is defined as follow:
where A is the chip area, B is the total bus area and I is the number of infeasible buses, and α, β, γ and δ are parameters that can be specified by the users. The parameter δ is a threshold for the bus cost which allows the floorplanner to give solutions with smaller dead space percentage. If the bus cost is smaller than this threshold, it is not added to the total cost.
H.3 Speedup of the Annealing Process
Bus assignment is the most time-consuming step in our floorplanner. In order to reduce run time, we will estimate the cost in each annealing iteration before invoking the bus assignment step. To estimate the cost, we will first compute the chip area A and compare it with the cost of the previous BDF solution (C). If A < C, we will continue with the bus assignment. Otherwise, we will continue with a probability e , where T is the current temperature. By adding this simple computation, many poor BDF solutions can be pruned at an early stage. This improvement can reduce over 70% of the run time for our floorplanner and its effectiveness can be seen from the experimental results.
I. Soft Module Adjustment
We will adjust the dimensions of the soft modules in a postprocessing step. This soft block adjustment step is also done by simulated annealing with the same cost function. In each iteration of the annealing process, a module lying on a critical path will be selected, and either its width or height will be changed a little bit. However, if some originally feasible buses become invalid after this adjustment, the candidate solution will be rejected.
VI. EXPERIMENTAL RESULTS
We compare our results for a data set which previouly used in [1] (Details can be found in [1] ). Since there is no experimental results for hard modules given in [1] , we only compare the results after the soft module adjustment. Experimental results show that our approach can reduce the dead spaces by 22.62% on average (Table VI) .
In order to have a better comparison including run time with the approach presented in [1] and to demonstrate the advantage of our algorithm that favors test cases with large bus nets, we have created another set of test cases based on the ami33 benchmarks. Test cases from ami33-a to ami33-e are explicitly created to have a gradual increase in the average net size. Details are shown in table II. Our proposed algorithm was implemented using the C language. All test cases are run by both our floorplanner and that in [1] on the same machine, Dell Optiplex 280 Intel P4 (3.2GHz) with 2GB memory. We run each test case for ten times and then record the average. The results are shown in table III. When the bus net size increases, experiments show that both the run time and dead space percentage of our floorplanner will increase. However, comparing with [1] , our algorithm still perform better in run time by 32.07% and in dead space percentage with and without soft module adjustment by 11.17% and 21.34% respectively. More importantly, note that the approach in [1] is not able to generate any feasible solutions when the bus net size increases further. For ami33-e, only one out of the ten annealing processes generates a feasible final floorplan. For ami33-f, none of the ten resulting floorplans is feasible.
More details of the experiments for data set are reported in table VI, which may give more insights to our approach. The increase in bus flexibility (without restricting the number of bendings) has increased the percentage of feasible candidate BDF solutions in the annealing process and this is one major reason of why our algorithm can generate solutions with higher quality. For run time, there are three factors contributing to the reduction. Firstly, there is no more complex shape validation steps; Secondly, our searching can find feasible solutions with less iterations because of the relaxed restriction on bus shapes. Lastly, the speedup step as discussed in section H.3 can significantly reduce the run time by skipping 88% of the iterations on average.
Finally, we also derive a new set of test cases from ami49 with bus net sizes ranging from 10 to 49. To the best of our knowledge, no previous approaches can handle a buses with 2C-3 such a large net size. The results are shown in table V. Our approach can generate floorplan solutions quickly, even for the test case with largest net size (ami49-e), the total run time is still less than one minute. The resulting floorplans generated have small dead space percentage both before and after the soft module adjustment step. 
