We present in this paper a multilevel floorplanning/placement framework based on the B*-tree representation, called MB*-tree, to handle the floorplanning and packing for large-scale building modules. The MB*-tree adopts a two-stage technique, clustering followed by declustering. The clustering stage iteratively groups a set of modules based on a cost metric guided by area utilization and module connectivity, and at the same time establishes the geometric relations for the newly clustered modules by constructing a corresponding B*-tree for them. The declustering stage iteratively ungroups a set of the previously clustered modules (i.e., perform tree expansion) and then refines the floorplanning/placement solution by using a simulated annealing scheme. In particular, the MB*-tree preserves the geometric relations among modules during declustering, which makes the MB*-tree an ideal data structure for the multilevel floorplanning/placement framework. Experimental results show that the MB*-tree obtains significantly better silicon area and wirelength than previous works. Further, unlike previous works, MB*-tree scales very well as the circuit size increases.
Introduction
Design complexities are growing at a breathtaking speed with the continued improvement of the nanometer IC technologies. On one hand, designs with hundreds of million transistors are already in production, IP modules are widely reused, and a large number of buffer blocks are used for delay optimization as well as noise reduction in very deep-submicron interconnect-driven floorplanning [1, 7, 11, 13, 21] , which all drive the need of a tool to handle large-scale building modules. On the other hand, the highly competitive IC market requires faster design convergence, faster incremental design turnaround, and better silicon area utilization. Efficient and effective design methodology and tools capable of placing and optimizing large-scale modules are essential for such large designs.
Many floorplan representations have been proposed [5, 9, 14, 15, 16, 18, 19, 20, 22] in the literature. However, traditional floorplanning/placement algorithms do not scale well as the design size, complexity, and constraints increase, mainly due to their inflexibility in handling non-slicing floorplans, and/or intrinsically non-hierarchical data structures (representations). The B*-tree, in contrast, has been shown an efficient, effective, and flexible data structure for non-slicing floorplans [5] . It is particularly suitable for representing a nonslicing floorplan with large-scale modules and for creating or incrementally updating a floorplan. What is more important, its binary-tree based structure directly corresponds to the framework of a hierarchical, divide-and-conquer scheme, and thus the properties inherited from the structure can substantially facilitate the operations for multilevel large-scale building module floorplanning/placement.
Based on the B*-tree representation, we present in this paper a multilevel floorplanning/placement framework, called MB*-tree, to handle the floorplanning and packing for large-scale building modules with high efficiency and quality. MB*-tree is inspired by the success of the multilevel framework in graph/circuit partitioning such as Chaco [10] , hMetis [12] , and ML [2] , placement such as MPL [4] , and routing such as MRS [6] , MR [17] , and MARS [8] .
£ This work was supported in part by the National Science Council of Taiwan by Grant No. NSC-91-2215-E-002-038. E-mails: hclee@synopsys.com; ywchang@cc.ee.ntu.edu.tw; gis89530@cis.nctu.edu.tw; hyang@ichips.intel.com.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Unlike multilevel partitioners and placers, however, multilevel floorplanning poses unique difficulties as the shapes of modules to be clustered together can significantly affect the area utilization of a floorplan, and a floorplan design within a cluster needs to be explored along with the global floorplan optimization. The clustering approach also helps to directly address floorplan congestion and timing issues, since different clustering algorithms can be developed to localize inter-module communication and reduce critical path length.
The MB*-tree algorithm adopts a two-stage technique, clustering followed by declustering. The clustering stage iteratively groups a set of modules (could be basic modules and/or previously clustered modules) based on a cost metric guided by area utilization and module connectivity, and at the same time establishes the geometric relations for the newly clustered modules by constructing a corresponding B*-tree. The clustering procedure repeats until a single cluster containing all modules is formed, denoted by a one-node B*-tree that bookkeeps the entire multilevel clustering information. For soft modules, we apply Lagrangian relaxation during clustering to determine the module shapes. Then, the declustering stage iteratively ungroups a set of the previously clustered modules (i.e., expanding a node into a subtree according to the B*-tree topology constructed at the clustering stage), and then apply simulated annealing to refine the floorplanning/placement solution based on a cost metric defined by area utilization and wirelength. The refinement shall lead to a "better" B*-tree structure that guides the declustering at the next level. It is important to note that we always keep only one B*-tree for processing at each iteration, and the MB*-tree preserves the geometric relations among modules during declustering (i.e., the tree expansion), which makes the MB*-tree an ideal data structure for the multilevel floorplanning/placement framework.
Experimental results show that the MB*-tree scales very well as the circuit size increases while the famous previous works, sequence pair, O-tree, and B*-tree alone, do not. For circuit sizes ranging from 49 to 9,800 modules and from 408 to 81,600 nets, the MB*-tree consistently obtains high-quality floorplans with dead spaces of less than 3.7% in empirically linear runtime, while sequence pair, O-tree, and B*-tree can handle only up to 196, 196, and 1,960 modules in the same amount of runtime and result in the dead spaces of as large as 13.00% (@ 196 modules), 9.86% (@ 196 modules), and 27.33% (@ 1960 modules), respectively. We also performed experiments based on a large industrial design with 189 modules and 9777 nets. The results show that our MB*-tree algorithm obtained significantly better silicon area and wirelength than previous works.
The remainder of this paper is organized as follows. Section 2 formulates the module floorplanning/placement problem. Section 3 gives a brief overview on the B*-tree representation. Section 4 presents our two-stage algorithm, clustering followed by declustering. Section 5 presents our approach for handling soft modules. Section 6 gives the experimental results.
Problem Formulation
Let Å Ñ ½ Ñ ¾ ÑÒ be a set of Ò rectangular modules. Each module Ñ ¾ Å is associated with a three tuple´ Û µ, where , Û , and denote the width, height, and aspect ratio of Ñ , respectively. The area of Ñ is given by Û , and the aspect ratio of Ñ is given by Û , Let Ö Ñ Ò and Ö Ñ Ü be the minimum and maximum aspect ratios, i.e., Û ¾ 3 Review of the B*-tree Representation Figure 1(a) .) A B*-tree is an ordered binary tree (a restriction of O-tree with faster and more flexible operations) whose root corresponds to the module on the bottom-left corner. Using the depth-first search (DFS) procedure, the B*-tree Ì for an admissible placement È can be constructed in a recursive fashion.
Starting from the root, we first recursively construct the left subtree and then the right subtree. Let Ê denote the set of modules located on the right-hand side and adjacent to Ñ . The left child of the node Ò corresponds to the lowest module in Ê that is unvisited. The right child of Ò represents the lowest module located above Ñ , with its Ü-coordinate equal to that of Ñ . Inheriting from the nice properties of ordered binary trees, the B*-tree is simple, efficient, effective, and flexible for handling non-slicing floorplans. It is particularly suitable for representing a non-slicing floorplan with various types of modules and for creating or incrementally updating a floorplan. What is more important, its binary-tree based structure directly corresponds to the framework of a hierarchical scheme, which makes it a superior data structure for multilevel large-scale building module floorplanning/placement.
The MB*-tree Algorithm
In this section, we shall present our MB*-tree algorithm for multilevel large-scale building module floorplanning/placement. As mentioned earlier, the algorithm adopts a two-stage approach, clustering followed by declustering, by using the B*-tree representation.
The clustering operation results in two types of modules, namely primitive modules and cluster modules. A primitive module Ñ is a module given as an input (i.e., Ñ ¾ Å) while a cluster one is created by grouping two or more primitive modules. Each cluster module is created by a clustering scheme Ñ Ñ , where Ñ (Ñ ) denotes a primitive or a cluster module.
In the following subsections, we detail the clustering and declustering algorithms for hard modules.
Clustering
In this stage, we iteratively group a set of (primitive or cluster) modules until a single cluster is formed (or until the number of cluster modules is smaller than a threshold) based on a cost metric of area and connectivity. We shall first consider the clustering metric.
The clustering metric is defined by the two criteria: area utilization (dead space) and the connectivity density among modules.
Dead space: The area utilization for clustering two modules Ñ and Ñ can be measured by the resulting dead space × , representing the unused area after clustering Ñ and Ñ . Let ×ØÓØ denote the dead space in the final floorplan È . We have ×ØÓØ ØÓØ is equivalent to minimizing the dead space ×ØÓØ.
Connectivity density: Let the connectivity denote the number of nets between two modules Ñ and Ñ . The connectivity denisty between two (primitive or cluster) modules Ñ and Ñ is given by
where Ò (Ò ) denotes the number of primitive modules in Ñ (Ñ ).
Often a bigger cluster implies a larger number of connections. The connectivity density considers not only the connectivity but also the sizes of clusters between two modules to avoid possible biases. Obviously, the cost function of dead space is for area optimization while that of connectivity density is for timing and wiring area optimization. Therefore, the metric for clustering two (primitive or cluster) modules Ñ and Ñ , Ñ Ñ · ¼ , is then given by
where × and Ã are respective normalized costs for × and Ã , « ¬ and Ã are user-specified parameters/constants.
Based on , we cluster a set of modules into one at each iteration by applying the aforementioned methods until a single cluster containing all primitive modules is formed or the number of modules is smaller than a given threshold (and thus can be easily handled by the classical program). During clustering, we shall record how two modules Ñ and Ñ are clustered into a new cluster module Ñ . If Ñ is placed left to (below) Ñ , then Ñ is horizontally (vertically) related to Ñ , denoted by Ñ ´ µÑ . If Ñ ´ µÑ , then Ò is the left (right) child of Ò in its corresponding B*-tree. The relation for each pair of modules in a cluster is established and recorded in the corresponding B*-subtree during clustering. It will be used for determining how to expand a node into a corresponding B*-subtree during declustering. 
Declustering
We shall first introduce the metric used in simulated annealing for refining floorplan/placement solutions. The declustering metric is defined by the two creiteria: area utilization (dead space) and the wirelength among modules.
Dead space: Same as that defined in Section 4.1. Wire length: The wirelength of a net is measured by half the bounding box of all the pins of the net, or by the length of the center-to-center interconnections between the modules if no pin positions are specified.
The wirelength for clustering two modules Ñ and Ñ , Û , is measured by the total wirelength interconnecting the two modules. The total wirelength in the final floorplan È , ÛØÓØ, is the summation of the length of the wires interconnecting all modules.
Obviously, the cost function of dead space is for area optimization while that of wirelength is for timing and wiring area optimization. Therefore, the metric for refining a floorplan solution during declustering,
where × and Û are respective normalized costs for × and Û , and and AE are user-specified parameters.
Algorithm: Declustering(Ñ , Ñ , Ñ ) Input: Ñ -the cluster module; Ñ , Ñ -two modules with Ñ right to or below Ñ ; The declustering stage iteratively ungroups a set of previously clustered modules (i.e., expand a node into a subtree according to the B*-tree constructed at the clustering stage) and then refines the floorplan solution based on simulated annealing. 
Simulated Annealing
We proposed a simulated annealing based algorithm to refine the solution at each level of declustering. We apply the following three operations to perturb a multilevel B*-tree (a feasible solution) to another.
Op1: Rotate a module. Op2: Move a module to another place. Op3: Swap two modules. Op1 exchanges the width and height of a module. Op2 deletes a node of a B*-tree and inserts it into another position. Op3 deletes two nodes and inserts them into the corresponding positions in the B*-tree. Obviously, Op2 and Op3 need to perform the deletion and insertion operations on a B*-tree, which takes Ç´ µ time, where is the height of the B*-tree..
The simulated annealing algorithm starts by a B*-tree produced during declustering. Then it perturbs a B*-tree (a feasible solution) to another B*-tree by Op1, Op2, and/or Op3 until a predefined "frozen" state is reached. At last, we transform the resulting B*-tree to the corresponding final admissible placement.
The Overall MB*-tree Algorithm
The MB*-tree algorithm integrates the aforementioned three algorithms. We first perform clustering to reduce the problem size level by level and then enter the declustering stage. In the declustering stage, we perform floorplanning for the modules at each level using the simulated annealing based algorithm B*-tree SA. Figure 4 illustrates an execution of the MB*-tree algorithm. For explanation, we cluster three modules each time in Figure 4 . Figure 4 (a) lists seven modules to be packed, Ñ 's, ½ . Figures 4(b)-(d) illustrate the execution of the clustering algorithm. Figures 4(b) shows the resulting configuration after clustering Ñ , Ñ , and Ñ into a new cluster module Ñ (i.e., the clustering scheme of Ñ is Ñ Ñ Ñ ). Similarly, we cluster Ñ ½ , Ñ ¾ , and Ñ into Ñ by using the clustering scheme Ñ ¾ , Ñ , Ñ ½ . Finally, we cluster Ñ ¿ , Ñ , and Ñ into Ñ ½¼ by using the clustering scheme Ñ ¿ , Ñ , Ñ . The clustering stage is thus done, and the declustering stage begins, in which simulated annealing is applied to do the floorplanning. In Figure 4 (e), we first decluster Ñ ½¼ into Ñ ¿ Ñ , and Ñ (i.e., expand the node Ò ½¼ into the B*-subtree illustrated in Figure 4(e) ). We then move Ñ to the top of Ñ (perform Op2 for Ñ ) during simulated annealing (see Figure 4(f) ). As shown in Figure 4 (g), we further decluster Ñ into Ñ ½ Ñ ¾ , and Ñ , and then rotate Ñ ¾ and move Ñ ¿ on top of Ñ ¾ (perform Op1 on Ñ ¾ and Op2 on Ñ ¿ ), resulting in the configuration shown in Figure 4 (h). Finally, we decluster Ñ shown in Figure 4 (i) to Ñ , Ñ , and Ñ , and move Ñ to the right of Ñ ¿ (perform Op2 for Ñ ), which results in the optimum placement shown in Figure 4 (j).
Handling Soft Modules
In this section, we present our approach for handling soft modules. We first apply Lagrangian relaxation [23] to cluster soft modules at the clustering stage while keeping declustering the same as before. We then propose a network-flow based algorithm for projecting Lagrange multipliers to satisfy their optimality conditions. 
Formulation

Lagrangian Relaxation
Then, the Lagrangian relaxation subproblem associated with the multiplier È= ( , , , , Ö, ×), denoted by ÄÊË ´ Èµ, can be defined as follows: 
Since CS can be transformed into a convex problem, we can apply Theorem 6.2.4 of [3] . This implies that if È is an optimal solution to Ä È , the optimal solution of ÄÊË ´ Èµ will also optimize CS. 
Consider the Lagrangian of CS
Solving ÄÊË ´ Èµ and Ä È
Let ª denote the set of multipliers È satisfying Equations (6) and (7) . We now consider solving the Lagrangian relaxation subproblem ÄÊË ´ Èµ for a given È ¾ ª, i.e. computing the dimension and coordinate of each module.
First, we partially differentiate with respect to Û to get an optimal value of Û such that is minimized. Since the dimension of each primitive module (ÛÔ and Ô) has been determined, the dimension of each cluster module (Û and ) can be computed by applying a longest path algorithm in and Ú . Then, we consider partial differentiation of with respect to Ü and Ý , giving the optimality conditions
where denotes the number of the pins of .
In Equation (8), there are Ò equations with Ò variables. Thus, we can apply the Gaussian elimination to solve these Ò equations with Ò variables to get the optimal value of Ü . In these Ò equations, all coefficients of variables depend only on the net information (i.e., ). Since the net information is the same through the entire process, each variable can be solved by the same process. Hence, we can record the process of solving each variable during the first iteration (which takes cubic time), and then each subsequent computation will take only quadratic time by applying the same process. Similarly, we can compute the optimal value of Ý .
Next, we use a subgradient optimization method to search for the optimal È. Let È be a multiplier at step . We move È to a new multiplier È ¼ based on the subgradient direction: 
Projecting Lagrange Multipiers
We present a network flow based algorithm to check whether È belongs to ª and to project È to È £¾ª, if È ¾ ª. Further, an increamental update technique is employed to make the maximum flow computation more efficient. 
Theorem 3 È £¾ª.
The projection process greatly affects the efficiency of the entire optimization, since there may be Ç´Ò ¾ µ edges in the worst case. Thus, we employ an incremental flow update technique to speed up the max-flow computation after updating È and its corresponding capacity. Note that, for efficiency consideration, we may perform Lagrangian relaxation only at the higher levels of the multilevel framework (when the number of modules become small enough for Lagrangian relaxation). To do so, however, we still need to pass the information of the aspect ratio for each soft module level by level.
Algorithm: IncreamentalUpate(AE × Ø) Input: AE-the flow network; ×-the source of AE; Ø-the sink of AE; 
Experimental Results
We implemented the MB*-tree algorithm for hard modules in the C++ language on a 450 MHz SUN Ultra 60 workstation with 2 GB memory. The package is available at http://cc.ee.ntu.edu.tw/ ywchang/research.html.
Columns 1, 2, 3, and 4 of Table 1 lists the names of the benchmark circuits, the number of modules, the number of nets, and the total area of modules in the circuits, respectively. ami49 is the largest MCNC benchmark circuit used in the previous works [5, 9] for comparative study. To test the capability of existing methods, we created ten synthetic circuits, named ex ami49 x, by duplicating the modules and nets of ami49 by Ü times. The largest circuit ex ami49 200 contains 9,800 modules and 81,600 nets. Table 1 also shows the results for ex ami49 x by optimizing area alone ( ½ ¼ and AE ¼ ¼). Columns 5, 6 , and 7 give the resulting area, the dead space, and the runtime for our MB*-tree, respectively. The remaining columns list the results for the famous previous works, sequence pair [18] , O-tree [9] , and B*-tree [5] . Note that the B*-tree package we used here is the September 2000 version, B*-tree-v1.0, available also at http://cc.ee.ntu.edu.tw/ ywchang/research.html. It runs 50X-100X faster and achieves better area utilization than the B*-tree package reported in [5] . As shown in the table, our MB*-tree algorithm obtained a dead space of only 2.78% for ami49 in only 0.4 min runtime while B*-tree-v1.0 reported a dead space of 3.53% using 0.25 min runtime. Further, the experimental results for larger circuits show that the MB*-tree scales very well as the circuit size increases while the previous works, sequence pair, O-tree, and B*-tree, do not. For circuit sizes ranging from 49 to 9,800 modules and from 408 to 81,600 nets, the MB*-tree consistently obtains high-quality floorplans with dead spaces of less than 3.72% in empirically linear runtime, while sequence pair, O-tree, and B*-tree can handle only up to 196, 98, and 1,960 modules in the same amount of time and result in dead spaces of as large as 13.00% (@ 196 modules), 12.29% (@ 98 modules), and 27.33% (@ 1960 modules), respectively. As shown in Table 1 , the resulting dead spaces for the MB*-tree is almost independent of the circuit sizes, which proves the high scalability of the MB*-tree. In contrast, the dead spaces for the non-hierarchical previous works all grow dramatically as the circuit size increases. In particular, the empirical runtime of the MB*-tree approaches linear in the circuit size while the other previous works cannot handle large-scale designs. Figure 7 shows the layout for the largest circuit ex ami49 200 obtained by MB*-tree in 256 min CPU time. It has a dead space of only 3.44%. Note that this circuit is not feasible to the previous works [5, 9, 18] . Table 2 shows the comparisons for area optimization alone ( ½ ¼, AE ¼ ¼), wirelength optimization alone ( ¼ ¼, AE ½ ¼), and simultaneous area and wirelength optimization ( ¼ , AE ¼ ) among sequence pair (SP), B*-tree, and MB*-tree based on the circuit industry (whose total area = 658.04 ÑÑ ¾ ). The circuit industry is a 0.18 m, 1 GHz industrial design with 189 modules, 20 million gates, and 9,777 center-to-center interconnections. It is a large chip design and consists of three "tough" modules with aspect ratios greater than 19 (and as large as 36). (Note that we do not have the results for O-tree for this experiment because the data industry cannot be fed into the O-tree package.) In each entry of the table, we list the best/average values obtained in ten runs of simulated annealing, using a random seed for each run. For the column "Time," we report the runtime for obtaining the best value and the average runtime of the ten runs. As shown in the table, our MB*-tree algorithm obtained significantly better silicon area and wirelength than sequence pair and B*-tree in all tests. For area optimization, MB*-tree can obtain a dead space of only 2.11% while sequence pair (B*-tree) results in a dead space of at least 28.1% (12.9%). For wirelength optimization, MB*-tree can obtain a total wirelength of only 56631 ÑÑ while sequence pair (B*-tree) requires a total wirelength of at least 81344 ÑÑ (113216 ÑÑ). For simultaneous area and wirelength optimization, MB*-tree also obtains the best area and wirelength. The results show the effectiveness of our MB*-tree algorithm. For the runtimes, MB*-tree is larger than B*-tree and SP for wirelength optimization. (For area optimization, MB*-tree runs faster than SP.) This is reasonable because it took much longer to obtain significantly better results and the multulevel process incurred some overhead. Nevertheless, as shown in Table 1 , both SP and B*-tree do not scale well to the instances with a large number of modules (and thus their runtimes increase dramatically when the number of modules grows into hundreds). The resulting layout of industry for simultaneous area and wirelength optimization using MB*-tree is shown in MB*-tree Sequence Pair [18] O-tree Table 2 : Comparisons for area optimization alone, wirelength optimization alone, and simultaneous area and wirelength optimization among sequence pair (SP), B*-tree, and MB*-tree based on the circuit industry. In each entry, both the best/average values obtained in ten runs of simulated annealing are reported. The last two rows give the ratios of the results (SP to MB*-tree and B*-tree to MB*-tree). 
Concluding Remarks
We have presented the MB*-tree based multilevel framework to handle the floorplanning and packing for large-scale modules. Experimental results have shown that the MB*-tree scales very well as the circuit size increases. The capability of the MB*-tree shows its promise in handling large-scale designs with complex constraints. We propose to explore the floorplanning/placement problem with large-scale rectilinear and mixed sized modules/cells as well as buffer-block planning for interconnect-driven floorplanning in the future.
