We present in this paper a multilevel floorplanninglplacement framework based on the B*-tree representation, called MB*-tree, to handle the floorplanning and packing for large-scale building modules. The MB*-tree adopts a two-stage technique, clustering followed by declustering. The clustering stage iteratively groups a set of modules based on a cost metric guided by area utilization and module connectivity, and at the same time establishes the geometric relations for the newly clustered modules by constructing a corresponding B*-tree for them. The declustering stage iteratively ungroups a set of the previously clustered modules (i.e., perform tree expansion) and then refines the floorplanning/placement solution by using a simulated annealing scheme. In particular, the MB*-tree preserves the geometric relations among modules during declustering, which makes the MB*-tree an ideal data structure for the multilevel floorplanning/placement framework. Experimental results show that the MB*-tree obtains significantly better silicon area and wirelength than previous works. Further, unlike previous works, MB*-tree scales very well as the circuit size increases.
Introduction
Design complexities are growing at a breathtaking speed with the continued improvement of the nanometer IC technologies. On one hand, designs with hundreds of million transistors are already in production, IP modules are widely reused, and a large number of buffer blocks are used for delay optimization as well as noise reduction in very deep-submicron interconnect-driven floorplanning [l, 7, 11, 13, 211 , which all drive the need of a tool to handle large-scale building modules. On the other hand, the highly competitive IC market requires faster design convergence, faster incremental design turnaround, and better silicon area utilization. Efficient and effective design methodology and tools capable of placing and optimizing large-scale modules are essential for such large designs.
Many floorplan representations have been proposed [5, 9, 14, 15, 16, 18, 19, 20, 22] in the literature. However, traditional floorplanning/placement algorithms do not scale well as the design size, complexity, and constraints increase, mainly due to their inflexibility in handling non-slicing floorplans, and/or intrinsically non-hierarchical data structures (representations). The B*-tree, in contrast, has been shown an efficient, effective, and flexible data structure for non-slicing floorplans [5] . It is particularly suitable for representing a nonslicing floorplan with large-scale modules and for creating or incrementally updating a floorplan. What is more important, its binary-tree based structure directly corresponds to the framework of a hierarchical, divide-and-conquer scheme, and thus the properties inherited from the structure can substantially facilitate the operations for multilevel large-scale building module floorplanning/placement.
Based on the B*-tree representation, we present in this paper a multilevel floorplanning/placement framework, called MB*-tree, to handle the floorplanning and packing for large-scale building modules with high efficiency and quality. MB*-tree is inspired by the success of the multilevel framework in graphkircuit partitioning such as Chaco [lo], hh4etis [12] , and ML [2] , placement such as MPL [4] , and routing such as MRS [6] , MR [17] , and MARS [SI.
*This work was supported in part by the National Science Council of Taiwan by Grant No. NSC-91-2215-E,002-038. E-mails: hclee@synopsys.com; ywchang@cc.ee.ntu.edu.tw; gis89530@cis.nctu.edu.tw; hyang@ichips.intel.com.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 2003, June 2 4 , 2 0 0 3 , Anaheim, Califomia USA.
Copyright 2003 ACM 1-581 13-688-9/03/0006 ... $5.00
Unlike multilevel partitioners and placers, however, multilevel floorplanning poses unique difficulties as the shapes of modules to be clustered together can significantly affect the area utilization of a floorplan, and a floorplan design within a cluster needs to be explored along with the global floorplan optimization. The clustering approach also helps to directly address floorplan congestion and timing issues, since different clustering algorithms can be developed to localize inter-module communication and reduce critical path length.
The MB*-tree algorithm adopts a two-stage technique, clustering followed by declustering. The clustering stage iteratively groups a set of modules (could be basic modules and/or previously clustered modules) based on a cost metric guided by area utilization and module connectivity, and at the same time establishes the geometric relations for the newly clustered modules by constructing a corresponding B*-tree. The clustering procedure repeats until a single cluster containing all modules is formed, denoted by a one-node B*-tree that bookkeeps the entire multilevel clustering information. For soft modules, we apply Lagrangian relaxation during clustering to determine the module shapes. Then, the declustering stage iteratively ungroups a set of the previously clustered modules (i.e., expanding a node into a subtree according to the B*-tree topology constructed at the clustering stage), and then apply simulated annealing to refine the floorplanning/placement solution based on a cost metric defined by area utilization and wirelength. The refinement shall lead to a "better" B*-tree structure that guides the declustering at the next level. It is important to note that we always keep only one B*-tree for processing at each iteration, and the MB*-tree preserves the geometric relations among modules during declustering (i.e., the tree expansion), which makes the MB*-tree an ideal data structure for the multilevel floorplanning/placement framework.
Experimental results show that the MB*-tree scales very well as the circuit size increases while the famous previous works, sequence pair, 0-tree, and B*-tree alone, do not. For circuit sizes ranging from 49 to 9,800 modules and from 408 to 81,600 nets, the MB*-tree consistently obtains high-quality floorplans with dead spaces of less than 3.7% in empirically linear runtime, while sequence pair, 0-tree, and B*-tree can handle only up to 196, 196 , and 1,960 modules in the same amount of runtime and result in the dead spaces of as large as 13.00% (@ 196 modules), 9.86% (@ 196 modules), and 27.33% (@ 1960 modules), respectively. We also performed experiments based on a large industrial design with 189 modules and 9777 nets. The results show that our MB*-tree algorithm obtained significantly better silicon area and wirelength than previous works.
The remainder of this paper is organized as follows. Section 2 formulates the module floorplanning/placement problem. Section 3 gives a brief overview on the B*-tree representation. Section 4 presents our two-stage algorithm, clustering followed by declustering. Section 5 presents our approach for handling soft modules. Section 6 gives the experimental results.
Problem Formulation
Let M = {ml,mz, ..., m,} be a set of n rectangular modules. Each module mi E M is associated with a three tuple (h, , wi , ai), where hi, wi, and ai denote the width, height, and aspect ratio of mi, respectively. The area Ai of mi is given by hi wi , and the aspect ratio of mi is given by hi /wi , Let ~i ,~i~ and ~i ,~~~ be the minimum and maximum aspect ratios, i.e., k / w i E A placement (floorplan) P = {(zi,yi)lmi E M } is an assignment of rectangular modules mi's with the coordinates of their bottomleft comers being assigned to (q, yi)'s so that no two modules overlap (and hi/wi E [~i ,~i~, ~i ,~~~] , t l i ) .
We consider in this paper both hard and soft modules. A hard module is not flexible in its shape but free to rotate. A soft module is free to rotate and change its shape within the range [~,~i~, ~i ,~~~] . The objective of placement/floorplanning is to minimize a specified cost metric such as a combination of the area Aot and wirelength Wtot induced by the assignment of mi's, where Atot is measured by the final enclosing rectangle of P and Wtot the summation of half the bounding box of pins for each net.
Review of the B*-tree Representation
Given a compacted placement P that can neither move down nor move left (called an admissible placement l(a) .) A B*-tree is an ordered binary tree (a restriction of 0-tree with q5 : {mi, m j } 4 R+ U {0}, is then given by faster and more flexible operations) whose root corresponds to the module on the bottom-left comer. Using the depth-first search (DFS) procedure, the B*-tree T for an admissible placement P can be constructed in a recursive fashion.
Starting from the root, we first recursively construct the left subtree and then the right subtree. Let R, denote the set of modules located on the right-hand side where S i j and is the left (right) child of ni in its corresponding B*-tree. The relation for each pair of modules in a cluster is established and recorded in the corresponding B*-subtree during clustering. It will be used for determining how to expand a node into a corresponding B*-subtree during declustering. of ni, module m, must be located above module mi, with the z-coordinate of corresponding B*-subtree for the case where mj is vertically related to 9, mj equal to that of W ; i.e., xj = xi. Also, since the root of T represents the bottom-left module, the coordinate of the module is (Goat, y,,,t) 
Inheriting from the nice properties of ordered binary trees, the B*-tree is simple, efficient, effective, and flexible for handling non-slicing floorplanS. It is p~c u~~~y suitable for representing a non-slicing floorplan with tvoes of modules and for creatine or incrementallv udating a f l o~~~l a n .
What creiteria: area utilization (dead space) and the wirelength among modules.
veclusLerlll~ We shall first introduce the metric used in simulated annealing for refining floorpladplacement solutions. The declustering metric is defined by the two ii'more important, its binary-& based structure dhxtlycorresponds to the framework of a hierarchical scheme, which makes it a superior data structure for multilevel large-scale building module floorplanning/placement. 
The MB*-tree Algorithm
The wirelength for clustering two modules m . i and m j , wij, is meaIn this section, we shall present our MB*-tree algorithm for multilevel sured by the total wirelength interconnecting the two modules. The large-scale building module floorplanning/placement. As mentioned earlier, total wirelength in the final floorplan P , wet, is the summation of the the algorithm adopts a two-stage approach, clustering followed by declusterlength of the wires interconnecting all modules. ing, by using the B*-tree representation.
Obviously, the COSt function of dead space is for area Optimization modules and cluster is a module given as while that of wirelength is for timing and wiring area optimization. Therean input (i.e., m € M ) while a cluster one is created by grouping two or fore, the metric for refining a floorplan solution during declustering, & :
more primitive modules. Each cluster module is created by a clustering scheme {mi, m i } 4 T?+ U { 0}, is then given by
The clustering operation results in two types of modules, namely primitive A primitive module
{mi,-mj}, where mi ( m J ) denotes aprimitive or a cluster module.
gorithms for hard modules.
In the following subsections, we detail the clustering and declustering alwhere s^ij and Gij are respective normalized costs for &, and wij, and y and 4.1 Clustering b are user-specified parameters.
In this stage, we iteratively group a set of (primitive or cluster) modules until a single cluster is formed (or until the number of cluster modules is smaller than a threshold) based on a cost metric of area and connectivity. We shall first consider the clustering metric.
The clustering metric is defined by the two criteria: area utilization (dead space) and the connectivizy densiw among modules.
Dead space: The area utilization for clustering two modules q and mj can be measured by the resulting dead space q j , representing the unused area after clustering mj and mj. Let stat denote the dead space in the final floorplan P . We have stat = AtotAi, where Ai denotes the area of module mi and Atot the area of the final enclosing rectangle of P. Since E,; E M A; is a constant, minimizing Atot is equivalent to minimizing the dead space sot. 
where ni ( n j ) denotes the number of primitive modules in mi ( m j ) . Often a bigger cluster implies a larger number of connections. The connectivity density considers not only the connectivity but also the sizes of clusters between two modules to avoid possible biases. Obviously, the cost function of dead space is for area optimization while that of connectivity density is for timing and wiring area optimization. Therefore, the metric for clustering two (primitive or cluster) modules % and mj ,
813
Algorithm: Declustering(mk , mi , m j ) Input: mk-the cluster module; 1 p a r e n t ( n i ) t p a r e n t ( n k ) ; 
r z g h t ( n ; ) t r i g h t ( n k ) ; 9 10 p a r e n t ( r i g h t ( n k ) ) t ni;

l e f t ( n -) c l e f t ( n s ) ; 12 13
p a r e n t ( l e f t ( n k ) ) t nj; mi, ",-two modules with mi right to or below mj;
f t ( n ; f c n j ; p a r e n t ( n j ) t ni; r i g h t ( n j ) t N I L ;
p a r e n t ( l e f t ( n k ) ) c na; 
The Overall MB*-tree Algorithm
The MB*-tree algorithm integrates the aforementioned three algorithms. We first perform clustering to reduce the problem size level by level and then enter the declustering stage. In the declustering stage, we perform floorplanning for the modules at each level using the simulated annealing based algorithm B*-tree-SA. Figure 4 illustrates an execution of the MB*-tree algorithm. For explanation, we cluster three modules each time in Figure 4 . Figures 4(b) shows the resulting configuration after clustering m5, m6, and m7 into a new cluster module n~s (i.e., the clustering scheme of m g is { {ms, me}, m7}). Similarly, we cluster ml, m 2 , and m4 into mg by using the clustering scheme { { m~, m4}, ml}. Finally, we cluster mg, m g , and m g into mlo by using the clustering scheme { { m g , m g } , mg}. The clustering stage is thus done, and the declustering stage begins, in which simulated annealing is applied to do the floorplanning. In Figure 4 (e), we first decluster mlo into m3, mg, and m g (is., expand the node n l o into the B*-subtree illustrated in Figure 4 (e)). We then move mg to the top of rng (perform Op2 for ma) during simulated annealing (see Figure 4(f) ). As shown in Figure 4 (g), we further decluster mg into ml , m2, and m4, and then rotate m2 and move mg on top of m2 (perform Opl on m2 and Op2 on mg). resulting in the configuration shown in Figure 4 (h). Finally, we decluster % shown in Figure 4 (i) to m5, me. and m7. and move m4 to the right of m3 (perform Op2 for m4), which results in the optimum placement shown in Figure 46 ).
Handling Soft Modules
In this section, we present our approach for handling soft modules. We first apply Lagrangian relaxation [23] to cluster soft modules at the clustering stage while keeping declustering the same as before. We then propose a network-flow based algorithm for projecting Lagrange multipliers to satisfy their optimality conditions.
Formulation
Let M = {ml,m2, ... ,mn} be a set of n primitive soft modules. Each primitive soft module rn. i E M is associated with a three tuple (h,wi, ai) , where hi, wi, and ai denote the width, height, and aspect ratio of mi, respectively. The area Ai of mi is given by hiwi, and the aspect ratio & of mi is given by hi/wi E [ri,min,ri,maz] . Let Li = J G and Without loss of generality, we make m, j right to or below mj . In Algorithm
For each m~ which is Placed at the left boundary (bottom boundary),
Declustering (see Figure 3) , parent(%), right(ni), and l e f t ( n i ) denote the parent, the right child, and the left child of node Q in a B*-tree, respectively. For each mp which is placed at the right boundary (top boundary), Line 1 updates the parent of n k as that of ni. Lines 2-5 make ni a left (right) child if n b is a left (right) child. Lines 6 1 3 deal with the case where m; is horizontally related to m, . If mi + mj, then nj is the left child of nj and If x~ 4-w~ 5 % 9 Ve(P, 9 ) E Ghc and YP 4-2 5 Yq 7 ve(P, Q ) E Gvc thus we update the corresponding links in Line 7. Lines 8-10 (11-13) update are satisfied, the relations of the modules in M, will not be violated. 
Simulated Annealing
ules. For each mi. E M i , (2%. , yt) denote the coordinate of its bottom-left convenience, we additionally create two variables, Gccl and Y;;+~, which denote the estimated height and width of the chip at level i, respectively. Thus, the estimated area of the chip at level i equals gc +1 yki + l . To estimate wirelength, we adopt the quadratic of the length of the complete graph of pins in Opl exchanges the width and height of a module. OP2 deletes a node of a a net, and take the center of a module as the location of a pin, if the pins are B*-tree and inserts it into another position. Op3 deletes two nodes and inserts not assigned during floorplanning, Let denote the set of nets at level i, For them into the corresponding positions in the B*-tree. Obviously, Op2 and Op3 need to perfom the deletion and insertion operations on a B*-tree, which takes a net e; E Ei, e; Can be represented as a set of the modules {%le: has a O ( h ) time, where h is the height of the B*-tree.. pin connecting to mi}. Thus, the estimated wirelengh Q ; of a net e; E Ei is
The simulated annealing algorithm starts by a B*-tree produced during definedby declustering. Then it perturbs a B*-tree (a feasible solution) to another B*-~~~c i &~~v and Gvc' Figure 3 shows the algorithm for declustering a cluster module e ( p , q ) from v p to vq in Ghc (G,,) .
create an edge e (v,, u p ) from v, to v p in Ghc (Gut) .
create an edge e(.,, u t ) from up to vt in Ghc (GWc).
is vertically related to 3.
.'
We proposed a simulated annealing based algorithm to refine the solution at 
(4)
We use the cost function r $ to guide the clustering of soft modules: Since CS can be transformed into a convex problem, we can apply Theorem
Lagrangian Relaxation
of [3]
. This implies that if 9 is an optimal solution to LDP, the optimal solution of L R S / (~) will also optimize CS.
Consider the Lagrangian < of CS defined as follows:
The Kuhn-Tucker conditions imply that the optimal solution of CS must be at a</dxp = 0, a</ayp = 0, a</ax:, = 0, and a</dyhi +, = 0. Thus, we only need to consider the multipliers 9 which satisfy these conditions. Therefore, for 1 < p 5 n, 
Solving LRS/(?) and LDP
Thus,wehaveyk,+, = $ E; : , ~j , a n d x ; ; + , = q j .
Let R denote the set of multipliers P' satisfying Equations (6) and (7). We now consider solving the Lagrangian relaxation subproblem LRS/(@) for a given 9 E 0, i.e. computing the dimension and coordinate of each module.
First, we partially differentiate < with respect to q to get an optimal value of wi such that C is minimized.
Thus, we have where O U t G ( W ) = {ule(w,u) E E ( G ) ) . Recall that Lp 5 w p 5
Up,l 5 p 5 n. Thus, the optimal w; can be computed by w; =
Since the dimension of each primitive module (up and h p ) has been determined, the dimension of each cluster module (9 and h j ) can be computed by applying a longest path algorithm in Ghj and G,j. Then, we consider partial differentiation of < with respect to f and yj, giving the optimality conditions of CS. Therefore, for 1 5 j 5 ni,
where lek I denotes the number of the pins of e;.
In Equation (S), there are n. j equations with ni variables. Thus; we can apply the Gaussian elimination to solve these ni equations with ni variables to get the optimal value of xi. In these ni equations, all coefficients of variables depend only on the net information (i.e., 2 ) Since the net information is k :
the same through the entire process, each vanable can be solved by the same process. Hence, we can record the process of solving each variable during the first iteration (which takes cubic time), and then each subsequent computation will take only quadratic time by applying the same process. Similarly, we can compute the optimal value of $ Next, we use a subgradient optimization method to search for the optimal P'. Let P' be a multiplier at step k . We move P' to a new multiplier P'I based on the subgradient direction:
where [2]+ = max {x, 0) and Pk is a step size such that limk+m pk = 0 and Er==, Pk = 00.
After updating @, we need to project @ to @* E R, and then solve the Lagrangian relaxation subproblem LRS/(P'*) by the above algorithm until the solution converges.
Projecting Lagrange Multipiers
We present a network flow based algorithm to check w'hether@ belongs to R and to project '6 to @* E R, if P' @ R. Further, an increamental update technique is employed to make the maximum flow computation more efficient.
For each cluster module m, we first create two networks Nhc (for Ghc) and Theorem 3 @* E R.
The projection process greatly affects the efficiency of the entire optimization, since there may be O(n2) edges in the worst case. Thus, we employ an incremental flow update technique to speed up the max-flow computation afteI updating P' and its corresponding capacity. Figure 6 gives an algorithm for the incremental network update. Lines lL2 check whether each edge e ( $ , q') violates the capacity constraint (i.e., 0 5 flow($, q') 5 cup(p', q')). Lines 3-9 fix the overflow on e@', q'), if an edge e@', q') violates its capacity constraint. Finally, Line 10 computes a maximum flow again.
Note that, for efficiency consideration, we may perform Lagrangian relaxation only at the higher levels of the multilevel framework (when the number of modules become small enough for Lagrangian relaxation). To do so, however, we still need to pass the information of the aspect ratio for each soft module level by level. 
Experimental Results
We implemented the MB*-tree algorithm for hard modules in the C++ language on a 450 MHz SUN Ultra 60 workstation with 2 GB memory. The package is available at http://cc.ee.ntu.edu.tw/wywchang/research.html.
Columns I, 2,3, and 4 of Table 1 lists the names of the benchmark circuits, the number of modules the number of nets and the total area of modules in the circuits, respectivel;. ami49 is the large'st MCNC benchmark circuit used in the previous works [5, 91 for comparative study. To test the capability of existing methods, we created ten synthetic circuits, named ex. ami49 x, by duplicating the modules and nets of ami49 by x times. The largest circuit ex-ami49200 contains 9,800 modules and 81,600 nets. Table 1 As shown in the table, our MB*-tree algorithm obtained a dead space of only 2.78% for ami49 in only 0.4 min runtime while B*-tree-vl.0 reported a dead space of 3.53% using 0.25 min runtime. Further, the experimental results.for larger circuits show that the MB*-tree scales very well as the circuit size increases while the previous works, sequence pair, 0-tree, and B*-tree, do not.
For circuit sizes ranging from 49 to 9,800 modules and from 408 to 8 1,600 nets, the MB*-tree consistently obtains high-quality floorplans with dead spaces of less than 3.72% in empirically linear runtime, while sequence pair, 0-tree, and B*-tree can handle only up to 196,98, and 1,960 modules in the same amount of time and result in dead spaces of as large as 13.00% (@ 196 modules), 12.29% (@ 98 modules), and 27.33% (@ 1960 modules), respectively. As shown in Table 1 , the resulting dead spaces for the MB*-tree is almost independent of the circuit sizes, which proves the high scalability of the MB*-tree. In contrast, the dead spaces for the non-hierarchical previous works all grow dramatically as the circuit size increases. In particular, the empirical runtime of the MB*-tree approaches linear in the circuit size while the other previous works cannot handle large-scale designs. Figure 7 shows the layout for the largest circuit exami49-200 obtained by MB*-tree in 256 min CPU time. It has a dead space of only 3.44%. Note that this circuit is not feasible to the previous works [5,9, 181. Table 2 shows the comparisons for area optimization alone (y = 1.0, 6 = O.O), wirelength optimization alone (y = 0.0, 6 = l.O), and simultaneous area and wirelength optimization (y = 0.5, 6 = 0.5) among sequence pair (SP), B*-tree, and MB*-tree based on the circuit industry (whose total area = 658.04 mm'). The circuit industry is a 0.18 pm, 1 GHz industrial design with 189 modules, 20 million gates, and 9,777 center-to-center interconnections. It is a large chip design and consists of three "tough" modules with aspect ratios greater than 19 (and as large as 36). (Note that we do not have the results for 0-tree for this experiment because the data industry cannot be fed into the 0-tree package.) In each entry of the table, we list the bedaverage values obtained in ten runs of simulated annealing, using a random seed for each run. For the column "Time," we report the runtime for obtaining the best value and the average runtime of the ten runs. As shown in the table, our MB*-tree algorithm obtained significantly better silicon area and wirelength than sequence pair and B*-tree in all tests. For area optimization, MB*-tree can obtain a dead space of only 2.1 1% while sequence pair (B*-tree) results in a dead space of at least 28.1% (12.9%). For wirelength optimization, MB*-tree can obtain a total wirelength of only 56631 mm while sequence pair (B*-tree) requires a total wirelength of at least 81344 mm (1 13216 mm). For simultaneous area and wirelength optimization, MB*-tree also obtains the best area and wirelength. The results show the effectiveness of our MB*-tree algorithm. For the runtimes, MB*-tree is larger than B*-tree and SP for wirelength optimization. (For area optimization, MB*-tree runs faster than SP.) This is reasonable because it took much longer to obtain significantly better results and the multulevel process incurred some overhead. Nevertheless, as shown in Table 1 , both SP and B*-tree do not scale well to the instances with a large number of modules (and thus their runtimes increase dramatically when the number of modules grows into hundreds). The resulting layout of industry for simultaneous area and wirelength optimization using MB*-tree is shown in Circuit Table 2 : Comparisons for area optimization alone, wirelength optimization alone, and simultaneous area and wirelength optimization among sequence pair (SP), B*-tree, and MB*-tree based on the circuit industry. In each entry, both the bestlaverage values obtained in ten runs of simulated annealing are reported. The last two rows give the ratios of the results (SP to MB*-tree and B*-tree to MB*-tree).
Area optimization (y = 1.0,6 = 0.0) Simultaneous area and wirelength optimization (y = 0.5,6 = 0. . . 
Concluding Remarks
We have presented the MB*-tree based multilevel framework to handle the floorplanning and packing for large-scale modules. Experimental results have shown that the MB*-tree scales very well as the circuit size increases. The capability of the MB*-tree shows its promise in handling large-scale designs with complex constraints. We propose to explore the floorplanning/placement problem with large-scale rectilinear and mixed sized modules/cells as well as huffer-block planning for interconnect-driven floorplanning in the future.
