Abstract-In this paper we study the large-scale mixed-size placement problem where there is a significant size variation between big and small placeable objects (the ratio can be as large as 10,000). We develop a multi-level optimization algorithm, MPG-MS, for this problem which can efficiently handle both large-scale designs and large size variations. Compared with the recently published work [l] on large-scale mixed macro and standard cell placement benchmarks for wirelength minimization, our method can achieve 13% wirelength reduction on average with comparable runtime.
I. INTRODUCTION
Circuit placement is an important step in VLSI physical design as it defines the interconnects which determine the overall performance of the final layout in deep submicron designs. This problem has been extensively studied in the past several decades. Historically, due to its complexity, the placement problem has been classified into two separate problems. One is called the gateke11 plucement problem, where the designs consist of a large number of small placeable objects with similar sizes, as in the standard cell designs. The other is the so-called modulefilock placement (Jloorplunning) problem, which consists of a small number of large blocks (typically around a hundred), with possibly flexible aspect ratios. For the gatelcells placement problem, the optimization objectives include wirelength, delay, and routability, etc. Heuristic methods, such as min-cut-based algorithms and quadratic placement algorithms, were developed to handle the cell placement problem with emphasis on handling the high complexity. For the floorplanning problem, handling non-overlapping constraints and soft modules has been the major challenge. Several abstract floorplan representations, such as sequence-pair [2] and TCG [3] , were developed to represent the overlap-free legal solutions which include an optimal solution. Various searching algorithms, such as simulated annealing, can be used to search for the optimal solution based on these representations.
With the advance of IC technology, especially with the reuse of IP blocks for multi-million-gate ASKS and SOC designs, Figure 1 shows the area distribution of the placeable objects of circuit ibml8 released from [I] . It shows that the number of placeable objects is very large (easily over 100,000) and the number of big objects can be as many as several hundreds. Moreover, it shows that the size ratio between big and small objects can be as large as 10,000.
The traditional standard-cell placement techniques usually assume that the placeable objects have identical or similar sizes, while the floorplanning techniques usually can not handle the large-scale placement problem. Therefore, none of them is capable of solving the large-scale mixed-size placement problem alone. A common approach to this problem is to use a hierarchical design flow [5, 6, 7] , where the standard cells are first partitioned into blocks using either the logical hierurchy or min-cut-based partitioning algorithms. Floorplanning is then performed on the partitioned blocks, together with macros for area and wirelength minimization. Finally, the cells in each block are placed separately. Though this method can reduce the problem size to the extent where the floorplanning technique can be applied, the quality of the final placement may not be good. As pointed out in [8], pre-partitioning standard cells to form rectangular blocks may prevent such a hierarchical method from finding an optimal or near-optimal solution in 'It was derived from ISPD-98 (IBM) circuit benchmarks [4] for mixed macro and standard cell placement.
We categorize placeable objects into big and small objects based on the assumption that the size difference between large and small objects should be greater than 20 or 30.
terms of wirelength and delay minimization.
Therefore, a new methodology was proposed in [SI, which first flattens the logical hierarchy to the extent that we are certain that the circuit elements in each module of the flattened hierarchy should physically stay together. Then a physical hierarchy is generated, which defines the global, semi-global, and local interconnects (based on their levels in the physical hierarchy). The physical hierarchy generation process determines the rough locations of the placeable objects in the flattened hierarchy which can include standard cells, small functional blocks (such as 32-bit adder, shifter, etc.), hard IP blocks and soft IP blocks. The core of the physical hierarchy generation is to solve the large-scale mixed-size placement problem for the flattened designs.
In this paper, we propose a multi-level placement algorithm, called MPG-MS, for large-scale mixed-size IC designs. The inputs include a set of placeable objects, netlist, U 0 pads, the target chip width and height, and delay information for each placeable object. Pin locations are provided for objects with a fixed dimension. For soft objects with flexible aspect ratios under a fixed area, pin locations are assumed to be at the centers of the objects.
The outputs include a valid placement solution of all the placeable objects, the orientations of big objects, and aspect ratios of soft objects. The optimization objectives can be wirelength minimization, delay minimization, routability optimization, or a combination of them. We adopt the multi-level optimization method to handle this problem. Mixed-size placeable objects are simultaneously placed, the locations of larger objects are gradually fixed, and overlaps between larger objects are gradually removed while the locations of smaller objects are further refined during the multi-level optimization process. In this paper, we focus on wirelength minimization, however we think that the proposed method can be applied to other objectives as well.
The remainder of this paper is organized as follows. Section I1 reviews the previous work. Section I11 describes our placement algorithm, MPG-MS. The experimental results are shown in Section IV, followed by the conclusions and future work in Section V.
PREVIOUS WORK
.
Early approaches to the mixed-size placement problem use iterative improvement methods, i.e., simulated annealingbased placement techniques [9, lo] , to simultaneously place the big and small objects. Although they give good results for small to medium-size designs, such methods have difficulty in scaling to very large designs due to their high complexity.
In [ll], a quadratic placer was extended and combined with a two-level clustering scheme to handle the mixed macro and standard cell placement problem. However, the testcases in [ 111 were not large enough to demonstrate the effectiveness of this method for large-scale designs.
3Pin assignment can be performed during the placement phase. In this paper, we will not consider pin assignment during the placement.
Recently, a placement-floorplanning-placement flow [ 11 was presented to place designs with macros and a large number of small standard cells. This flow is similar to the hierarchical design flow as both of them use floorplanning techniques to generate an overlap-free floorplan followed by standard cell placement. Rather than using pure partitioning algorithms to generate blocks for standard cells, this flow proposes to use an initial placement result to guide block generation for standard cells. As we pointed out in Section I, such a hierarchical a.pproach may lead to sub-optimal solutions, which is confirmed by the results of our method.
MULTI-LEVEL PLACEMENT FOR LARGE-SCALE MIXED-SIZE DESIGNS: MPG-MS
The major challenge in placing big and small objects together is how to handle the interaction between placing big and small objects. Without a good initial placement for big objects, the final placement may not be good, as the placement of small objects will largely depend on the locations of the big objects. On the other hand, placing big objects without considering small objects will not yield a favorable result, as the interconnections between small objects can not be ignored and they will play a somewhat important role in determining the quality of the final layout. Therefore we shall place them simultaneously. However, moving a big object can greatly affect wirelength, delay, and other objectives, and it is harder to remove the overlap between big objects than for small objects.
We think this problem can be nicely solved using the multilevel optimization method which has been successfully applied to several VLSI CAD areas, such as partitioning, cell placement and routing. The multi-level optimization method is very good at efficiently handling high complexity design problems. It consists of a coarsening phase and a refinement phase. In the multi-level optimization approach for the placement problem, placeable objects are clustered in the coarsening phase and gradually declustered and refined in the refinement phase by performing placement. The coarsening phase helps to reduce not only the problem size, but also the size variation between placeable objects at each level so that placement techniques can be more efficiently applied in the refinement phase. At each level in the refinement phase, the placer looks at a different level of abstraction of the flat design. Such abstraction provides enough detailed information of small objects for the placer to place big objects and small objects simultaneously. When a good initial placement for big objects is obtained, we shall fix their locations so that the placement of the small objects can be further optimized. When we fix the locations of the big objects, we shall generate an overlap-free placement for them based on the initial placement, keeping their locations as close to the initial placement as possible in order to have a consistent placement solution. We call this process big objects legalization. Figure 2 illustrates the proposed multilevel mixed-size placement flow.
Our algorithm, MPG-MS, follows the simulated annealingbased multi-level optimization framework MPG proposed A. Review 
MPG
In MPG, the coarsening is performed by recursively clustering the placeable objects using the Firstchoice (FC) clustering algorithm [ 131 to build a hierarchy of netlist and placement instances from level LO to L1, . . ., L,. Level LO represents the input netlist and placement instance. Level L, represents the coarsest level where the number of clusters is no less than a user-specified number, say 500. The refinement is performed by placing the clusters at each level to a bin structure using SA techniques. There are two key techniques that enable MPG to handle large-scale designs: a hierarchical area density control mechanism and the simulated annealing technique for multilevel optimization. Due to page limitations, we will not explain them in detail. In general, other placement techniques can also be used at each level in the refinement phase.
B. OurFlow: MPG-MS
We first place the big and small objects to a bin structure5 using the multi-level optimization method. After a coarse placement result is generated, (which is also an overlap-free placement for big objects), a detailed placement is performed to remove the overlap between the small objects. The following 4Please refer to [12] on how these two techniques efficiently place a large number of objects with different sizes.
'The bin structure can he specified by users or automatically set according to the design size. In general, the placement bin structure should be fine enough so that the wirelength estimated in the coarse placement stage is close to the wirelength estimated in the detailed placement stage. section mainly focuses on how to handle big objects in the SAbased multi-level placement framework.
'B.1 Coarsening Phase
Big objects will not be clustered at the beginning of the coarsening phase, but will be gradually clustered at coarser levels, as we need to fix the locations for big objects before we reach level LO in the refinement phase. In order to do that, we first classify the big objects into several groups according to their sizes. Big objects with similar sizes can be clustered from the same level. In that manner, we can gradually cluster big objects with other clusters, or small objects. Not allowing big objects to be clustered at all will deteriorate the quality of the clustering result measured by the connectivity and thus affect the efficiency of refinement.
B.2 Refinement Phase
After we decluster the clusters in level Li+l, big objects will appear at level Li if they are clustered in level Big objects that are legalized are calledfied big objects. Big objects that need to be legalized at current level are calledjoating big objects. The hierarchical area density control mechanism combined with SA-based moves can efficiently place objects with different sizes. In addition to moving placeable objects from bin to bin, changing the orientations of hard objects and the aspect ratios of soft objects are included in SA moves. After the SA process we get a placement for the current level which may not be an overlap-free placement for big objects. If the placement of big objects is valid, i.e., overlap-free, we do nothing and move to the next level of refinement; other-wise, we need to generate an overlap-free placement for big objects which is as close to the original placement as possible. Given an invalid placement of big objects, the problem is how to move them to get an overlap-free placement under the chip dimension constraints, while trying to minimize the placement change (movement). This is a non-trivial problem, as the rectangle packing problem under the chip dimension constraints is NP-complete [14] , let alone the goal of minimizing the movement. We call it the big objects legalization problem. Therefore, we propose a heuristic flow to handle it.
First, given the initial placement, we check whether it is possible to get a valid placement for big objects under the target chip dimension constraints. This is called feasibility checking (Section B.3). The placement is called feasible if it passes the feasibility checking. Depending on the result of feasibility checking, we then use one of the following two schemes to legalize the big objects: the complete legalization scheme if the initial placement is feasible, or the partial legalization scheme if it is not. The complete legalization scheme (Section B.4) generates an overlap-free placement based on a feasible initial placement. The partial legalization scheme (Section B.5) tries to remove partial overlap and to fix part of the big objects. According to the locations of newly fixed big objects, we again need to perform SA-based placement at the current level to place the remaining clusters if the complete legalization scheme is used, or to placejloating big objects together with clusters if the partial legalization scheme is used, hoping that the SA process can remove the overlap and bring another. better initial placement for thejoating big objects. Fixed big objects can not be moved in the SA process. The wirelengthdriven SA process, combined with the hierarchy area density control mechanism can push the overlapping movable clusters orfloating big objects away fromfixed big objects while trying to minimize the total wirelength. After SA placement is completed, w e do the legalization again if any JZoating big objects exist. If, after several iterations, all the big objects still can not be legalized, we give up at this level and proceed to refinement at the next level. If it still fails at level LO, we report failure. Figure 3 illustrates how the big objects legalization and the SA process are integrated.
B.3 Feasibility Checking
In order to preserve the relative locations between the big objects, sequence-pair (SP) representation [2] is used to capture the relationship between the locations of the big objects in the invalid placement.
First we generate an SP from the invalid placement of the big objects by modeling each big object as a point (without dimension) located at the center of the object in the coordinate system. We then rotate the coordinate system clockwise 45 degrees. After we sort the big objects according to their x and y coordinates in the rotated system in non-decrease order, r+ is set to the order list according to the 5 coordinates and r-is set to the order list according to the y coordinates. An example is shown in Figure 4 . ( G v ) is not longer than the width (height) of the chip W ( H ) , this SP is considered to be feasible and the complete legalization scheme is used to generate an overlap-free placement; otherwise it is infeasible and the partial legalization scheme is used.
B.4 Complete Legalization Scheme
After we determine that an SP is feasible, we are sure that at least the packing solution can guarantee an overlap-free placement. However, we also need to move big objects as little as possible from their original locations in order to minimize 'Note that the SP derived by our algorithm may not be the best SP that can lead to a feasible placement. We can swap adjacent elements in SP to reduce the longest path while trying to maintain the relative relationship between objects' locations. A couple of heuristic methods were introduced in [15] trying to transform an infeasible SP to a feasible SP. We can use them. However such heuristics still can not guarantee success. the extra movement due to the overlap removal and thus make the big objects placement result as consistent with the initial placement of the current level as p o~s i b l e .~ We use a heuristic method, called longest path compaction, for this problem. It adjusts the distance between non-overlapping big objects in order to push the overlapping big objects away. First we add weight to the edges in GH (Gv) in the following way: for an edge ei,j connecting object (vertex) i and j , if there is no overlap between them, the edge weight w ( e ) is set to the horizontal (vertical) distance between object i and j in the initial placement; otherwise it is set to zero. We then compute the longest path in the modified GH (Gv). k is a number between 0 and 1, i.e., 0 5 k < 1. We then reduce the edge weight w(e) to k . w(e), i.e., proportionally scale down the distance between the non-overlapping objects. After compacting one path, other paths' lengths may change. Therefore we need to re-compute the longest path and compact it until the length of the longest path does not exceed the chip dimension. As each time when we compact a path, the positive weights of edges in this path decrease, other paths' lengths will not increase and thus the process will converge after, at most, m iterations, where m is the number of paths whose lengths exceed the chip dimension before the process of longest path compaction starts. After the compaction process, we can get an overlap-free placement for the big objects. 19071  22563  26925  28146  32154  45348  50722  52857  67899  69779  69788  83285  146474  160794  182522  183992  210056 When the SP is infeasible, we have to partially fix the locations for some non-overlapping big objects using heuristic methods. For aJixed big object, if it overlaps with otherfloating big objects, we first identify them and then push them aside to remove the overlap. For afloating big object, we identify a group of floating big objects which overlap with it and pack them to remove the overlap. We then fix the big objects that do not overlap with others and re-perform the SA-based placement at the current level. Big objects that have been fixed are not moved in the SA process. The 2nd-round SA process tends to move the overlapping objects away from the fixed objects. We then perform legalization again on the SA placement result. If after several iterations, all the big objects still can not be legalized, we give up at this level and proceed to refinement at the next level. If it still fails at level LO, we report failure. Table I which consists of the number of standard cells (#cells), the number of macros (#MAS), the number of U 0 pads (#pads), the number of nets (#nets), the total macro area vs. the total area of standard cells and macros in percentage (tot. Am), the area of the biggest macro vs. the total area of standard cells and macros in percentage (A&), and the ratio between the area of the biggest macro, the smallest macro and the smallest cell ( A h : A& : A:). We compared our placement results of wirelength (WL) and runtime (CPU) with those reported in [l] in Table 11 . We also shown the total number of levels in the refinement phase for each circuit in the column titled "#LVs", the legalization schemes used for big objects legalization in the form CilPj, where Ci refers to performing the complete legalization scheme at level Li, Pj stands for running j iterations of the partial legalization scheme followed by a complete legalization scheme at level Li. Figure 5 shows the final placement generated by our method for circuit ibm02.
B.5 Partial Legalization Scheme
These results show that our method MPG-MS can consistently out-perform the flow proposed in [l] with an average wirelength reduction of 13%, which demonstrates the efficiency of ou; method in handling such large-scale mixed size placement problem.
V. CONCLUSIONS A N D FUTURE WORK
In this paper, we present a method to handle the large-scale, mixed-size placement problem for fixed die-size IC designs. It 'Their runtimes were measured on a lGHz PUIntel system running Linux. is based on a multi-level optimization approach. Mixed-size placeable objects are simultaneously placed to obtain a good initial placement for big objects, then the big objects are gradually fixed and any overlap between the big objects is gradually removed, while small object placement is further refined during the multi-level optimization process. By integrating big obj e c t s placement and small objects placement into a single flow with consistent objectives, we can better optimize the designs compared with the hierarchical design flow where floorplanning is performed for the partitioned blocks followed by the standard cell placement. Experimental results on large-scale mixed-size placement benchmarks show that our method can out-perform the hierarchical flow by 13% on average in terms of total wirelength. Though we show the result for wirelength minimization using our proposed method, we believe that other objectives can be optimized in a similar way. Therefore we plan to incorporate other objectives, such as delay optimization, into the optimization process when placing large-scale mixed-size IC designs.
