Abstract-In this paper we present a new placement method for cellbased layout styles. It is composed of alternating and interacting global optimization and partitioning steps that are followed by an optimization of the area utilizaiton. Methods using the divide-and-conquer paradigm usually lose the global view by generating smaller and smaller subproblems. In contrast, GORDIAN maintains the simultaneous treatment of all cells over all global optimization steps, thereby considering constraints that reflect the current dissection of the circuit. The global optimizations are performed by solving quadratic programming problems that possess unique global minima. Improved partitioning schemes for the stepwise refinement of the placement are introduced. The area utilization is optimized by an exhaustive slicing procedure. The placement method has been applied to real world problems and excellent results in terms of both placement quality and computation time have been obtained.
I. INTRODUCTION
HE ACCEPTANCE of cell-based design styles is consid-T erably influenced by the quality and the speed of the available design tools. In this paper we present strategies and algorithms of a new placement tool named GORDIAN, which has been successfully applied to all cell-based layout styles and particularly to large circuits.
Cell-based design is performed with predefined or adaptable functional units-cells which are taken from a well tested cell library. The most common layout styles are row-oriented standard cells and gate arrays. Standard cell circuits may either be complete chips or may form macros (building blocks) of hierarchical macrocell designs. The new sea-of-gates layout style exhibits features of the traditional gate array and macrocell concepts. As with macrocell designs. sea-of-gates cells can also vary considerably in size and aspect ratio. However, a sea-ofgates circuit can consist of thousands or tens of thousands of cells. The large circuit size and the variability of the cells, combined with the fixed area and routing resources of the master, makes the layout synthesis of sea-of-gates circuits very difficult.
The task of placement, the first step in the physical design process, is to calculate the positions of the cells. Since the quality of the placement determines the minimal achievable area and wiring length of a circuit, it has a large impact on production yield and circuit performance. Good placement tools, therefore, have to meet high requirements: they have to enable the sucManuscript received November 9, 1989 . This paper was recommended by Associate Editor R. H. J. M. Otten. cessful completion of routing within minimal or given area and must be able to deal with large designs. The difficulty of the placement problem increases as the cell count grows. Therefore, the classical approach to VLSI placement is based on the divide-and-conquer paradigm. Important representatives of this approach are based on min-cut graph partitioning (e.g., [ 11-[4] ). However, min-cut algorithms like those of Kemighan and Lin [SI and Fiduccia and Mattheyses [6] are iterative improvement heuristics that depend on an initial partition. Ng et al. [7] pointed out that it might be necessary to select one partition computed from many randomly generated starting partitions to obtain a good solution. They proposed a clustering algorithm that constructs a contracted network to be partitioned by the min-cut algorithm and in this way obtained improved results. Suaris and Kedem [4] extended the FiducciaMattheyses bisection algorithm to quadrisection and reported improved results when applied to standard cell placement.
J. M. Kleinhans is with the Siemens
Recently, alternative algorithms that model the placement problem as a linear or nonlinear continuous optimization problem have been studied. In contrast to the min-cut approach, geometric information about cell and chip dimensions and pin locations can be used directly. Usually no starting solution is needed and all modules (cells) are treated simultaneously. Among these approaches are methods using physical (force or electrical network) analogies [SI-[ 121 and eigenvector methods [13]-[1S]. Some of these methods apply partitioning to recursively create smaller subproblems. However, they restrict the simultaneous optimization to the initial step.
Getting stuck at local optima is a major drawback of partitioning-based methods. Efforts have been made to deal with this problem, especially to improve the widely used min-cut procedure-e.g., terminal propagation has been introduced by Lauther [I] and Dunlop and Kemighan [2] to consider the nets that connect cells in different regions. This global connectivity problem also arises with continuous optimization-based methods when applied to smaller and smaller subproblems.
The placement method GORDIAN [ 161 presented here has the unique feature of maintaining simultaneity over all optimization steps. The acronym GORDIAN stands for the two main parts of the method: global optimization and rectangle dissection, which is based on improved partitioning schemes.
With GORDIAN, the placement problem is formulated as a sequence of quadratic programming problems derived from the entire connectivity information of the circuit. An increasing number of constraints restricting the freedom of movement of the modules is imposed, reflecting the results of successively refined partitionings. In this way, on each level of refinement, a global placement of the modules is obtained simultaneously for all subproblems, avoiding any dependence on a processing sequence. The application of GORDIAN to standard cell and macrocell benchmarks from [ 171 has been discussed in [ 181.
0278-0070/91/0300-0356$01 .OO 0 1991 IEEE The extension of the procedure to the sea-of-gates layout style was presented in [ 191.
In the following sections, a detailed description of the components of the method is given and further results are presented. In Section 11, the procedure is outlined. The quadratic programming approach to global placement is described in Section 111. Section IV discusses fundamental and improved partitioning schemes. Section V explains how the final placement is obtained in accordance with the specific layout style. Space and standard cell and sea-of-gates circuits with up to over 6000 cells are discussed in Section VII. 
OUTLINE OF THE PROCEDURE
The placement procedure GORDIAN is composed of alternating and interacting global optimization and partitioning steps that are followed by a final placement step that adapts the global placement to style-dependent constraints. The data flow between these main steps is illustrated in Fig. l .
The input to GORDIAN consists of a net list, an extract of the cell library, and a description of the geometj of the chip.
The net list can be written as a binary relation 3 E 32 x 3Tl, where 3t and 311 are the index sets of the nets and the modules, respectively. A connection of net v to module p is represented by ( v , p ) E 3; the set of modules connected by net v is 311, = { p E 311 I (v, p ) E 3 }. The dimensions (width and height) of each rectangular module as well as the locations of its pins are taken from the cell library. For sea-of-gates circuits, the description of the chip geometry includes the basic cell array dimensions defining the possible module locations on the master. With standard cell designs the number of rows to be used must be given. For macrocell circuits an estimated placement area has to be described. The positions of the pad cells are needed independently of the layout style.
The main loop of GORDIAN is formed by an iteration of global optimization and partitioning steps. They aim at minimal wirelength and at a uniform distribution of the modules over the available placement area.
The global optimization starts with an initial (root) region that comprises the whole core area of the chip and contains all modules to be placed. One constraint fixes the center of gravity of all these modules to the center of this region. In each partitioning step the module set is further divided and the placement regions are dissected into son regions accordingly, thereby establishing new constraints for the next global optimization step. The partitioning generates a slicing tree [20] , [21] whose nodes correspond to the regions containing subsets of the modules.
This loop of global optimization and partitioning steps ( Fig.   1 ) is repeated until each region contains at most k modules, where k is a predefined constant. For standard cell circuits the modules are finally gathered into rows. For macrocell and seaof-gates circuits, the possible slicing dissections are enumerated. The allocation of the modules to the leaf regions is derived from their global placement, thereby avoiding a costly permutation. This allows the method to be applied to regions containing k = 30 or more modules even with large sea-of-gates designs. The result of this exhaustive slicing optimization is a shape function for each of these regions. It consists of area minimal rectangles circumscribing all enumerated module allocations with different aspect ratios. Finally, these shape functions are simultaneously evaluated to produce a placement of the modules that globally optimizes the area utilization.
GLOBAL PLACEMENT BY QUADRATIC PROGRAMMING
In each global optimization step, a quadratic programming problem is derived from the circuit connectivity (the net list) and from the dissection of the placement area on the respective level of partitioning. The solution of this quadratic programming problem is a global placement of the modules.
Problem Formulation
The objective function of the global optimization step is based on the rubber band lengths of the nets. The length L, of a net v is measured by the sum of the squared distances from its pins to the nets center coordinates (x", y , )
where ( tu,, vu,) are the coordinates of a pin connected to net Y relative to the center coordinates (x,, y , ) of its module j~ (in Fig. 2 the sum of the squared lengths I,, of the dashed lines is the length L, of net v).
To each net v an individual weight w, 2 1 is assigned. A high net weight groups modules that are connected by this possibly critical net closer together. Thus the objective function is the weighted sum of the squared rubber band lengths of the nets: 9 = ; c L, . w,.
( 2 )
" € X In order to reduce the number of variables we substitute the coordinates of the nets by the mean values of the coordinates of their pins. This way, the net variables are eliminated. Due to the net model chosen (Fig. 2) this is equivalent to replacing each net by all two-point connections of its pins (a clique). The objective function, which now depends only on the module coordinates, can be written in matrix form where the constant terms are deleted:
The vectors x and y denote the coordinates of the m movable modules p E 311, C 311 in the m-dimensional vector space !Rm.
The system matrix C and the vectors d, and dy are set up according to the procedure set-up-objective-function shown in Fig. 3 . For each net v, the edges of the clique which replaces the net, are weighted by the value e. It is set to e = 2/p; p = I 311, 1 (disregarding the net weight w, for the moment) since this way, the total edge weight of the clique amounts to (2/p) * ( p * ( p -1 ) / 2 ) = p -1,whichisthenumberofedgesin a spanning tree that connects all pins of net v.
The matrix C is positive definite if all movable modules are connected to fixed modules (e.g., pad cells) either directly or indirectly. This condition holds for all useful net lists, since At the top optimization level ( I = 0), all m modules to be placed belong to the root region which covers the whole placement area available to the modules. At the lth level of optimization, the placement area is divided into q s 2' regions p E CR"), each containing a subset X, & 3nm of modules, where 63''' is the index set of the regions on level 1. The centers ( u p , u p ) of these regions impose constraints on the global placement of the modules:
A ( / ) x = such that the area weighted mean value of the coordinates of modules p E 3n,, i.e., the center of gravity, corresponds to the center of region p . The entries a,, of the ( q x m )-matrix A'') depend on which module (occupying F, units of area) belongs to which region p : Combining the objective function (4) and the constraints (5), the following linearly constrained quadratic programming problem (LQP) is obtained:
is a convex function ( C is positive definite) and the linear equality constraints (5) define a convex subspace of %m, (7) has a unique global minimum +(x*). This particular modeling results in a LQP which is based on the entire circuit connectivity information at each level of optimization. The model at level 1 is derived from the model at level 1 -1 by refining the constraints. Thus the placement problem is mapped to a sequence of optimally solvable problems LQP.
Other placement methods that iteratively alternate global optimization and partitioning steps like [lo] and [12] differ considerably from GORDIAN in the way they treat the regions on a level of refinement: for each region they solve a separate optimization problem regarding modules that belong to other regions as fixed. Thus their solution depends on the sequence in which the regions are considered.
Solution Method
The q linear equality constraints restrict the freedom of movement of the modules to a ( m -q)-dimensional subspace of %"' . Visually, one module of each subset 3np has to be moved such that the center of gravity constraint imposed on the modules in this region is satisfied, while all other modules are free to move anywhere. This means that the m-dimensional coordinate vector x can be partitioned into m -q independent variables xi and q dependent variables xd: 
(9)
In the matrix A, there is exactly one entry for each module in each column (cf. (6) and Fig. 4) . Therefore, D can be chosen to be a diagonal matrix made of nonzero entries of A taking, for numerical reasons, the biggest entry of each row of A. The dependent variables xd and the vector x now can be expressed as a function of the independent variables xi:
with For the vector xo any choice is appropriate that satisfies Axo = U , e.g., the modules can initially be put onto the centers of their regions.
Substituting ( While C is a sparse matrix, ZTCZ is usually dense, so it is essential not to require Z T C Z explicitly when solving (14).
Therefore, direct solvers and iterative methods which need ZTCZ are impracticable. However, a well-suited iterative solution method for this class of problems is the conjugate-gradient method 1221- [24] . This method computes the solution using only products of the matrix Z T C Z with a vector, and does not explicitly require the matrix elements. Using appropriate data structures for C and Z , only sparse matrix-vector products have to be performed, resulting in an efficient solution procedure.
IV. IMPROVED PARTITIONING SCHEMES
During partitioning the module set and the placement area are recursively divided. Constraints are imposed on module subsets to get a better distribution of the modules over the whole placement area. GORDIAN does not use the partitioning principle to reduce the problem size, but to restrict the freedom of movement of the modules. Since these restrictions influence the following global optimizations and eventually fix approximate positions of the modules very close to their final placement, the decisions in the partitioning step are crucial. The partitioning decision is derived mainly from global placement. However, it should also be based on the number of nets crossing the new cut line. This number can be minimized by variation of the cut position, as described in Section IV-4.1, or by exchanging modules between the new subsets created, as explained in Section IV-4.2. Furthermore, the partitioning decisions can be improved by verifying them as often and as early as possible and to correct them if necessary. This can be achieved by repartitioning, which is described in Section IV-4.3.
The 
(16)
,'E31Z,.
, " 6 3R#..
The most obvious way of partitioning is to predefine a = 0.5 and to alternate the direction of the cut on each level. This leads to regions with approximately the same area and aspect ratio.
The quality of a partition can be measured by the cut value c, ( a ) which is the sum of the weights of nets that cross the cut line .,(a) = c w,
U€X'
with 3 t , = { Y~3 t~3 n , n 3 n p . # 0~% n t , n n t , . . # 0 } .
( 1 7 )
If the partitioning of a region is determined according to (15) and (16), the cut values have not yet been taken into account. Therefore, GORDIAN applies improved partitioning schemes to cut the Gordian knot. These try to minimize the influence of the partitioning step on the final layout by taking advantage of the global placement. In the following sections, three different methods for improving the partitioning are presented.
Improved Partitioning by Variation of Cut Direction and Position
To avoid large cut values the position of the cut can be varied. Going from left to right through the list of modules % , sorted by their global placement coordinates, and drawing a vertical cut line after each module, c,(a) may be determined for all values of a. Fig. 5 illustrates this analysis for the example of Fig. 6 . The value of a should usually be around 0.5.
Thus it is selected within the range 11 -2 a 1 I y 5 1 where c p ( a ) is minimum. Experiments indicate that the parameter y should not exceed the value of 0.3, since for larger values the dimensions of regions may differ too much, resulting in wasted area.
For each region p the cut values c, (a!) are calculated for both vertical and horizontal cuts. The lower value in the specified range of a suggests the cut direction. Should this choice create son regions with extreme aspect ratios, the cut is then made in the other direction using the proper value of a.
Improved Partitioning by Module Interchange
Another method to reduce cut values is to interchange modules between the subsets %, , and % , . of the initial partition of a region p derived from global placement according to (1 5) and I with usual min-cut, where terminal propagation has to be derived from the center coordinates of the regions the modules belong to.
Based on the global placement, min-cut is supplied with four subsets of modules for each region p . 312, is divided into a,, aF, CBF, and BL, such that for a vertical cut xPaL I xPaF I xrmr I xPaL. Fig. 6 (a) shows this initial partition of the module set of a macrocell circuit, where min-cut is applied after the first global optimization step. The modules belonging to the sets aF and (BF are highlighted by darker shading. Only the modules belonging to these two sets are free to move, whereas the modules p E @, U 53, are locked. The size of the subsets can be controlled by the parameter y: is used. Since a global placement is at hand, more detailed information about the positions of the modules can be used than
A value of y = 1 means that all modules of 312, are treated by min-cut, a smaller value restricts the min-cut algorithm to modules close to the cut line-in Fig. 6(a) , y = 0.5 is chosen.
Min-cut converts the sets aF and (BF into the sets a; and a;, minimizing the cut value. New son regions p' and p" are created with the modified module sets (3; and 53; such that 312,, = aL U (3; and 312,. = CBL U 53;. Fig. 6(b) indicates that only the few highlighted modules will actually be interchanged. Obviously the initial partition derived from global placement will be modified only near the cut depicted by the dashed line.
Repartitioning
During the first global optimization steps, modules may be clustered around the centers of their regions. If these regions are cut close to the center, the assignment of a module to one of the son regions may be fairly arbitrary, since many modules have approximately the same coordinates. The quality of the partitioning of a region p E @('-can be valued after the global optimization on level 1. A large overlap of the global placement of the module subsets belonging to the son regions p ' , p" E @"' indicates a bad partitioning since many modules of region p' tend to migrate to region p" and vice versa. 
V. FINAL PLACEMENT
The result of the alternating global optimization and partitioning steps is a global placement and a slicing structure with regions containing k or less modules. Since this placement contains overlapping modules and has to be adapted to a specific design style, a final placement step has to follow. In a standard- cell design the modules are collected in rows, for macrocell and sea-of-gates circuits an optimization of the area utilization is performed, packing the modules in a compact slicing structure.
I . Standard Cell Final Placement
In standard cell designs the modules are of approximately the same height but sometimes of fairly differing widths. The chip area is determined by the widths of the channels between the cell rows and by the lengths of the rows including feedthroughs for nets crossing the rows. The goal is to obtain narrow channels with equally distributed low wiring density and rows with equal length.
In GORDIAN, the final placement for standard cells proceeds similarly to the method proposed by Dunlop and Kernighan [2] . To collect the modules into r horizontal rows, they are sorted by their y-coordinates and divided into r subsets by r -1 horizontal cuts, such that y,, I . With this allocation procedure, which tries to change the global placement as little as possible, narrow channels and low wirelength can be achieved. To ensure equal row lengths, the number of feedthroughs is estimated. Rows with a large number of feedthroughs are made shorter than the average row length and vice versa. The row lengths are varied within a 1-5% deviation from the average row length. To achieve the desired row length as exactly as possible, modules with y-coordinates close to the cut line are exchanged between neighboring rows if necessary.
s y,, I s
Macrocell and Sea-of-Gates Final Placement
When the alternating global optimization and partitioning steps are completed, regions have been created that contain k or less modules. For these regions, an exhaustive slicing optimization (ESO) is performed which generates an optimal slicing structure in accordance to the global placement of the modules. Otten [ 131 published a heuristic to determine an optimized slicing structure for overlapping modules. Recently, van Ginneken [25] presented a polynomial algorithm to derive all possible slicing dissections of small sets of modules from a global placement.
The number s ( k ) of different slicing dissections of a rectangle into k subrectangles is shown in Table I . Min-cut or clustering algorithms, which have no neighboring information from module coordinates, have to choose one assignment of k modules to k regions from k ! permutations. Since there are as much as p ( k ) = k ! -s ( k ) possible placements, the enumeration of slicing structures is usually limited to k = 5.
However, in GORDIAN, module coordinates which are available from wirelength minimization in the global optimization step provide a criterion for module allocation. Therefore, the algorithm of van Ginneken [25] is applied to all ESO regions. It allows to enumerate all possible slicing dissections for module subsets with up to k = 35 modules even for large seaof-gates circuits. Fig. 8 shows the ESO procedure of GORDIAN. The procedure starts with the enumeration of all area minimal placements for each region that contains k or less modules. These placements are represented by a shape function [3], [26] - [28] for each ESO region. All area minimal placements of the whole circuit are obtained by recursively computing the shape function of the root of the slicing tree. After the selection of an appropriate root shape, a top-down traversal of the slicing tree that fixes the final placement is performed. Additionally it chooses the shape of each module. A module may possess more than one shape if it can be rotated or if there are different cell templates available from the library. By this ESO procedure, GORDIAN performs a global area optimization since all enumerations are evaluated simultaneously. Fig. 9 shows a typical result of this exhaustive slicing optimization process. It depicts the root shape functions of a seaof-gates circuit with over 6000 modules (see Section VII) for different values of the enumeration parameter k . Each point corresponds to the upper right corner of the circumscribing rectangle of an overlap-free placement. The fixed dimensions of the sea-of-gates master and the boundary curve h = (E, F , / w ) restrict the region of feasible placements to the shaded area in Fig. 9 . The influence of the parameter k on the shown shape functions is obvious. A higher value of k results in lower area and more shapes within the feasible placement area. With a value of k I 3 no feasible placement can be achieved because of the bad area utilization. With growing k the area utilization increases and the shape function gets closer and closer to the boundary hyperbola.
However, it is not the best idea to make k as large as possible. Experiments with GORDIAN, when applied to designs with a large number of modules, indicate that k should be just as low as needed for a good area utilization, since for higher values of k the quality of the placement in terms of wirelength usually becomes worse due to the earlier termination of the global optimization and partitioning loop.
VI. COMPLEXITY OF THE METHOD
Space complexity: There is one system matrix C for both the x-and y-coordinates and for all global optimization steps. It is The number n, of iterations needed to solve each of the quadratic placement problems (7) depends on how tight the bounds on the accuracy of the solution are set. A practical limit for n, is a value proportional to m0.5. The partitioning of q regions based on sorting the modules takes time proportional to q . 
VII. EXPERIMENTAL RESULTS
GORDIAN has been implemented in the C language and is running on workstations and main frames. To investigate the efficiency of the GORDIAN placement procedure, it was applied to standard cell blocks of hierarchical designs, as well as to whole standard cell and sea-of-gates circuits. The different partitioning schemes presented in Section IV were compared for sea-of-gates circuits [19] and standard cell circuits. In most cases the synergy of global optimization and min-cut (Section IV-4.2) worked best. . For the standard cell blocks scbl to scb9, the pads column depicts the number of connectors (boundary pins). Table I11 compares the results yielded by GORDIAN to those obtained from other tools, one based on min-cut, the other on simulated annealing [30] . The results are compared in terms of block area after final routing. In Table I11 bold numbers indicate the best results. The blocks were routed by the global and final routers of the VENUS CAD system [3 11. For small circuits the simulated annealing tool gives the best results. However, for blocks with more than 1000 cells and nets, GORDIAN performs better. The gap between annealing and GORDIAN becomes larger with increasing circuit size. The annealing performance is worse for the large circuits because CPU-time becomes too expensive and faster cooling schedules lead to suboptimal results. The CPU-times given in Table I11 are measured on a 15 MIPS main frame computer. Fig. 10 shows a plot of a design where the area of the standard cell blocks dominates the chip size. The chip represents a processing unit out of a series of chips for a main frame computer [32] . It contains 33 600 equivalent gate functions in the two large standard-cell blocks, 4.5-kb memory in 8 RAM cells, and one block with hard macros. The chip area has been remarkably reduced by putting all standard cells into just two blocks and by applying the placement procedure GORDIAN to these standard cell blocks (cf., the entries scb8 and scb9 in Table 111 ). Several other circuits with comparable complexity have been successfully designed with the VENUS system including GORDIAN. Tables IV-VI1 compare GORDIAN against the min-cut based placement method of the VENUS CAD system for the benchmark circuits from [29] . The comparisons are performed in terms of circuit area after completed wiring, the wiring length in layers 1 and 2, and CPU time needed by the placement methods measured on a Apollo DN4500 workstation running DO-MAIN/IX. Both global and detailed routing was carried out by the same tools. Additionally in the GORDIAN column, the ratio of its result compared to min-cut is shown in parentheses. GORDIAN outperforms the min-cut based procedure more and more as the circuits become larger, needing much less CPU time. 
I . Standard Cell Circuits

Sea-of-Gates Circuits
GORDIAN has also been compared to a min-cut placement procedure for sea-of-gates circuits [33] . Table VI11 confirms the results obtained with standard cell circuits. GORDIAN performed better in all cases. The results are given in terms of weighted estimated wiring length, measured as Manhattan-metric minimum spanning trees. The improved placement leads to lower wiring densities and lengths, which results in a drastically reduced number of unrouted connections and reduced CPUtimes due to less rip-up and reroute.
The placement of a 6112 cell sea-of-gates circuit sog6 was obtained within 10 min on a 15 MIPS computer. This circuit consists of 28K random logic with 63% basic cell utilization. Two metal layers were used for routing. Fig. 11 shows the final placement of this circuit obtained after exhaustive slicing opti- mization with k = 7. In contrast to most of the available seaof-gates design systems which place the cells in rows to create wiring channels, the approach taken here is to place the cells like for a huge macrocell circuit. The underlying slicing structure can hardly be detected in Fig. 1 1. This good area utilization is a result of the ESO procedure described in Section V-5.2. The final routing on the second metal layer is shown in Fig. 12 . It shows a uniform distribution of the wiring density.
VIII. CONCLUSIONS
GORDIAN, a new placement method based on simultaneous quadratic programming combined with improved partitioning schemes and exhaustive slicing optimization, has been presented. Results obtained for large industrial designs substantiate distinct improvements in placement quality and computation time compared with state of the art placement tools. The global view of GORDIAN particularly pays off with increasing circuit size. Our experiments indicate that GORDIAN will be able to obtain high quality results with low computation times for circuits with tens of thousands of modules. To satisfy the high wiring requirements of such designs, our future work will concentrate on combining global placement with global routing. With an improved net modeling derived from the global routing, it will be possible to incorporate timing constraints during global placement.
