Placement is an important constrained optimization problem in the design of very large scale (VLSI) integrated circuits [1] [2] [3] [4] . Simulated annealing [5] and min-cut placement [6] are two of the most successful approaches to the placement problem. Min-cut methods yield less congested and more routable placements at the expense of more wire-length, while simulated annealing methods tend to optimize more the total wire-length with little emphasis on the minimization of congestion. It is also well known that min-cut algorithms are substantially faster than simulated-annealing-based methods. In this paper, a fast min-cut algorithm (ROW-PLACE) for row-based placement is presented and is empirically shown to achieve simulated-annealing-quality wire-length on a number of benchmark circuits. In comparison with Timberwolf 6 [7] , ROW-PLACE is at least 12 times faster in its normal mode and is at least 25 times faster in its faster mode. The good results of ROW-PLACE are achieved using a very effective clustering-based partitioning algorithm in combination with constructive methods that reduce the wire-length of nets involved in terminal propagation.
INTRODUCTION
The design of VLSI circuits is a complex process that transforms a design specification into a physical circuit through several inter-dependent steps. The layout problem is an important stage of the overall design process and it involves the assignments of geometric locations to the elements of the circuits and the electric wire connections among them. Due to an enormous combinatorial complexity, the [8] [9] [10] [11] , but the ever increasing size of electronic circuits is rendering the placement followed by routing the more practical approach to the layout problem. Routing is highly de- pendent on the placement stage. A good placement simplifies the subsequent routing step, while a bad placement may render routing an impossible task. Therefore, routability is a major goal of placement among many other goals such as minimizing timing delays on critical nets, maximizing circuit performance, and minimizing layout area.
Successful placement algorithms have been obtained using general optimization paradigms such as simulated annealing [5, 7] and genetic algorithms [12] [13] [14] . Other existing placement techniques are:
Incremental construction: The strategy used here is to place the nodes of the circuit successively until all of them have been placed. Seed nodes are placed first. Subsequently, based on a selection rule, an unplaced node is chosen and is placed in the next best vacant position. This process is repeated until all the circuit elements have been placed, approaches of this kind are reported in [15] .
Node exchange: This is an iterative improvement approach in which an initial placement is improved by the exchange of the positions of some of the nodes. A widely used strategy is the exchange of the positions of two nodes until no further improvements can be made 16, 17] .
Combinatorial methods: Branch-and-bound is a strategy that systematically explores the solution space of a combinatorial problem in search of the optimal solution [3] . The strategy is used to prune or reduce the number of placements explored in what would otherwise be a complete exhaustive search of all possible placements. Branch-and-bound algorithms tend to be computationally expensive and are not used except for problems of small size [1] . Branch-and-bound techniques have been reported in [15] . Analytical methods: It is possible to formulate the placement problem as a non-linear mathematical problem, where non-linear programming techniques are applicable [18, 19] .
Min-cut placement: Methods in this class place the nodes of the circuits by a recursive use of a partitioning method. Basically, the circuit is partitioned into two parts. The available layout area is then cut by a straight line into two parts on each side of the cutting line. Each of the two parts of the circuit is then assigned to the two parts of the layout area. This gives rise to two smaller placement problems which are then recursively solved by the same method until each sub-circuit consists of one cell [6, 11, 20] .
Force-directed placement: Methods in this class view the connections among circuit nodes as binding forces that are trying to keep the connected nodes in close proximity. Therefore, a node is in a good location if the total force exercised on it, is zero. A common denominator of these methods is the determination of the best location for a node. Usually, nodes are successively moved in some order to their ideal locations in an effort to optimize the placement. Such methods are reported in [21] .
Spectral methods: These methods use a mathematical formulation of the placement problem, where placement properties are usually related to the eigenvalues of an associated matrix. These relationships are exploited to design of approximation placement algorithms. Several spectral approaches have been reported in the literature [22, 23] .
Resistive network optimization: This approach has been proposed by Cheng and Kuh [24] . Basically, the placement problem is formulated as the problem of minimizing the power dissipation in a resistive network. A solution of the resistive network is then translated into a placement of the circuit.
Among all placement methods, simulated annealing is currently the most popular and is the best algorithm available in terms of the placement quality, but it is too time consuming. One of the best available simulated annealing placement package is Tim-MIN-CUT PLACEMENT ALGORITHM 39 berwolf 6 [7] . Min-cut algorithms rank second to simulated annealing in terms of placement quality but are substantially much faster [1, 4] . The contribution of this paper is the design of a min-cut algorithm (ROW-PLACE) with results that are competitive with Timberwolf 6 in terms of quality. In terms of speed, ROW-PLACE in its normal mode is at least 12 times faster than Timberwolf 6, and, in its fast mode, ROW-PLACE is at least 25 times faster than Timberwolf 6. ROW-PLACE is distinguished from previous min-cut placement methods by an effective clustering-based partitioning algorithm in combination with constructive methods that reduce the wirelength of nets involved in terminal propagation.
ROW-BASED PLACEMENT
For placement purposes, an electrical circuits consists of a hypergraph along with geometric descriptions of its components. A hypergraph G(V, E) consists of a set of nodes V and a set of nets E. Each net e E is a subset of 2 or more nodes in V. In the hypergraph model of an electrical circuit, each node correspond to a component of the circuit, and each net represents a common electrical signal among its constituent nodes. A pin is a point of contact of a net with one of its constituent nodes. A net may touch a node in more than one pin. The locations of pins of a node are specified by relative coordinates with respect to the center of that node. The nodes are usually rectangular in shape and are placed so that their sides are parallel to the reference coordinate axes in the plane. Therefore, the location of a node is completely specified by the coordinate of its center if only one orientation of the node is allowed.
Row-based placement is an approach applicable to design styles such as standard cells, gate arrays, and field programmable gate arrays. In this approach, the nodes of the circuit have a common height but differ in length, and they can be placed in horizontal rows, where each row has the same common height of the nodes. The space between rows is reserved for routing. The number of rows is a user-chosen parameter and is usually chosen so that the layout space used is approximately a square. The length of a row is the sum of lengths of nodes assigned to it. Therefore, to avoid wasted space at the end of short rows, the placement algorithm must balance the lengths of rows.
OUTLINE OF ROW-PLACE
Roughly, the min-cut approach used is the same as in [20] . We alternate the partitioning of the circuit nodes by vertical and horizontal lines until the nodes are localized in small areas where they can be assigned to specific locations in specific rows. To ease the description of the process, let us define a rectangulation of the layout surface to consist of: r r k that are ordered from left to right. Let x be the x-length of rectangle ri. The x-coordinate of rectangle r is xi/2 + ,i-} xj. Thus, r has Xl[2 as its x-coordinate, r2 has x2/2 + Xl as its x-coordinate, and so on. Each row in the placement is at some y level. Let h be the common height of all the nodes of the circuit. Then row has the y-coordinate h/2 + 2(i 1)h. Thus, it is assumed as in Timberwolf [7] that adjacent rows are separated by distance h. This assumption is only used to estimate the wire-length and the separation between adjacent rows can only be determined after routing. The y-coordinate of a slab is the average y-coordinate of its rows. Each rectangle of the slab has the same y-coordinate as the slab it belongs to. The x and y coordinates of a node in a placement are those of its enclosing rectangle. The length of a net in a placement is the half-perimeter of the rectangle that encloses all its pins. The wirelength of the placement is the sum of lengths of all nets.
A placement into a rectangulation can be refined by applying either a y-refinement or an x-refinement. When a rectangle is cut into two rectangles, the nodes assigned to it are partitioned between the two resulting rectangles so that the two new rectangles are about equal in length.
The initial rectangulation is one that consists of one slab that spans all the rows and that consists of one rectangle. By repeated applications of x-refinements and y-refinements, we reach a rectangulation where each rectangle contains one node of the circuit and each slab spans one row. At this stage, the coordinates of each node specify the location of that node in the final placement. Many different sequences of x-refinements and y-refinements have been tried. The best approach was the one that alternates between the two refinements as in [20] . Figure 1 shows a placement of a small circuit obtained using ROW-PLACE. The After y-refinement, the final rectangulation is: in V and T is a desired target size for U. In ROW-PLACE, T is chosen so that a rectangle is cut into two rectangles of about equal length. The input hypergraph may contain one or both of the two bias nodes u and d of zero length that are designated to stay locked in U and D respectively during the partitioning process.
Compaction is an operation in which subsets of nodes are coalesced (compacted or clustered) into a single node each. When a subset of nodes X C V of a hypergraph G(V,E) is compacted into single node x, the nets incident with node x in the new hypergraph 42 Y. SAAB are of the form (e f') (V X)) tO {x}, where e E is a net originally incident with some node in X, and with the provision that nets that are reduced to one node are discarded. All other nets and nodes in V X remain the same.
The algorithm, BISECT, briefly presented here is described in [25] and in more detail in [26] . BISECT uses information collected during iterative improvement to incorporate compactions of nodes in a dynamic way. In [25, 26] , it is empirically shown that BISECT results can be up to 73 times better than the Fiduccia-Mattheyses algorithm (FM). The good empirical performance of ROW-PLACE are mainly due to the highly effective partitioning algorithm BI-SECT.
Let BISECT_AND_COMPACT(G, P1, P2, G', P'I, P'2) be a function that takes as input a hypergraph G (V, E) along with an initial feasible partition (P1, P2) of G, and outputs a compacted hypergraph G' (V', E') along with a feasible partition (P'I, P'2) of G'. 2) Forward move: Of the subsets P1 and P2, select the one that has excess size, call it F, and call the other subset T. Move a sequence of nodes fl fk from F to T using a highest-gain-first scheme until either F is out of free nodes or a stopping criterion is satisfied. Lock fl and fk in T, set c c + 1, and let L {fl f}. BISECT is the main feature of our min-cut approach. However, in order not to make this paper unnecessarily long, the reader is referred to [25, 26] in which the specific details of the implementation of BISECT are discussed at length. This leaves enough space to discuss additional relevant details of our implementation of ROW-PLACE without having to duplicate material available elsewhere. In the remainder of this section, the methods used to generate the initial feasible partition are presented. In this initial partitioning scheme, the gain of a node is the number of nets connecting it to U minus the number of nets connecting it to D. The next node to be added to U has the highest current gain among all remaining unassigned nodes. The next node to be added to D has the least current gain among all remaining unassigned nodes.
One-sided and two-sided constructions of the initial partition, are used to reduce the wire-length of nets involved in terminal propagation. For where low and high are the minimum and the maximum node length. This is done so that in subsequent y-refinements, the number of nodes per rectangle is large enough to permit the partitioning algorithm to achieve the desired size ratio of the two parts. For example, suppose the first rectangle in a slab has two nodes of lengths 10 and 200, respectively. During y-refinement, such rectangle is problematic because it cannot be cleanly divided into two rectangles of about equal length and thus it will lead to a large size imbalance that decreases the placement quality. The above restriction on x-refinements is meant to avoid the appearance of such problematic rectangles.
Local improvements: Two-interchange of nodes is used as a last step in improving the wire-length.
However, nodes are only allowed to move locally. More precisely, the nodes are stored in a 2-dimensional table T. A node in T(i, j) can only be interchanged with a node in T(m, n), where li ml -< ROW_RANGE and n <-RANGE. ROW_RANGE and RANGE are two user-specified parameters, and they were respectively set to 1 and 5 in our experimentation.
After one two-interchange step, a node in a row may overlap with other nodes in the same row. Node overlaps are removed by adjusting node positions by a sweep of each row from left to right.
EXPERIMENTAL RESULTS
All our experiments were performed on a DEC 5000-240 workstation with 96 Megabytes RAM. The resuits of ROW-PLACE were compared with those of Timberwolf 6. Due to the use of randomness, both ROW-PLACE and Timberwolf 6 produce different results in different runs. For this reason, all results used are averages of 5 different runs in each case.
The circuits used are listed in Table I in increasing Iteration of refinements: Due to the use of terminal propagation, partitioning of the nodes of one rectangle is influenced by the coordinate of nodes in other rectangles. Thus it is possible to get better performance by iterating x-refinements and y-refinements as long as the wire-length of the current placement can be improved. Normally each refinement is iterated 3 to 4 times.
Iteration of the partitioning algorithm: When the input subcircuit to the partitioning algorithm does not contain any of the two bias nodes, the initial partition is randomly generated. To improve performance in this case, the best partition generated by LIMIT runs of the partitioning algorithm is used, where LIMIT is a user-chosen parameter. In our experimentation, LIMIT was set equal to 5. 1) The improvement due to local node interchange ranges from 1.5% for the circuit industry3 to 7.6% for the circuit good. The overall improvement for all circuits is 2.2%.
2) The overall improvement for all circuits due to the use of one-sided and two-sided constructions of the initial partition over using of a random initial partition in BISECT is 2.4%. However, ROW-PLACE using random initial partition in BISECT generated better results for some individual circuits as indicated by the negative number in the second column of [25, 26] . Nevertheless, one-sided and two-sided constructions of the initial partition lead to better results overall and for most individual circuits.
The third column in Table III shows the advantage of using BISECT rather than FM as the partitioning algorithm. Except for the circuit l000g, ROW-PLACE performed better using BISECT. The improvement can be as much as 17% for the circuit ckta7 and is 9.4% overall.
The 4th column in Table III [25, 26] .
The third set of experiments is intended to show the good performance of ROW-PLACE in comparison with Timberwolf 6 [7] , a simulated-annealing based algorithm and widely recognized as the champion of row-based placement algorithms. The default parameter setting were used to run Timberwolf 6. Timberwolf 6 was also used with the parameter TWSCfast set to 10 (tw_fast (10) Table IV shows percentages over the resuits of Timberwolf 6 in terms of the sum of the wire-length over all circuits.
The timing results are shown in Table V and they are expressed as multiples of the run-time of ROW-PLACE using the third setting (fast), which is shown in CPU seconds in the first column of this table.
The results of tw_fast (10) 
