I. Introduction
With the rapid progress of semiconductor technology, standard-cell placement has become a quite difficult task, since it must handle circuits containing more than 10 6 cells under strict timing constraints. A wellknown approach to this difficult problem is to introduce clustering of cells so that the problem size is effectively reduced. In particular, when a simulated annealing based placement algorithm is used, clustering is indispensable to get a good placement in a practical computation time. There have been many clustering methods proposed so far Any placement method combining clustering and simulated annealing typically consists of three phases, i.e., clustering cells, global placement, and detailed placement. Global placement is to determine a placement of clusters, and detailed placement is to get a final standard cell placement. In the global placement phase, the shape of any cluster is generally assumed as a square, or sometimes, a circle. However, it may not be appropriate to restrict the shape of a cluster to a square or a circle, since, in the final cell placement, cells in a cluster would not form a square or a circle. Thus, in global placement can be flexibly changed, a better final placement would be obtained. This is our basic idea of the proposed method.
In this paper, we propose a clustering based standard cell placement method, which consists of three phases mentioned above. In the first clustering phase, we perform a connectivity-based clustering algorithm to form clusters of cells for the given netlist. In the global placement phase, we determine and improve cluster placement by simulated annealing. We propose a new cluster placement model, called the Amoeba model. In the amoeba model, the shape of any cluster can be flexibly changed under some constraints. Since the flexibility of cluster placement is increased, it is expected that a high quality placement satisfying timing constraints is easily obtained. Finally, in the detailed placement phase, we assign cells to cell rows under the nonoverlapping constraint of cells with a constructive approach considering the minimization of estimated total wire length of all nets to get a final legal cell placement. Experimental results show that the proposed placement method produced better placements, compared with the placement method, in which the shape of any cluster is restricted to a square. The remainder of this paper is organized as follows. In Section 2, we explain the interconnect delay model, and formulate the timing-driven standard-cell placement problem. In Section 3, we propose a timing-driven standard-cell placement method based on a new cluster placement model. Finally, in Section 5, we address some conclusions and discuss future researches.
II. Preliminaries

Layout Model
The layout model we assume in this paper is the row based standard cell model. A standard cell layout consists of rectangular modules with the same height called cells. The functionality and the electrical characteristics of each predefined cell are tested, analyzed, and specified in advance. Cells are placed in rows called cell rows, and none of them should overlap each other. I/O pads are placed around the chip area. The space between two cell rows is called a channel, which is used to realize interconnection between cells.
Delay Model
In the proposed algorithm, timing constraints are taken into account, and thus an appropriate wire delay model is required. In this paper, the Elmore delay model is adopted as the interconnect delay model. For wire e, let le, we, ce and re denote its length, width, capacitance, and resistance, respectively. Further, let ev denote the wire entering node v from its parent, as shown in Figure 1 . We use the following model for interconnect delay of ev, denoted Dwire(ev). ce = (ca · we + cf ) · le re = r0 · le/we Dwire(ev) = rev.( cev/2 + c(Tv)) where ca, cf , and r0 are area capacitance, fringing capacitance, and resistance for unit-width, unit-length wire, respectively. Tv is the subtree rooted at v, and c(Tv) is the capacitance of directed connected subtree in Tv from Tv's root v. The total Elmore delay DElmore from source s0 to sink ti is given as follows. DElmore(s 0 , t i )=∑
Problem Formulation
Let L = (M,N) be a logic circuit, whereM = {m1, m2, ..,mM} is a set of cells and N = {n1, n2, ..., nN} is a set of nets. Any net ni can be represented as a subset ofM. Assume that a net nj is a set of cells, nj = {mnj0,mnj1, ...,mnjk} ⊂ M, k = |nj| − 1, where mnj0 is the source cell and others are sink cells. For each net nj , the required arrival time Treqj is given. Let dj denote the largest delay time from the source to a sink of a net nj . Then, the timing constraint violation Vj of a net nj is defined as follows. This equation means that, for each net nj , if the largest sink delay dj is smaller than the required arrival time Treqj , the timing constraint is satisfied, otherwise the timing constraint is violated. The placement problem discussed in this paper is to find a standard cell placement minimizing the total wire length of all nets under the condition that all timing constraints are satisfied.
III.
The Proposed Algorithm
Overview
The proposed method is divided into three phases, i.e., clustering of cells, global placement and detailed placement. After clustering cells, in the global placement phase, cluster placement is determined and improved with a simulated annealing based algorithm. Then, in the detailed placement phase, cells are assigned to cell rows under the nonoverlapping constraint of cells. In the following, we give the overview of each phase.
First, clustering of cells is performed to reduce the size of the problem so as to shorten the computation time of the simulated annealing based global placement method in the global placement phase (Fig.2(b) ). In most of previous clustering-based placement methods, in the global placement phase, the shape of each cluster is assumed to be a square, and each cluster is placed in the chip area, allowing the overlaps among clusters. However, it may be difficult to obtain a high quality cell placement, since cells in a cluster will not necessarily form a square in the final cell placement. Making the size of a cluster small Figure 3 : Global placement based on the amoeba model. would lead to a better placement, but it would also incur the large increase of computation time if the placement method was based on simulated annealing. To resolve this difficulty, we propose a new cluster placement method. In the global placement phase of the proposed method, the chip area is divided into a number of global bins (Fig. 2(c) ). Then, all clusters are placed on global bins by a simulated annealing(SA) based method ( Fig. 2(d) ), allowing the overlaps among clusters. The area of each cluster is k times as large as the area of a global bin, where k is an integer with 5 ≤ k < 20. When determining the cluster placement in the chip area, the shape of any cluster is not restricted to a square, but can be a collection of connected global bins (Fig. 2(d) ). We call this cluster placement model the Amoeba model.
In the detailed placement, each cluster is decomposed contained in global bins on which the cluster including them are placed. Finally, the cell assignment is performed with a constructive method minimizing the half perimeter length of all nets to obtain a final legal cell placement.
Phase 1: Clustering
A clustering of a standard-cell netlist groups cells into disjoint clusters. The objective of clustering in the proposed method is to reduce the problem size so as to improve the performance of the global placement phase, which is based on simulated annealing. Let K be a number of clusters after clustering cells. In the current implementation, we set K to 100 ∼ 500, and a simple connectivity-based, greedy clustering algorithm is used to form clusters. In this algorithm, first, K cells are randomly selected as initial clusters.
Then, a pair of a cell and a cluster, denoted ci and Cj respectively, is selected so that ci is not an element of any current cluster, and the fractions of nets absorbed by the clusters is maximum if ci is merged with Cj . We repeat this procedure to form clusters. We also introduce the upper and lower bounds of the cluster size, and the clustering result must satisfy those bounds. For the lack of space, we omit the details of cell clustering. For details, refer to [6] .
The Amoeba Model
We introduce some terminologies to define the amoeba model. The chip area is a rectangle, on which all cells are placed. The global bin is a grid, which is obtained by dividing the chip area equally in x-and ydirections. We divide the chip area into a set of global bins so that the number of global bins is about k times as many as the number of clusters, where k is a positive integer less than 20 (in the current implementation, we set k=10). Let C be a set of all clusters, C = {c1, c2, ..., cC}, and let B be a set of all global bins, B = {b1, b2, ..., bB}. Each cluster is placed on more than one global bins (Figure 3) . Therefore, let Ai be the size of a cluster ci and let Abin be the size of a global bin, then a cluster ci is placed on Ni global bins, where Ni = _Ai/Abin_. We call Ni global bins composing a cluster ci the cluster bins of ci. And for a set of global bins, B, two global bins bs B and bt B is said to be 8-adjacent, if they share a common vertex. Moreover, if the reflexive transitive closure of the 8-adjacent relation on B is equal to the universal relation of B, we say that B is 8-connected. We formally define the amoeba cluster placement model as follows. Amoeba Model For any cluster c C, c is placed on a set of 8-connected global bins in the chip area.
Phase 2: Global Placement
As described in Section 3.1, in the global placement phase of the proposed method, the set of all clusters in C is placed in the chip area with the simulated annealing based placement method with the amoeba model. Outline of the global placement phase is shown in Figure 4 . Initial placement of clusters is generated randomly under the condition that the shape of any cluster is a square and overlaps among clusters are permitted. We define the shape of a cluster as a rectilinear polygon by eliminating bins from the top-right corner of a square if the cluster can't be placed in a square shape. We give some notations to explain the detail of the proposed algorithm in the following. When a cluster ci C was placed with the amoeba model, a set of global bins on which cluster bins of ci have been placed is represented as P(ci) ⊂ B. Moreover, a set of clusters placed on global bin bj is represented as Q(bj), that is, Q(bj) = {ci | bj P(ci), ci C}. In the following, we explain the detail of the proposed algorithm.
Moving clusters
In this section, we describe how to change the cluster placement in the proposed method when the cluster placement is improved with simulated annealing. There are two cases. One is to move one cluster, and the other is to interchange locations of two clusters. But, the latter can be realized as an extension of the former. Therefore, we mainly explain how to move a cluster using an example shown in Figure 5 .
Determining the shape of a cluster
We explain how to determine the shape of a cluster when moving it. As described in Section 3.4.1, for the cluster c i to be moved, first, one global bin is randomly selected as the destination of move of one cluster bin of ci, and then, remaining cluster bins are moved one by one so that the shape of cluster ci is determined. The following is the procedure to determine the shape of a cluster.
Step 1 Let P(ci) = {bs}.
Step 2 Let B8(ci) B be a set of global bins which are 8-adjacent with P(ci).
Step 3 Evaluate the cost of each b B8(ci).
Step 4 Add bmin B8(ci) with the minimum cost to P(ci), that is, P(ci) = P(ci) ∪ {bmin}.
Step 5 Repeat Step 2 Step 4 until |P(ci)| = Ni. The objective of determining the shape of ci is to minimize the total delay considering the number of overlaps among clusters. Therefore, we should use the sum of delay.
Of all interconnections connecting to b as the cost in Step 3. However, we can hardly estimate the accurate delay to evaluate the shape of a cluster, because precisely estimating the delay of interconnections might require an unacceptable large time in this phase. So, we use the quadratic sum of total estimated wire length since the delay of a wire is quadratically proportional to the wire length. We define the estimated wire length of a net as follows. A set of nets connecting to ci C is represented byNet(ci). Note that a net n Net(ci) is a set of clusters. Moreover, the distance between ci C and cj C, denoted l(ci, cj), is defined as follows.
where lM(b, b) is the Manhattan distance between the centers of b and, b'. And then, the estimated wire length of n Net(c), denoted L(c, n), is defined as follows.
Cost function
A current solution in simulated annealing (SA) is perturbed by the methods presented in Sections 3.4.1 and 3.4.2. Then, a neighborhood solution is generated. We define the cost function to evaluate a neighborhood solution as follows. fcost = Lwire + βNoverlap + γTvio
Routing Estimation
To evaluate interconnections among clusters by Equation (6), we need to assign the pins of a net on each cluster and generate routing patterns of interconnections. We determine them by an extension of the SERT algorithm [3] . In the proposed method, we assign the pin positions for each cluster and determine global routes among clusters, simultaneously. First, we must assign the pin position of the source to the source cluster. We assign the pin position of the source to the cluster bin which is the nearest to the center of a net. The center of a net is determined by calculating the arithmetic mean of locations of all clusters connecting to the net. Next, instead of seeking terminals in each step of adding a new edge, we seek cluster bins which haven't been added to a Steiner tree. And then, the pin of the cluster connecting to a net is assigned to a cluster bin included in the cluster. For the lack of space, the details are omitted. For details, refer to [6] .
Phase 3: Detailed Placement
In the detailed placement, a final standard cell placement is determined from the global placement of clusters.
The cell placement is done with the step-by-step refinement procedure as follows. This procedure consists of three steps. In the first step, cluster bins in each cluster are assigned to global bins in the chip area without any overlaps among clusters, considering the minimization of total wire length. For each cluster, the corresponding global bins in the chip area thus determined in this step are called target global bins. In the second step, each cluster is decomposed into a set of original cells, and each cell is assigned to one of target global bins, which were determined in the first step, considering the minimization of total wire length. Finally, in the third step, each cell in the global bins are assigned to cell rows, also considering the minimization of total wire length. For the lack of space, the details are omitted. For details, refer to [6] .
IV. Conclusion
In this paper, we have proposed a clustering based, timingdriven standard-cell placement method with a new cluster placement model, called the amoeba model. In the proposed method, the shape of any cluster, which was used to be a square in the previous method, can be flexibly changed. Since the flexibility of cluster placement is increased, we can obtain a high quality placement satisfying timing constraints. Experimental results were quite promising. There are several future works. First, in the clustering phase, some timing influenced clustering method like is sought, since in the current implementation, no timing constraints are considered in cell clustering. Second, in the global placement phase, some effective mechanism to remove overlaps among clusters is required, since, in the current implementation, a considerable amount of overlaps exists in general. Also, an efficient algorithm to determine global routes among clusters is needed to reduce the computation time. Finally, in the detailed placement, a timingdriven cell assignment method to cell rows is sought.
