An efficient analytical placement algorithm using cell shifting, iterative local refinement and a hybrid net model by Viswanathan, Natarajan
Retrospective Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 
1-1-2003 
An efficient analytical placement algorithm using cell shifting, 
iterative local refinement and a hybrid net model 
Natarajan Viswanathan 
Iowa State University 
Follow this and additional works at: https://lib.dr.iastate.edu/rtd 
Recommended Citation 
Viswanathan, Natarajan, "An efficient analytical placement algorithm using cell shifting, iterative local 
refinement and a hybrid net model" (2003). Retrospective Theses and Dissertations. 20075. 
https://lib.dr.iastate.edu/rtd/20075 
This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and 
Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses 
and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, 
please contact digirep@iastate.edu. 
An efficient analytical placement algorithm using cell shifting, iterative 
local refinement and a hybrid net model 
by 
Natarajan Viswanathan 
A thesis submitted to the graduate faculty 
in partial fulfillment of the requirements for the degree of 
MASTER OF SCIENCE 
Major: Computer Engineering 
Program of Study Committee: 
Chris Chong-Nuen Chu, Major Professor 
Akhilesh Tyagi 
Soma Chaudhuri 
Iowa State University 
Ames, Iowa 
2003 
Copyright © Natarajan Viswanathan, 2003. All rights reserved. 
ii 
Graduate College 
Iowa State University 
This is to certify that the master's thesis of 
Natarajan Viswanathan 
has met the thesis requirements of Iowa State University 
Signatures have been redacted for privacy 
lll 
TABLE OF CONTENTS 
LIST OF TABLES . 
LIST OF FIGURES 
ABSTRACT. 
CHAPTER 1 INTRODUCTION. 
CHAPTER 2 PLACEMENT ALGORITHMS 
2.1 Existing Placement Algorithms 
2.2 Proposed Placement Algorithm 
CHAPTER 3 OVERVIEW OF THE ALGORITHM 
CHAPTER 4 GLOBAL OPTIMIZATION 
CHAPTER 5 HYBRID NET MODEL 
5.1 Clique, Star and Hybrid Net Models 
v 
Vl 
viii 
1 
8 
8 
11 
13 
16 
19 
. . . . . . 20 
5.2 Equivalence of the Hybrid Net Model to the Clique and Star Net Models 22 
CHAPTER 6 CELL SHIFTING 
6.1 Calculation of Bin Utilization 
6.2 Shifting of Cells . . . . . . . . 
25 
25 
26 
IV 
6.3 Addition of Spreading Forces . . . . . . . . . . . . . . . . 31 
CHAPTER 7 ITERATIVE LOCAL REFINEMENT . . . . 
7.1 Bin Structure ........ . 
7.2 Description of the Technique . 
CHAPTER 8 DETAILED PLACEMENT 
8.1 Bin Structure and Wirelength Reduction 
CHAPTER 9 OTHER ATTEMPTED APPROACHES .. 
9.1 Objective Function 
9.2 Cell Shifting . . . . 
CHAPTER 10 EXPERIMENTAL RESULTS . . . . . . . . . 
10.1 Placement Benchmarks ................ . 
10.2 Comparison Between Clique and Hybrid Net Models 
.. 
. . . . 
10.3 Comparison Between Placement Algorithms: FastPlace, Capo 8.6 and 
Dragon 2.2.3 ................. . 
10.4 Scalability Analysis and Placement Figures . 
CHAPTER 11 CONCLUSIONS ...... . 
APPENDIX PLACEMENT FIGURES. 
BIBLIOGRAPHY ..... 
ACKNOWLEDGEMENTS 
34 
35 
35 
37 
38 
39 
39 
40 
43 
43 
44 
49 
55 
58 
60 
65 
69 
Table 10.l 
Table 10.2 
v 
LIST OF TABLES 
Placement benchmark statistics . . . . . . 
Clique net model versus Hybrid net model 
43 
45 
Table 10.3 Run-times for different steps of FastPlace (Clique net model) 46 
Table 10.4 Run-times for different steps of FastPlace (Hybrid net model) 47 
Table 10.5 Total run-time results for the two approaches . . . . . . . . . 48 
Table 10.6 Comparison of placement results with Capo 8.6 (Hybrid net model) 51 
Table 10. 7 Comparison of placement results with Dragon 2.2.3 (Hybrid net 
model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 
Table 10.8 Comparison of placement results with Capo 8.6 (Clique net model) 53 
Table 10.9 Comparison of placement results with Dragon 2.2.3 (Clique net 
model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 
Table 10.10 Comparison of placement results with Capo 8.6 for circuits with 
over lOOk cells (Hybrid net model) . . . . . . . . . . . . . . . . . 54 
Table 10.11 Comparison of placement results with Dragon 2.2.3 for circuits 
with over lOOk cells (Hybrid net model) . . . . . . . . . . . . . . 55 
Figure 1.1 
Figure 1.2 
Figure 1.3 
vi 
LIST OF FIGURES 
VLSI design flow . . . . . . 
VLSI physical design flow . 
Standard cell layout . . . . 
2 
4 
7 
Figure 3.1 The FASTPLACE algorithm.. . . . . . . . . . . . . . . . . . . . 15 
Figure 5.1 Netlist models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 
Figure 6.1 Regular bin structure . . . . . . . . . . . . . . . . . . . . . . . . 26 
Figure 6.2 (a) Regular bin structure (b) Unequal bin structure and utiliza-
Figure 6.3 
Figure 6.4 
tion after shifting . . . . . . . . . . . . . 
Cell distribution before and after shifting 
Pseudo pin and Pseudo net addition ... 
27 
30 
31 
Figure 10.l Run-time of FastPlace versus number of pins in logarithmic scale 56 
Figure A.1 Initial placement (Circuit: ibmOl) . . . . . . . . . . . . . . . . . 60 
Figure A.2 After 5-iterations during stage 1 of global placement (Circuit 
ibmOl) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 
Figure A.3 
Figure A.4 
Figure A.5 
vii 
After stage 1 of global placement (Circuit: ibmOl) 
After Stage 2 of global placement (Circuit: ibmOl) 
Final placement solution (Circuit: ibmOl) . . . . . 
62 
63 
64 
viii 
ABSTRACT 
In this thesis, we present FastPlace - a fast, iterative, flat placement algorithm for 
large scale standard cell designs in the fixed-die context. FastPlace is based on the 
quadratic placement approach. The quadratic approach formulates the wirelength min-
imization problem as a convex quadratic program, which can be solved analytically by 
some efficient techniques. However, the quadratic approach in general suffers from some 
drawbacks. First, the resulting placement has a lot of overlap among cells. Second, the 
resulting total wirelength may be long as the quadratic wirelength objective is only an 
indirect measure of the total linear wirelength. Third, existing net models tend to create 
a lot of non-zero entries in the connectivity matrix while modeling the netlist and this 
slows down the quadratic program solver. 
These problems are handled as follows: (1) A Cell Shifting technique is proposed to 
generate an evenly distributed global placement from the quadratic program solution. 
This technique is very efficient and produces a high-quality global placement with even 
cell distribution. (2) An Iterative Local Refinement technique is proposed to reduce the 
wirelength according to the half-perimeter bounding rectangle measure. This technique 
is very effective as it makes use of the wirelength and cell distribution information pro-
ix 
vided by a coarse global placement. (3) A Hybrid Net Model is proposed which is a 
combination of the traditional clique and star models. This net model significantly re-
duces the number of non-zero entries in the connectivity matrix. It results in a significant 
speed-up of the solver as compared to using it with the traditional clique model. 
Experimental results show that the run-time of FastPlace is of the order O(ni.412 ), 
where n is the circuit size given by the number of pins. Also, the current implementation 
when tested on 18 Standard Cell benchmark circuits is on average 11.0 and 82.7 times 
faster than existing academic placers Capo and Dragon respectively. 
1 
CHAPTER 1 INTRODUCTION 
The semiconductor industry has evolved from the first Integrated Circuits of the 
early 1970s and matured rapidly since then. Early small-scale integration ( SS! ) ICs 
contained about 1 to 10 logic gates, amounting to a few tens of transistors. Medium-
scale integration ( MSI ) increased the range of integrated logic available to counters 
and similar, larger scale, logic functions. The era of large-scale integration ( LSI ) 
packed even larger logic functions, such as the first microprocessors, into a single chip. 
Very large-scale integration ( VLSI ) now offers circuits with millions of transistors on 
a single piece of silicon. As CMOS process technology improves, transistors continue to 
get smaller and ICs hold more and more transistors [20]. 
As the number of components to be handled is growing at such a rapid rate, it 
is necessary to have tools which speed-up the time-to-market for such extremely large 
designs. This has been made possible by computer-aided design tools which automate 
the entire layout process (also known as the physical design phase) that follows the 
circuit design phase. VLSI Physical Design Automation is the research and development 
of algorithms and data structures related to the physical design phase in the VLSI 
design flow (Figure 1.1). The objective of physical design is to come up with optimal 
2 
System Specification 
Architectural Design 
Functional Design 
Logic Design 
Circuit Design 
Physical Design 
! 
Fabrication 
Packaging 
and 
Testing 
Figure 1.1 VLSI design flow 
3 
arrangements of devices on a chip and efficient interconnection schemes between these 
devices to obtain the desired functionality and performance [18]. Due to the dramatic 
increase in the number of devices to be placed on the chip, physical design algorithms 
must use the available chip area in a very efficient manner to lower costs and improve 
yield. Also, the algorithms have to be efficient to handle the extremely large circuit 
sizes. Efficient algorithms lead to a fast turn-around time and more importantly, permit 
designers to make iterative improvements to layouts to achieve the desired performance. 
The input to the physical design phase is a circuit diagram containing the various cells, 
macro blocks and transistors forming the circuit and the interconnection among them. 
The output of this phase is a placed and routed layout of the circuit. This is accomplished 
in several steps like partitioning, floorplanning, placement, routing and compaction. A 
typical physical design flow is given in Figure 1.2. 
The work presented in this thesis belongs to the placement step of the physical design 
flow. The goal of placement is to find a good arrangement of the components that allows 
for completion of interconnection between them, while meeting the desired performance 
constraints. Placement is a key step in the physical design flow. A poor placement con-
sumes larger area and results in poor circuit performance. It generally leads to difficult 
or sometimes impossible routing task. The placement problem can be defined as follows: 
Given an electrical circuit consisting of modules with pre-defined input and output ter-
minals and a netlist providing the interconnection between the terminals, construct a 
layout indicating the positions of each module so that the estimated wirelength and 
4 
Circuit Design 
Physical Design 
Partitioning 
Floorplanning 
and 
Placement 
Routing 
Compaction 
Extraction 
and 
Verification 
Fabrication 
Figure 1.2 VLSI physical design flow 
5 
layout area is minimized [17]. 
Placement can be classified into four main categories depending upon the different 
design styles: 
1. Full Custom or Macro Block: Macro block placement consists of placing a 
number of blocks of different sizes and shapes within the placement region. There 
is generally no restriction on how the blocks are placed within the rectangular 
region except that no two blocks may overlap. The irregularity of the block shapes 
is the main cause of unused areas, which have to be minimized along with the 
objective of minimizing the total wirelength. 
2. Standard Cell: In standard cell placement all the cells have the same height 
but differ in their widths. Cells are placed in pre-defined rows in the placement 
region. With the advent of over-the-cell routing due to the development of 5-6 
metal layer processes, there is no spacing between rows in current designs giving 
rise to channel-less standard cell designs. 
3. Gate Array: In gate arrays, the layout consists of primitive logic gates, which are 
pre-designed and pre-fabricated as a rectangular array, with horizontal and vertical 
routing channels between gates reserved for interconnects. Gate array placement 
consists of mapping the circuit onto the gates and designing the interconnect be-
tween these gates. 
4. Mixed Mode: Mixed mode placement consists of placing both macro blocks and 
6 
standard cells in the same rectangular region. This calls for algorithms which are 
efficient in handling both standard cell and macro block placement so that the 
wirelength objective is minimized. 
In this thesis, we present a placement algorithm for standard cell designs. In standard 
cell designs the placement region is divided into pre-defined rows within which the cells 
have to be placed. The height of all the cells in a standard cell design are the same 
with their widths being different. It is also important to consider the orientation of 
standard cells during placement. This is because standard cells are designed so that the 
power and ground nets run horizontally through the top and bottom of the cells. Hence, 
during standard cell placement, alternate rows of cells are usually flipped to share the 
power and ground nets so as to mimimize the area. A typical standard cell layout and 
associated terminology is shown in Figure 1.3. 
7 
CHIP BOUNDARY 
EJ D D D D 
D D 
D 
WASTED SPACE D 
STANDARD CELL ROW 
D D 
PLACEMENT REGION 
DDDDD 
Figure 1.3 Standard cell layout 
8 
CHAPTER 2 PLACEMENT ALGORITHMS 
Placement happens to be one of the most persistent challenges in present day In-
tegrated Circuit design. Designs in current deep sub-micron technology often contain 
over a million placeable components, and are getting larger by the day. Moreover, due 
to the dominance of interconnect delay, placement has become a major contributor to 
timing closure results [21]. Hence, it becomes imperative to have an ultra-fast placement 
algorithm to handle the ever increasing placement problem size. 
2.1 Existing Placement Algorithms 
The VLSI cell placement problem is know to be NP complete. The most commonly 
used objective for cell placement is that of wirelength minimization. A wide repertoire 
of heuristic algorithms exists in the literature for efficiently arranging the cells on a 
VLSI chip. These algorithms apply various approaches including, Analytical Placement 
[6, 7, 10, 13, 19, 22], Simulated Annealing [16, 23], and Partitioning/Clustering [3, 4, 24]. 
Among these, the analytical placement approach is quite promising for designing fast 
placement algorithms. Analytical placement algorithms typically utilize a quadratic 
wirelength objective function. Although the quadratic objective function is only an 
9 
indirect measure of the wirelength, its main advantage is that it can be minimized quite 
efficiently. As a result, analytical placement algorithms are relatively efficient in handling 
large problems. They typically employ a flat placement methodology so that a global 
view of the placement problem can be maintained [6, 7, 10, 13, 19, 22]. For simulated 
annealing and partitioning/clustering based approaches, a hierarchical methodology is 
almost always employed to reduce the problem size so as to speed up the resulting 
algorithm [3, 4, 16, 23, 24]. Note that when the placement problem is so large that a 
flat analytical approach cannot handle it effectively, a hierarchical analytical approach 
will be beneficial. Recently, Hu and Marek-Sadowska [11] proposed a fine granularity 
clustering technique, which essentially introduces a two-level hierarchy, to reduce the 
size of large scale placement problems. Analytical placement approaches following a flat 
placement methodology can be easily converted into hierarchical by incorporating the 
above or any other clustering algorithms. 
In this thesis, we deal with Analytical Placement using the quadratic objective func-
tion. The quadratic placement approach has been widely used in a lot of placers. The 
main advantage of the quadratic objective function is that it determines the proper 
ordering of the cells while inherently minimizing the wirelength. Also, the resulting 
formulation is convex and the nature of the problem is such that there exist fast and 
accurate techniques to minimize the objective function. A major concern with this ap-
proach is that it results in a placement with a large amount of overlap between the cells. 
Secondly, the quadratic objective by itself does not give the best possible wirelength. 
10 
Placement algorithms using the quadratic objective fundamentally differ in the way they 
handle the above problems. 
Kleinhans et al. [13] (GORDIAN) use a placement-based bisection technique to re-
cursively divide the circuit and add linear constraints to pull the cells in each partition to 
the center of the corresponding region. The min-cut algorithm of Fiduccia and Matthey-
ses [8] is used to improve the bisection and hence the wirelength. Vygen [22] applies 
a position-based quadrisection technique instead. A splitting-up technique to modify 
the netlist is also proposed to ensure that the cells will stay in the assigned region. 
The splitting-up technique also breaks down long nets and hence makes the objective 
behave like a linear function to some extent. Eisenmann et al. [6] propose constant 
density-induced forces to maintain a force equilibrium state and to distribute the cells 
over the placement region. Their approach involves solving a complex double integral 
formulation to determine the constant forces, which are then used in the quadratic opti-
mization stage to minimize the objective function. Etawil et al. [7] add repelling forces 
to cells sharing a net so as to maintain a target distance between them and attractive 
forces by using fixed dummy cells to pull cells from dense regions to sparse regions in the 
placement area. Hu et al. [10] introduce the idea of a fixed-point as a more general way 
to add forces for cell spreading. The last three papers [6, 7, 10] focus on cell spreading 
and have not discussed on how to improve the wirelength by a quadratic objective. Sigl 
et al. [19] employ a linear objective function so as to have a more direct measure of 
the wirelength. They report a 20% decrease in wirelength over GORDIAN, but their 
11 
technique also has a 5X increase in the run-time as compared to GORDIAL\. Hence, 
such an approach may not be scalable to current generation circuits. 
2.2 Proposed Placement Algorithm 
In this thesis we present a fast, iterative, flat placement algorithm called FastPlace 
for large-scale standard cell designs in the fixed-die context. Fastplace is based on the 
analytical placement approach. The main contributions of our work are : 
• An efficient Cell Shifting technique is proposed to remove cell overlap. The cell 
shifting technique maintains a rough order of the cells in both the horizontal and 
vertical directions as we believe that the quadratic objective function can determine 
a proper cell ordering. Hence, a high-quality global placement with even cell 
distribution can be produced in a short time. 
• An Iterative Local Refinement technique is proposed to reduce the wirelength ac-
cording to the half-perimeter bounding-rectangle measure. This technique is in-
terleaved with Global Optimization and Cell Shifting during the final iterations of 
global placement. This technique is very effective as it makes use of the wirelength 
and cell distribution information provided by a coarse global placement. 
• A Hybrid Net Model is proposed which is a combination of the traditional clique 
and star [14] net models. We prove the equivalence of the hybrid net model to the 
traditional clique and star models. On average, the Hybrid Net Model results in a 
12 
2.95 times reduction in the number of non-zero entries in the connectivity matrix 
as compared to the traditional clique model. This results in a significant speed-up 
of the quadratic program solver. 
We show experimentally that FastPlace is significantly faster than state-of-the-art aca-
demic placers Capo and Dragon, while producing comparable wirelength. 
13 
CHAPTER 3 OVERVIEW OF THE ALGORITHM 
FastPlace essentially consists of three stages. The aim of the first stage is to minimize 
the wirelength and spread the cells over the placement region to obtain a coarse global 
placement. It is composed of an iterative procedure in which we alternate between 
Global Optimization and Cell Shifting. Global Optimization involves minimizing the 
quadratic objective function. During Cell Shifting, the entire placement region is divided 
into equal sized bins. For each bin, we determine its current utilization based on the 
placement obtained from the Global Optimization step. This gives a measure of the 
current placement distribution. The standard cells are then shifted around the placement 
region based upon the bin in which they lie and its current utilization. Shifting is done 
so as to decrease the overlap among the cells and spread them over the placement region. 
Finally, we add a spreading force to all the cells to account for their movement during 
shifting. This is done so that the cells do not collapse back to their original positions 
during the next Global Optimization step. 
The second stage is to refine the global placement by interleaving an Iterative Local 
Refinement technique with Global Optimization and Cell Shifting during the final stages 
of global placement. The Iterative Local Refinement technique is employed to reduce 
14 
the wirelength based on the half-perimeter bounding-rectangle measure and to speed up 
the convergence of the algorithm. This stage of global placement yields an extremely 
well distributed placement solution with a very good value for the total wirelength. The 
wirelength result obtained from this stage is quite close to the final wirelength obtained 
after the detailed placement stage. 
The third stage is that of the Detailed Placement. This consists of legalizing the 
current placement by assigning cells to pre-defined rows in the placement region and 
removing any overlap among them. It also consists of further reducing the wirelength 
by a greedy heuristic. 
The algorithm FASTPLACE is summarized in Figure 3.1 and the individual com-
ponents of the flow are discussed in more detail in Chapters 4-8. 
15 
Algorithm FASTPLACE 
Initialization: Divide the placement region into equal sized 
bins and place the cells in the center. 
Stage 1: Coarse Global Placement (CGP) 
1. Repeat 
2. Perform Global Optimization. 
3. Perform Cell Shifting and Add Spreading Forces. 
4. Until the placement is roughly even. 
Stage 2: Wirelength Improved Global Placement (WIGP) 
1. Repeat 
2. Perform Global Optimization. 
3. Perform Iterative Local Refinement. 
4. Perform Cell Shifting and Add Spreading Forces. 
5. Until the placement is very even. 
Stage 3: Detailed Placement (DP) 
1. Repeat 
2. Further reduce the wirelength using a greedy heuristic. 
3. Legalize the current placement solution. 
4. Until there is no significant improvement in wirelength. 
Figure 3.1 The FASTPLACE algorithm. 
16 
CHAPTER 4 GLOBAL OPTIMIZATION 
This chapter describes the Global Optimization step, which involves formulating and 
solving the traditional quadratic objective function. The quadratic placement approach 
uses springs to model the connectivity of the circuit. The total potential energy of the 
springs, which is a quadratic function of the length of the springs, is minimized1 to 
produce a placement solution. In order to model the circuit by a spring system, each 
multi-pin net needs to be transformed into a set of two-pin nets by using a suitable 
net model. In other words, the circuit is represented by a weighted hypergraph of 
vertices, corresponding to the cells to be placed and hyperedges, corresponding to the 
nets connecting the cells. The hypergraph needs to be transformed to a graph by using a 
suitable net model so that it can be worked upon by a quadratic program solver. In the 
following discussion we assume that this transformation has been applied to the circuit. 
The net models used are discussed in Chapter 5. 
Let n be the number of movable cells in the circuit and (xi, Yi) the coordinates of the 
center of cell i. All pins corresponding to a particular cell are assumed to be at the center 
of the cell, hence cell and pin coordinates for a particular cell are essentially the same. A 
1 Equivalently, a force-equilibrium state of the spring system is found. 
17 
placement of the circuit is given by the two n-dimensional vectors, x = (x1, x2, ...... , xn) 
and y = (y1, y2 , ...... , Yn)· Consider the net between two movable cells i and j in the 
circuit. Let, W;1 be the weight of the net connecting them. Then the cost of the edge 
between the two movable cells is given by: 
(4.1) 
If a cell i is connected to a fixed cell f with coordinates ( x 1, y 1), the cost of the net is 
given by: 
(4.2) 
Consequently, the objective function summing up the cost of all the nets can be written 
in matrix notation as [9]: 
(4.3) 
where, Q is an n x n symmetric positive definite matrix and dx, dy are n-dimensional 
vectors. Since equation ( 4.3) is separable into <I>(x, y) = <I>(x) + <I>(y), we consider only 
the x-dimension for subsequent discussion. This is given as : 
(4.4) 
Let% be the entry in row i and column j of matrix Q. From expression (4.1), the cost in 
18 
the x-direction between two movable cells i and j is t Wii ( x; + x] - 2xix i). The first and 
second terms contribute Wii to qii and %i respectively. The third term contributes - Wij 
to% and qii· From expression ( 4.2), the cost in the x-direction between a movable cell 
i and a fixed cell f is tWi1(x; + x} - 2xiXf ). The first term contributes Wif to qii· The 
third term contributes -Wifx f to the vector dx at row i and the second term contributes 
to the constant part of equation ( 4.4). The objective function 4.4 is minimized by solving 
the system of linear equations represented by: 
Qx+dx = 0 (4.5) 
Equation ( 4.5) gives the solution to the unconstrained problem of minimizing the 
quadratic function in (4.4). In FastPlace, we solve such an unconstrained minimization 
problem throughout the placement process. We do not add any constraint to the place-
ment process. This is so because the spreading forces added during the Cell Shifting 
step is produced by pseudo nets connecting the cells to the chip boundary. This only 
introduces some terms in the form of expression ( 4.2) and only modifies the Q matrix 
and the dx vector as described above. 
19 
CHAPTER 5 HYBRID NET MODEL 
To handle the large placement problem size, a fast and accurate technique is needed 
to solve equation ( 4.5). Since the connectivity matrix Q is sparse, symmetric and positive 
definite, we use the pre-conditioned conjugate gradient method [2] to solve equation ( 4.5). 
The Incomplete Cholesky Factorization [2, 12] of matrix Q is used as the preconditioner. 
The run-times for computing the Incomplete Cholesky Factorization and the Conjugate 
Gradient solver are directly proportional to the number of non-zero entries in the matrix 
Q. This i~ turn, is equal to the number of two-pin nets in the circuit. Hence, it becomes 
imperative to choose a good net model so as to have minimal non-zero entries in the 
matrix Q. 
To speed up the solver, we propose a Hybrid Net Model which is a combination of the 
clique and star net models. Experimental results in Chapter 10 show that the Hybrid 
Net Model can reduce the number of non-zero entries in the matrix Q by 2.95 times over 
the traditional clique model. In the subsequent discussion, we give a brief overview of 
the clique and star net models and introduce the hybrid net model. Then, we prove the 
equivalence of the clique and star models and hence the consistency of the hybrid net 
model. 
20 
5.1 Clique, Star and Hybrid Net Models 
The clique model is the traditional model used in analytical placement algorithms. 
In the clique model, a k-pin net is replaced by k(k - 1)/2 two-pin nets forming a clique. 
Let W be the weight of the k-pin net. Some commonly used values for the weight of the 
two-pin nets are W/(k - 1) (e.g., [22]) and 2W/k (e.g., [6, 13]). The clique model for a 
5-pin net is illustrated in Figure 5.l(a). 
(a) Clique Model (b) Star Model 
k (= 5) Pin Net 
Figure 5.1 Netlist models 
Recently, Mo et al. [14] utilize the star net model in a macro-cell placer. In the star 
model, each net has a star node to which all the pins of the net are connected. Hence a 
k-pin net will yield k two-pin nets. The star model for a 5-pin net is illustrated in Figure 
5.l(b). Mo et al. [14] point out that the clique model generates on average 30% more 
21 
two-pin nets than the star model for the MCNC92 macro block benchmarks, although a 
star model is created in their model even for two-pin nets. Vygen [22] also switches to a 
star model for very large nets to reduce the number of terms in the objective function, but 
has not shown the validity of mixing the clique and star models in quadratic placement. 
In addition, both papers have not discussed as to how the weight of the nets introduced 
by the star model should be set. 
We prove in the following section that for a k-pin net of weight W, if we set the 
weights of the two-pin nets introduced to 'YW in the clique model and k"fW in the star 
model for any "(, the clique model is equivalent to the star model. Therfore, the two 
net models can be used interchangeably. We propose a Hybrid Net Model which uses 
a clique model for two-pin and three-pin nets and a star model for nets with four or 
more pins. We set 'Y to 1/(k - 1) in FastPlace as it works better experimentally. For 
nets with four or more pins, the proposed model greatly reduces the number of two-pin 
nets in the circuit. Consequently, it reduces the number of non-zero entries in the Q 
matrix. By using the clique model for two-pin nets, we will not introduce one extra net 
and two extra variables per two-pin net as in [14]. We choose to use the clique model for 
three-pin nets because it is better than the star model for the following reasons: First, if 
two cells are connected by more than one two-pin or three-pin net in the original netlist, 
the two-pin nets generated by the clique model between the two cells can be combined 
and will introduce only a single non-zero entry in the matrix Q. Second, it will not 
introduce an extra pair of variables. The Hybrid Net Model greatly reduces the number 
22 
of non-zero entries in the matrix Q. This significantly speeds-up the computation of the 
Incomplete Cholesky Factorization and the Conjugate Gradient solver, as compared to 
using the traditional clique model. 
5.2 Equivalence of the Hybrid Net Model to the Clique and 
Star Net Models 
In this section, we show that the clique model is equivalent to the star model in 
quadratic placement if net weights are set appropriately. It follows that the clique, star 
and Hybrid net models are all equivalent. 
Lemma 1 For any net in the star model, under force equilibrium, the star node is at 
the center of gravity of all pins of the net. 
Proof: Consider a k-pin net. Let Xs be the x-coordinate of the star node and let Ws 
be the weight of the two-pin nets introduced. Then the total force on the star node by 
all the pins of the net is given by: 
k 
F = L Ws ( X j - X 8 ) 
j=l 
Under force equilibrium, the total force F = L:;=l Ws(Xj - Xs) = 0. Therefore, 
(5.1) 
23 
Hence the lemma follows. • 
Theorem 1 For a k-pin net, if the weight of the two-pin nets introduced is set to We 
in the clique model and k We in the star model, the clique model is equivalent to the star 
model in quadratic placement. 
Proof: For the clique model, the total force on a pin i by all the other pins of the net 
is given by: 
k 
Ftque = We L (xj - Xi) (5.2) 
j=l,#i 
For the star model, all the pins of the net are connected to the star node. Hence, the 
force on a pin i due to the star node is given by: 
k 
= We L (xj - Xi) 
j=l,#i 
24 
= F~ique 
i 
As the forces are the same in both the models for all pins, the two net models are 
equivalent. • 
25 
CHAPTER 6 CELL SHIFTING 
Global Optimization gives a placement which minimizes the quadratic objective func-
tion. However, it does not consider the overlap among cells. Therefore, the resulting 
placement has a lot of cell overlap and is not distributed over the placement area. Cell 
Shifting evens out the placement by distributing the cells over the placement region 
while retaining their relative ordering obtained during the Global Optimization step. 
6.1 Calculation of Bin Utilization 
In the initialization step the placement region is divided into equal sized bins as 
shown in Figure 6.1. The area of each bin is such that it can accomodate an average of 
4 cells. Based on the current placement of cells obtained from the Global Optimization 
step, the utilization of each bin (Ui) is then computed. Ui is defined as the total area of 
all cells inside bin i. In calculating Ui we sum the areas of all standard cells which are 
completely covered by bin i and the overlap area between the bin and the standard cell 
for cells which partially overlap with bin i. The standard cells are then shifted around 
the placement region based upon the bin in which they lie and its current utilization. 
26 
Figure 6.1 Regular bin structure 
6.2 Shifting of Cells 
Let us consider the case where the cells are shifted in the x-dimension. To shift cells 
in the x-dimension, we go through every row of the regular bin structure and move cells 
present in the row. Shifting of cells is a two step process. First, based on the current 
utilization of all the bins in a particular row an unequal bin structure reflecting the 
current bin utilization is constructed. Second, every cell belonging to a particular bin in 
the old regular bin structure is then linearly mapped to the corresponding new bin in the 
unequal bin structure. As a result of this mapping, cells in bins with a high utilization 
will shift in a way so as to reduce its utilization and the overlap amongst themselves. 
These steps are described in more detail below. For shifting cells in the y-dimension 
we consider every column of the regular bin structure and follow the same two steps as 
described above. 
(a) Distribution 
Before 
Spreading 
(b) Distribution 
After 
Spreading 
27 
Utilization 
Bini 
Bin i+ l 
~i + l 
Figure 6.2 (a) Regular bin structure (b) Unequal bin structure and utiliza-
tion after shifting 
To illustrate the shifting consider a particular row in the regular bin structure as 
shown in Figure 6.1 (shaded row) . The utilization of all the bins in this row is given in 
Figure 6.2(a). Typically, during the initial placement stages, most of the cells will be 
clustered in the center of the placement region. Hence, bins in the center will have a 
very high utilization as compared to bins in the periphery. The unequal bin structure 
constructed from the regular bin structure is illustrated in Figure 6.2(b). It can be 
observed from Figure 6.2(b) that the new bins in the center of the placement region 
28 
are wider than the original bins because of a high value of the bin utilization. Also, 
original bins having a small value of the bin utilization become narrower in the new bin 
structure. 
To get the equation for the new bin structure, from Figure 6.2 let, 
• 0 Bi : x-coordinate of the boundary of bin corresponding to the regular bin 
structure 
• N Bi : x-coordinate of the boundary of bin i corresponding to the unequal bin 
structure 
Then, 
NBi = OBi-1(Ui+i + 6) + OBi+i(Ui + 6) 
ui + ui+i + 26 (6.1) 
The intuition behind the above formula is that the new bin should be constructed in 
such a way that it averages the utilization of bin i and bin i + l. The reason for having 
the parameter 6 is as follows: Let, 6 = 0 and the utilization of bin i + 1, Ui+l = 0, then 
from equation (6.1) it can be seen that the x-coordinate of the new bin boundary of bin 
i, N Bi = OBi+I and for bin i + 1, N Bi+1 = OBi . This results in a crossing of the bin 
boundaries in the new bin structure which results in improper mapping of the cells. To 
avoid this crossing we need the parameter 6 which is set to a value of 1.5. 
Following the creation of the unequal bin structure we do a linear mapping of all 
cells present in a bin of the regular bin structure to the corresponding bin in the unequal 
29 
bin structure. If, 
• Xj : x-coordinate of cell j in bin before mapping (obtained from the Global 
Optimization step) 
• xtj : x-coordinate of cell j in bin i after mapping 
Then, 
Xj - OBi-1 - Xfj - N Bi-1 
OBi - OBi-1 N Bi - N Bi-1 
or, 
(6.2) 
During the initial placement iterations, bins in the center of the placement region 
have an extremely high bin utilization value. Consequently, cells in such bins will have a 
tendency to shift over large distances. This will perturb the current placement solution 
by a large amount. This effect gets added over iterations and will result in a final 
placement solution with a high value of the total wirelength. 
Therefore, to control the actual distance moved by any cell during the shifting step 
we introduce two Movement Control Paramerters, °'x and o:y ( < 1) for the x and y 
dimensions respectively. O:x and o:y are increasing functions which are inversely propor-
tional to the maximum bin utilization and have a very small value during the initial 
placement iterations. Considering the x-dimension, the actual distance moved by cell j 
is °'x I xtj - Xj I (where Xj and xtj have the same definitions as before). This is just a 
fraction of the total distance to be moved by the cell. 
30 
This way, the cells are shifted over very small distances during the initial placement 
iterations. During the later placement iterations, the cells will be distributed quite 
evenly and hence will not have a tendency to shift over large distances. Therefore o: can 
take a larger value to accelerate convergence. The expressions for O:x and o:y are : 
o:y = 0.02 + (U) max i 
0.5 
_ 0 0 ( 0.5 ) (averageCellWidth) O:x - • 2 + . max(Ui) cellH eight 
Bini 
c=J 
CJ I I CJ D D D I I ~
I I 
c=§=J ~ c=J 
a Ori inal Bin Structure and Cell Distribution ( ) g \ 
Bini 
CJ 
D I I 
D 
D 
~l~I~ 
(b) New Bin Structure and Cell Distribution after Cell Shifting 
Figure 6.3 Cell distribution before and after shifting 
Figure 6.2(b) gives the utilization of the bins after the shifting step. Figure 6.3 shows 
the positions of cells in a particular row before and after Cell Shifting. It can be seen 
that cells j, k and l in bin i have spread out to reduce the overlap amongst themselves. 
31 
6.3 Addition of Spreading Forces 
After the cells have been shifted in the x and y dimensions, additional forces need to 
be added to them so that they do not collapse back to their previous positions during 
the subsequent Global Optimization step. This is achieved by connecting each cell to a 
corresponding pseudo pin added at the boundary of the placement region. The pseudo 
pin and pseudo net addition is illustrated in Figure 6.4. 
Pseudo pin 
Pseudo net to pull the cell 
to the target position 
___ Resultant 
Target Position~netForcey I ______ 
Force 
Q_\ ,,.,..-·,..-·'/ netForce, 
/~-----·"' er' \ ~ 
0 Original Position 
Figure 6.4 Pseudo pin and Pseudo net addition 
Let ( x j, Yi) and ( xf, yJ) be the original and target position of cell j before and 
after cell shifting. Since (xj, Yi) is the equilibrium position obtained by the Global 
Optimization step, the total force acting on cell j in this position is zero. When it is 
32 
moved to the target position it will experience a force due to its connectivity with the 
other cells (or equivalently, the star node) in the placement region. This force can also 
be viewed as the force required to move the cell from the original to the target position. 
If, 
• netF orcex : x-component of the force required to move the cell 
• netForcey : y-component of the force required to move the cell 
• pdx : x-component of the distance between the pseudo pin and target position of 
cell j 
• pdy : y-component of the distance between the pseudo pin and target position of 
cell j 
Then, the position of the pseudo pin can be determined by the intersection of the 
resultant force vector with the chip boundary. A pseudo net for cell j is one which 
connects cell j from its target position to its pseudo pin . The spring constant or the 
edge weight for the pseudo net is given as : 
JnetForcei + netForce~ 
/3 = ----;::===--~ 
Jpdi +p~ 
Since the pseudo pin is a fixed pin present at the boundary, we know from equation 
( 4.2) and the subsequent analysis given in Chapter 4, that only the diagonal of the Q 
matrix and the dx and dy vectors need to be updated for every cell. Hence, it takes only 
33 
a single pass of O(n) time, where n is the total number of movable cells in the circuit, 
to regenerate the connectivity matrix for the next Global Optimization step. Thus we 
have incorporated an extremely fast Cell Shifting technique to distribute the cells evenly 
over the placement region. 
34 
CHAPTER 7 ITERATIVE LOCAL REFINEMENT 
As previously stated, the quadratic objective function on its own does not yield the 
best possible result in terms of wirelength as it is just an indirect measure of the linear 
wirelength. Previous approaches using the quadratic objective function like GORDIAN 
employ min-cut partitioning with some local swapping between placement regions to 
reduce the wirelength. The swapping is done based upon the cut values generated 
during the partitioning stage. 
To offset the disadvantage of the quadratic objective, we incorporate an Iterative 
Local Refinement technique to further reduce the wirelength. The Iterative Local Re-
finement technique is interleaved with the Cell Shifting and Global Optimization steps 
during the WIGP stage of global placement. This technique acts on a coarse global 
placement obtained from the previous stage and hence is very effective in minimizing 
the wirelength. Unlike other approaches, this technique uses the actual position of a 
cell and the half-perimeter bounding rectangle measure of all nets connected to the cell 
for moving it around the placement region. The technique is based on a greedy heuris-
tic which mainly tries to minimize the wirelength while trying to reduce the current 
maximum bin utilization so as to speed-up the convergence of the algorithm. 
35 
7.1 Bin Structure 
The Local Refinement technique also employs a regular bin structure for performing 
wirelength improvement. The bin structure is used to estimate the current utilization 
of a placement region as before. Cells are then moved from source bins to target bins 
based upon the wirelength improvement and target bin utilization. This bin structure is 
slightly different from that used in the Cell Shifting step. During the first iteration of the 
WIGP stage, the width and height of each bin for the Refinement is set to 5 times that 
of the bin used during the Cell Shifting step. This is done so as to enable cell movement 
over large distances. Such large bins are constructed to minimize the wirelength for long 
nets which might span over a large part of the placement area. The width and height 
of the bins for the refinement technique are gradually brought down to the values used 
in the Cell Shifting step over subsequent iterations of the WIGP stage. 
7.2 Description of the Technique 
Once the utilization of all the bins in the placement region has been determined, 
we traverse through all the cells in the placement region and determine their respective 
source bins. For every cell present in a bin we compute four scores corresponding to the 
four possible cell movement directions. For calculating the score, we assume that a cell is 
moving from its current position in a source bin to the same position in a target bin which 
is adjacent to it. That is, we move the cell by one bin width. Each score is a weighted 
36 
sum of two components: The first component is the wirelength reduction for the move. 
The wirelength is computed as the total half-perimeter of the bounding rectangle of all 
nets connecting to the cell. Hence it is much more accurate than the quadratic objective 
function used in the Global Optimization step. The second component is a function of 
the utilization of the source and target bins. Since the Local Refinement technique is 
mainly used to reduce the wirelength, a higher weight is used for the first component. 
If all the four scores are negative, the cell will stay in the current bin. Otherwise, it will 
move to the target bin with the highest score for the move. During one iteration of the 
Local Refinement, we traverse through all the bins in the placement region and follow 
the above steps for cell movement. Subsequently, this iteration is repeated until there is 
no significant improvement in the wirelength. 
The Iterative Local Refinement technique is then followed by the Cell Shifting step 
in which we add pseudo pins and pseudo nets to reflect the current changed placement. 
37 
CHAPTER 8 DETAILED PLACEMENT 
Our algorithm performs fixed-die placement. In fixed-die placement, the placement 
region, row structure and legal positions within rows where the standard cells have to be 
placed are pre-defined. The height of each row in the placement region is equal to the 
standard cell height. The benchmarks on which our algorithm has been tested on have 
no space between the standard cell rows. This reflects present generation 5 or 6 metal 
layer processes, in which most of the inter-cell routing follows an over-the-cell routing 
procedure. The final placement solution obtained from the algorithm should have all 
the cells assigned to legal positions in the placement region. 
The Detailed Placement stage legalizes the solution obtained from the global place-
ment stage. It assigns all the standard cells to the pre-defined rows of the placement 
region. Once the cells are assigned to the rows, any remaining overlap among them 
is removed and they are assigned to legal positions within the rows. During legaliza-
tion, the detailed placement also tries to further reduce the wirelength. To reduce the 
wirelength, this stage employs a technique similar to the Iterative Local Refinement 
Technique described in Chapter 7. 
38 
8.1 Bin Structure and Wirelength Reduction 
The bin structure employed for wirelength reduction during the Detailed Placement 
stage is different from the ones used in the previous stages. In this stage, the bins are 
constructed using the pre-defined rows of the placement region. A point to be noted here 
is that the height of the bins in the previous stages are much greater than the standard 
cell height whereas it is equal to the standard cell height for the Detailed Placement 
stage. 
As mentioned before, this stage employs a technique similar to the Iterative Local 
Refinement technique for wirelength reduction. The difference between the Detailed 
Placement technique and the Local Refinement technique is that the Detailed Placement 
technique acts on cells which have been assigned to the actual rows present in the 
placement region. Secondly, it puts a higher weight on the bin utilization component 
than the wirelength reduction component. This is because the emphasis of this stage is 
on the removal of overlap among cells so as to obtain a legalized placement free of any 
overlap. 
39 
CHAPTER 9 OTHER ATTEMPTED APPROACHES 
As a part of the research we tried out various approaches for the different steps of the 
placement algorithm before fixing on the steps and the flow described in the previous 
chapters. Given below are a few of the approaches which can be further explored in 
future. 
9.1 Objective Function 
Instead of using a quadratic objective function, we tried using a linear objective 
function. A linear objective gives a more direct measure of the wirelength as compared 
to the quadratic objective function. The basic difference observed between the two 
objective functions is that the quadratic objective function tends to make the wirelength 
of very long nets shorter than what is done by the linear objective. This is done at the 
expense of the shorter nets, which become slightly longer. In other words, the standard 
deviation of the net lengths is smaller for a quadratic objective function than for a linear 
objective function [15]. 
The linear objective function was set-up similar to that of GORDIANL [19] and 
was optimized using the quadratic programming techniques described in [19]. It was 
40 
observed that the linear objective function, coupled with Cell Shifting and Iterative 
Local Refinement did not give us the desired wirelength results. The wirelength obtained 
by using the linear objective function was much greater than that obtained by using the 
quadratic objective function. Secondly, the time taken by the solver to converge to a 
solution during the Global Optimization step using the linear objective was much greater 
than that taken by using the quadratic objective. 
We attribute the increase in the wirelength to the various parameters used during the 
Cell Shifting and the Iterative Local Refinement steps. The Global Optimization, Cell 
Shifting and Iterative Local Refinement steps are very closely related to each other and 
a change in the parameters in one of the steps greatly affects the behavior of the others. 
We tried changing the parameters used in the Cell Shifting and Local Refinement steps, 
but were unable to obtain a good wirelength using the linear objective. Future work 
can be done to change the parameters so that they can be used with the linear objective 
function. The increase in the run-time of the Global Optimization step can be attributed 
to the quadratic programming techniques used in [19] to optimize the objective function. 
Future work can be done to determine efficient techniques to optimize the objective 
function so as to integrate the linear objective into our flow. 
9.2 Cell Shifting 
In our current Cell Shifting approach, the cells are shifted simultaneously in the x 
and y dimensions based on the placement obtained after the Global Optimization step. 
41 
Previously, we used an approach in which the cells were shifted in only the x dimension 
till the bin utilization came down to a target bin utilization for that dimension. Based 
on the x coordinates of the cells obtained after this step, they were then shifted in the y 
dimension until the bin utilization reached a target utilization for the y dimension. We 
observed that the placement obtained from such an approach was not well distributed 
over the placement region and had clusters in the corners. This can be attributed to the 
fact that shifting in the two dimensions are inter-related and we cannot shift cells in one 
dimension without any information of the positions of the cells in the other dimension. 
We also tried out a variety of formulae for bin based Cell Shifting. They were based 
on a three step approach. In the first step we constructed the regular bin structure 
to estimate the utilization as described before. In the second step, an unequal bin 
structure was constructed wherein the bin width was inversely proportional to the bin 
utilization calculated in the previous step. Bins with a very high utilization during step 
one became very narrow in the bin structure constructed during step two. We then 
went through all the cells and determined their bins in the unequal bin structure. In 
the third step the regular bin structure was again constructed and cells in a particular 
bin of the unequal bin structure were then linearly mapped to corresponding bins in 
the regular bin structure constructed during step 3. The disadvantage of this approach 
was that it resulted in cells shifting over long distances during Cell Shifting. Hence, the 
final wirelength obtained was quite high as compared to that obtained using the current 
approach. Also, this approach was more time consuming than the two step approach 
42 
being currently adopted for Cell Shifting because we need to go through all the cells 
twice and determine their corresponding bins for spreading them. Future work can be 
done to determine other formulae which can be used for bin based Cell Shifting. 
43 
CHAPTER 10 EXPERIMENTAL RESULTS 
10.1 Placement Benchmarks 
Table 10.l Placement benchmark statistics 
Circuit II #Nodes I #Terminals I #Nets I #Pins I #Rows 
ibmOl 12506 246 14111 50566 96 
ibm02 19342 259 19584 81199 109 
ibm03 22853 283 27401 93573 121 
ibm04 27220 287 31970 105859 136 
ibm05 28146 1201 28446 126308 139 
ibm06 32332 166 34826 128182 126 
ibm07 45639 287 48117 175639 166 
ibm08 51023 286 50513 204890 170 
ibm09 53110 285 60902 222088 183 
ibmlO 68685 744 75196 297567 234 
ibmll 70152 406 81454 280786 208 
ibm12 70439 637 77240 317760 242 
ibm13 83709 490 99666 357075 224 
ibml4 147088 517 152772 546816 305 
ibm15 161187 383 186608 715823 303 
ibml6 182980 504 190048 778823 347 
ibm17 184752 743 189581 860036 379 
ibm18 210341 272 201920 819697 361 
FastPlace is implemented in C. The benchmarks used in our experiments are derived 
from the ISPD-02 suite downloaded from [1]. These benchmarks consist of macro blocks 
and hence had to be modified to be tested on FastPlace. The height of all the macro 
44 
blocks was brought down to the standard cell height. The average width of all the 
modules in the original benchmark was computed and the width of all macros exceeding 
4 times the average width was assigned to a value of 4X average width. The placement 
region was correspondingly modified to reflect these changes. All designs in the derived 
set have a whitespace of 10%. The IBM-Place Benchmarks used in Dragon [23] could 
not be used because they do not have any connectivity information between the movable 
cells and the fixed terminals on the placement boundary. This information is essential 
for a quadratic placement approach. Statistics for the placement benchmarks are given 
in Table 10.1. 
10.2 Comparison Between Clique and Hybrid Net Models 
To compare different net models, we consider two implementations of FastPlace, one 
by incorporating the traditional clique model and the other by incorporating the hybrid 
net model described in Chapter 5. Table 10.2 gives a comparison of the non-zero entries 
in the matrix Q for the two implementations. Columns 2 and 3 of Table 10.2 give 
the number of non-zero entries of matrix Q in the clique model and the hybrid model 
respectively. Column 4 gives the ratio of the number of non-zero entries in the clique 
model to that in the hybrid model. It can be seen that, on average, the hybrid net model 
leads to 2.95 times fewer non-zero entries in the matrix Q than the clique model. 
Tables 10.3 and 10.4 give a break-up of the run-times for the different steps of the 
algorithm incorporating the clique and hybrid models respectively. In both the tables, 
45 
Table 10.2 Clique net model versus Hybrid net model 
Circuit #Non-zero Entries #Non-zero Entries Ratio of 
Clique Model Hybrid model #Non-zero Entries 
Clique / Hybrid 
ibmOl 109183 41164 2.65 
ibm02 343409 70014 4.90 
ibm03 206069 74680 2.76 
ibm04 220423 84556 2.61 
ibm05 349676 108282 3.23 
ibm06 321308 106835 3.01 
ibm07 373328 147009 2.54 
ibm08 732550 173541 4.22 
ibm09 478777 185102 2.59 
ibmlO 707969 251101 2.82 
ibmll 508442 230865 2.20 
ibm12 748371 270849 2.76 
ibm13 744500 295048 2.52 
ibm14 1125147 456474 2.46 
ibm15 1751474 607289 2.88 
ibm16 1923995 668491 2.88 
ibm17 2235716 753507 2.97 
ibm18 2221860 711702 3.12 
I Average II 2.95 
46 
Column 2 gives the time taken to determine the Incomplete Cholesky factorization of 
matrix Q. Column 3 gives the time taken by the Conjugate Gradient solver. Together, 
they account for the run-time of the Global Optimization step. Columns 4 and 5 give 
the time taken by the Cell Shifting and Iterative Local Refinement steps. Column 6 
gives the run-time of the Detailed Placement step. 
Table 10.3 Run-times for different steps of FastPlace (Clique net model) 
Incomplete Conjugate Iterative Detailed 
Circuit Cholesky Gradient Cell Shifting Local Placement 
Factorization Solver Refinement 
(sec) (sec) (sec) (sec) (sec) 
ibmOl 2.95 9.16 1.67 7.29 1.54 
ibm02 26.9 38.31 4.35 15.73 3.17 
ibm03 8.13 16.81 4.19 20.38 2.38 
ibm04 6.56 23.20 3.72 16.81 2.77 
ibm05 15.35 24.11 7.00 24.92 3.73 
ibm06 10.84 37.59 5.50 24.43 4.18 
ibm07 13.01 61.66 8.41 40.20 5.64 
ibm08 39.10 107.51 9.57 53.07 6.51 
ibm09 21.12 93.67 14.54 35.46 8.01 
ibmlO 33.81 125.32 13.17 73.68 10.60 
ibmll 18.78 103.25 15.35 48.20 8.51 
ibm12 37.84 147.14 14.38 60.10 10.64 
ibm13 49.00 174.03 22.85 72.72 9.78 
ibm14 77.45 295.81 36.49 112.49 22.06 
ibm15 149.77 494.48 44.41 200.72 33.17 
ibm16 177.36 556.70 62.91 216.71 53.17 
ibm17 203.49 627.97 51.97 275.95 53.75 
ibm18 195.91 621.39 52.44 356.78 76.65 
For the clique model, we found that the Incomplete Cholesky factorization and the 
Conjugate Gradient solver together consumed 66% of the total run-time over all 18 
benchmark circuits. Assuming a direct proportionality between the number of non-zero 
47 
Table 10.4 Run-times for different steps of FastPlace (Hybrid net model) 
Incomplete Conjugate Iterative Detailed 
Circuit Cholesky Gradient Cell Shifting Local Placement 
Factorization Solver Refinement 
(sec) (sec) (sec) (sec) (sec) 
ibmOl 0.44 4.95 1.42 6.40 1.66 
ibm02 1.08 10.81 3.09 17.93 4.07 
ibm03 1.04 13.23 3.61 16.24 2.36 
ibm04 1.12 15.73 4.20 19.76 3.71 
ibm05 2.17 12.36 6.24 29.04 6.20 
ibm06 1.22 17.22 4.05 25.20 3.55 
ibm07 2.38 47.59 7.94 33.54 4.89 
ibm08 2.97 46.50 8.93 43.40 7.99 
ibm09 3.35 60.41 12.71 37.91 9.15 
ibmlO 4.94 77.89 12.47 63.50 13.33 
ibmll 4.23 76.65 14.82 49.14 12.87 
ibm12 5.39 79.41 12.51 59.86 11.45 
ibm13 6.68 115.74 17.54 64.95 12.83 
ibm14 11.93 202.99 32.91 140.02 24.54 
ibm15 20.83 328.53 43.88 224.12 38.85 
ibm16 22.56 373.21 54.80 310.98 52.84 
ibm17 22.11 360.68 39.82 368.83 55.80 
ibm18 23.95 413.48 57.89 353.94 58.81 
48 
entries in matrix Q and the run-time of the factorization and solver, the average run-time 
for these two steps should decrease by a factor of 2.95. Consequently, there should be 
a 1. 7 X speed-up in the total run-time over the 18 benchmark circuits by incorporating 
the hybrid net model as compared to the clique model. Our experimental results show 
an average speed-up of 1.5X in the total run-time on incorporating the hybrid model as 
compared to incorporating the clique model. 
Table 10.5 Total run-time results for the two approaches 
Circuit Total Run-time Total Run-time Speed-up I 
Clique Model Hybrid Model Clique / Hybrid 
ibmOl 22.61 14.87 1.5 
ibm02 88.46 36.98 2.4 
ibm03 51.89 36.48 1.4 
ibm04 53.06 44.52 1.2 
ibm05 75.11 56.01 1.3 
ibm06 82.54 51.24 1.6 
ibm07 128.92 96.34 1.3 
ibm08 215.76 109.79 2.0 
ibm09 172.80 123.53 1.4 
ibmlO 256.58 172.13 1.6 
ibmll 194.09 157.71 1.2 
ibm12 270.10 168.62 1.6 
ibm13 328.38 217.74 1.5 
ibm14 544.30 412.39 1.3 I 
ibm15 922.55 656.21 1.4 
ibm16 1066.85 814.39 1.3 
ibm17 1213.13 847.24 1.4 
ibm18 1303.17 908.07 1.4 
I Average II 1.5 
The total run-time results for the two implementations are summarized in Table 
10.5. Columns 2 and 3 of Table 10.5 give the total run-time of the placement algorithm 
incorporating the clique and hybrid net models respectively. Column 4 gives the ratio of 
49 
the total run-time using the clique net model to that using the hybrid net model. Thus 
it can be seen that by using the hybrid net model a significant speed-up can be achieved 
over using the traditional clique model. 
10.3 Comparison Between Placement Algorithms: FastPlace, 
Capo 8.6 and Dragon 2.2.3 
FastPlace is compared with two state-of-the-art academic placers - Capo 8. 6 [3] and 
Dragon 2.2.3 [23]. All the experiments are run on a Sun Sparc-2, 750 MHz machine. 
We run MetaPl-Capo8.6 for Solaris, which incorporates Capo, orientation optimizer and 
row ironing in the default mode. Dragon is run in the fixed die mode. For FastPlace we 
run experiments incorporating the clique and hybrid net models. The wirelength and 
runtime comparisons between Capo and FastPlace for all 18 benchmarks are given in 
Tables 10.6 and 10.8. The wirelength and runtime comparisons between Dragon and 
FastPlace for all 18 benchmarks are given in Tables 10.7 and 10.9. Tables 10.10 and 10.11 
give the comparison results of FastPlace with Capo and Dragon respectively for circuits 
with over lOOk cells when FastPlace is run incorporating the hybrid model. Columns 
2 and 3 in all the tables give the Half-Perimeter Wirelength (HPWL) of the respective 
placers. Column 4 in Tables 10.6, 10.8 and 10.10 gives the ratio of the FastPlace to Capo 
wirelength. Column 4 in Tables 10.7, 10.9 and 10.11 gives the ratio of the FastPlace to 
Dragon wirelength. Columns 5 and 6 in all the tables give the run times of the respective 
placers. Column 7 in Tables 10.6, 10.8 and 10.10 gives the speed-up of FastPlace over 
50 
Capo. Column 7 in Tables 10. 7, 10.9 and 10.11 gives the speed-up of FastPlace over 
Dragon. 
For the Hybrid Net Model, from Table 10.6 it can be seen that on average, FastPlace 
is 11.0 times faster than Capo with comparable wirelength. The average wirelength of 
FastPlace over 18 circuits is just 0.83 higher than Capo. Also, for circuits with over 
lOOk cells, it can be seen from Table 10.10 that FastPlace is 8.0 times faster than Capo 
with the wirelength results being slightly better. From Table 10. 7 it can be seen that on 
average, FastPlace is 82. 7 times faster than Dragon with the average wirelength being 
just 1.63 higher. For circuits with over lOOk cells, it can be seen from Table 10.11 
that FastPlace is 77.2 times faster than Dragon with the wirelength results being 1.83 
lesser than Dragon. 
For the Clique Net Model, from Table 10.8 it can be seen that on average, FastPlace 
is 7.5 times faster than Capo with the wirelength being 0.93 higher. From Table 10.9 it 
can be seen that on average, FastPlace is 56 times faster than Dragon with the average 
wirelength being 1.43 higher. 
51 
Table 10.6 Comparison of placement results with Capo 8.6 (Hybrid net 
model) 
Capo FP Wirelength Capo FP Speed-Up 
Circuit HPWL HPWL Ratio Run-time Run-time 
( x le6) ( x le6) (FP/Capo) (Capo/FP) 
ibmOl 1.86 1.91 1.03 4m 2s 15s xl6.l 
ibm02 4.12 4.02 0.98 6m 59s 37s xll.3 
ibm03 5.17 5.45 1.05 8m 23s 36s x14.0 
ibm04 6.44 6.63 1.03 lOm 44s 44s x14.6 
ibm05 10.51 10.96 1.04 lOm 42s 56s xll.5 
ibm06 5.69 5.55 0.97 12m 27s 51s xl4.6 
ibm07 9.50 9.56 1.00 18m 5s lm 36s xll.3 
ibm08 10.23 10.01 0.98 19m 41s lm 50s xl0.7 
ibm09 10.49 11.26 1.07 22m 43s 2m 03s xll.1 
ibmlO 19.58 19.31 0.99 28m 25s 2m 45s xl0.3 
ibmll 15.57 16.03 1.03 30m 23s 2m 37s xll.6 
ibm12 25.72 25.04 0.97 30m 47s 2m 48s xll.O 
ibm13 19.02 19.46 1.02 38m 31s 3m 38s xl0.8 
ibm14 35.90 36.09 1.01 lh llm 6m 52s xl0.6 
ibm15 43.64 45.21 1.04 lh 26m lOm 56s x7.9 
ibm16 49.70 48.43 0.97 lh 35m 13m 34s x7.0 
ibm17 68.68 68.09 0.99 lh 46m 14m 07s x7.5 
ibm18 47.66 46.89 0.98 lh 44m 15m 08s x6.9 
I Average II 1.008 xll.O 
52 
Table 10.7 Comparison of placement results with Dragon 2.2.3 (Hybrid net 
model) 
Dragon FP Wirelength Dragon FP Speed-Up 
Circuit HPWL HPWL Ratio Run-time Run-time 
( xle6) ( xle6) (FP /Dragon) (Dragon/FP) 
ibmOl 1.84 1.91 1.04 29m 6s 15s x 116.4 
ibm02 3.98 4.02 1.01 3lm 13s 37s x50.6 
ibm03 5.31 5.45 1.03 31m 49s 36s x53.0 
ibm04 6.22 6.63 1.07 lh 5m 44s x88.5 
ibm05 10.35 10.96 1.06 lh 48m 56s xl15.3 
ibm06 5.45 5.55 1.02 lh 2lm 5ls x95.9 
ibm07 9.26 9.56 1.03 lh 47m lm 36s x67.0 
ibm08 9.66 10.01 1.04 4h 30m lm 50s x 147.4 
ibm09 11.03 11.26 1.02 3h 43m 2m 03s x108.9 
ibmlO 19.46 19.31 0.99 3h 19m 2m 45s x72.5 
ibmll 15.36 16.03 1.04 2h 22m 2m 37s x54.2 
ibml2 24.74 25.04 1.01 3h 48m 2m 48s x81.5 
ibm13 19.32 19.46 1.01 3h 4m 3m 38s x50.8 
ibml4 35.77 36.09 1.01 7h 37m 6m 52s x66.6 
ibml5 43.39 45.21 1.04 lOh 34m lOm 56s x58.0 
ibml6 49.54 48.43 0.98 12h 6m 13m 34s x53.5 
ibml7 73.45 68.09 0.93 26h 54m 14m 07s xl14.3 
ibml8 48.59 46.89 0.96 23h 39m 15m 08s x93.7 
I Average II 1.016 x82.7 
53 
Table 10.8 Comparison of placement results with Capo 8.6 (Clique net 
model) 
Capo FP Wirelength Capo FP Speed-Up 
Circuit HPWL HPWL Ratio Run-time Run-time 
( xle6) ( x le6) (FP/Capo) (Capo/FP) 
ibmOl 1.86 1.91 1.03 4m 2s 23s xl0.5 
ibm02 4.12 3.97 0.96 6m 59s lm 28s x4.8 
ibm03 5.17 5.33 1.03 8m 23s 52s x9.7 
ibm04 6.44 6.37 0.99 lOm 44s 53s xl2.2 
ibm05 10.51 10.90 1.04 lOm 42s lm 15s x8.6 
ibm06 5.69 5.54 0.97 12m 27s lm 22s x9.l 
ibm07 9.50 9.19 0.97 18m 5s 2m 09s x8.4 
ibm08 10.23 10.09 0.98 19m 41s 3m 36s x5.5 
ibm09 10.49 11.03 1.05 22m 43s 2m 53s x7.9 
ibmlO 19.58 19.72 1.01 28m 25s 4m 16s x6.7 
ibmll 15.57 15.99 1.03 30m 23s 3m 14s x9.4 
ibm12 25.72 25.41 0.99 30m 47s 4m 30s x6.8 
ibm13 19.02 19.68 1.03 38m 31s 5m 28s x7.0 
ibm14 35.90 36.66 1.02 lh llm 9m 4s x7.8 
ibm15 43.64 45.52 1.04 lh 26m 15m 22s x5.6 
ibm16 49.70 49.29 0.99 lh 35m 17m 47s x5.3 
ibm17 68.68 70.73 1.03 lh 46m 20m 13s x5.3 
ibm18 47.66 48.35 1.01 lh 44m 21m 43s x4.8 
I Average II 1.009 x7.5 
54 
Table 10.9 Comparison of placement results with Dragon 2.2.3 (Clique net 
model) 
Dragon FP Wirelength Dragon Fast Place Speed-Up 
Circuit HPWL HPWL Ratio Run-time Run-time 
( x le6) ( x le6) (FP /Dragon) (Dragon/FP) 
ibmOl 1.84 1.91 1.04 29m 6s 23s x75.9 
ibm02 3.98 3.97 0.99 3lm 13s lm 28s x21.3 
ibm03 5.31 5.33 1.00 3lm 49s 52s x36.7 
ibm04 6.22 6.37 1.02 lh 5m 53s x73.5 
ibm05 10.35 10.90 1.05 lh 48m lm 15s x86.0 
ibm06 5.45 5.54 1.02 lh 2lm lm 22s x59.7 
ibm07 9.26 9.19 0.99 lh 47m 2m 09s x49.9 
ibm08 9.66 10.09 1.04 4h 30m 3m 36s x75.l 
ibm09 11.03 11.03 1.00 3h 43m 2m 53s x77.5 
ibmlO 19.46 19.72 1.01 3h 19m 4m 16s x46.7 
ibmll 15.36 15.99 1.04 2h 22m 3m 14s x43.9 
ibml2 24.74 25.41 1.03 3h 48m 4m 30s x50.7 
ibml3 19.32 19.68 1.02 3h 4m 5m 28s x33.7 
ibml4 35.77 36.66 1.02 7h 37m 9m 4s x50.4 
ibml5 43.39 45.52 1.05 lOh 34m 15m 22s x41.2 
ibml6 49.54 49.29 0.99 12h 6m l 7m 47s x40.8 
ibml7 73.45 70.73 0.96 26h 54m 20m 13s x79.8 
ibml8 48.59 48.35 0.99 23h 39m 2lm 43s x65.3 
I Average II 1.014 x56 
Table 10.10 Comparison of placement results with Capo 8.6 for circuits with 
over lOOk cells (Hybrid net model) 
Capo FP Wirelength Capo FP Speed-Up 
Circuit HPWL HPWL Ratio Run-time Run-time 
( x le6) ( x le6) (FP/Capo) (Capo/FP) 
ibml4 35.90 36.09 1.01 lh llm 6m 52s xl0.6 
ibml5 43.64 45.21 1.04 lh 26m lOm 56s x7.9 
ibml6 49.70 48.43 0.97 lh 35m 13m 34s x7.0 
ibml7 68.68 68.09 0.99 lh 46m 14m 07s x7.5 
ibml8 47.66 46.89 0.98 lh 44m 15m 08s x6.9 
I Average II 0.998 x8.0 
55 
Table 10.11 Comparison of placement results with Dragon 2.2.3 for circuits 
with over lOOk cells (Hybrid net model) 
Dragon FP Wirelength Dragon FP Speed-Up 
Circuit HPWL HPWL Ratio Run-time Run-time 
( x le6) ( x le6) (FP /Dragon) (Dragon/PP) 
ibml4 35.77 36.09 1.01 7h 37m 6m 52s x66.6 
ibml5 43.39 45.21 1.04 lOh 34m lOm 56s x58.0 
ibm16 49.54 48.43 0.98 12h 6m 13m 34s x53.5 
ibm17 73.45 68.09 0.93 26h 54m 14m 07s xl14.3 
ibml8 48.59 46.89 0.96 23h 39m 15m 08s x93.7 
I Average II 0.984 x77.2 
10.4 Scalability Analysis and Placement Figures 
To determine the scalability factor of FastPlace, we plot the run-time versus the total 
number of pins, which is a good measure of the circuit size, in logarithmic scale for all 18 
benchmarks in Figure 10.1. We notice that the data points can be closely approximated 
by a straight line of slope 1.412. Hence, the run-time of FastPlace is roughly O(nl.412 ), 
where n is the circuit size given by the number of pins. 
Included in Appendix A, are several figures that display the placement at different 
stages of the algorithm for the circuit ibmOl. Figure A.l gives the placement obtained 
after the first Global Optimization step. Figure A.2 gives the placement after 5 iterations 
of the Coarse Global Placement stage. It can be seen that the cells have shifted by a small 
amount in the iterations of the Coarse Global Placement. Figure A.3 gives the placement 
after the Coarse Global Placement stage. This placement is already quite spread out 
so that it can provide useful wirelength and cell distribution information to be used 
Run-time (s) 
1000 1 
I 
500 ~ 
I 
200 ~ 
100 l 
I 
5°1 
20 ~ 
I 
50 100 
56 
.. 
I 
j 
I 
#Pins (K) 
200 500 1000 
Figure 10.l Run-time of FastPlace versus number of pins in logarithmic 
scale 
57 
by the Iterative Local Refinement technique applied during the Wirelength Improved 
Global Placement stage. Figure A.4 gives the placement after the Wirelength Improved 
Global Placement stage. It can be seen that the cells are very evenly distributed over the 
placement region at the end of this stage. Finally, Figure A.5 gives the final placement 
after the Detailed Placement Stage. 
58 
CHAPTER 11 CONCLUSIONS 
The objectives to be handled by placement algorithms are getting complicated by 
the day. This calls for the development of faster placement algorithms to handle such 
immense complexity. Future placement algorithms must be capable of placing circuits 
with multiple million components in a reasonable time. Besides, they must be flexible 
enough to handle any modifications in the design style of VLSI circuits. 
In light of these facts, in this thesis, we have presented FastPlace, an efficient and 
scalable placement algorithm for large scale standard cell circuits. FastPlace is based on 
an Analytical Placement approach and uses a quadratic objective function to handle the 
wirelength minimization problem. Three techniques are proposed to handle the draw-
backs associated with the quadratic objective function. First, an efficient Cell Shifting 
technique is proposed to remove cell overlap and spread the cells over the placement area 
to obtain a high-quality global placement with even cell distribution and proper cell or-
dering in a very short time. Second, an Iterative Local Refinement technique is proposed 
to reduce the wirelength according to the half-perimeter bounding-rectangle measure. 
Third, a Hybrid Net Model is proposed which is a combination of the traditional clique 
and star models. This model greatly reduces the number of non-zero entries in the 
59 
connectivity matrix Q and results in a significant speed-up of the quadratic program 
solver. 
The current implementation produces comparable placement to existing state-of-the-
art academic placers, but in a significantly lesser run-time. Such an ultra-fast placement 
tool is very much needed for the timing convergence of the layout phase of IC design. The 
run-time of FastPlace can be further reduced by incorporating it into the FPI framework 
of [11 J or in a general hierarchical framework, and by applying the algebraic multigrid 
method of [5] to solve the system of linear equations (4.5). Future extensions to the 
current FastPlace algorithm could include considering other placement objectives like 
timing, routing congestion, variable whitespace allocation and interconnect issues. 
60 
APPENDIX PLACEMENT FIGURES 
= = = 
= = = = 
= = = 
= 
= 
= = Do = 
= = 
= = 
= 
Figure A.l Initial placement (Circuit: ibmOl) 
61 
= = = 
= = = 
= 
= 
= 
= 
Figure A.2 After 5-iterations during stage 1 of global placement (Circuit 
ibmOl) 
62 
Figure A.3 After stage 1 of global placement (Circuit: ibmOl) 
63 
Figure A.4 After Stage 2 of global placement (Circuit: ibmOl) 
64 
~~ 'Til~rTTT1 ,........,..,,,~rrnr-TT,,.,.....,,,......,.,..... ,...,....,,.,,....,n.,.,.,....,..., 
-,~ 
Figure A.5 Final placement solution (Circuit: ibmOl) 
65 
BIBLIOGRAPHY 
[1] S. Adya and I. L. Markov, URL: http://vlsicad.eecs.umich.edu/BK/ISPD02bench/ 
ISPD02 Placement Benchmarks, GSRC Wirelength-driven Standard Cell Placement 
Slot, 2002. 
[2] R. Barrett, M. Berry, et al., Templates for the Solution of Linear Systems: Building 
Blocks for Iterative Methods. SIAM, Second Edition, 1994. 
[3] A. E. Caldwell, A. B. Kahng and I. L. Markov, "Can Recursive Bisection Alone 
Produce Routable Placements?" In IEEE/ACM Design Automation Conference, 
2000, Pages 477-482. 
[4] T. Chan, J. Cong, T. Kong and J. R. Shinnerl, "Multilevel Optimization for Large-
Scale Circuit Placement." In Proc. International Conference on Computer-Aided 
Design, 2000, Pages 171-176. 
[5] H. Chen, C. -K. Cheng, N. -C. Chou, A. Kahng, J. MacDonald, P. Suaris, B. Yao and 
Z. Zhu, "An algebraic multigrid solver for analytical placement with layout based 
clustering." In Proc. ACM/IEEE Design Automation Conference, 2003, Pages 794-
799. 
66 
[6] H. Eisenmann and F. M. Johannes, "Generic Global Placement and Floorplanning." 
In Proc. 35th ACM/IEEE Design Automation Conference, 1998, Pages 269-274. 
[7] H. Etawil, S. Arebi and A. Vannelli, "Attractor-Repeller Approach for Global Place-
ment." In Proc. International Conference on Computer-Aided Design, 1999, Pages 
20-24. 
[8] C.M. Fiduccia and R. M. Mattheyes, "A linear-time heuristic for improving network 
partitions." In Proc. ACM/IEEE Design Automation Conference, 1982, Pages 175-
181. 
[9] K. M. Hall, "An r-dimensional quadratic placement algorithm." Management Sci-
ence, Vol 17, 1970, Pages 219-229. 
[10] B. Hu and M. Marek-Sadowska, "FAR: Fixed-Points Addition & Relaxation Based 
Placement." In Proc. International Symposium on Physical Design, 2002, Pages 
161-166. 
[11] B. Hu and M. Marek-Sadowska, "Fine granularity clustering for large scale place-
ment problems." In Proc. International Symposium of Physical Design, 2003, Pages 
67-74. 
[12] D. S. Kershaw, "The Incomplete Cholesky Conjugate Gradient Method for the 
iterative solution of systems of linear equations." Journal of Computational Physics, 
Vol 26, 1978, Pages 43-65. 
67 
[13] J. M. Kleinhans, G. Sigl, F. M. Johannes, K. J. Antreich, "GORDIAN: VLSI Place-
ment by Quadratic Programming and Slicing Optimization." IEEE Transactions on 
Computer-Aided Design, Vol 10, No 3, 1991, Pages 356-365. 
[14] F. Mo, A. Tabbara and R. K. Brayton, "A Force-Directed Macro-Cell Placer." In 
Proc. International Conference on Computer-Aided Design, 2000, Pages 177-180. 
[15] B. T. Preas and M. J. Lorenzetti, Physical Design Automation of VLSI Systems. 
Benjamin/Cummings Publishing Company, 1998. 
[16] C. Sechen and A. L. Sangiovanni-Vincentelli, "Timber Wolf 3.2: A new standard cell 
placement and global routing package." In Proc. ACM/IEEE Design Automation 
Conference, 1986, Pages 432-439. 
[17] K. Shahookar and P. Mazumder, "VLSI Cell Placement Techniques." ACM Com-
puting Surveys, Vol 23, No 2, June 1991. 
[18] N. Sherwani, Algorithms for VLSI Physical Design Automation. Kluwer Academic 
Publishers, Third Edition, 1999. 
[19] G. Sigl, K. Doll and F. M. Johannes, "Analytical Placement: A Linear or a 
Quadratic Objective Function." In Proc. 28th ACM/IEEE Design Automation Con-
ference, 1991, Pages 427-431. 
[20] M. J. S. Smith, Application Specific Integrated Circuits. Addison-Wesley, June 1997. 
68 
[21] P. Villarrubia, "Important placement considerations for modern vlsi chips." In Proc. 
International Symposium on Physical Design, 2003, Page 6. 
[22] J. Vygen, "Algorithms for Large-Scale Flat Placement." In Proc. 34th Design Au-
tomation Conference, 1997, Pages 7 46-751. 
[23] M. Wang, X. Yang and M. Sarrafzadeh, "Dragon 2000: Standard-Cell Placement 
Tool for Large Industry Circuits." In Proc. International Conference on Computer-
Aided Design, 2000, Pages 260-263. 
[24] M. C. Yildiz and P. H. Madden, "Global objectives for standard cell placement." 
In Proc. 11th Great Lakes Symposium on VLSI, 2001, Pages 68-72. 
69 
ACKNOWLEDGEMENTS 
I would like to express my deepest gratitude to my advisor, Dr. Chris C.-N. Chu, 
for giving me the opportunity to work under his guidance at Iowa State University. His 
motivating and enlightening discussions, timely suggestions and constructive criticisms 
have been a major driving force during the course of my research work. 
I am also greatly indebted to my family members, who have supported me throughout 
the years and have been a constant source of encouragement and support when I needed 
them the most. Thanks are also due to my friends. I really appreciate their patience for 
listening to the numerous descriptions of my research and my constant ramblings and 
their support during the rough times. Their company has made my stay at Iowa State 
University an unforgettable experience. 
Finally, my thanks goes to all those who have contributed directly or indirectly in 
the successful completion of this work. 
