Simultaneous placement and timing optimization using buffer insertion, cell replication and gate sizing by Nowak, Brian Thomas
Retrospective Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 
1-1-2001 
Simultaneous placement and timing optimization using buffer 
insertion, cell replication and gate sizing 
Brian Thomas Nowak 
Iowa State University 
Follow this and additional works at: https://lib.dr.iastate.edu/rtd 
Recommended Citation 
Nowak, Brian Thomas, "Simultaneous placement and timing optimization using buffer insertion, cell 
replication and gate sizing" (2001). Retrospective Theses and Dissertations. 21462. 
https://lib.dr.iastate.edu/rtd/21462 
This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and 
Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses 
and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, 
please contact digirep@iastate.edu. 
Simultaneous placement and timing optimization using buffer 
insertion, cell replication and gate sizing 
by 
Brian Thomas Nowak 
A thesis submitted to the graduate faculty 
in partial fulfillment of the requirements for the degree of 
MASTER OF SCIENCE 
Major: Computer Engineering 
Major Professor: Chris Chu 
Iowa State University 
Ames, Iowa 
2001 
Copyright© Brian Thomas Nowak, 2001. All rights reserved. 
ll 
Graduate College 
Iowa State University 
This is to certify that the Master's thesis of 
Brian Thomas Nowak 
has met the thesis requirements of Iowa State University 
Signatures have been redacted for privacy 
iii 
TABLE OF CONTENTS 
LIST OF FIGURES ............................................. i V 
LIST OF TABLES ............................................... V 
ABSTRACT .................................................... vi 
CHAPTER 1. INTRODUCTION ...................................... 1 
CHAPTER 2. LITERATURE REVIEW ................................. 7 
CHAPTER 3. TIMING ........................................... 11 
3 .1 Criticality ........................................... 15 
CHAPTER 4. FORCE DIRECTED PLACEMENT ......................... 20 
4 .1 Design Entry .......................................... 23 
4 .1.1 Library Cells ..................................... 24 
4 .1. 2 Design Netlist .................................... 25 
4. 2 Initialization ........................................ 26 
4. 3 Fill Force ............................................ 27 
4. 4 Placement ............................................. 29 
4. 5 Row Assignment ........................................ 31 
CHAPTER 5. OPTIMIZATIONS .................................... 32 
5 .1 Gate Sizing ........................................... 34 
5.2 Type A Buffer Insertion ............................... 35 
5. 3 Type B Buffer Insertion ............................... 36 
5. 4 Cell Replication ...................................... 39 
CHAPTER 6. EXPERMENTAL RESULTS .................... .......... 42 
CHAPTER 7. CONCLUSIONS ...................................... 54 
GLOSSARY OF TERMS ........................................... 57 
REFERENCES CITED ............................................ 59 
lV 
LIST OF FIGURES 
Figure 1: VLSI Design Cycle ................................. 1 
Figure 2 : Basic Die Terminology ............................. 3 
Figure 3 : n Model of Interconnect .......................... 12 
Figure 4: Interconnect Model with Cstar ..................... 12 
Figure 5 : Interconnect Model with Pie Representation ....... 13 
Figure 6 : Clic and Star Based Net Models ................... 21 
Figure 7: Cellnet and Starnet Terminology .................. 26 
Figure 8 : Type A Buffer Insertion .......................... 35 
Figure 9: Type B Buffer Insertion .......................... 37 
Figure 10: Cell Replication ................................ 40 
Figure 11: Cell Replication Method of Balance .............. 41 
Figure 12: Final Placement Due to Oscillation .............. 44 
Figure 13: Cause of Osculation ............................. 44 
Figure 14: Constant a and ~ ................................ 45 
Figure 15: Cooling Schedule for a and ~ .................... 45 
Figure 16: Path Delay Graph for MacUnit32 .................. 49 
Figure 17: Path Delay Graph for Matrix ..................... 50 
Figure 18: Path Delay Graph for Addrgen .................... 50 
Figure 19: Path Delay Graph for RGB_interp ................. 51 
Figure 20: Path Delay Graph for TLC ........................ 51 
Figure 21: Initial Random Placement of MacUnit32 ........... 52 
Figure 22: Final Placement Before Row Alignment ............ 53 
V 
LIST OF TABLES 
Table 1: Roadmap Values 42 
Table 2: Design Parameters 43 
Table 3 : FD Placement Only 47 
Table 4: Random Placement only DOA, AOI=l 47 
Table 5 : Random Placement only DOA, A0I=3 47 
Table 6: FD Placement using DOA, AOI=l 48 
Table 7: FD Placement using DOA, A0I=3 48 
Table 8 : Timing Results for All Methods 49 
Vl 
ABSTRACT 
Since the introduction of deep sub-micron technologies, 
meeting timing specifications have become more difficult. 
Interconnect delay is now a dominant factor in the total 
circuit delay as die sizes become larger and device delays 
decrease. With the larger die sizes interconnects are 
becoming longer and are a direct factor in determining the 
overall system performance. In the current practice placement 
and delay optimization are separated. This separation between 
placement, and timing optimizations is no longer desirable. 
Due to the over constrained stages of separate steps it is 
hard to optimize the final solution. To globally optimize the 
circuit these existing physical design stages must be 
combined. 
An algorithm that runs timing analysis, and optimization 
during the placement phase is demonstrated in this thesis. 
Unlike existing methods our timing driven standard-cell 
placement algorithm is capable of performing buffer insertion, 
gate sizing, and cell replication entirely during the 
placement phase. This allows the built-in optimization 
algorithm to modify net information along with physical cell 
locations. 
vii 
The primary contribution of this thesis is the 
integration and algorithm used for performing netlist 
modifications in the placement phase. Other contributions 
consist of bin localization, discussed in the force directed 
placement approach to reduce run time, and the methods for 
optimization selection. From experimental results it has been 
shown that with integrated optimization placement can achieve 
a better delay value when compared with timing driven 
placement alone. 
1 
CHAPTER 1. INTRODUCTION 
When designing an Integrated Circuit (IC), a predefined 
design cycle is followed much like the one seen in Figure 1 
[ 1] . The most common Very Large Scale Integration (VLSI) 
design cycle starts with an abstract view of the design were 
system specifications are set on the global level. At this 













VLSI Design Cycle 
2 
defined. This leads into the architectural design phase 
where and textual block level description is formed. In the 
behavioral or functional design phase, functional uni ts of 
the system are identified and their characteristics defined 
in terms of input and output timing. With these timing 
definitions a high-level logic description is formed in the 
logic phase using a Hardware Description Language (HDL). 
This HDL is then compiled into a gate level representation by 
a Synthesis tool in the Circuit design phase. These gate 
level representations are then optimized and the logic-level 
dependencies removed. From here the design can be parsed 
into cell and net representations based on a given set of 
logic types. Cell and net information is then used in the 
physical design phase to assign each cell a placement on the 
IC. 
In the physical design phase three major placement 
styles are used. These placement styles reduce the 
complexity of physical design by their level of complexity. 





standard cell design method, 
synthesis tools, will be the 
created 
focus 
throughout this thesis. This design style consists of a 
collection of rectangular cells with the same height, 
3 
connected in some fashion to create the desired logic 
representation. The functionality and electrical 
characteristics of each predefined cell is stored in a 
central library and the cell connections in a netlist. The 
netlist details the connections between cells there by 
creating the design functionality. Connections made to the 
outside of the die are done from I/0 terminals, also called 
pads, located on the boundary of the die (Figure 2). 
DIE BOUNDRY 
I I I t'PAD 
i---------1 




1-. ------· I 1!1111 
LI -I -'~~~~DRY 
Figure 2: Basic Die Terminology 
A placement tool is then used to place the cells into rows. 
This method of hierarchy greatly simplifies the layout 
representation, as it is not necessary to duplicate the 
library information for each cell. While standard cell 
designs are quicker to develop they usually take more area 
4 
and larger delay times than full-custom or hand crafted 
designs [ 1 J • 
Current placement tools use a multi-step physical design 
methodology as seen in Figure 1. In this methodology 
interconnect information is rarely used prior to the physical 
design stage because accurate values are not yet known. With 
interconnect information not known timing, critical paths are 
assigned using cell connections and gate sizes. These 
critical paths are then maintained throughout the placement 
totally independent of cell location and interconnect 
information. Placement is then calculated from the net and 
cell representations to reduce total area and total 
interconnect distance. After a physical representation of 
the circuit is complete, physical cell locations are known, 
timing is generated and interconnect optimization methods are 
performed if there is a violation in the total delay time. 
At this point, gate sizing and buffer insertion are performed 
on the existing placement given the physical constraints of 
the placed cells. 
in the placement 
When inserting a new cell only free space 
is used drastically constraining the 
optimization. In this methodology each phase is independent 
and if it fails new constraints must be assigned and the 
design moved back to an earlier phase. 
5 
As VLSI circuits continue into the deep sub-micron 
design range, timing is becoming more important. As 
interconnect delay begins to outweigh that of the device 
delay, the physical location of all cells must be known in 
order to optimize device characteristics. To accurately 
reduce this delay, early timing analysis must be performed 
using physical placement information to guide the final cell 
placement. Exiting automated cell placement tools have 
concentrated on valid placement [2-6], minimum area designs 
[7], and total path delay [8,9,10]. One indirect method of 
delay reduction in placement tools has been to reduce routing 
complexity and delay times by minimizing the total wire 
length. This method does reduce routing complexity but is 
not efficient for reducing delay. As delay analysis can 
show, minimizing interconnect length only yields a minimum 
delay in single-line, single-load cases. For multi terminal 
nets, the delay time is dependent on both total interconnect 
length and net geometry [9]. 
Unlike the above placement methodology, our approach 
uses interconnect information to drive optimization during 
the placement phase. With our approach, weights are given to 
each net depending on past and present delay properties so 
global delay time can be reduced by both physical and logical 
6 
optimization. The critical delay path is also monitored 
during the entire placement phase rather than only once from 
which interconnect information is not known. This allows 
critical paths to change during placement giving the cell 
placement tool a rough idea of cell priority, and the ability 
to perform delay optimizations such as buffer insertion, and 
cell replication to fix possible timing bottlenecks. 
7 
CHAPTER 2. LITERATURE REVIEW 
In the past, gate delay dominated circuit performance. 
With gate sizes moving into the sub-micron range, 
interconnect delay is increasing in importance. Most 
placement tools are designed to reduce the total area of the 
die. Though area is still important, it is becoming harder 
to meet timing due to unknown interconnect values. Under the 
current VLSI design cycle there is a non-optimal separation 
between delay optimization and placement causing delay 
constraints to exist. To remove these constraints an 
algorithm is created in this thesis that runs timing 
analysis, and optimization during the placement phase. 
The placement algorithm in this thesis is loosely based 
off the force directed approaches in [ 2, 3, 7 J . To alleviate 
the clustering problem inherent in the force directed 
approach an attractive fill force is used to evenly 
distribute the cells throughout the placement area [2,3,7]. 
The force directed approach allows overlap, as to easily 
adapt to changes in the netlist. In [2], timing information 
is used to increase priority for critical paths and create a 
hierarchy of importance in the placement phase. In 
comparison to [BJ where placement is done by starting with a 
8 
seed and slowly inserting each connected cell back onto the 
die. To combat delay time in [ 8 J, connection forces are 
added, one based on net connections, and the other based on 
timing. The timing information in [8] is given before the 
placement starts and not recalculated through the placement 
phase. When timing is based on non-interconnect information, 
an optimal solution is hard to achieve as new critical paths 
will appear as initial paths are reduced. With the greedy 
insertion of [8], a cell placed where another already exists 
will cause that cell to be moved to the nearest empty 
location. This method penalizes late placed cells by 
reducing the total free area near the center of the die. 
Optimization with buffer insertion [12,13] and gate 
sizing are covered in [ 11 J • Some optimization algori thrns 
[14,15] even take final placements and insert buffers in free 
space to reduce delay times. The concept of feasible regions 
is introduced in [14] as the possible buffer locations needed 
to satisfy delay constraints. Once these regions are 
determined, the chip is searched for unused area. This 
method of optimization is limited by the insertion 
restrictions. If free space is not found in the desired 
feasible region the placement must be modified. This method 
produces good delay reduction but still separates the 
9 
optimizations from the placement. In [ 11 J gate sizing and 
netlist transformations were used to produce the best area-
delay trade off. The methods of gate sizing, and buffer 
insertion are used to reduce delay, while the area-delay 
trade off of each is used for method selection. The area-
delay trade off discussed, in (11] has prime implications in 
placement but was not specifically discussed. 
In (16], an iterative refinement algorithm was used to 
decrease delay time by performing logic based netlist 
modifications. This method started from a valid placement 
and refined the netlist by using physical information to 
refine the synthesis step. In the synthesis step, structural 
netlist transformations are made to the netlist, which lead 
to the removal of existing gates or the insertion of new 
gates. Once the netlist change was made, the refinement 
phase was run to update any changes in the placement. Our 
timing driven standard-cell placement algorithm is capable of 
performing buffer insertion, gate sizing, and cell 
replication entirely during the placement phase. This is 
done not by changing the logic structure as in [ 16 J but 
adding cells to increase driving strength or isolate large 
nets. This combines both placement and optimization phases 
10 
so the built-in optimization algorithm can modify net 
information and directly affect physical cell locations. 
11 
CHAPTER 3. TIMING 
After each force directed placement iteration, timing 
optimizations are performed. Even though the actual 
placement is not optimal at this time, a rough estimate of 
interconnect distances is known. The timing phase is used to 
generate the critical path for each iteration. Unlike [8], 
the critical path in our algorithm is determined at each 
iteration and not determined from the cell information in an 
unplaced circuit. This allows delay optimizations to mold 
the final placement. At this point cells can be added or 
removed without consequence and critical nets are given 
higher priority as to affect their placement in later 
iterations. For optimizations to be performed timing must be 
known for each cell and pin. 
The delay on each net is calculated using the Elmore 
Delay (ED) model metric [ 17], which is the first moment of 
the impulse response, although other models could be easily 
adapted. During the timing phase, interconnect lengths are 
fixed and the distances calculated using the Manhattan 
distance between source to star and star to sink. To model 
each interconnect segment electrically, a n model is used 
(Figure 3). The interconnect distance between the driver cell 
12 
Ru 
~ 1 Cu/2 l Cu/2 
Figure 3: n Model of Interconnect 
and the star generates the first segment and the 
interconnects from the star to each load cell composes the 
remaining segments. To calculate the Elmore delay for each 
sink cell as seen in Figure 4a the capacitive sum called Cstar 
is used for simplification (Figure 4b) . The CLstar is the 
lumped area capacitance (Carea), fringing capacitance (Cfringe), 
and load capacitance (C1oad) from the star to all load as seen 
in equation ( 1) . Where The values in equation ( 2) 




Reduced model with Cstar 
Figure 4: Interconnect Model with Cstar 
13 
# cell 
connection ( ) 
C = C . +C . +C . Lstar L u(l) area u(i) fringe u(i) load (1) 
i=l 





It should be noted that the source-to-star interconnect 
capacitance is not present in CLstar. For each sink n the 
electrical representation as seen in Figure 5 is used and 
Ro Ru1 Star Run 
----Y\/'v I VV'v I 1 VV'v I 1 1c;1 C;1r rcls,a, c;"l lCLn 
Figure 5: :Interconnect Model with Pie Representation 
equation (4) used to calculate its delay. From equation (4) 
similarities between all multi-terminal delay calculations 
were noted. It became evident that all values on the first 
segment and the upstream capacitance at the star node remain 
14 
the same for each delay calculation on the same net. This 
allows a two step solution to be used in order to efficiently 
solve for each multi-terminal net without calculating 
redundant information (equation (5)). The delay from the 
source cell to star is calculated first in equation (6). 
Then the delay in equation (7) is added to find the total 
delay at each sink. Using this two step process requires 
only a single calculation of equation (6) otherwise it would 
be repeated for each sink on the net. 
Elmore Dly = Dly 1 + Dly 2 
Dlyl = R)Cu1 +CLstar)+ R,;1 (cui/2+CLsta,) 




The maximum delay at each cell is then stored using the 
timing calculation as described above. To find this maximum 
input delay for each cell, the maximum delay time from the 
last cell is added to the interconnect delay producing a 
total delay. This total delay propagates from the input of 
the die or the output of a flip-flop to the output of the die 
or the input of a flip-flop. This method of timing is 
desired as cycle times between the clock gives us the target 
15 
delay times. Any total delay calculations taking more than 
one clock cycle will cause timing violations. 
3.1 Criticality 
Optimization is based on give and take. To perform 
reliable optimization you must know what timing must be 
reduced and what timing has enough slack to accommodate the 
transfer. The exact particulars of its use will be discussed 
when Type B buffer insertion is described in Chapter 5. 
Criticality is defined as the level of timing importance 
given to each cell or star. The level depends on its 
importance in a timing path. A timing path is a route from a 
chip input to a flip-flop, a flip-flop to a flip-flop or a 
flip-flop to a chip output. If this path has the maximum 
delay it is considered the most critical. To reduce the 
global delay time the most critical path during each 
iteration is reduced by methods of placement and 
optimization. The criticality of all other paths are then 
used to guide optimization by the level of their importance. 
The method used to create net priority revolves around 
the most critical path for each iteration. To determine the 
most critical path a backwards traversal is performed 
starting from the outputs of the chip and the inputs of all 
16 
flip-flops. The initial value of cell criticality (xc) for 
each of the primary outputs is set to the difference between 
the maximum circuit delay and its pa th delay. During the 
backwards traversal, cells are added once all fan-outs 
contain an xc value. 
Factored into the xc is gate sensitivity. From [11] 
the sensitivity is given as the partial derivative in 
equation (8), where dis delay and xis the size of the gate. 
Given that [11] is considering the continuous case in this 
~=(o,!~·&J (8) 
= (M) 
paper, and we are considering the discrete case there is a 
reduction as seen in equation (8). When moved into a 
discrete cell implementation, this derivative can be 
represented as the slope of the d, x curve. Given that the 
slope of this curve will be the 6.d/ 6.x, 6.x will be canceled 
leaving only the 6.d or the change in delay. 
Before the criticality is found the source cell is 
increased by one size. If the delay change is negative a 
delay savings is produced using gate sizing. If the delay 
change is positive, the sensitivity is set to zero, as an 
17 
increase in delay time is not desired. With the source 
sensitivity calculated, the xci can be found for each fan-
outj, see equation (9). The slackj is defined as the increase 
(9) 
in delay before it becomes the largest delay for cell J. To 
calculate the criticality a reverse breath first search is 
performed. First all output pins and flip-flops are added to 
an array and the cells under it given a criticality. If the 
criticality for the lower cell has been calculated the same 
number of times as its output pins its criticality is final. 
This means the minimum value of criticality for that cell has 
been retained. This cell is then used to calculate the 
criticality for cells below it. As the critical path is 
constructed each star on this path is assigned a new level of 
criticality. This star criticality (xs) is used in the 
placement as a method to assign timing importance. The level 
of importance determines the strength at which cells are be 
drawn together. Star criticality (xs) is determined based on 
its present and last critical values, where mis the current 
iteration and 1 is the star number. If the star is a non-
critical path, then equation 10 is used. If the star is on a 
18 
critical path, equation 11 is used. The use of previous 
criticality makes sure that any net previously critical will 
not bounce between critical and non-critical during several 
iterations. 
xs/ml = (xsim-I) )/2 
xs/ml = (xsim-I) + 1)/2 
(10) 
(11) 
The values of xc are used to determine the most critical 
path and also used in delay optimizations. The criticality 
for each cell allows the optimization phase to separate 
important paths from less important paths. The closer xc is 
to zero the greater the importance. If the xc of two fan-
outs are close numerically, care must be taken not to promote 
the smaller value higher than the present largest value. 
This use of xc to stop optimizations of one path that will 
increase another highly critical path is very useful and 
highly desired. 
In order to meet timing, all critical paths must be 
reduced below the timing specifications. When optimizing a 
synchronous design, the path delay directly determines the 
speed of the clock cycle. With xs and xc, both physical and 
19 
logical optimization steps are produced in order to achieve 
the desired timing. 
20 
CHAPTER 4. FORCE DIRECTED PLACEMENT 
The force directed approach is a cell based placement 
approach driven by Hook 1 s law [ 1 J • In this approach each 
net, the connection between cells, is given an attractive 
spring force of k for each net and a distance between 
connected cells of (~x,~y). The objective is then to find a 
force balance position for each cell based on the connections 
made to it. The force equations expressed in equations (12) 
and (13) are used in (14) to compute the potential energy and 





Fycell = ~)-k·8yJ 
i=O 
#cells( ) 





In this thesis the force directed approach runs off an 
iterative greedy placement approach based on cells and stars 
[2]. To model multiple cell connections, an imaginary point 
called a star is used. The star is where all nets, the 
21 
connections between cells, converge into a cluster (Figure 
6b) . This method is used over the more prevalent click model 
where each cell is connected to all other cells on that net 
(Figure 6a). In a comparison between the two models [2] found 
CELL 
---·----- .. "----... ___ ,' .......... 
\ --r--........ ', .. \ : ... ___ _ \ : ... __ _ 
\ I 
' I 





Figure 6: Clic and Star Based Net Models 
that using the statistics from MCN92 benchmark examples the 
click model produced a 30% greater number of connections on 
average when compared to the star based model. This savings 
in computation time was the base for our selection. 
Using the star method cell force is calculated while the 
stars are fixed. The net or spring force used in this thesis 
is similar to [7] which is based on a quadratic distance 
between cells and a weight determined by timing dependencies. 
This severely penalizes long interconnects and prioritizes 
timing critical nets by pulling the cells closer together. 
22 
The force directed method is not without its downfalls. 
It can be intuitively seen that the most optimal location for 
all cells is where all cells overlap. At this point all 
interconnect distance equals zero and the total energy is the 
smallest. This clustering of cells must be prevented in 
order to create a valid placement. To prevent cell overlap 
in this thesis, a second force, ref erred to as the fill 
force, has been added to each cell [2,7]. This force pulls 
each cell toward less populated areas of the die, where cell 
density is less than average. Even though overlap is 
allowed, the fill force is used to prevent it. A fill force 
is generated for each cell position based on the assumptions 
that regions of high cell density act as sources and regions 
of low cell density act as sinks [7]. This will cause a cell 
to be pulled away from any dense areas. To construct such a 
force the die area is divided up into a grid structure. Each 
grid, referred to as a bin, is given a central location, 
dimensions, and total area. The bin density is then 
calculated base on the total cell area that overlaps each 
bin. Every cell is considered and only the area overlap is 
added. To calculate the density we must first find out how 
much area each bin is expected to contain. The total area of 
all cells divided by the die area produces an average density 
23 
expected throughout the chip. 
even cell distribution. 
This is used to produce an 
The goal of our placement algorithm is to create a valid 
placement in the desired die area, and meet timing goals. 
The placement algorithm in this thesis will generate a macro-
cell design with cell placement based on the row information 
given. The output will be a valid placement were no cells 
overlap, and are constrained to an existing row. Cells are 
gathered from a central library and connected in an order 
specified by the design netlist. 
4.1 Design Entry 
Several assumptions are made before the force directed 
program starts. It is assumed that die size, I/0 pad 
placement, and row information are known. Such constraints 
are fixed during the initial design phase, thus are not 
considered flexible in this implementation. As much of 
industry works in a parallel design process this assumption 
is considered valid. The design files are also assumed to 
have no hanging connections, or multiple drivers. Hanging 
connections are nets with a single output but no input and a 
multiple driver is defined as any net with more than one 
output. If multiple drivers exist there is bus contention 
24 
which could allow the circuit to go into a runaway case. In 
a runaway case one driver holds the bus low and the other 
holds it high. If this occurs there will be a short between 
Vdd and GND. As long as these assumptions are followed the 
placement program will 
solution is created. 
continue and a valid row-based 
To help increase usefulness and compatibility, the 
Cadence Library Exchange Format (LEF) and Data Exchange 
Format (DEF) was used as library and design inputs. As of 
2001, Cadence released these formats and their parsing source 
code as an open source [ 18 J • All information relevant to 
each logic cell is contained in the LEF file. While all net 
and component information is contained in the DEF file. 
4.1.1 Library Cells 
As discussed before, a central library of cells is 
needed to create the design. The cell information contained 
in this file gives a full description of each cell. To store 
this information the Cadence designed LEF was used. 
Contained in the library is the physical cell information 
such as and not limited to area, width and height. 
below is the information used: 
Listed 
25 









Number of pins 
Pin descriptions 




• Location (with respect to center of cell) 
All information is gathered from the file and passed to a 
memory location for later use. 
4.1.2 Design Netlist 
The design or connection map of the circuit was gathered 
from the Cadence (DEF) . This file also contains the 
information about placement such as die size, I/0 pad 
placement and row information. Shown below is a list of 
information gathered, modified then written back to a file: 








I/0 pad placement 
cell name and library link 
design netlist 
Some information is modified, most is not, but both the LEF 
and DEF files must be present for the force directed program 
26 
to work. When the DEF file assigns a cell it gives it a name 
and associates that named cell to a library cell. This 
enables multiple copies of one library cell to be reused with 
different pin connections. 
4.2 Initialization 
With all information contained in memory, two netlists 
are created; one called the cellnet and the other starnet. 
In the cellnet every cell gives a list of stars connecting to 
it. To simplify placement tasks all out-stars are listed 
first. An out-star is defined as a star driven by the 
current cell (Figure 7a) . In the starnet every star has a 
list of cells connected to it. Here the out-cells are listed 
first. An out-cell is defined as a cell driven by the 
}* CELL OUT R CELL 
CELL / IN // OUT / *STAR ___ --~~ 
~ CELL OUT R CELL 
(a) (b) 
Cellnet Starnet 
Figure 7: Cellnet and Starnet Terminology 
27 
current star (Figure 7b). Although the cellnet and starnet 
contain the same information, this allows connection searches 
to be performed extremely fast given the netlist does not 
change often. 
To start the force directed algorithm a random placement 
is generated. Placement of each cell is randomly assigned 
within the die area. This allows a unique location for all 
cells and prevents group movement by the fill force. With 
the initialization completed, placement is now able to move 
the cells to their force balanced locations. 
4.3 Fill Force 
In this thesis rather than constructing a repulsive 
overlap force, an attractive connection force called the fill 
force [2,7] is used to draw cells away from areas of overlap. 
This method uses a grid pattern overlapped on the chip, where 
each square or bin contains a total cell area. As described 
in [2] a two-dimensional grid is used so x and y directions 
can have different bin numbers. In most macro-based chips, 
the height of the cell is greater than the width. To 
increase the placement resolution for a row-based design, a 
greater bin count in the x axis is used while a smaller y 
value is used to help decrease run time. The balance density 
28 






The bin density or level of cell overlap in each bin is 
calculated using: 
all cells A [i] 
DBIN [xB,yB]=-DBAL + L -. --
i=O bmarea 
(16) 
If the result is negative, the bin is under-filled, while a 
positive value denotes an excess in cell area. When 
calculating the bin density each cell is selected and the 
surrounding bins checked for overlap. Any overlap is then 
represented as area A [ i J where cell i covers bin [xB, YB] . 
Before the fill force can be calculated, all bin densities 
must be known. The fill force or attraction force is then 
calculated by equation (17), where P[x,y] is the center 
position of a cell and P [xi, Yj J is the center position of a 
bin. 
F1 [x,y]=binarea (17) 
29 
The fill factor calculation can become very time 
consuming as the complexity is dependent on the maxbin size 
in both x, and y directions. To reduce run time, [2] uses an 
average of all outlying bins. In this thesis a method called 
bin localization is introduced. Bin localization limits the 
bin search to a small localized number of bins as cell 
movement degrades. This method is based on our observations 
that there is no reason to look at all bins once the cells 
are positioned in their local areas. The only change the 
cell will perform is a small movement affecting only the 
cells in its local surroundings. Our experimentation led us 
to realize this localization reduces the sum needed in the 
fill force calculation and dramatically decreases run time. 
4.4 Placement 
Cell placement in each iteration is done with a greedy 
two step calculation. First, all cell locations are fixed 
and each star location calculated. A star location is the 
average of all connecting cell locations. In the second 
step, all stars are fixed and the cell locations calculated. 
The location of the cell is determined by the sum of 
attractive forces of its connecting stars, and the fill force 
used to evenly distribute the cells. 
30 
For each net i, there is an associated weight wk used to 
promote the importance of timing critical star nets: 
wk =l +xs. c1s> l l 
This timing critical star weight is based off values of xs, 
which range from O (never critical) to 1 (always critical). 
The cell force Fe is calculated for each cell i using the 
placement of each weighted star location: 
~-I 
star degree ( ) 
Lwkj ~ta'j - ~elli 
j=l 
(19) 
Each cell is connected to at least two stars, input and 
output, although any degree higher is acceptable. To find 
the final position of the cell, the total cell force (Fe) and 
fill force (Ff) are scaled with a and Band added to the cells 
current position: 
(20) 
After each iteration the cells are pulled closer to 
their force balanced positions. By using a timing based 
weighing factor, important nets are given priority and pulled 
closer together. During the iterations, both a and B are 
changed to promote connection or evenly distribute the cells. 
31 
This cooling of a and~ is discussed further in Chapter 6. We 
found that starting with a small~ then increasing it higher 
than a allowed for connection clusters to be made then pulled 
apart. With the star weight highly critical paths would 
remain closer than other clustered cells. 
4.5 Row Assignment 
Before outputting the placement, a cleanup step is 
performed. In standard cell designs all cells are the same 
height but can have different widths. For a valid output a 
row based cell placement must be constructed from the 
unconstrained placement generated by the force directed 
approach. The goal of such an optimization is to evenly 
distribute the cells on each row without destroying the 
timing characteristics of the circuit. With the placement in 
row based form it is then output in Cadence DEF format for 
future modification and viewing in other software. 
32 
CHAPTER 5. OPTIMIZATIONS 
The goal of placement driven delay optimization is to 
guide the final solution to a local minimum while new cells 
are easily integrated into an existing cell placement. This 
optimization is used to reduce all critical paths greater 
than the desired cycle time and direct the placement as to 
reduce interconnect delays. Without hard constraints imposed 
by a finial solution, our iterative method will produce a 
more optimal placement with a minimal area increase. For 
this task four methods of optimization are used: gate sizing, 
type A buffer insertion, type B buffer insertion, and cell 
replication. To retain consistency with other papers the 
Type A, and B buffer insertion terminology is retained as 
described in [11]. 
To determine which method of optimization is desired a 
delay optimization algorithm (DOA) was created. This 
algorithm determines the optimal location, and insertion 
method for each DOA iteration. During the selection process 
large increases in area are deterred with the area optimized 
insertion constant (AOI). The AOI constant is used to 
increase or decrease the penalty for added area. As each 
method of optimization creates a varying increase in area, 
33 
the selection method must determine the best delay to area 
savings. If the AOI constant is small, area has little 
effect on the decision and the best change in delay is 
selected. The method of selection discussed above is driven 
by the value called delay gain. This value shows the delay 
savings given the area penalty present in the optimization 
method: 
Delay gain=~delay · (1- aoi · !1area J 
max cell area) 
(21) 
The DOA, selection is simply the largest value of delay gain 
found in each iteration. With the best optimization method 
selected the desired modifications are made to the circuit 
and current iteration of the DOA is completed. 
When calculating the delay gain for each of the 
optimization methods several criteria must be met. First an 
appropriate point must be chosen for the optimization to 
effect. If this point is not capable of handling the desired 
optimization that method is ignored for the current 
iteration, which forces each method to determine causes of 
error before the delay gain is calculated. Once a successful 
optimization is determined care must be taken not to disrupt 
circuit components directly affected by the optimization 
34 
method. Thus forcing the optimization methods to retain 
intelligence and observe its effects on the surrounding 
cells. 
5.1 Gate Sizing 
Gate sizing is the simplest of all optimization 
approaches. By increasing the driving strength of a cell, a 
decrease in delay can be achieved. This is due to the 
decline in the output resistance allowing the capacitance in 
large loads, such as long interconnects or large fan-outs, to 
be switched at higher speeds. When increasing the gate size, 
only a small increase in cell size is produced. In fact, it 
has been noted, in some macro-cell libraries an increase in 
gate size sometimes produces no increase in cell size. This 
is due to the constrained height of all macro-cells in the 
library. If extra area exists in the vertical direction no 
width increase is needed to change the gate size of the cell. 
The optimal location of gate sizing is found on the most 
critical path. When scanning the most critical path every 
cell along the way is increased and the change in delay noted 
[ 11 J . This change in delay is then monitored and the largest 
delay gains is selected as the gate sizing point. In most 
35 
cases the critical path is small and this brute-force method 
is not computationally expensive. 
5.2 ~e A Buffer Insertion 
Like gate sizing the type A buffer insertion is used to 
increase the driving strength of the source cell to more 
adequately dive the load (Figure 8). In this case the load 
consists of a large capacitive value containing interconnect 

















gate sizing type A buffer insertion can place the new buffer 
at any location between the star and source cell. If there 
is a long interconnect between these two points the cell can 
be used to drive the long interconnect and the buffer to 
drive the remaining load. Insertion of the buffer does have 
36 
its down-falls. When inserting the buff er a new cell is 
added causing an increase in area. This size increase can 
become four to five times that of simple gate sizing. 
However, a properly sized type A buffer will produce a much 
better decrease in delay. 
The optimal location for the type A buffer insertion is 
based on the value of maximum path delay along the most 
critical path. The maximum path delay is the largest delay 
time from each source to sink contained along the critical 
path. It is also used as the insertion point for both type B 
insertion and cell replication. 
5.3 fype B Buffer Insertion 
The intent of a type B buffer insertion is to shield the 
most critical paths(s) from all non-critical nets. This 
shielding effect replaces the capacitive sum of all non-
critical sinks with the gate capacitance of a minimum sized 
buffer (Figure 9). 
This minimum sized buffer creates a decrease in sink 
capacitance seen by the source cell but adds cell area. If 
presumed non-critical cell is placed behind the new buffer 
delay can increase in the next iteration. It is for this 
reason that care must be taken when selecting the non-
37 
critical cells because once the buffer is inserted there is 
no function to remove it from the netlist. If a non-critical 
cell becomes critical after a type B insertion then the cell 



















SOURCE NEW j / NEW 




NON\CRITICAL ~ NON-CRITICAL 
$INK '"-_ SINK 
----- • -~O~N-CRITICAL 
SINK ".. SINK 
·, 
Figure 9: Type B Buffer Insertion 
As in the type A buffer insertion the net with the 
largest path delay is used as the location for insertion. 
This point differs from [11] where the maximum fanout 
capacitance along the most critical path is used as the 
insertion location. Under the implementation discussed in 
this thesis such a location would not be desired as chip 
output has a capacitive load several times larger than that 
of any cell. Given that the chip output cannot be decreased 
by optimization it will always remain the largest fanout. 
This observation showed that if the chip outputs were 
38 
ignored, the max path delay would closely follow the largest 
capacitive fanout. 
To determine if a type B insertion should be performed, 
the net must contain a degree larger than two . As Type A 
insertion is designed to handle all single-line single-load 
these cases there is no need to use the Type B insertion for 
the same case. To further make the type B insertion 
successful less critical net segments must be isolated from 
the highly critical net segments. This isolation is 
determined from the criticality found in the Chapter 3.1. A 
cell criticality of zero is the most important path. Any 
other value represents the amount of delay increase along the 
path that can be absorbed before it becomes the most critical 
value. To isolate the capacitance, two sets are created for 
the type B insertion. One set contains all critical cells 
and the other set contains all non-critical cells. The 
maximum value of criticality of all cells on the net is used 
to determine which cells are critical and which are not. The 
maximum criticality is then multiplied by a constant, cl, and 
all cell cri ticali ties lower than this value are placed in 
the critical set. All non-critical cells are then placed 
behind the new buffer in order to isolate them from the 
critical cells [11]. 
39 
With the two segments created, timing must be re-run to 
verify that no net placed behind the buffer becomes more 
critical than the present critical path. This is done to 
prevent future complications with the added buffer cell. If 
any net does become more critical it is moved from the non-
critical set back into the critical set and timing is run 
again. This verification is continued until all critical 
cells are removed from the non-critical set. If no cells 
remain in the non-critical set after timing is run type B 
insertion is ignored and zero delay gain is reported to the 
DOA. 
5.4 Cell Replication 
As with type B buffer insertion cell replication is used 
to isolate the total interconnect capacitance. Unlike type B 
buffer insertion cell replication creates a whole new source 
cell, allowing greater physical movement (Figure 10). The 
goal of this type of optimization is to decrease interconnect 
capacitance and divide the sink capacitance. Unlike type B 
buffer insertion the criticality is not used to divided the 
cells into critical and non-critical sets but rather divide 
the two sets by the values of location and sink capacitance. 
SINK SINK 
,,' ,,, 
1' ,,,,' I ,,,, 
SOURCE / ,// SINK 
STAR / ,,/ --¾-
\ ',,, 
\ ', \ ,, 
\ , ... ,, 










/ \ ,,,, 
i \ '*..W 
SIN~ ou~,:~~::1 
Figure 10: Cell Replication 
The method of dividing the cells into two sets is based 
off the assumptions that, each star is the average between 
all connecting cells, and any reduction in capacitance will 
decrease the delay time. For a good division the total 
capacitance in each set should be approximately the same. 
This is to prevent a large delay in one set and a small delay 
in the other. Given that the star is the center of all cells 
an approximate average of cells is already known. Two 
possible sets are first evaluated. One set xl, x2 and the 
second yl, y2 (Figure 11) . Both sets are then compared and 
the set with the closest total capacitance is chosen. With 
the cells for each set determined new star locations are 
found. By finding new star positions interconnect 
capacitance is changed and a small view of possible physical 
41 
delay savings shown. Now timing is run and delay gain 




SOURCE 11 ,/ 
la /' \1 / SINK 
---- STAR ' ,,-/' __ ..:-:.-:.--;;;*'· 
,' '· /'' ·-.... , 








Figure 11: Cell Replication Method of Balance 
42 
CHAPTER 6. EXPERMENTAL RESULTS 
The experimental results in this section are based off 
Verilog design files used to create the Cadence DEF. The 
Verilog design files were compiled by Synopsys Design 
Compiler into a gate level design and linked to the lsi ls15 
library. The lsi ls15 library was originally used to 
describe a 1.5 micron process but all values contained in the 
library are unit values. This allowed me to generate a 0.18 
micron process approximation based on the 1999 National 
Technology Roadmap for Semiconductors [19] ( see Table 1) . 
Listed in Table 2 are the design files acquired from [ 10] . 
The program presented in this thesis was compiled in C++ and 
run on a 367 MHZ single processor HP C360 machine with 512 MB 
of memory. 
Table 1: Roadmap Values 
Tech. (µm) 0.18 
Minimum wire width: Wmin (µm) 0.18 
Sheet resistance: r (.Q/0) 0.068 
Unit area capacitance: Ca (fF/µm2J 0.060 
Unit fringing capacitance: Cf (fF/µm) 0.064 
Output resistance of min. device: rg (Q) 17100 
Input capacitance of min. device: Cg (fF) 0.234 
43 
Table 2: Design Parameters 
Design Num. of Num. Of Num. of 
Name Components Pins Nets 
tlc 47 5 54 
addrgen 646 97 763 
rgb_interp 1658 71 1813 
matrix 1773 118 1905 
macunit32 6546 213 6758 
To produce an evenly distributed placement a balance 
must be maintained between cell connection force and fill 
force. This balance was acquired by changing the scaling 
factors a and B- Through experimentation it was found that a 
constant value for each did not produce the best placement. 
Bad placements appeared when cell oscillations occurred 
(Figure 12) . This osculation transpires when the cell force 
causes the cells to cluster, as in Figure 13a, during one 
iteration and in the next iteration the fill force yanks them 
back apart, as in Figure 13b. From the delay values in 
Figure 14 you can see the large changes in delay as the cells 
osculate. To reduce the oscillations the connection and fill 
force need to remain small enough as to reduce large 
movements of the cells but large enough to keep the total 
number of iterations small. The selection of a and B were 
based off two basic observations. If a was higher than B the 
connection force would dominate and clustering of the cells 
44 
a) Oscillation b) No Osculation 
Figure 12 : Final Placement Due to Oscillation 
a) Clustering due to 
Connection Force 
Figure 13 : 
~ u U U U U U U U U llJL [T U 'ti U ~· -1 H w.. UT~ W J.l I.If U... In tr WI U 










~ "" n 
111 n n n ,., I\J II 
" 




b) Rebound due to 
Fill Force 










--- alpha=0.5, beta=0.5 
-,1r- alpha=0.5, beta=0.6 
~ alpha=0.5, beta=0.7 






~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ v ~ ~ ~ ~ 3/ ~ ~ ~ ~ ¥ ~ ~ ~ ~ 
iteration 
Figure 14: Constant a and f3 
a , J3 Factors 
0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 252627282930313233343536373839 404142 43444546 4748 49 
Iteration 
Figure 15 : Cooling Schedule for a and f3 
46 
would occur. If ~ was larger a very well distributed 
placement was produced but connected cells were not close 
together, causing delay to be large. It was from these 
observations that and a combination of each was needed. To 
handle such a task a was given a large initial value while~ 
started small and slowly increased. This allowed connections 
to move close together and slowly let~ pull the cells into 
empty space without destroying the locality of the connected 
cells. Once~ was allowed to pull the cells apart both a and 
were reduced as to produce smaller local overlap 
refinements. This tempering of a and~ can be seen in Figure 
15. 
To show the conceptual idea of integrating optimization 
with placement, three levels of optimization are shown. 
First my Force Directed placement tool is used alone. This 
means no delay optimization is performed. Remember, to aid 
the placement tool in delay minimization, timing critical 
paths are still giving a larger connection force. No other 
methods of optimization are used in this example, designed 
specifically to show the capabilities of the FD placement 
algorithm. The results from this level based cell placement 
method are shown in Table 3. 
47 
Table 3: FD Placement Only 
Design Run Init. Final 
Name Time (s) Delay (ps) Delay (ps) 
tlc 0.90 837.39 77.766 
addrgen 6.44 1704.01 156.407 
rgb_interp 26.49 11887. 70 954.219 
matrix 27.45 13399.70 711. 501 
macunit32 119.25 32748.50 1811. 610 
To show the validity of the DOA a random placement was 
generated and 51 iterations of the DOA run. No changes in 
placement were performed during this time giving a good 
representation of the DOA's capability. During this 
experiment an AOI of 1 (Table 4) and AOI of 3 (Table 5) were 
used and the number of each optimization method noted. 
Table 4: Random Placement only DOA, AOI=l 
Design Run Type Type Cell Gate Area Final 
Name Time (s) A B Rep. Sizing Inc. Delay (ps) 
tlc 0.19 0 18 0 33 47.5% 40.422 
addrgen 1.18 6 14 5 26 3.0% 121.237 
rgb interp 2.91 6 19 2 24 2.0% 1289.800 
matrix 3.31 13 13 0 25 2.3% 1436.680 
macunit32 28.06 2 38 0 11 0.8% 2804.550 
Table 5: Random Placement only DOA, AOI=3 
Design Run Type Type Cell Gate Area Final 
Name Time (s) A B Rep. Sizing Inc. Delay (ps) 
tlc 0.19 0 17 1 33 45.9% 39. 671 
addrgen 1.17 4 11 4 32 1. 9% 121. 551 
rgb_interp 3.29 5 16 1 29 1. 6% 1292.200 
matrix 3.39 8 13 0 30 1. 9% 1434.260 
macunit32 28.16 0 29 0 22 0.7% 3032.820 
48 
With the optimization and placement steps separated (Tables 
3-5) individual delay gains are seen. In 
Table 6 and Table 7 both placement and DOA are run together 
and the delay shown. 
Table 6: FD Placement using DOA, AOI=1 
Design Run Type Type Cell Gate Area Final 
Name Time (s) A B Rep. Sizing Inc. Delay (ps) 
tlc 0.80 0 8 0 2 17.10% 26.442 
addrgen 26.51 5 4 1 2 1.10% 101.568 
rgb_interp 25.60 1 2 1 14 0.34% 810.184 
matrix 28.50 3 7 0 0 0.81% 690.152 
macunit32 122.85 0 9 0 0 0.18% 1573.600 
Table 7: FD Placement using DOA, AOI=3 
Design Run Type Type Cell Gate Area Final 
Name Time (s) A B Rep. Sizing Inc. Delay (ps) 
tlc 0.90 0 6 2 3 14.60% 23.081 
addrgen 26.60 4 5 0 3 1.10% 103.422 
rgb_interp 26.50 2 2 1 12 0.37% 828.460 
matrix 28.50 2 6 1 1 0.70% 691.570 
macunit32 118. 80 0 9 0 0 0.18% 1573.640 
From each individual method it is hard to distinguish 
delay savings. For this reason Table 8 is use for direct 
comparison. The numbers in blue represent relative values 
with respect to placement only delay. If a more in-depth 
view of each placement and optimization algorithm is desired 
refer to Figures 16-20. These figures show the maximum path 
delay per iteration for all designs. Also included is the 
49 
random placement for MacUni t 32 ( Figure 21) and the final 
placement after 51 i te r ations with DOA (Figure 22). 
Table 8 : 
Placement Only, no 
timing (ps) 
Placement Only, with 
timing (ps) 
DOA Only, AOI=l (ps) 
DOA Only , A0I=3 (ps) 
Placement and DOA, 
AOI=l (ps) 
Placement and DOA, 
A0I=3 (ps) 










2 .DOE +03 
1.75E+03 
JD 
Fi gure 1 6 : 
Timing Results for All Methods 
Design Names 
TLC Addrgen RGB_ interp Matri x MacUnit 32 
81 . 892 189 . 632 821 . 928 7 4 0 . 272 1808.650 
(1 . 00 ) (1.00 ) (1. 00 ) (1. 00 ) (1.00 ) 
77 . 766 1 56 . 407 954 . 219 711 . 501 1811.610 
(0. 95 ) (0 . 82 ) (1. 16 ) (0 . 96 ) (1. 00 ) 
40 . 422 1 21 . 237 1289.800 1436 . 680 2804 . 550 
(0 . 4 9 ) (0. 64 ) ( 1. 57 ) ( 1. 94 ) (1. 55 ) 
39 . 671 121.551 1292 . 200 1434 . 260 3032 . 820 
( 0 . 4 8 ) ( 0 . 6 4) (1. 57 ) ( 1. 94 ) (1. 68 ) 
26 . 442 1 01 . 568 810 . 184 690 . 152 1573 . 600 
(0 . 34 ) (0 . 5 4) (0 . 99 ) (0 . 93 ) (0 . 87 ) 
23 . 081 103 . 422 828.460 69 1. 570 1573.640 
(0 . 28 ) (0. 55 ) (1.01 ) (0 . 93 ) (0.87 ) 
MacUnit32 Delay Va lues 
15 2 0 25 3 0 
lt.erat:k>n 
----JustDO Aw :ith AO Fl 
----- JustDO Aw :ith AO F3 
PbcementandDOA w:ithAO Fl 
~ P bcemen tand DO Aw :ith AO F3 
- JustPbcementwmTin:hg 
- JustP bcementno Tin:hg 
35 4 0 45 

























Figure 18 : 
50 
Matrix Delay Values 
Optim izati Oil _._Just DOA with A01=1 




Placement and DOA with A01=1 
-*-Placement and DOA with A01=3 
-----Just Placement with Timing 
_._Just Placement no Timing 
35 40 45 
Path Delay Graph for Matrix 
Addrgen Delay Values 
20 25 
Iteration 
30 35 40 













RGB_interp Delay Values 
__._Just DOA with A01=1 
--II-Just DOA with A01=3 
Placement and DOA with AOI= 1 
---M- Placement and DOA with A01=3 
-il--Just Placement with Timing 
_._Just Placement no Timing 
750 .. ____________________________________________ ..,.. 
10 
Figure 19 : 
15 20 25 
Iteration 
30 35 40 45 
Path Delay Graph for RGB_interp 


















__._ Just DOA with AOI= 1 
--II-Just DOA with AOl=3 
Placement and DOA with A01=1 
---M-Placement and DOA with A01=3 
-il--Just Placement with Timing 
_._Just Placement no Timing 
40 45 




LI..I ... I UNJY _ U I, u fl 1111.ll ...,..,..,,, .. .LI urnLlrl ~ 1-Utill u u -l -1 "U 11\i ,_ H 
~ ~ llnU ,, ~~ ,~ • 11 - u IV" 
t""lf um fl. r1,j _ ~ II r u " JU , .. ...,__,,. n ·u: -.,1 11u II 
lln-- .I U u I I ~ .J f'I l"J]U n 
a ~· ~• r 
ffl :' INCi! •L.. I_ "'Ill 
~_u .0· • _I LI , .... I 1'111 a II ~ " - - -
1 u- ... , u • 1r 1 
1111 II 
II. In -
I "1l I - • 
UMU1J ,,., - I J 
I 
',I 
II -11 ua. n .. .._--...-... au·~ 
I la u, u W ,,. I • 
"I n ,~ •1 ~-~, n n ~ 
JI, [J Ir II lJ : I U.. ,-- ~ - I. 
n ff 1 .J..\ ti • -·ci::n .. r ,.,, ~fin 
n !,f, T .., - I .U 
·u I n .LI, 
1 11 1111 t. I I Ill - 1; L 
- ~ "'" ~ u_ .. _ 
Ir "n. UT r II "~ 
IL • 
u • ,.., 11 IU 
-u ..... n., LI~•• 
Lil 1.1•. 
.._, - ._ 1 r 1 1r " 
1,1'4 
--ra. "Iii H rd.I 
- Cl ~ 
11111 • LI-I l I 

















-- •.., n 
n 
I\, 1411-
1 I I 
"<Ju,,--
II 
ltt. '•' I I 11 
- D .J1 U, 
,..-
- Ill In U,J ..,. ,. u ,..., , l 
-· LIi. u 
L 
II n.. n u .I. 'ff ,. 
n n r ,. II 
u .,..,. "' J1I 
,,. 
... 
u H rll J_ 
rn, I ::a - t II IS f I n 
..iii , - ... I II =~ I'll 
~ t11 u Un r. I .- o m ~, ,f I In L "'J. .,.,._., ..J _ • . r.w1 u LI I. .JL.t'.,L 
, 
II 
- "Ji rrll"""11 'f- ll I nn Ult I .. --ti " r II.. I LJl u, IIL I 
1-1 l 
-.; u I 'WI 
J ...... aJ ~ ....... .,u,, 
_r - -
I~ U :. ~ 
IA:~ 
II ---II ....., I 
• J. ., I . • 
1u • ., • ,r, - ,_ I LI 
n ....1. _-,, Ii n I 'I: 
.r 
l l 111111u::.: µ Irk tt 'I"' L 
I.al n .111 -. In U.,n ..., a....i .n -
11 U •• 
...... - I T - ~, nl I f'hni ail I l I ~- .u: I~ . . 
,hf -
lffll 
" .. u LI,. ~ I uu 'Ill ., I LI Lt, Iii u 
h 111 ..... ,, IUIII ,rll 111, ,,- lW J I II 
I ..,. • " ..,, JJ f I ., "\.I " If ••~" L 
I rt l • .J • fl ,...1 -- '1. 11.-., 
I 
·•1 - 1 U- -, _,i. n lJ r"r _ 11 L 11.u I 
,L - '- LI l ... 'I n I '"" u 
u IU. "l,hl I 
.- 1~ l 'I IU o· II In ,n ~"" 
.... 
. , I 
11 I _! n -. 
L'T11 1111 .n ~L..I 
- - ...., 
-• 11 "LI 




µ u u, I l!l' IU rJ L o· _I_U. 1111 ~ ,I • n.J 1 1wo1a n ... t' 111 I 
I - J.Jril uu 
'"'11" ~-rri 
U UI' Pl' "fl- IL.a 11 mu 111 , 
-. • II IJ _=11 ~ ~ - IH1H 
;:; I I• 'LJI_ ~,i ll." n 
r u ,, u11 r,., u • _ 'I. I m 
·, 1 _ J., - Iii 111nrr Ill" - II n n--. 
I mr.r,:,-., I ,_ [1 , _, ••• --1 , • L n 
11 
... ~ 1 
.I - 11.J 
,I 
· - I, -·· ..,.._,.,. l • ..., n _ Ji.: n II 11 11 ~ I 
f I . .. •~ --., ll"ffl 1...-~~-• U 1r 11a r'I,,, ' UI 11, - --r--:u I: 11 
111 nll!'L 11l1Ul1 nn+-Pl 11,,.11 n tll'TI TrYf lft-fl--- f1 r rm• r n-nv-nT111 h' 
UHi llil.111 .~ 14 
I - ~ 














... ., L rl, fl ll. r l fl I r r 
.w1.1 .ll 11, V w, u El Ii. 'UJ 11'.I I N""' " r11 11 l " 11 • I 
en __ ,._ 11rjl 
··- . II 111:: · • .-- ..... •L.A.• I u ~ 
.. y ,. r•n 
IU" .r JlrnlJ..Jr: 
• ·"' I I "' ,.. "~"II - Ill l w 
I L n Tl r r-1 ... .,. 
!IL II""• II I II , 1• l""T 11 I U II. L h ... 
I~' • Ll•II -- j 
I II ,. 
n..{T u --- .---a .IL,, 
I 
llU r--" r---'I I -1~-- I , _,., M -
'L.J.L._ L._lll n.n l IT-, 11 'JI n !I..• 
~ -, u n -11 
LI LI r,I U n., .,._ ·- 11 
n I ~ n ,- 1 II I -• 








T II n, 1-• ,, ~ I ~ I o...+F="' r ... - ..... , "" Oil 1"11 






I I l -,ii·-
u ,.. .__... • "" ~ 0-ll I -... ~-
~ n Ll 
n ...,,, n 
r---1 I u ~ -u 
,.uu1 c':11 _u -
n,i ......... 
.... 
n r l"I 'WL (i -
II I~ • 111aLJo-1 n· 
-- r11 • u .._ ~~ tr _JI 11. n II 1,,., I 4 ...., 
,_ n rt. 1 --1 nn , rt ,D ,... 








,- ll .,_.I '..JI 1 M ....... l 
u - " lldl LI. 
n .. 
_ft'I. i..-~ _-,_ :,- u u - L 
U I 
n U n " "' 
- Nl4 ,,.,-'111~--,. ,, .,. 
Ill -"" _ .,._ _ fI _ 
'1.. --L..C'.:- ~ - n II I m r11 n. - I l -11 •o _ll_ L...L....l_ ..1J -"1 • r0 ~'1 Jt1 u 1•• _ 11 









I I • L 
lJ'I 
, l!'U Un- -'"'LM 
t-11-.J .... .....:..!::!... -n .,~ II r,., I I I llJ,I 
":"'.'l.J' II r,, --.. - ........ 
II U lllA.. lm1 .J n ,~ -.., 
LI 
...,, 





J - 11 II I'-' tn ~ l _, """" 
1111-, HJ nU 11u• ~ 
" ,ua.: .-, ru ua. u ... 111 l9I ... , .I - .. - ll -~ I 11 .~~"'"' 1'9.a,;--, ... L 
II .1,1"1 II 11 ..tJI p Anll'!L .. 





.L.i µ '""' 
I 
u u u u 






CHAPTER 7. CONCLUSIONS 
Our use of a combined placement and optimization 
algorithm has been verified as capable of delay vs. area 
reductions. The resulting delay difference between placement 
without circuit optimization and placement with circuit 
optimization was good but not as large as expected. 
experimentation possible modifications were noted. 
During 
It is 
from this observation that future work to both the placement 
and optimization can produce better results. 
In the placement algorithm balance must be introduced 
between placement forces. Balance is needed between the cell 
and fill force or an osculation effect was generated. This 
balance was hard coded but a balance running in real time 
would produce a faster convergence. The timing driven spring 
constant is assigned so critical nets acquire a large spring 
constant and non-critical nets are given a lower. The 
placement algorithm used star criticality for this spring 
constant which affected all nets connected to the star. This 
is not desired as in each star connection there will be both 
critical and non-critical connections. To produce better 
delay values the slack of the non-critical nets needs to be 
used by the critical nets. 
are not produced. 
55 
If this is not done delay savings 
To reduce delay further the optimization criteria must 
better determine which method of optimization to use. In the 
insertion of a Type B buffer the critical and non-critical 
sets will need to be found more accurately. Presently a 
large gray area between critical and non-critical cells is 
retained in the Type B insertion algorithm. The present 
algorithm only considers the most critical net, al though a 
secondary objective should also be added. If our primary 
objective is to reduce the arrival time of the most critical 
nets by the largest possible amount, then the secondary 
objective should be to reduce the delay for as many highly 
critical nets as possible. For the cell replication method 
criticality should be integrated so an unbalanced set 
capacitance can be used. For an unbalanced set capacitance, 
all critical nets should be contained in the set with the 
least capacitance with the severity of the imbalance 
determined by the difference between critical and non-
critical values. In both of the above modifications 
sacrificing the slack in the non-critical nets will help 
reduce the highly critical net delays. 
56 
Since timing is becoming more important to industry, 
reduction of delay in the placement phase must be considered. 
By eliminating the isolated stages between placement and 
optimization larger delay gains can be achieved. Our 
approach has been shown to reduce global delay time by both 
physical and logical optimization. Our method also allows 
for delay time designation where run time is dependent on 
complexity of the problem. 
57 
GLOSSARY OF TERMS 
balance density - the total cell area per die area of the 
chip 
bin bounded area created to hold information needed to 
compute the fill force 
bin density - the cell overlap in each bin 
cell a copy or reference to a more detailed version 
contained in the library cell 
connection force combination of spring forces for each 
cell-to-star connection 
criticality - (critical/non-critical) a path is critical if 
that path violates the timing specifications. The magnitude 
of the violation produces the level of criticality. 
degree - total number of connections made to given point 
ie. star net degree= the total# of cells connected 
driver resistance - resistance generated by the transistors 
internal to each cell. As the size of the transistor is 
increased the resistance is decreased. 
driver strength - the level of transistor sizing determines 
the driver resistance. A low resistance allows a very fast 
transition speed or large load capacitance to be driven. 
fill force force used to pull cells away from dense 
portions of the die in order to reduce cell overlap. 
force directed placement 
driven by Hoeks law. 
a cell based placement approach 
interconnect delay - delay time needed to drive a sink cell 
(1/0). 
library cell detailed description both 
physically. The physical boundaries and 
cell, are given along with the location, 








loading capacitance total capacitance seen by the driver 
resistance. Capacitance includes both interconnect and sink 
capacitance. 
macro-cell design - a layout of rectangular cells of the same 
height. Where the cells are placed in rows and then 
connected. 
net - a set of terminals that have to be made electrically 
equivalent. 
ie. electrical connection between two points 
netlist - collection of nets, description of cell connections 
pad - electrical connection to outside the die 
pin - an input or output connection on a cell 
sink capacitance capacitance generated by the gate 
transistors of each cell. As the size of the transistor is 
increased the capacitance also increases. 
sink cell - the cell driven by a source cell. 
source cell 
cells. 
the cell used to drive all connecting sink 
spring force - force based on Hoeks law. 
star - a central connection between two components 
59 
REFERENCES CITED 
1. Naveed A. Sherwani, "Algorithims For VLSI Physical Design 
Automationn, 3rd edition, Kluwer Academic Publishers, 
1999. 
2. Fan Mo, Abdallah Tabbara, and Robert K. Brayton, "A Force-
Directed Macro-Cell Placer," IEEE Transactions on CAD of 
IC and Systems, Vol. 4, pp. 177 -180, .March 2000 
3. Kleinhans, J.M.; Sigl, G.; Johannes, F.M.; and Antreich, 
K.J., "GORDIAN: VLSI placement by quadratic prograrmning 
and slicing optimization," IEEE Transactions on CAD of IC 
and Systems, Vol. 10, pp. 356 -365, March 1991 
4. Etawil, H.; Areibi, S.; and Wannelli, A., "Attractor-
repeller approach for global placement," IEEE/ACM 
International Conference on Computer-Aided Design, pp. 20 
-24, 1999 
5. Horvath, E.I., "A parallel force direct based VLSI 
standard cell placement algorithm," IEEE International 
Symposium on Circuits and Systems, vol. 3, pp. 2071 -2074, 
May 1993 
6. Xianlong Hong; Hong Yu; Changge Qiao; Yiel Cai, "CASH: a 
novel quadratic placement algorithm for very large 
standard cell layout design based on clustering," 
International Conference Proceedings on Solid-State and 
Integrated Circuit Technology. pp. 496 -501, 1998 
7. Eisenmann, H.; and Johannes, F.M., "Generic global 
placement and floorplanning," IEEE Proceedings on Design 
Automation Conference, pp. 269 -274, 1998 
8. Youssef, H.; Sait, S.M.; and Al-Farra, K.J., "Timing 
influenced force directed floorplanning," IEEE Proceedings 
EURO- Design Automation Conference, pp. 156 -161, 1995 
60 
9. Xianlong Hong; Yici Cai; Changge Qiao; Pujiang Huang; 
Zhiwei Kang; Tianxiong Xue; Kuh, E.S.; and Chung-Kuan 
Cheng, "Tiger: a timing-driven gate array and standard 
cell layout system," IEEE 4th International Conference on 
Solid-State and Integrated Circuit Technology, pp. 338 -
342, 1995 
10. Chou, Yih-Chih; Lin, Youn-Long, "A Performance-Driven 
Standard-Cell Placer Based on a Modified Force-Directed 
Algorithm," IEEE International Symposium on Physical 
Design, pp. 24 -29, 2001 
11. Yanbin Jiang; Sapatnekar, S.S.; Bamji, C.; and Juho Kim, 
"Interleaving buffer insertion and transistor sizing into 
a single optimization," IEEE Trans. on VLSI Systems, vol. 
6, pp. 625 -633, Dec. 1998. 
12. Alpert, C.J.; Devgan, A.; Quay, S.T., "Buffer insertion 
with accurate gate and interconnect delay computation," 
IEEE Proceedings on Design Automation Conference, pp. 479 
-484, 1999 
13. Adler, V.; Friedman, E.G., "Optimizing RC tree delay in 
high speed ASICs through repeater insertion," IEEE 
Proceedings on International ASIC Conference, pp. 375 -
379, 1998 
14. Cong, J.; Tianming Kong; and Pan, D.Z., "Buffer block 
planning for interconnect-driven floorplanning," IEEE/ACM 
International Conference on Computer-Aided Design, pp. 358 
-363, 1999 
15. Jagannathan, A.; Sung-Woo Hur; Lillis, J., "A fast 
algorithm for context-aware buffer insertion," Proceedings 
on Design Automation Conference, pp. 368 -373, 2000 
16. Stenz, G.; Riess, B.M.; Rohfleisch, B.; and Johannes, 
F.M., "Performance optimization by interacting netlist 
transformations and placement," IEEE Trans. CAD of IC 
Systerms, vol. 19, pp. 350 -358, March 2000. 
61 
17. W. C. Elmore, "The transient response of damped linear 
network with particular regard to wideband amplifiers", J. 
Applied Physics 19, 1948, pp.1-94. 
18. The EDA Open Source Community. 
http://www.openeda.org 
19. Semiconductor Industry Association. "The International 
Technology Roadmap for Semiconductor. 1999," 
http://www.itrc.net/ntrs/publntrs.nsf 
