Routing congestion analysis and reduction in deep sub-micron VLSI design by Shen, Zion Cien
Retrospective Theses and Dissertations Iowa State University Capstones, Theses andDissertations
2004
Routing congestion analysis and reduction in deep
sub-micron VLSI design
Zion Cien Shen
Iowa State University
Follow this and additional works at: https://lib.dr.iastate.edu/rtd
Part of the Electrical and Electronics Commons
This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University
Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University
Digital Repository. For more information, please contact digirep@iastate.edu.
Recommended Citation
Shen, Zion Cien, "Routing congestion analysis and reduction in deep sub-micron VLSI design " (2004). Retrospective Theses and
Dissertations. 1122.
https://lib.dr.iastate.edu/rtd/1122
Routing congestion analysis and reduction in deep sub-micron VLSI design 
by 
Zion Cien Shen 
A dissertation submitted to the graduate faculty 
in partial fulfillment of the requirements for the degree of 
DOCTOR OF PHILOSOPHY 
Major: Computer Engineering 
Program of Study Committee: 
Chris Chu, Major Professor 
Randall Geiger 
Degang Chen 
Akhilesh Tyagi 
Lu Ruan 
Iowa State University 
Ames, Iowa 
2004 
Copyright © Zion Cien Shen, 2004. All rights reserved. 
UMI Number: 3145682 
INFORMATION TO USERS 
The quality of this reproduction is dependent upon the quality of the copy 
submitted. Broken or indistinct print, colored or poor quality illustrations and 
photographs, print bleed-through, substandard margins, and improper 
alignment can adversely affect reproduction. 
In the unlikely event that the author did not send a complete manuscript 
and there are missing pages, these will be noted. Also, if unauthorized 
copyright material had to be removed, a note will indicate the deletion. 
UMI 
UMI Microform 3145682 
Copyright 2004 by ProQuest Information and Learning Company. 
All rights reserved. This microform edition is protected against 
unauthorized copying under Title 17, United States Code. 
ProQuest Information and Learning Company 
300 North Zeeb Road 
P.O. Box 1346 
Ann Arbor, Ml 48106-1346 
ii 
Graduate College 
Iowa State University 
This is to certify that the doctoral dissertation of 
Zion Cien Shen 
has met the dissertation requirements of Iowa State University 
Major Professor 
For the Major Program 
Signature was redacted for privacy.
Signature was redacted for privacy.
iii 
TABLE OF CONTENTS 
LIST OF TABLES iv 
LIST OF FIGURES v 
1. GENERAL INTRODUCTION 1 
1.1 Routing Congestion 2 
1.2 Sterner Tree Construction among Blockages 2 
1.3 Dissertation Overview 3 
2. BOUNDS ON THE NUMBERS OF SLICING, MOSAIC AND GEN­
ERAL FLOORPLANS 6 
2.1 Introduction 6 
2.2 Contributions 10 
2.3 The Tight Bound for Slicing Floorplan 11 
2.3.1 The exact number of skewed slicing trees 12 
2.3.2 The tight bound on the number of slicing floorplans 14 
2.4 The Tight Bound for Mosaic Floorplan 16 
2.5 An Upper Bound for General Floorplan 21 
2.5.1 Empty room insertion 21 
2.5.2 An upper bound on the number of general floorplans 24 
2.6 Conclusion 26 
3. GENERAL FLOORPLAN REPRESENTATION-TWIN BINARY 
SEQUENCES 27 
iv 
3.1 Introduction 27 
3.1.1 Previous works 27 
3.1.2 Our contributions 29 
3.2 Twin Binary Sequences (TBS) Representation 30 
3.2.1 From floorplan to twin binary trees 31 
3.2.2 Definition of twin binary sequences 35 
3.2.3 From TBS to floorplan 38 
3.2.4 Size of solution space 43 
3.3 Extension to General Floorplan 44 
3.3.1 Empty room in mosaic floorplan 44 
3.3.2 Mapping between mosaic floorplan and general non-slicing floorplan 46 
3.3.3 Inserting empty rooms directly on TBS 49 
3.3.4 Tight bound on the number of irreducible empty rooms 53 
3.4 Floorplan Optimization by Simulated Annealing 54 
3.5 Experimental Results 59 
4. CONGESTION DRIVEN FLOORPLANNING 63 
4.1 Introduction 63 
4.1.1 Previous work 63 
4.1.2 Our contributions 65 
4.2 Overview of Our Floorplanner 67 
4.3 Congestion Estimation Model 68 
4.3.1 Underlying routing graph 68 
4.3.2 Inner dual graph construction from TBS 70 
4.3.3 Problem formulation 72 
4.3.4 Incoming flow balancing (IFB) Phase 74 
4.3.5 Stepwise flow refinement (SFR) phase 76 
V 
4.4 Global Routing Solution Generation 78 
4.5 Experimental Results 80 
4.6 Conclusion and Discussion 82 
5. CONGESTION ESTIMATION MODELS IN PLACEMENT 84 
5.1 Introduction 84 
5.2 Notations 87 
5.2.1 Lou's model and its revision (LKS) 89 
5.3 Simple Probabilistic Model (SPM ) 91 
5.4 Multi-pin Net Probabilistic Model (MPM ) 94 
5.5 Post-Probability Processing (PPP ) 95 
5.6 Experimental Results 99 
5.6.1 Correlation 100 
5.6.2 Run time 103 
5.7 Conclusion and Discussion 104 
6. EFFICIENT RECTILINEAR STEINER TREE CONSTRUCTION 
WITH RECTILINEAR BLOCKAGES 108 
6.1 Introduction 108 
6.2 Problem Formulation Ill 
6.2.1 Rectilinear blockages Ill 
6.2.2 Directional blockages 112 
6.3 Escape Graph Versus Spanning Graph 113 
6.3.1 Redundancy in escape graph 113 
6.3.2 Spanning graph 114 
6.4 Spanning Graph Based Approach in RSMTRB 115 
6.4.1 Search regions 115 
6.4.2 Spanning graph construction in RSMTRB 116 
vi 
6.5 RM ST Construction 118 
6.6 Experimental Results 121 
6.7 Conclusion and Discussion 123 
BIBLIOGRAPHY 124 
ACKNOWLEDGEMENTS 132 
Vil 
LIST OF TABLES 
Table 2.1 A summary of results 10 
Table 3.1 Area minimization 61 
Table 3.2 Area and wirelength minimization 61 
Table 3.3 Comparisons with ECBL and enhanced Q-sequences 62 
Table 3.4 Comparisons with other representations for slicing floorplan. ... 62 
Table 4.1 Experimental results for MCNC benchmarks 80 
Table 4.2 MCNC benchmark 81 
Table 5.1 Net weight w(p)  in MPM 96 
Table 5.2 Placement Benchmark Statistics 100 
Table 5.3 Correlation of predicted congestion with actual results for heavily 
congested circuits 105 
Table 5.4 Correlation of predicted congestion with actual results for lightly 
congested circuits 106 
Table 5.5 Run time for congestion estimation models and global router. . . . 107 
Table 6.1 Statistics of testcases 121 
Table 6.2 Experimental results 123 
viii 
LIST OF FIGURES 
Figure 2.1 Relationship among the solution spaces of slicing, mosaic and gen­
eral floorplans 7 
Figure 2.2 Structures that cannot be represented in mosaic floorplan 8 
Figure 2.3 Slicing floorplan and its corresponding Skewed Slicing Tree 11 
Figure 2.4 A "+"-rooted Skewed Slicing Tree T 12 
Figure 2.5 The distribution of G(k) 17 
Figure 2.6 An example of reducible and irreducible empty rooms 22 
Figure 2.7 A wheel structure 22 
Figure 2.8 A pair of T junctions produce a wheel structure 22 
Figure 2.9 An example of inserting maximum number of irreducible empty 
rooms 23 
Figure 3.1 Structures that cannot be represented in a mosaic floorplan 29 
Figure 3.2 An example of a twin binary trees 31 
Figure 3.3 Building a twin binary trees from a mosaic packing 32 
Figure 3.4 Proof of Observation 1 (if part) 34 
Figure 3.5 Proof of Observation 1 (only if part) 34 
Figure 3.6 An example of an extended tree 35 
Figure 3.7 A simple example of constructing a floorplan from its TBS 40 
Figure 3.8 Proof of Theorem 3 41 
Figure 3.9 Proof of Theorem 3 43 
ix 
Figure 3.10 Examples of reducible and irreducible empty rooms 44 
Figure 3.11 A wheel structure 45 
Figure 3.12 Proof of Lemma 4 46 
Figure 3.13 Proof of Lemma 5 46 
Figure 3.14 Tree structure of an irreducible empty room 47 
Figure 3.15 Mapping between mosaic floorplan and non-slicing floorplan. ... 48 
Figure 3.16 The only two ways to insert X into a tree 49 
Figure 3.17 A simple example of constructing a non-slicing floorplan from a 
mosaic floorplan 50 
Figure 3.18 An example of searching the last module in the right subtree of n52 
Figure 3.19 An example of searching the first module in the left subtree of tt,. 53 
Figure 3.20 Floorplan example with many irreducible empty rooms 55 
Figure 3.21 Right-Rotate and Left-Rotate for a binary search tree 56 
Figure 3.22 Modified red-black rotations when subtree D is 0 or 1 56 
Figure 3.23 Four cases of Left-Rotate^, 7^) on ti 57 
Figure 3.24 Proof of Lemma 7 59 
Figure 4.1 Congestion estimation by the probability approach 65 
Figure 4.2 (a) A rectangular floorplan R. (b) Its inner dual graph G. The 
channel segment and the inner dual graph edge corresponding to 
the adjacent rooms C and D are highlighted 69 
Figure 4.3 An illustration of routing direction assignment 70 
Figure 4.4 An example of routing direction assignment 70 
Figure 4.5 A special case where cycle exists after direction assignment 71 
Figure 4.6 Determining the room neighborhood information directly from TBS. 72 
X 
Figure 4.7 Illustration of Incoming Flow Balancing technique, (a) Flow dis­
tribution before assigning incoming flow, (b) Flow distribution 
after assigning incoming flow 76 
Figure 4.8 The convergence of maximum congestion in IFB phase for circuit 
apte  76 
Figure 4.9 SFR approach to globally optimize the maximum congestion. In 
this  example,  e  =  {i ,  j}  is  wi th  Mcong,  V^t  = {j ,  H,  L,  M,  N},  V i n  = 
{i ,C,B,k} .  After  one i terat ion,  we der ive r\  =< B,D,E,M > 
and r2 =< B,i , j ,M> 77 
Figure 4.10 The convergence of Mcong of apte  in SFR phase, where 7 = 10, 
e = 0.001 79 
Figure 4.11 Floorplanning result of ami49a using F1 82 
Figure 4.12 Floorplanning result of ami49a using F2 83 
Figure 5.1 Tile structure for congestion estimation 88 
Figure 5.2 Proof of theorem 1 91 
Figure 5.3 n pseudo horizontal routes with weight of 1/n in SPM 92 
Figure 5.4 Calculating horizontal usages in SPM 94 
Figure 5.5 Congestion estimation by existing probabilistic approach 96 
Figure 5.6 Step one in PPP. (a) The horizontal usages before step one. (b) 
The horizontal usages after step one 98 
Figure 5.7 Step two in PPP , the over congested horizontal boundary, RDij , 
with its six neighbors 98 
Figure 5.8 Congestion map by global router 101 
Figure 5.9 Congestion map by LKS 101 
Figure 5.10 Congestion map by MP M 101 
Figure 5.11 Congestion map by SPM 102 
XI 
Figure 5.12 Congestion map by SPM .PPP 102 
Figure 6.1 Sequential approach to solve RSMTRB 110 
Figure 6.2 Dissect a rectilinear blockage into 3 rectangular blockages 112 
Figure 6.3 Three types of blockages: (a) Complete blockage (b) Horizontal 
blockage (c) Vertical blockage 112 
Figure 6.4 Escape graph 113 
Figure 6.5 Eight regions defined for each point in spanning graph 114 
Figure 6.6 Search regions for blockages and pins 116 
Figure 6.7 Visible points in search region for different blockages 116 
Figure 6.8 Example of unseen region for point p 116 
Figure 6.9 Edge connection for region R 2  119 
Figure 6.10 Complete spanning graph 119 
Figure 6.11 Six steps to construct RSMT 122 
1 
CHAPTER 1. GENERAL INTRODUCTION 
The driving force behind the increasingly rapid growth of the VLSI technology has been 
the constant shrinking of the feature size of VLSI devices (i.e., the minumum transistor 
size). The feature size decreases from about 1 fim in 1990 to 0.09fim (or 90 nm) early 
of 2004. Now, EDA community has been starting to develop tools for 65nm design. It 
is also projected that the major chip makers, like Intel and IBM, will be able to put the 
65nm chip into production in 2005 using 300nm wafers. 
Scaling down of VLSI fabrication into the ultra deep sub-micron (UDSM) dimensions 
has dramatic impact on the VLSI technology in several ways. First, the device density on 
integrated circuits grows quadratically with the rate of decrease in the feature size. The 
total number of transistors on a single chip has increased from 500K in 1985 to 40M today 
and will reach 300M in 2006 [1]. Second, due to the reduction of wire size and gate size 
and increase of wire length, the switching delays in gates decreases, and the signal delays 
due to interconnects increase rapidly and have been predominant in today's designs. All 
these make the VLSI design, especially the physical design (which is the back-end of the 
VLSI design) more and more challenging. 
In this dissertation, the author collects several works done in his Ph.D. years. These 
works address several topics related to some practical issues in physical design, like routing 
congestion and steiner tree construction among blockages. 
2 
1.1 Routing Congestion 
The routing congestion can be interpreted as a supply and demand problem for routing 
resources. The supply of routing resources can be roughly determined by technology 
parameters, such as die size, number of layers and position of preplaced macros. The 
demand of routing resources for each design is determined by global routing and detail 
routing solution. 
In traditional design, routing congestion will not be considered until global routing 
stage. But with growing complexity of chips, routing congestion needs to be emphasized 
at the planning stages(i.e., floorplanning and placement stages). Since a highly congested 
region in the floorplan/placement often leads to routing detours around the region which 
results in a larger routed wire length and worse timing. Congested areas can also de­
teriorate the performance of global router and, in the worst case, create an unroutable 
floorplan/placement in the fix-die regime. 
Congestion is one of the main optimization objectives in global routing. However, the 
optimization performance is constrained because the cells are already fixed at this stage. 
Therefore, designer can save substantial time and resources by detecting and reducing 
congested regions during the planning stages. An efficient and yet accurate congestion 
estimation model is crucial to be included in the inner loop of floorplanning and placement 
design. In this dissertation, we mainly focus on routing congestion modeling and reduction 
during floorplanning and placement. 
1.2 Steiner Tree Construction among Blockages 
Given n points on a plane, a Rectilinear Steiner Minimal Tree (RSMT) connects these 
points through some extra points called steiner points to achieve a tree with minimal total 
wire length. Many works have been done on this fundamental problem in VLSI physical 
design. However, most of them did not take blockages into consideration. In fact, today's 
3 
design often contains many rectilinear routing blockages, e.g., macro cells, IP blocks, 
and pre-routed nets. Thus, rectilinear Steiner miminal tree construction with rectilinear 
blockages (RSMTRB). becomes a very practical problem. 
Generally, RSMT is used in initial net topology creation for global routing or incre­
mental net tree topology creation in physical synthesis. In addition, RSMT is utilized to 
accurately estimate congestion and wire length in early design stages, like block floorplan­
ning and cel l  p lacement .  The t iming and congest ion information obtained from RSMT 
can be used as a criteria in timing and congestion driven routing. It is the problem applied 
hundreds of thousands times and many of them have very large input sizes, RSMT thus 
deserves much intensive research in VLSI CAD. 
Unfortunately, RSMT itself was shown to be strongly NP-complete. Taking blockages 
into account dramatically increases the problem complexity. Thus, it is extremely unlikely 
that an efficient optimal algorithm exits for RSMTRB. In the past, several heuristic 
algorithms were proposed for this problem, however, they have either poor performances 
or expensive running time. In this dissertation, we propose and efficient and effective 
graph based approach to  solve RSMTRB. 
In following section, we give a brief overview of this dissection. 
1.3 Dissertation Overview 
In Chapter 2, we derive tighter asymptotic bounds on the number of slicing, mosaic 
and general floorplans. Consider the floorplanning of n blocks. For slicing floorplan, 
/ \ / \ 
wo prove that the exact number is n!^—^ X^fe=o(^ + VS)n -2k 1/2 
yn — k J 
and the 
tight bound is 6(n!22'543n/nL5). For mosaic floorplan, we prove that the tight bound is 
0(n!23n/n4). For general floorplan, we prove a tighter lower bound of f2(n!23n/n4) and a 
tighter upper bound of 0(n!25"/n4,5). 
4 
In Chapter 3, we proposed a floorplan representation for general non-slicing floorplan. 
The new representation is called Twin Binary Sequences (TBS), which is the first com­
plete and non-redundant topological representation for non-slicing structure. Like some 
previous work [34], we have also made used of mosaic floorplan as an intermediate step. 
However, instead of including a more than sufficient number of extra dummy blocks in 
the set of modules (that will increase the size of the solution space significantly), our 
representation allows us to insert an ex act number of irreducible empty rooms to a mo­
saic  f loorplan such that  every non-sl ic ing f loorplan can be obtained uniquely f rom one 
and only one mosaic floorplan. The size of the solution space is only 0(n!23"/n1,5), since 
mosaic floorplan is used as an intermediate step, but every non-slicing floorplan can be 
generated uniquely and efficiently in linear time without any redundant representation. 
In Chapter 4, we design an accurate and efficient congestion estimation model by 
performing global routing. We interpret the global routing problem as a flow problem of 
several commodities and relax the integral flow constraints. The objective of resulting 
fractional flow problem is to minimize the maximum congestion over all edges in the inner 
dual graph. The underlying routing graph for each commodity is derived by assigning 
directions to the inner dual graph edges. We design an efficient two-phase algorithm to 
solve this  f ract ional  f low problem. The f i rs t  phase is  denoted as  Incoming Flow Balancing 
( IFB) by which a  good ini t ia l  solut ion is  der ived.  The second phase is  cal led Stepwise 
Flow Refinement (SFR) by which the maximum congestion of the solution in first phase 
is iteratively reduced to its optimal value. In addition, a valid global routing solution 
can be obtained by applying a simple rounding procedure on the fractional flow solution. 
The maximum congestion after rounding is only increased by 2.82% on average according 
to our experimental results, which justifies the use of fractional flow to estimate the 
routing congestion. Finally, we demonstrate our model by integrating it into a simulated 
annealing (SA) based floorplanner, where we use the maximum congestion as part of 
the cost of SA. The experimental results show that, on average, our congestion-driven 
5 
floorplanner can generate a much less congested floorplan (-36.44%) with a slight sacrifice 
in area (+1.30%) and wirelength (+2.64%). The runtime of the whole SA process is only 
increased moderately (+270%). 
In Chapter 5, we proposed three congestion estimation models based on probabilis­
tic approach. Particularly, Simple Probabilistic Model (SPM) is proposed by using an 
efficient probabilistic usage assignment algorithm for two-pin nets. Multi-pin Net Proba­
bilistic Model (MPM) as an extension of SPM, is proposed to directly assign probabilistic 
usage on multi-pin nets. Post-Probability Processing (PPP) is presented to improve the 
correlation between the congestion predicted by SPM versus the actual congestion by 
global router. The experimental results show that MPM is about 21.70 times faster 
than LKS, a revision of Lou's model [41], with comparative correlation. SPM with 
PPP achieves much better correlation than LKS with its run time even less than that 
of 
In Chapter 6, we propose an efficient and effective approach to construct rectilinear 
steiner minimum tree with rectilinear blockages. The connection graph we used in this 
approach is called spanning graph which only contains 0(n) edges and vertices. An 
0(n log n) time algorithm is proposed to construct spanning graph for RSMTRB. The 
experimental results shows that this approach can achieve a solution with significantly 
reduced wire length. The total runtime increased is negligible in the whole design flow. 
6 
CHAPTER 2. BOUNDS ON THE NUMBERS OF SLICING, 
MOSAIC AND GENERAL FLOORPLANS 
2.1 Introduction 
Floorplanning is a major step in the physical design cycle of VLSI circuits. It is the 
step to plan the positions and the shapes of the top-level blocks of a hierarchical design. 
With circuit sizes keep on increasing, floorplanning becomes more and more critical in 
determining the quality of a layout. 
Floorplanning can be viewed as the problem of placing flexible blocks, that is, blocks 
with fixed area but unknown dimensions. There are many variations in the problem for­
mulation [2, 3, 4]. Unfortunately, all practical floorplanning formulations are NP-complete 
[2, 3], As a result, many floorplanners adopt simulated annealing [5] or other stochas­
tic techniques. A code, called a floorplan representation, is usually used to represent 
the geometrical relationship among the blocks. The code is perturbed repeatedly by the 
stochastic techniques to search for a good floorplan. The run time and the quality of the 
solutions depend strongly on the size of the solution space, i.e., the number of possible 
codes. 
The geometrical relationship among the blocks is commonly specified by a rectangular 
dissection of the floorplan region. The floorplan region is first dissected into rectangular 
rooms and each block is then mapped to a different room. In order to restrict the size 
of the solution space, three different ways of dissection are proposed. The corresponding 
floorplanning structures are called slicing [6], mosaic [7] and general floorplan [8]. Slicing 
7 
floorplan is a special case of mosaic floorplan, and mosaic floorplan is a special case 
of general floorplan. The relationship among the solution spaces of slicing, mosaic and 
general floorplans is illustrated in Figure 2.1. However, only very loose lower and upper 
bounds on the size of these three sets are available. The details are discussed below. 
Slicing 
Mosaic 
General 
Figure 2.1 Relationship among the solution spaces of slicing, mosaic and 
general floorplans. 
Slicing floorplan is a rectangular dissection that can be obtained by recursively cut­
ting a rectangle horizontally or vertically into two smaller rectangles. In [9], Otten first 
proposed to represent slicing floorplan using a binary tree representation called slicing 
tree. Each leaf of the slicing tree corresponds to a block and each internal node represents 
a vertical or horizontal merge operation on the two descendents. Note that one slicing 
floorplan may correspond to more than one slicing tree. Later, the redundancy was iden­
tified by Wong and Liu in [6], where Normalized Polish Expression (NPE) was proposed 
to represent any slicing structure without redundancy. An upper bound on the number of 
NPEs, which is also an upper bound on the number of slicing floorplans, is 0(n!23"/nL5)1 
. The best lower bound on the number of slicing floorplans is given by the number of 
binary trees without labels on internal nodes, which is Q(n!22n/n1,5)[10]. 
Mosaic floorplan was proposed by Hong et al. in [7], In mosaic floorplan, non-slicing 
structures (e.g., a wheel structure) are allowed. However, the floorplan region is dissected 
into exactly n rooms so that each room is occupied by one and only one block. In addition, 
1In this paper, we let n be the number of blocks in the floorplanning problem. 
8 
there is no crossing cut in the mosaic floorplan. See Figure 3.1 for some structures that 
cannot be represented in mosaic floorplan. Corner Block List (CBL) was proposed in [7] 
to represent mosaic floorplan. The size of the solution space for CBL is 9(n!23n). Notice 
that some CBLs do not correspond to any floorplan. At about the same time, Sakanushi 
et al. [11] introduced the Quarter-State Sequence (Q-Sequence) representation for mosaic 
floorplan. Q-Sequence is a concatenation of room names and two kinds of positional 
symbols, with the total length equals 3n. It is a non-redundant representation of mosaic 
floorplan. An upper bound on the size of the solution space for Q-sequence is 0(n!23"). 
There is no previously available result in literature on the lower bound on the number of 
mosaic floorplans. So the best lower bound is the same as the one for slicing floorplan. 
A B 
D C 
A B 
C D 
fa) rw 
Figure 2.2 Structures that cannot be represented in mosaic floorplan. 
General floorplan is similar to mosaic floorplan in that non-slicing structures are al­
lowed. However, the floorplan region can be dissected into more than n rooms such that 
some rooms are not occupied by any block. Many representations have been proposed 
during the 1990's. In [12], Onodera used Branch-and-Bound algorithm to solve the general 
floorplan problem. An upper bound on the size of the solution space for this approach is 
0(2"("+2)), which is extremely huge. In [13], Murata et al. introduced the sequence pair 
(SP) representation for general floorplan. BP is one of the most elegant representations for 
general floorplan and has been widely used. Unfortunately, redundancy still exists in this 
representation. The number of different SP is 0((n!)2). In [14], Nakatake, et al. proposed 
the Bounded-Sliceline Grid (BSG) representation. In BSC, n blocks are randomly placed 
9 
in a special n-by-n grid. The corresponding size of the solution space is 0(n\C(n 2 ,n)) ,  
which is even larger than that of SP. The huge solution spaces of SP and BSG restrict 
the applicability of these representations in large floorplan problems. Later, O-tree [15] 
and B*-tree [16] were proposed to represent a compacted version of general floorplan. 
Compared to SP and BSG, these two representations have a much smaller solution space 
of 6(n!22n/n1,5). However, they represent only partial topological information, and the 
dimensions of all blocks are required in order to describe an exact floorplan. In addition, 
not all possible rectangular dissections can be represented by O-tree and B*-tree. For the 
lower bound on the number of general floorplans, there is no previously available result 
in literature. So the best lower bound is again the same as the one for slicing floorplan. 
Recently, several representations have been proposed to construct general floorplans 
by inserting empty rooms into mosaic floorplans. They make use of mosaic floorplan as 
an intermediate step to represent non-slicing structures. In such approach, the number 
of empty rooms is crucial because it changes the size of the solution space significantly. 
In [17], Zhou et al. proved that n2 — n empty rooms are enough to produce all general 
floorplans. As a result, the size of the solution space is as huge as 0(n\C(n2, n)23™2). 
In [18], Zhuang et al. proved that n — \_\fin — lj empty rooms are enough to generate 
all general floorplans. But the size of the solution space of 0(26"(2n)!/n!) is still quite 
large. Recently, Young et al. introduced Twin Binary Sequences (TBS) [19]. TBS is a 
non-redundant mosaic floorplan representation in which the exact positions for irreducible 
empty room insertion can be found in linear time. So, by upper-bounding the number of 
ways to insert empty rooms into each TBS, we can derive an upper bound on the number 
of general floorplans. We use this idea to derive the bound in Section 2.5. 
In [20], Yao et al. showed that the exact number of slicing floorplans is given by the 
Super Catalan number and the exact number of mosaic floorplans is given by the Baxter 
number. However, Super Catalan number is given as a recurrence relation and Baxter 
number is given as a very complicated summation. The growth rate of those numbers 
10 
are hard to comprehend. The asymptotic bounds derived in this paper give us a better 
understanding on those numbers as well as on the number of slicing and mosaic floorplans. 
2.2 Contributions 
Although many representations of these three types of floorplan have been studied in­
tensively and several upper bounds on the number of combinations of those representations 
have been reported, it is still theoretically interesting to find the tight bounds on the num­
ber of slicing, mosaic and general floorplans. In this paper, we got the exact number of the 
/ \ / \ 
slicing floorplans is n!^—^— ]T^=0(3 + V8)n 2k 
\ 
1/2 
y n — k j  
. Also we prove that 
1/2 
k 
the tight bound on this number is 0(n!(3+v/8)"/n1-5) = 0(n!22'543™/nL5). For the number 
of mosaic floorplans, based on the Baxter number, we show that the tight bound is 0(n! 
23™/n4). For the number of general floorplans, based on the idea of inserting the empty 
rooms into TBS, we derive a tighter upper bound of 0(n!25n/n4,5). Based on the bound 
for mosaic floorplan, we also get a tighter lower bound of fi(n!23"/n4) on the number of 
general floorplans. The results are summarized in Table 2.1. 
Table 2.1 A summary of results. 
Previous Bounds 
Lower Upper 
Our Bounds 
Lower Upper 
Slicing 
Mosaic 
General 
[10] 0(m!2^/^^) [6] 
[10] 0(n!2^) [11] 
n(m!22"/?f'5) [10] 0((n!H [13] 
8(n!2^"/ni-5) 
8(n!23"/n4) 
These bounds give us a better understanding on the relative sizes of these three types 
of floorplan. In addition, these bounds could be utilized as a criterion to evaluate the size 
of the solution space of different floorplan representation. 
The organization of this paper is as follows. In Section 2.3, we will show the detailed 
proof of the exact number and the tight asymptotic bound on the number of slicing 
11 
floorplans. In Section 2.4, we will present the tight bound on the number of mosaic 
floorplans. In Section 2.5, a tighter upper bound on the number of general floorplans will 
be derived. In Section 2.6, we will conclude the paper. 
In [9], Otten et. al introduced a kind of binary tree called Slicing Tree (ST) to represent 
slicing structure. A ST is a hierarchical description of the direction of the cuts (vertical 
or horizontal) in a slicing floorplan. However, for a given slicing floorplan, there may 
be more than one slicing tree representation. In order to non-redundantly represent all 
slicing floorplans, Wong and Liu [6] proposed a special kind of slicing tree named Skewed 
Slicing Tree (SST). A SST is a slicing tree in which no node and its right child have the 
same label in {*, +}(Figure 2.3), where they interpreted the symbols * and + as two 
binary operators between slicing floorplans. They used the postorder traversal of SST 
called the Normalized Polish Expression as the floorplan representation. Wong and Liu 
noted that there is a one-to-one correspondence between the set of NPEs of length 2n — 1 
and the set of SSTs with n leaves. Thus, A one-to-one correspondence also exists between 
all SSTs with n leaves and all slicing structures with n rectangular rooms. Therefore, 
we could obtain the number of slicing floorplan configurations with n blocks by counting 
the number of SSTs with n leaves. Before we explore the tight bound on the number of 
SSTs, we will first show how the exact number of SSTs can be obtained in the following 
subsection. 
2.3 The Tight Bound for Slicing Floorplan 
A 
B 
C 
D 
E 
B A 
Figure 2.3 Slicing floorplan and its corresponding Skewed Slicing Tree. 
12 
2.3.1 The exact number of skewed slicing trees 
Suppose there are overall t n  different SSTs with n leaves. When n = 1, obviously 
h = 1. (2.1) 
When n > 2, we classify them into two types of SSTs as follows: a n  "+"-rooted SSTs 
with n leaves and bn -rooted SSTs with n leaves. Then 
tn = Qn ~t~ (2.2) 
Given a "+"-rooted SST F (see Figure 2.4), according to the definition of SST, the left 
subtree L of F could be either a SST ("*" -rooted or "+"-rooted) with fewer than n leaves 
or a single leaf. The right subtree R of T could be either a -rooted SST with fewer than 
n leaves or a single leaf. Then the number of "+"-rooted SSTs with n leaves becomes: 
d n  = tlbn-l  +  ^ 2^n-2 + ' " '  +  £«-2^2 + t n—\ • 1. (2.3) 
Similarly, the number of -rooted SSTs with n leaves is 
b n  = tia n_i  + /:2tin-2 + • • • + in-2^2 + t n -1 • 1- (2.4) 
i-
Figure 2.4 A "+"-rooted Skewed Slicing Tree F. 
According to Eq. (2.2), (2.3) and (2.4), the number of SSTs with n leaves becomes 
t n  — t i (a n - i  + b n~ 1) + ^2 (®n—2 + b n^ 2) + • • • 
+ tn—2 (@2 + 62) + 2in_i 
= t \ t n-l  + 2 + • • • + in-2^2 + t n - \ t i  + 1.  
13 
(2.5) 
In order to solve the recurrence equation (2.5) with the initial condition (5.2.1), we define 
the generating function as 
T(z)  = t \  + t2Z + t^z 2  + • • • (2.6) 
Then, we have 
T 2 (z)  = t \  + (t \ t2  + t2t \ )z  + (t i ts  + t 2 t2  + t^t i )z 2  + • • •  
and so 
T 2 (z)  + T(z)  = (t 2  + t\)  + (i t i^  + t i t i  + (g)z  
+ (^1*3 + *2^2 + ^3^1 + (3)^^ + ' ' ' (2.7) 
Combining (2.6) and (2.7) yields 
Zi + [T2(z) + T(z)]z = T(z). 
Since ti = 1, then 
zT(z)2 + (z - l)r(z) + 1 = 0. (2.8) 
Solving Eq. (2.8) with initial condition T(0) =ti  = l  yields 
T(z)  = 1 ~ 2 - ^ - f c + 1 .  
14 
Let a = 3 + V8 and (3 = 3 — A/8. Notice that a/3 = 1. Thus 
T(z)= 1 — z  — \ /a  — z \ j f3  — z  
2z 
1 
2z 
1- Z—y/cxP* I  
i— E 
i=0 
a y f3_ 
M 
V /  
a E j=o 
V" 
y 
By the definition of generating function (2.6), for n > 2, we get the coefficient of zn as 
k=0 
/A 
\*7 
-l)n+1 " 
~2 E 
fc=0 
a 
a 
n—2k 
( 1 \ 
^n — k  j  
-1 
0 
n—k 
(  i  ( i 
2 
n — k  
The exact number of SSTs thus has the general form 
tn  — < 
(~1)"+ 1  V^n  nn~2 k  2 Z^fc=0 " 
2 
Vv 
if n = 1 
if n > 2 
n — kI  
In the following subsection, we try to get the tight bound on t n .  
(2.9) 
2.3.2 The tight bound on the number of slicing floorplans 
In order to obtain the tight bound on t n  (n  > 2), we rewrite Eq. (2.9) as 
t n  = ^ F(k)  
where 
F{k)  = a -l)n+1 
k=0 
n—2k 
. \ I  , \ 
\k  j  
2 
n — k  
15 
First, we bound F(0)  as follows: 
( _ l ) n + l  
^(0) - ^ 4— 
/ i \ 
\ n  J 
= 0 
2 n! 
gw(l x 3 x x (2n-3)) 
2  n + 1n\  
a" (2»)! 
22"+i(n!)2(2M-l) 
^ a"\/27r2M(^)^ 
(-l) r  
then, by Stirling's approximation2 
^2"+i(V2^)2(;rM2n - 1^/ 
= 0(a"/^^) 
= 8(2^"/^'^) 
Second, we will bound Yl"k=i F(k)-
For 1 < k < n — 1, let 
F(k)  (n  — k  + 1 )(k  — | )  
F(k — 1)  a 2{n — k  — \ )k  r k  
When n —> oo, it is not difficult to observe that 
n % -0.0147, 
0 < 7"2 < 7"3 < - < r»_i < 1, 
rn_i % 0.1177 
Therefore, F( 1) = r%F(0) < 0 and 
n—1 
Y^F(k)  = F(l)+F(l)r 2  + F(l)r 2r 3  H \ -F(l)r 2  •  •  -r„_j 
> F(l)+F(l)r»_i+F(l)r:_i + . - .+F(l)r^ 
F( l )  
k—1 
> 
1 — fn-l 
-0.0166F(0). 
^Stirling's approximation for n! = 0 (\/2rm (7)") [10]. 
16 
Thus we bound Xwc=i F(k) as 
n—1 
-0.0166F(0) < < 0. 
k=l 
Third, we bound F(n)  as 
F(m) = a-^F(O) = o(F(0)). 
We thus get the tight bound on t n  as 
^ = F(0) - 0.0166F(0) + o(F(0)) 
= 8(F(0)) 
= 8(2^"/ni5) 
If we consider the labels of the leaves of SSTs, there are n! combinations for labeling 
of the leaves of SSTs. Thus the total number of combinations of slicing floorplan is 
0^!22.543n/^1.5) 
2.4 The Tight Bound for Mosaic Floorplan 
In paper [20], Yao et al. first proved that the number of combinations of mosaic 
floorplan with n blocks is equal to the number of Baxter permutations on {1,..., n}. Then, 
M(n) = B(n), where M{n) is the exact number of combinations of mosaic fioorplans with 
n blocks, and B(n) is a Baxter number, which can be represented as follows: 
\ i / \/ 
W 
\fc+1/ 
(2.10) 
17 
In order to get the tight bound on B(n) ,  we first simplify Eq. (2.10) as follows: 
2 ((n+W 
"^(n)=n(n-l-l)2^ (k  — l)!(n—k+2)\k\(n—k+l)\ (k  + l) \ (n  — k) \  
2(n!)3(n+l) 
2-^n(k — iy .k \ (k  + l) \ (n  — k) \ (n  — k+l) \ (n  — k+2)\  
=^G(k) ,  
k=1 
where 
G(A:) = 2(n!)
3(n + 1) 
n(k — 1 ) \k \ (k  + l)!(n — k) \ (n  — k  + l)!(n — k + 2)! 
Without lost  of  general i ty ,  we assume n is  even.  Note that  for  1  < k < n 
C(&) = G(n - t + 1) 
and for 2 < k < |  
G(Jt) ( j i  — k  + 1)(h — fc + 2)(n — k + S) 
> 1 
G(A:-1) (t-!)&(& + !) 
Therefore, for 1 < k < |, G(fc) will increase with the increasing of k\  for | + 1 < k < n, 
G(fc) will decrease with the increasing of k. Thus, the distribution of G(k) will be roughly 
like Figure 2.5. 
G(k) 
G(n/2) 
.... -5- +1 ••••n k 
Figure 2.5 The distribution of G(k). 
18 
First, we bound G (|) as 
2 (n!)3(n + 1) (§) G - = 4#-!)!(!)!(#+ !)!(!)!(! + !)!(!+ 2)! 
(n!)3(n + 1) 
( ( # ( !  +  ! ) ( % +  2 )  
= t(7T»)3(^)3"#I ^ ^^ approximation 
= 8(2^/^'^). 
Second, we try to bound Ylk=i G(k). 
Lemma 1. For | — [Vn In n] < k < |, w/ïen n —»• oo; G(&) = where M = 
e ( ( ï f ( 2 - ï ) 3 ( " - t ) ) .  
Proof: For f\/nInn] < /c< |, when n —> oo, notice that k —>• oc, n — k —> oo. By 
Stirling's approximation, we work out G(/c) as 
G(A:) = 8^ 
= 0 
n3(A:!)3((n — fc)!)3, 
(27r7l)^(2)3" 
(2^)3/2(^)3^ (27T(n - t))^ (2^)3("-t)n3 
3n 
8 ' " 
= 0 
n4-5k3k(n — k)3(n fc) 
23n X 
n4.5^2fcj3fe^2 — M)3(n-fc) J 
c(i) 
M ' 
where M = 8 ((^^2-^)^). 
Lemma 2. WTien /c = | — [^j\/n In n], G(fc) = 0 (n-1) G (|) 
Proof: Assume 
(2.11) 
/c = ^ — [cVnhm] (2.12) 
19 
where 0 < c < 1. By Lemma 1, we have 
3 ( f  ~ c ^ n  l n n )  /  r ~ —\ 3(f+c\/n In nj (1+2c^ ) 
Using the limit of function 
Zm%c_,o(l ± x)* = e±fc, 
we simplify M as 
_ Q ^g-(3o/nInn—6c2 Inn) _ ^(ScVnlnn+6c2 lnn)^ 
= 8 . (2.13) 
Let c = by (2.11) and (2.13), we get 
, 1 
7Î2 
gGH-4 ^ 1) = 0(»-,)«© 
Noticing that 
0(1) < G(2) < ... < G ^ 
we thus bound ^ G(k)as follows: 
f - [ ^ V n b m ]  
0< 
z -, \ 
Lemma 3. For | - [^Vn In n] < k < ~, 
^ boumjed 8 (ynG(^)). 
Proof: For | In n] < k < |,we take G(fc) as an continuous function.Thus we 
bound the summation of G(k) by bounding the integration of continuous function G(k) 
20 
n 12°2dc 
where | — ^=\/nInn < k < |. By (2.11), (2.12) and (2.13) 
=  V n I n n G ( ^ ) © (  f  
V v ù  / 
=  \ / n l n n G ( ^ ) 0 ^ J n~12°2dc^ . 
1 
In order to solve the integration n~l2c2dc, we use the property of Normal Distribution 
with the probability density function as 
1 - *2 P(x) — —-=e 
for 0 < x  < 3a 
3lT 
_ x2 
e 3T^dx « 0.5 
i.e. 
Then 
_L_ / 
Jo 
l e~éix " Vf" 
Jo  Jo  
Notice that when n —> oo 
1 
» 3 
thus 
L 
VÏ2 V241nn' 
Then, we have 
2 A/24 In n 
C(t) = 8(V^G(^)) 
f _ r ^ 2 ^ n  ' n n i  
21 
We thus obtain the tight bound on B(n) as 
B(n) = 
k=1 
- 2 G  (i -
2 e ( G Q ) )  +  e ( ^ Q ) ) _ e M G Q )  
= 0 { v - n a Q ) )  
8(2^/^). 
If we consider the labeling of block names, the final tight bound on the number of mosaic 
floorplans with n blocks is 0(n!23n/n4). 
In paper [19], a general floorplan F' can be constructed from a mosaic floorplan F by 
inserting some irreducible empty rooms into a mosaic floorplan at right places in F. There 
are two kinds of empty rooms. One is called reducible empty room which is resulted from 
assigning a small block into a big room (see an example in Figure 2.6(a)). Another kind 
of empty room is called irreducible empty rooms which can not be removed by merging 
with another room in the packing (see an example in Figure 2.6(b)). In addition, a wheel 
structure always exists in every irreducible empty room (see Figure 3.11). We discuss this 
idea in more detail in the following subsection. 
2.5.1 Empty room insertion 
Observation 1. A wheel structure can be produced from and only from the following 
mosaic structure: a pair of T junctions share the same channel on each side, respectively. 
2.5 An Upper Bound for General Floorplan 
22 
B 
reducible 
empty room 
irreducible 
empty room 
fb) 
Figure 2.6 An example of reducible and irreducible empty rooms. 
_ irreducible 
empty room 
The four T-junctions at the 
comers of an irreducible empty 
room form a wheel structure 
Figure 2.7 A wheel structure. 
It is shown in Figure 2.8. 
C C 
A 
D . A X D 
B B 
Channel c 
\ A A C X B 
D 
Channel c 
Figure 2.8 A pair of T junctions produce a wheel structure. 
Based on the Observation 1, it is not difficult to prove the following Lemma. 
Lemma 4. For a channel with p and q blocks on each side, respectively, the maximum 
number of irreducible empty rooms which could be inserted along the channel is min(p, q)— 
1. 
Proof: Without lost of generality, we assume p < q. Then, from p blocks on one 
side of a channel, we could find out p—lT junctions. Similarly, for q blocks along the 
other side of the channel, there is q — 1 T junctions. Therefore, at most we could pick 
p pairs of T junctions from each side at one time. According to the Observation 1, an 
empty room could be produced by any pair of T junctions with one from each side of the 
23 
channel, respectively. We label these T junctions as T1,T2,--- ,Tp_i on each side from 
top to bottom or from left to right, and then match them one by one according to the 
order from Ty to Tp. p — 1 empty rooms could thus be inserted along the channel. An 
example with 4 blocks and 5 blocks along a channel c on each side, respectively, is shown 
in Figure 2.9. Maximumly, 3 irreducible empty rooms (represented by X) are inserted in 
this example. • 
Empty room X 
A A 
E 
F 
B 
G 
C 
H D 
I 
X 
D 
Channel c 
Figure 2.9 An example of inserting maximum number of irreducible empty 
rooms. 
Lemma 5. For a channel with p and q blocks on each side, respectively, the number of 
p + q — 2 
ways to insert empty rooms along the channel is 
V 1 J 
Proof: Without lost of generality, we assume p < q. According to Lemma 4, we can 
insert at most p — 1 empty rooms along the channel, which means j 0 < j < p— 1 empty 
rooms could possibly inserted along the channel. Similar to the PROOF of Lemma 4, 
we pick j pairs of T junctions from each side of the channel, and label them from top to 
bottom or left to right and then match them one by one according to the order. We can 
thus insert j empty rooms into those T junctions. 
Let C(p, q) denote the number of ways to insert empty rooms along the channel with 
24 
p and q blocks on each side, respectively. We have 
3=0 
p-1  
=£ 
.7—0 
q — 1 
V J / 
p - 1  
p - l - j  
\ 
\ J ) 
x /
, - l x  
V J / 
P  +  g - 2  
p- 1 
by the definition of Combination 
( \ 
p + q — 2 
\ q- 1 / 
In the next subsection, we will formulate the total number of ways to insert empty 
rooms into a mosaic floorplan with n blocks. 
2.5.2 An upper bound on the number of general floorplans 
Given a mosaic floorplan with n blocks, by counting the total number of ways to 
insert empty rooms into the mosaic floorplan, we can obtain the total number of general 
floorplan generated from the mosaic floorplan. 
For a mosaic floorplan with n blocks, it has overall n + 3 channels. Without lost of 
generality, we assume it has k (2 < k < n + 1) horizontal channels with the uppermost 
boundary as the 1st horizontal channel and downmost boundary as kth horizontal channel. 
In addition, it has n + 3 — k vertical channels with the leftmost boundary as 1st vertical 
channel and rightmost boundary as (n + 3 — k)th vertical channel. 
Let hi (i = 1,2, • • • , k) be the number of blocks which touch the ith horizontal channel on 
the top, h'j be the number of blocks which touch the ith horizontal channel on the bottom. 
Let Vj (1 < j < n+3—k) be the number of blocks which touch the jth vertical channel on 
25 
the left, v'a be the number of blocks which touch the jth vertical channel on the right. 
Assuming hi = h'k = v 1 = v'n+3_k — 1, we have 
k k n+3—fc n+3—k 
53 ^=  v ' i = n + i .  
i=1 2=1 j=l 
We denote L(n) as the total number of ways to insert empty rooms into a mosaic 
floorplan with n blocks, according to Lemma 5 
k n+3—k 
n cw) 
i=1 j=1 
/ .  . .  A  .  /  
n i=i 
V 
/ \ / \ 
s 
/ 
< 
n+3—/c 
n 
j=i 
•U,- + u' - 2 
X Vj - 1 
hi + h!i — 2 
hi ~ 1 
In order to get an upper bound on L(n), we notice that Vp, q , s , t  e  {0,1,2, 
P 
\ q J  
( \ 
p + s 
X * / q +1 
We thus bound (2.14) as 
L(n) < 12i=i(hi + h[ - 2) 
Zti(% -1) 
X / 2 Ti + 2 — 2 k 
271-2 
271 - 2 ! 
- 1 )  
X 
2n + 2 — 2(tî + 3 — fc) 
77. —|— 1 — (?1 + 3 — /c) y 
then, by Stirling's approximation 
2n—2 \ 
= 0 
((71-1)1)2 
\/27r(27i - 2)((2n - 2)/e) 
2n-2 
(2.14) 
,n} 
(\/27r(7i- 1))2((71 - l)/e] 
= 0(2^/71°'^) 
Since the tight bound on the number of mosaic floorplan with n blocks is 6(n!23n/n4), 
we obtain an upper bound on the number of general floorplan with n blocks as 0(n!25n/n4'5). 
26 
2.6 Conclusion 
We have successfully obtained tight bounds of 0(n!22-543n/n1,5) on the number of 
slicing floorplans and Q(n!23n/n4) on the number of mosaic floorplans. However, for the 
number of general floorplans, the lower bound fî(n!23"/n4) is still significantly smaller 
than the upper bound 0(n!25n/n4,5). We will work on the tight bound on the number of 
general floorplans in the future. 
Regarding floorplan representations, NPE is a non-redundant representation for slic­
ing floorplan. Q-sequence and TBS are two non-redundant representations for mosaic 
floorplan. However, there is no non-redundant representation for general floorplan. Al­
though all general floorplans can be produced by inserting empty rooms into TBSs, the 
information describing which empty room to be inserted is not uniform. Hence TBS can­
not be easily extended to a succinct representation which describes a general floorplan 
completely. We will also work on the problem of designing an elegant and non-redundant 
general floorplan representation in the future. 
27 
CHAPTER 3. GENERAL FLOORPLAN 
REPRESENTATION TWIN BINARY SEQUENCES 
3.1 Introduction 
Floorplan design is a major step in the physical design cycle of VLSI circuits to 
plan the positions and shapes of a set of modules on a chip in order to optimize the 
circuit performance. As technology moves into the deep-submicron era, circuit sizes and 
complexities are growing rapidly, and floorplanning has become ever more important than 
before. Area minimization used to be the most important objective in floorplan design, 
but today, interconnect issues like delay, total wirelength, congestion and routability have 
instead become the major goal for optimization. Unfortunately, floorplanning problems 
are NP-complete. Many floorplanners employ methods of perturbations with random 
searches and heuristics. The efficiency and effectiveness of these kinds of methods depend 
very much on the representation of the geometrical relationship between the modules. 
A good representation can shorten the searching process and allows fast realization of 
the floorplan so that more accurate estimations on area and interconnect costs can be 
performed. 
3.1.1 Previous works 
The problem of floorplan representation has been studied extensively. There are three 
types of floorplan: slicing, mosaic and non-slicing. A slicing floorplan is a floorplan 
that can be obtained by recursively cutting a rectangle into two by using a vertical line 
28 
or a horizontal line. Normalized Polish expression [32] is the most popular method to 
represent slicing floorplan. This representation can describe any slicing structure with 
no redundancy. An upper bound on its solution space is 0(n!23"~3/nL5). For general 
floorplan that is not necessarily slicing, there was no efficient representation other than 
the constraint graphs until the sequence pair (SP) [27] and the bounded-sliceline grid 
(BSG) [28] appeared in the mid 90's. The SP representation has been widely used because 
of its simplicity. Unfortunately, there are a lot of redundancies in these representations. 
The size of the solution space of SP is (n!)2 and that of BSG is n\C(n2,n). This drawback 
has restricted the applicability of these methods in large scale problems. O-tree [24] and 
B*-tree [21] are later proposed to represent compacted (admissible) non-slicing floorplan. 
They have very small solution space of 0(n!22ra-2/nL5) and can give a floorplan in linear 
time. However, they describe only partial topological information and module dimensions 
are needed to give a floorplan exactly. The representation is not unique, and a single O-
tree or B*-tree representation, depending on the module dimensions, can lead to more 
than one floorplan with modules of different topological relationships with each other. 
The paper [25] proposes a new type of floorplan called mosaic floorplan. A mosaic 
floorplan is similar to a general non-slicing floorplan except that it does not have any un­
occupied room (Figure 3.1(a)) and there is no crossing cut in the floorplan (Figure 3.1(b)). 
A representation called Corner Block List (CBL) is proposed to represent mosaic floor-
plan. This representation has a relatively small solution space of 0(n!23") 1 and the time 
complexity to realize a floorplan from its representation is linear. However, some corner 
block lists do not correspond to any floorplan. As a remedy to the weakness that some 
non-slicing structures cannot be represented (e.g., Figure 3.1(a)), CBL is extended by 
including dummy blocks of zero area in the set of modules. In order to represent all 
1In [25], the paper claims without proof that the size of solution space for Corner Block List is 
0(n!23"/nL5). However, we believe that the correct size of CBL solution space should be 0(n!23"). 
In the CBL algorithm, the corner block list (S, L, T) are perturbed randomly and independently in the 
simulated annealing process. There are n! combinations for S, 2n~x combinations for L, and 22"-3 
combinations for T. So the total number of combinations is 6(n!23n). 
29 
non-slicing structure, 0(n2) of such dummy blocks are used and this has increased the 
size of the solution space significantly [34]. In the paper [30], a new representation called 
Q-sequence is proposed to represent mosaic floorplan, which is later enhanced in the pa­
per [35] by including empty rooms. It is also proved in [35] that the number of empty 
rooms required is upper bounded by n — \_\fAn — lj where n is the number of modules. 
A B 
D C 
A B 
C D 
Cat (b) 
Figure 3.1 Structures that cannot be represented in a mosaic floorplan. 
3.1.2 Our contributions 
Although the problem of floorplan representation has been studied extensively, and nu­
merous floorplan representations have been proposed in recent years, it is still practically 
useful and theoretically interesting to find a complete (i.e., every non-slicing floorplan 
can be represented) and non-redundant topological representation for general non-slicing 
structure. In this chapter, we will present such a representation, the Twin Binary Se­
quences. This will mark the first of this kind. Like some previous work [34], we have 
made use of mosaic floorplan as an intermediate step to represent a non-slicing structure. 
However, instead of including an extra number of dummy blocks in the set of modules, 
the representation allows us to insert an exact number of irreducible empty rooms to a 
mosaic floorplan such that every non-slicing structure can be generated uniquely and non-
redundantly. Besides, the representation can give a floorplan efficiently in linear time. We 
have studied the relationship between mosaic and non-slicing floorplan and have proved 
30 
that the number of empty rooms needed to be inserted into a mosaic floorplan to obtain 
a non-slicing structure is tightly bounded by 0(n) where n is the number of modules. 2 
In the following section, we will define twin binary sequences, and show how a floorplan 
can be constructed from this representation in linear time. In section 3, we will show 
how this representation can be used to describe non-slicing structure with the help of a 
fast empty room insertion process. We will also present some interesting results on the 
relationship between mosaic and general floorplan. In section 4 and 5, we will discuss our 
floorplanner based on simulated annealing and the experimental results will be shown. 
3.2 Twin Binary Sequences (TBS) Representation 
In the paper [33], Yao, et al. first suggest that the Twin Binary Trees (TBT) can 
be used to represent mosaic floorplan. They have shown a one-to-one mapping between 
mosaic floorplan and TBT. We have made use of TBT in our representation. Recall that 
the definition of twin binary trees comes originally from the paper [23] as follows: 
Definition 1 The set of twin binary trees with n nodes TBTn C Tree„ x Treen is the set: 
TBT» - {(61,62)161,62 E ÏYee» and 8(6i) = 8^(62)} 
where Treen is the set of binary trees with n nodes, and 0(6) is the labeling of a binary 
tree b as follows. Starting with an empty sequence, we perform an inorder traversal of the 
tree b. When a node with no left child is reached, we will add a bit 0 to the sequence, and 
when a node with no right child is reached, we will add a bit 1 to the sequence. The first 
0 and the last 1 will be omitted. 0C is the complement of 0 obtained by interchanging 
all the 0's and l's in 0. An example of a twin binary trees is shown in Figure 3.2. 
Instead of using an arbitrary pair of trees (which may not be twin binary to each other) 
directly, we used a 4-tuple s = (7r, a,/?,/?') called a twin binary sequences to represent 
^Together with the upper bound result in [35], the tight bound can be further improved to Q(n—2>/n). 
31 
o 10 1 
bi 
labeling = 100101 
0 
b2 
labeling = 011010 
Figure 3.2 An example of a twin binary trees. 
a mosaic floorplan with n modules where n is a permutation of the module names, a is 
sequence of n — 1 bits, and (3 and f3' are sequences of n bits. The properties of these 
bit sequences will be described in details in section 2.2. This 4-tuple can be one-to-one 
mapped to a pair of binary trees ti and t2 such that t\ and t2 must be twin binary to 
each other and they together represent a mosaic floorplan uniquely. Most importantly, 
we are then able to insert empty rooms to tx and t2 at the right places to give a non-
slicing floorplan. We proved that every non-slicing structure can be obtained by this 
method from one and only one mosaic floorplan. In order to motivate the idea of our new 
representation, we will first show how a twin binary trees can be obtained from a mosaic 
floorplan in the following subsection. 
3.2.1 From floorplan to twin binary trees 
Given a mosaic floorplan F, we can obtain a pair of twin binary trees /, i and t2 by 
traveling along the slicelines of F. An example is shown in Figure 4.6. To construct 
ti, we start from the module at the lower left corner and travel upward (left subtree) 
and to the right (right subtree). Whenever the lower left corner of another module x is 
reached, a node labeled x will be inserted into the tree and the process will be repeated 
starting from module x until all the modules in the floorplan are visited. The tree t2 
32 
can be built similarly by starting from the module at the upper right corner and travel 
downward (right subtree) and to the left (left subtree). Similarly, whenever the upper 
right corner of another module y is reached, a node labeled y will be inserted into the 
tree and the process will be repeated starting from y until all the modules are visited. 
The paper [33] has shown that the pair of trees built in this way must be twin binary to 
each other, and there is a one-to-one mapping between mosaic floorplan and twin binary 
trees. We observed that the inorder traversal of the two binary trees constructed by the 
above method must be the same. Let us look at the example in Figure 4.6. We can see 
that the inorder traversais of both t\ and are ABCFDE. We have proved the following 
observation that helps in defining the Twin Binary Sequences representation: 
<r— 
»— 
ti 
Figure 3.3 
Observation 1 A pair of binary trees t\ and can be constructed from a mosaic 
floorplan by the above method if and only if (1) they are twin binary to each other, i.e., 
Q(ti) — 6c(t2), and (2) their inorder traversais are the same. 
Proof: (if part) This part can be proved by induction on the number of modules in 
the floorplan. The base case occurs when there is only one module in the floorplan and 
conditions (1) and (2) follow trivially. Assume that these conditions are true when there 
are not more than k > 1 modules in the floorplan. Consider a floorplan F with k + 1 
t2 
—o 
B 
-Q— 
D 
A 
o © ©w 
tl 
tl left-going branches: {C, B}, {E, D}. 
t / 
t2 right-going branches: {A}, {F, C}. 
Building a twin binary trees from a mosaic packing. 
33 
modules. Let the pair of binary trees constructed from F by the above method be t\ and 
Î2- Consider the module m at the upper left corner of F. There are only four possible 
configurations for the position of m in F as shown in Figure 3.4. In each case, let F' be the 
floorplan obtained by sliding module m out of F by moving the thickened sliceline in the 
direction shown. Let t[ and t2 be the pair of binary trees constructed from F' by the above 
method. Since floorplan F' has only k modules, t\ and t'2 satisfy conditions (1) and (2) 
according to the hypothesis, i.e., 0(t[) = 0C%2), and their inorder traversais are the same. 
From Figure 3.4, we can see that in case (a) and (c), 0%) = 10%), 0%) = 00%) and 
the inorder traversal of t\ (t2) is the same as that obtained by appending m in front of the 
inorder traversal of t[ %2). Similarly, in case (b) and (d), 0%) = 00%), 0%) = 10%) 
and the inorder traversal of t\ (t2) is the same as that obtained by appending m, in front 
of the inorder traversal of t[ %). Therefore ti and t2 also satisfy conditions (1) and (2). 
(only if part) Again, this part is proved by induction. The base case occurs when 
there is only one node in the pair of binary trees. If both conditions (1) and (2) are true 
(note that condition (1) must be true since there is only one node in the trees and their 
labelings are both empty), their nodes are labeled the same and they correspond to a 
packing with only one module. Assume that this statement is true for any pair of trees 
with k> 1 nodes, i.e., inorder traversal of length k and labeling of length k — 1. Consider 
a pair of trees % and t2) with inorder traversal mim2 ... m&+i, and labelings bib2 ... bk 
and 6i,62,.. .bk- There are two cases as shown in Figure 3.5 according to the value of the 
bit bi. In both cases, the inorder traversal m2m3 ... mk+i, and the bit sequences b2b3.. ,bk 
and b2,b3,.. ,bk will correspond to a floorplan F' according to the hypothesis. We can 
obtain a floorplan F from F' by putting the module mi on the right (case (a)) or at the 
t o p  ( c a s e  ( b ) ) .  F  w i l l  c o r r e s p o n d  t o  a  p a i r  o f  t r e e s  w i t h  i n o r d e r  t r a v e r s a l  m i m 2  . . .  m , k + 1 ,  
and labelings bxb2 ... bk and bib2 • • - bk. We can choose between case (a) and (b) depending 
on the value of bi. Therefore this only if statement is also true when there are k + 1 nodes 
in the pair of trees. 
34 
l\ where ti is 1 \ where t, is 
Figure 3.4 Proof of Observation 1 (if part) 
b, = 0 b[= 1 
Figure 3.5 Proof of Observation 1 (only if part). 
35 
If we extend a tree t by adding a left child of bit 0 to every node (except the leftmost 
node) that has no left child and by adding a right child of bit 1 to every node (except the 
rightmost node) that has no right child, the tree obtained is called an extended tree of t. 
An example of an extended tree is shown in Figure 3.6. Notice that the inorder traversal 
of the extended tree of t will be miaitn2a2 • • • ««-1% where mim2 • • • mn are the inorder 
traversal of t and aia2 ... a„-i is the labeling of t. Observation 1 can be restated as 
follows: 
Observation 2 A pair of binary trees t\ and t2 can be constructed from a mosaic 
floorplan by the above method if and only if the inorder traversal of their extended trees 
are the same except that all the bits are complemented. 
extended 
tree of t 
Figure 3.6 An example of an extended tree. 
3.2.2 Definition of twin binary sequences 
From observation 1, we know that a pair of binary trees ty and t2 are valid (i.e., 
corresponding to a packing) if and only if their labelings are complement of each other 
and their inorder traversais are the same. However, the labeling and the inorder traversal 
are not sufficient to identify a unique pair of t\ and t2. Given a permutation of module 
names it and a labeling a, there can be more than one valid pairs of t\ and t2 such that 
their inorder traversais are it and B(ii) = Qc(t2) = a. In order to identify a pair of trees 
36 
uniquely, we need two additional bit sequences /? and /?' for ty and t2 respectively such 
that the ith bit in (3 and /?' tells whether the ith module in tt is the left child (when the 
bit is 0) or the right child (when the bit is 1) of its parent in ty and t2 respectively. These 
bits are called the directional bits. If module k is the root of a tree, its directional bit will 
be assigned to zero. 
For a binary tree t, its labeling sequence a = «ia2 • • • »n-i and its directional bit 
sequence f3 — f3yf32 • • • f3n must satisfy the following conditions: 
(1) In the bit sequence (3yCtyf32 • • • an-y/3n, the number of 0's is one more than the 
number of l's. 
(2) For any prefix of the bit sequence /3yay/32 •. • an_i/?„, the number of 0's is more than 
or equal to the number of l's. 
We proved the following lemmas which show that conditions (1) and (2) are necessary 
and sufficient for a pair of labeling sequence a and directional bit sequence /3 to correspond 
to a binary tree. 
Lemma 1 For any binary tree, its labeling sequence a and directional bit sequence f3 must 
satisfy conditions (1) and (2). 
Proof: Given a binary tree t, the bit sequence /3yayf32 . • • a„_i /3„ is the inorder traversal 
of the extended tree t' of t (with the internal nodes labeled by their directional bits). To 
verify condition (1), notice that each internal node of t' has two children, one is labeled by 
0 and the other one is labeled by 1. We assume that the root is labeled by 0. Therefore 
condition (1) must be satisfied. To verify condition (2), notice that for any two children 
having the same parent, the child labeled 0 is always visited first in the inorder traversal. 
Therefore condition (2) must be satisfied. • 
37 
Lemma 2 For any binary sequences a ofn—1 bits and (3 ofn bits satisfying conditions (1) 
and (2), there exists a unique binary tree t such that the labeling sequence oft is a and 
the directional bit sequence of t is j3. 
Proof: The uniqueness can be proved by induction on the number of modules. The 
claim is trivially true when there is only one module, i.e., when n = 1. Assume that the 
claim holds when the number of modules is at most k, i.e., when n < k. Consider the case 
w h e n  n  =  k  +  1 .  G i v e n  a  p a i r  o f  b i n a r y  s e q u e n c e s  a  —  a \ a 2  . . . ( % &  a n d  ( 3  =  / 3 \ [ 3 2  •  •  •  A t + i ,  
we can reduce the problem to the case with k or less modules as follows. First of all, we 
append a bit ao = 0 in front of a and a bit a&+i = 1 at the end of a. Then there exists at 
least one i such that a^i = 0 and a* = 1. This is a place for a leaf node where the leaf is 
either a left (when # = 0) or a right (when (3i — 1) child of its parent. We use S to denote 
the set of all such locations, i.e., S = {i|(l < i < k + 1) fl (a,_i = 0) fl (a, = 1)}. Let a' 
be the binary sequence obtained from a by replacing a:;_ia; by for all i 6 S, and (3' be 
the binary sequence obtained from (3 by deleting $ for all i 6 S. Notice that the first bit 
of a' must be 0 and the last bit must be 1, i.e., we can write a' as 0a"l. According to the 
induction hypothesis, there exists a unique binary tree t! such that the labeling sequence 
of t' is a" and the directional bit sequence of t' is /?'. The tree t for the original pair of 
binary sequences a = a\a2 • • • au-1 and (3 = fiifo •.. (3k can be constructed uniquely from 
t! by inserting a leaf node corresponding to the module 7Tj to the position of bit $ for all 
i € S. Therefore the uniqueness still holds when n — k + 1. • 
Now, we can define a twin binary sequences representation. A twin binary sequences 
s for n modules is a 4-tuple: 
s = (n-.a,/),/?) 
where n is a permutation of the n modules, both a and (3, and ac (the complement of a) 
and (3' satisfy conditions (1) and (2). We have proved the following two theorems that 
38 
show the one-to-one mapping between twin binary trees and mosaic floorplan. 
Theorem 1 The mapping between twin binary sequences and twin binary trees is one-to-
one. 
Proof: Given a pair of twin binary trees, we can construct one unique twin binary 
sequences according to the definition in section 2.1. On the other hand, if we are given 
a twin binary sequences s = (TT, a,/?,/?'), according to Lemma 2, there exists a unique 
binary tree t (t') such that the labeling sequence of t (t') is a (ac) and the directional bit 
sequence of t (t1) is (3 (/?'). Since Q(t) — @c(t'), t and t' are twin binary. We can then label 
their nodes according to the inorder traversal tt. This is the unique pair of twin binary 
trees t and t! corresponding to s. Therefore the mapping between twin binary sequences 
and twin binary trees is one-to-one. • 
Theorem 2 The mapping between twin binary sequences and mosaic floorplan is one-to-
one. 
Proof: The one-to-one mapping between twin binary sequences and mosaic floorplan 
follows from Theorem 1 and the proof in paper [33] that the mapping between twin binary 
trees and mosaic floorplan is one-to-one. • 
3.2.3 From TBS to floorplan 
3.2.3.1 Algorithm for floorplan realization 
In order to realize a floorplan from its TBS representation efficiently, we devised an 
algorithm that only needs to scan the sequences once from right to left to construct 
the packing. We will construct the floorplan by inserting the modules one after another 
following the TT sequence in the reversed order. A simple example illustrating the step of 
the algorithm is given in Figure 3.7. At the beginning, we will put the last module of the 
39 
7r sequence, i.e., module D, into the packing P. We will then insert the other modules 
one after another. The next module to be considered after D is 1x4 = E. Since (*4 = 0, 
we will look at the sequence (3 and find the closest bit "1" on the right of [34, i.e., /%. We 
will then add module E into P from the left pushing D (since a5 = D) to the right as 
shown in Figure 3.7(b) and delete bit /% from (3. The next module to be considered after 
E is 7T3 = C. Since a:) — 1, we will look at the sequence (3' and find the closest bit "1" on 
the right of f3'3, i.e., (3\. We will then add module C into P from above pushing E (since 
«4 = E) down as shown in Figure 3.7(c) and delete bit (3'4 from /?'. These steps repeat 
until the whole sequence 7r is processed and a complete floorplan is obtained. 
Algorithm TBStoFloorplan 
Input: A TBS s = (tt, a, (3, /?') 
Output: Packing P corresponding to s 
Begin 
1. Append a with bit '1 i.e., an = 1. 
2. Initially, we have only module irn in P. 
3. For i = n — 1 down to i = 1 : 
4. If = 0): 
5. Find the smallest k s.t. i < k < n and f3k — 1. 
6. Note that the set S of modules ^+1,^+2,(those with corresponding (3 bit 
not deleted yet) will be lying on the left boundary of P. Add module 7Tj to P from the 
left, pushing those modules in S to the right. 
7. Delete /3,;+i,$+2, •• - ,0k from 0. 
8. If (ai = I): 
9. Find the smallest k s.t. i < k <n and (3'k — 1 
10. Note that the set S of modules 7tï+i, 7TÏ+2, • • • ,^k (those with corresponding (3' bit 
not deleted yet) will be lying on the top boundary of P. Add module 71^ to P from above, 
40 
pushing those modules in S down. 
End 
(a) (b) (c) (d) (e) 
I A B C E D  i t  A B C E D  %  A B Ç E D  i t  A B C E D  i t  A B C E D  
a  0  1 1  0 1  a  0  1  1 0 1  a  0  1 _ 1  0  1  a O J l O l  a  0  1 1  0 1  
( 3  0  0  1  0  1  ( 3  0  0 1  O A  ( 3  0  0  1  0  ( 3  0  0  1  0  ( 3  O d f O  
B '  0  0  0  1  1  3 '  0 0 0 1 1  B *  0  0  0 / 1  1  B '  0  0 0  A 3 '  0 0  
Figure 3.7 A simple example of constructing a floorplan from its TBS. 
3.2.3.2 Proof of correctness 
The correctness of the above algorithm on floorplan realization can be proved by the 
following lemma and theorem: 
Lemma 3 In the for-loop of the above algorithm, when we scan to a point i where 1 < 
i < n  —  1  a n d  o n  =  0  ( o n  =  I ) ,  t h e  c o r r e s p o n d i n g  n o d e  7 T j  i n  t i  ( t 2 )  h a s  a  r i g h t  c h i l d  i X j  
and all the nodes in t, where t is the subtree of ti (t2) rooted at tt,-, have been scanned 
immediately before 7^. In addition, any node 7Xk Et where k ^ j and (3k = 1 ((3'k = l) will 
have its (3 ((3') bit deleted. 
Proof: W.l.o.g., we only prove the case when = 0. The case when a* = 1 can be 
proved similarly. The proof can be done by induction on i. The base case is when i = n—1. 
If an-\ = 0, 7r„_i must have a right child in t\ according to the definition of TBS. Let t 
be the right subtree of 7rra_i in t\. Since we are performing the inorder traversal in the 
reversed order, the nodes in t must have been scanned immediately before 7rn_i. In this 
41 
base case, there is only one node (7rn) in t which is the right child of tt„_I and fa = 1. 
Therefore the statement is true for this base case. 
Assume that the statement is true when i > p for some 1 < p < n — 1. Consider 
the case when i = p — 1. If ap_i = 0, similarly, TTp_i must have a right child TT,- in ti 
according to the definition of TBS. Let t be the subtree of l,\ rooted at TT, . Since we are 
performing the inorder traversal in the reversed order, the nodes in t must have been 
scanned immediately before ttp_i. Let them be ttp, ttp+i, ..., 7rp+m_i where m is the size 
of t. (Note that p < j < p + m — 1.) If there is any node 7r& in t where k ^ j and fa = 1, 
/3fe must have been deleted when the scan reaches ttp_i. This is because if fa = 1, tt/c is 
the right child of its parent tt; in t\ and %i must also be in t. According to the inductive 
hypothesis, when we scan to 7r;, we will find that an = 0 (since 7r; has a right child in 
and 7rfc will be the only node in the right subtree of 7r( in ^ such that fa = 1 at that 
moment. Since the nodes in the right subtree of 7r; will be lying immediately in front of 
7ti in the reversed inorder traversal, we will delete all the (3 bits up to and including fa. 
Therefore, when we scan to ttp_i, any node tt/c e t where k ^ j and fa = 1 will have its /? 
bit deleted. • 
B 
-
A 
B 
fa) fb) 
Figure 3.8 Proof of Theorem 3. 
Theorem 3 The algorithm TBStoFloorplan can convert a TBS to its corresponding floor-
plan correctly. 
Proof: Again, the proof can be done by induction on the number of modules. The base 
case occurs when there are only two modules in the packing. There can only be two 
42 
different mosaic packings with two modules, one with the two modules lying side by side 
and the other one with the two modules piling up vertically. It is easy to show that the 
algorithm is correct in both situations. 
Assume that the algorithm is correct when there are n modules in the floorplan for 
some n > 2. Consider the case when there are n + 1 modules. W.l.o.g., we assume 
that a.\ = 0. The case when a\ = 1 can be proved similarly. Since ai = 0, the upper 
left module A — 7TI has a right child B — iTj in t\ and A should be packed in one of 
the two ways shown in Figure 3.8 in the floorplan F . Assume that the TBS of F is 
s = (n,a,f3,(3') where TT = 7Ti, 7r2,... ,7rn+i, a = ai,a2, /? = A, A, • • •, Pn+i and 
/3' = (3[,f32, • • •, (3'n+1• Consider sliding module A out of the floorplan F (Figure 3.9) in 
the direction shown to obtain a floorplan F\ with n modules. Note that the TBS Si for Fy 
can be obtained from s by changing (3j from 1 to 0 and removing tti, a\, (31 and (3[ from 
TT, a, (3 and (3' respectively. Since Fi has only n modules and the algorithm can construct 
the floorplan F\ correctly from g% according to the inductive hypothesis. 
Consider the sequence of operations of the algorithm on s. The first n — l steps of the 
for-loop will be the same as that for Sj. The two sequences of operations are the same 
although (3j is changed from 1 to 0 because all the modules lying between B and A in 
the inorder sequence tt are in the left subtree of B in ti- After scanning pass B, if there 
is an Qfc = 0 where 1 < k < j, we will only delete those (3 bits up to and including 
where 7r; is the right child of 7T& according to Lemma 3. Thus, the value of f3j will not 
affect the first n — l steps of the for loop. That means, when we reach A, the intermediate 
floorplan obtained is the same as Fi. At A, since c*i = 0, according to the above lemma, 
B = TT, will be the only module in the left subtree of A in t\ such that /?, = 1. Therefore 
we will delete all the (3 bits up to and including (3j and insert module A to Fi from the 
left, pushing to the right all the modules from the upper left corner of Fi down to and 
including module B. We will get back the correct packing F. Therefore the statement is 
also true when there are n + 1 modules in the packing. • 
43 
o F F, F F, 
A 
-
Sliding - Sliding 
-
A out A out 
B • B 
B B 
Figure 3.9 Proof of Theorem 3. 
3.2.4 Size of solution space 
The TBS representation is a complete and non-redundant representation for mosaic 
floorplan. Thus the number of different TBS configurations should give the Baxter num­
ber [33]. Baxter number can be written analytically as a complicated summation (equa­
tion 3.1 in [33]). However, there is no known simple closed form expression for the Baxter 
number. In the following, an upper bound on the number of different TBS configurations 
(i.e., on the Baxter number) is presented. 
Consider a TBS (TT, a, fa, fa) for n modules, a and fa uniquely specify a rooted ordered 
binary tree. Thus the number of combinations of a and fa is given by the Catalan number. 
Since the number of combinations for TT is n!, the number of combinations for fa is upper-
bounded by 0(2"), and the Catalan number is upper-bounded by 0(22n/n15), the number 
of different TBS configurations is bounded by 0(n!23"/nL5). 
44 
3.3 Extension to General Floorplan 
3.3.1 Empty room in mosaic floorplan 
A twin binary sequences s represent a mosaic floorplan F. Now we want to insert an 
exact number of empty rooms at the right places in F to obtain a corresponding non-
slicing floorplan F' such that every non-slicing floorplan can be generated by this method 
from one mosaic floorplan non-redundantly. There are two kinds of empty rooms. One 
is resulted because a big room is assigned to a small module. This kind of empty room 
is called reducible empty room. An example is shown in Figure 3.10(a). Another kind of 
empty room is called irreducible empty room and is defined as follows: 
Definition 2 An irreducible empty room is an empty room that cannot be removed by 
merging with another room in the packing. 
An example of an irreducible empty room is shown in Figure 3.10(b). We observed 
that an irreducible empty room must be of wheel shape and its four adjacent rooms (the 
rooms that share a T-junction at one of its corners) must not be irreducible empty rooms 
themselves: 
reducible 
empty room 
irreducible 
empty room 
(b) 
Figure 3.10 Examples of reducible and irreducible empty rooms. 
Lemma 4 The T-junctions at the four corners of an irreducible empty room must form 
a wheel structure (Figure 3.11). 
45 
Proof: If an empty room X does not form a wheel structure, there is at least one slicing 
cut (Figure 3.12) on one of its four sides. By removing this slicing cut, we can merge X 
with the room on the other side of the slicing cut (room A in Figure 3.12) and X can be 
removed. • 
h 
H _ irreducible empty room 
The four T-junctions at the 
corners of an irreducible empty 
room form a wheel structure 
Figure 3.11 A wheel structure. 
Lemma 5 The adjacent rooms at the four T-junctions of an irreducible empty room must 
not be irreducible empty rooms themselves. 
Proof: W.l.o.g., we consider an irreducible empty room X of clockwise wheel shape, 
and assume that its adjacent room A sharing with X the T-junction at its upper left 
corner is also an irreducible empty room (Figure 3.13). Then A must be an anti-clockwise 
wheel. There are two cases: (1) If width(A) > width(S), X can be merged with Ai 
(Figure 3.13(a)) to form a new empty room. This empty room X + Ai is reducible, and 
can be removed by combining with the modules on the right hand side (labeled B). (2) 
If width(A) < width(S), A can be merged with X\ (Figure 3.13(b)) to form a new empty 
room and similar argument follows. In both cases, we are able to reduce the number of 
irreducible empty rooms by one. By repeating the above process, we will either end up 
with only one irreducible empty room that must satisfy the condition, or the situation 
that every remaining irreducible empty room does not share a T-junction with each other. 
46 
A 
X 
Figure 3.12 Proof of Lemma 4. 
B 
A, A2 A 
X 
B 
x, X2 
fa) fbl 
Figure 3.13 Proof of Lemma 5. 
3.3.2 Mapping between mosaic floorplan and general non-slicing floorplan 
In this section, we will show how a non-slicing floorplan F' can be constructed from 
a mosaic floorplan F by inserting some irreducible empty rooms at the right places in 
F. For simplicity, we will make use of twin binary trees for explanation. That means, 
given a mosaic floorplan F represented by a twin binary trees ti and t2, we want to insert 
the minimal number of empty rooms (represented by X) to the trees appropriately so 
that they will correspond to a valid non-slicing floorplan F', and the method should be 
such that every non-slicing floorplan can be constructed by this method uniquely from 
one and only one mosaic floorplan. To construct a non-slicing floorplan from a mosaic 
floorplan, we only need to consider those irreducible empty rooms, because all reducible 
empty rooms can be removed by merging with some neighboring rooms. From Lemma 4, 
we know that an irreducible empty room must be of the shape of a wheel, so its structure 
in the twin binary trees must be of the form as shown in Figure 3.14. In our approach, 
we will use the following mapping Mx to create irreducible empty rooms from a sliceline 
structure. 
47 
A D 
C B 
Figure 3.14 Tree structure of an irreducible empty room. 
Definition 3 The mapping Mx will map a vertical (horizontal) sliceline with one T-
junction on each side to an irreducible empty room of anti-clockwise (clockwise) wheel 
shape (Figure 3.15). 
It is not difficult to prove the uniqueness of this mapping as stated in the next Lemma: 
Lemma 6 Every non-slicing floorplan can be mapped by Mx from one and only one mo­
saic floorplan. 
Proof: Given a non-slicing floorplan F, each of it's irreducible empty rooms must form 
a wheel structure, sharing its four corners with four different modules. Each of them can 
only be created from one slicing structure as described in the mapping Mx. It is thus 
obvious that the floorplan F can only be mapped from one unique mosaic structure. • 
From Lemma 5, we know that the adjacent rooms of an irreducible empty room must 
be occupied. Therefore if we want to insert X's into the twin binary trees t\ and t2 of 
a mosaic floorplan, the X's must be inserted between some module nodes as shown in 
Figure 3.16. Given this observation, we will first insert as many X's as possible (i.e., 
n — l) into ti and t2 to obtain another pair of trees t[ and t'2. An example is shown 
48 
Figure 3.15 Mapping between mosaic floorplan and non-slicing floorplan. 
49 
in Figure 3.17(b). Now, the most difficult task is to select those X's that are inserted 
correctly. According to Observation 2, a pair of twin binary trees are valid (correspond to 
a packing) if and only if the inorder traversal of their extended trees are equivalent except 
that all the bits are reversed. Therefore, in order to find out those valid X's, we will write 
down the inorder traversal of the extended trees of t[ and t2 and try to match the X's. 
The matching is not difficult since there must be an equal number of X's between any two 
neighboring module names (Figure 3.17(c)). We may need to make a choice when there 
are more than one X's between two modules. For example, in Figure 3.17(c), there is one 
X between C and D in the first sequence and there are two X's in the second sequence. In 
this case, we can match one pair of X's. There are two choices from the second sequence, 
and they will correspond to different non-slicing structures as shown in Figure 3.17(c). 
Every matching will correspond to a valid floorplan, and each non-slicing floorplan can 
be constructed uniquely by this method from one and only one mosaic floorplan. 
3.3.3 Inserting empty rooms directly on TBS 
In our implementation, we do not need to build the trees explicitly to insert empty 
rooms. We can scan the twin binary sequences s = (TT, a, /3, /?') once to find out all the 
positions of the X's in the inorder traversais of t\ and t2 after insertion. This is possible 
because of the following observation. Consider an X inserted at a node position A in 
a tree. If A has a left subtree B (Figure 3.16 (a)), this inserted X will appear just 
before the left subtree of A in the inorder traversal of t'. Similarly, if A has a right child 
Figure 3.16 The only two ways to insert X into a tree. 
50 
A 
B 
C 
D E F 
X O A O X O B 1 C 1 X 1 D O E O F 1 X 1 X  
i XX.. : i i i 
X  0  A  1 B  0  C  0 X 0 X 0  D 1 E 1  F  1 X 1 X  
One of them will be matched 
with the X in the first sequence 
A 
B 
C 
X 
F 
D E 
The first X is chosen. 
B 
A C 
X 
D 
E F 
The second X is chosen. 
(c) 
Figure 3.17 A simple example of constructing a non-slicing floorplan from 
a mosaic floorplan. 
51 
B (Figure 3.16 (b)), this inserted X will appear just after the right subtree of A in the 
inorder traversal of if. A simple algorithm can be used to break down the subtree structure 
of a tree and find out all the positions of the X's in the sequences after insertion in linear 
time. The details of the algorithm as follows. 
We scan the TBS from left to right and assume that an = 1. If at = 0, module 7r* 
has a right subtree in t\ according to the definition of TBS. By the observation above, we 
only need to find the position of the last module (7^) in the right subtree of tt, in t\ from 
the TBS, and then insert one X just after 7T& in the inorder traversal of t\. In addition, 
we will assign 1 as the labeling bit of the inserted X. Note that the right subtree of 7T; can 
be taken as a binary tree except that the directional bit of the root is 1, not 0 as usual. 
In addition, ak — 1- Thus, we obtain the modified conditions for the right subtree of TT., 
as follows: 
(a) In the bit sequence (3i+iai+iPi+2 ... ak_if3kak, the number of l's is two more than 
the number of 0's. 
(b) For any proper prefix of the bit sequence f3 i + iai+ i f3 i + 2  . . .  ak-i/3kak ,  the number of 
l's is less than or equal to the number of 0's plus 1. 
Based on the above conditions, we can count the number of 0 and 1 from /?i+1 and ai+1 
until we reach the module irk. It is not difficult to find nk by the following mathematical 
form: 
k 
%](& + <%) = 2 (3.1) 
j=i+1 
where we define 
A simple example is shown in Figure 3.18. After we insert X for module 7r,, the inorder 
traversal of the extended t[ becomes EliïiOAODlBOClXlFOG. Note that the inserted 
52 
X for module TI^ just appears after the last module (i.e., module C) of the right subtree 
of 7ii in t\. The labeling bit for the inserted X is 1. 
TBS 
Tt: EitiADBCFG 
a: 10 0 10 10 1 
p ;  0  0  1 0  1 1 0  1  
Search the last module in the right subtree of m by 
counting the number of 1 and 0 in the bit sequence 
B a, af) ...until we reach module C which 
satisfies the equation: y(p.+ a- )=2 
j=i+l 1 1 
Figure 3.18 An example of searching the last module in the right subtree 
of 7n. 
Tree t 
If ai — 1, module TT, has a right subtree in t2 according to the definition of TBS. 
Similarly, we can insert X for TT, directly into the inorder traversal of the extended t'2 by 
searching the last module of the right subtree of 7T; in t2. The algorithm is exactly the 
same as above. 
Now we consider the case that 7r* has a left subtree in the twin binary trees. If a^\ = 1, 
7Ti has a left subtree in Ly. According to the observation above, we only need to find the 
position of the first module (TTA,) in the left subtree of TT,; in t,\ from the TBS, and insert 
one X just before 7T& in the inorder traversal of t\. In addition, we assign 0 as the labeling 
bit of the inserted X. Note that the left subtree of 7r* in t\ is exactly a general binary 
tree. In addition, ak-i = 0. We thus obtain the modified conditions for the left subtree 
of 7Ti as follows: 
(a) In the bit sequence ... afc/3fcaf c _ i ,  the number of 0's is two more 
than the number of l's. 
(b) For any proper prefix of the bit sequence /3j_iaj_2/3,_2ai-3 • • • 1, the number 
of 0's is less than or equal to the number of l's plus 1. 
53 
Based on the above conditions, we can count the number of 0 and 1 from $_! and a,_2 
until we reach the module 7T&. It is not difficult to find 7T& by the following mathematical 
form: 
Another simple example is shown in Figure 3.19. After we insert X for module 7Tj, the 
inorder traversal of the extended t[ becomes GlFOXOAODlBOCliTiOE. Note that the 
inserted X for module 7r* appears just before the first module (i.e., module A) of the left 
subtree of 7% in t\. The labeling bit for the inserted X is 0. 
If a,_i = 0, module 7r* has left subtree in Similarly, we can insert X for TT, directly 
into the inorder traversal of t'2 by searching the first module in the left subtree of TT, in Z2. 
The algorithm is exactly the same as above. 
After we inserted all the possible X's, we obtain the inorder traversais of the trees t\ 
and t2. Matching can then be done as described in the previous subsection. 
k 
T .  ( f a  + â j - 1) - ~2 (3.2) 
j=i—1 
Tree 11 TBS 
Tt: G F A D B C i t i  E  
a: 10 0 10 10 1 
p :  0 0 0 0 1 1 1 1 
Search the first module in the left subtree of Tti by 
counting the number of 1 and 0 in the bit sequence 
|3, a B a ...until we reach module A which 
satisfies the equation: ^(Pj+ <Xj , ) =-2 
j=i-l 
Figure 3.19 An example of searching the first module in the left subtree of 
7Tj. 
3.3.4 Tight bound on the number of irreducible empty rooms 
In order to describe non-slicing structure by a mosaic floorplan representation, some 
previous works [34, 35] include dummy blocks of zero area in the set of modules. The 
54 
method described in section 2.3 is very efficient but it is applicable to the TBS repre­
sentation only. In general, we only need to have n — l extra dummy blocks in order to 
represent all non-slicing structures by a mosaic floorplan representation. We have proved 
an upper bound of n — 1 and a lower bound of n — 2y/n + 1 on the number of irreducible 
empty rooms in a general non-slicing floorplan. (An example with 49 modules and 36 
irreducible empty rooms is shown in Figure 3.20.) It means that n — l dummy blocks are 
needed and we cannot use much less. 
Theorem 4 In a non-slicing floorplan P, there can be at most n — l irreducible empty 
rooms. 
Proof: According to Lemma 3, the adjacent rooms of an irreducible empty room in P 
must be occupied. Therefore, each irreducible empty room will take up four corners of 
some occupied rooms. Since there are only n occupied rooms in total and the four corners 
of the chip cannot be used, there are only 4n — 4 corners to be used. Therefore, there are 
at most n — l irreducible empty rooms. • 
Theorem 5 There exists a non-slicing floorplan P of n modules and n — 2 sfn + 1 irre­
ducible empty rooms. 
Proof: A floorplan with n — 2y/n + 1 irreducible empty rooms can be constructed sim­
ilarly to the example in Figure 3.20. Let k be the number of modules along each edge 
(for the example in Figure 3.20, k = 7), number of modules n = k2 and number of empty 
rooms — (k — l)2 = (\/n — l)2 = n — 2 \/n + 1. • 
3.4 Floorplan Optimization by Simulated Annealing 
Simulated annealing is used to search for a good TBS. The temperature is set to 
1.5 x 106 initially and is lowered at a constant rate of 0.95 to 0.97 until it is below 
55 
Figure 3.20 Floorplan example with many irreducible empty rooms. 
1 x 10~10. The number of iterations at one temperature step is 30. In every iteration 
of the annealing process, we will modify the TBS by one of the following four kinds of 
moves: 
Ml: Swap two modules in TT. 
M2: Change the width and height of a module. 
M3: Rotation based on t \ .  
M4: Rotation based on tg. 
We design the moves such that all TBS's are reachable. In Lemma 7, we prove that 
starting from any TBS, we can generate any other TBS with the same TT sequence by 
applying one or more moves from the set {M3, M4}. Since we can swap any two modules 
in the n sequence by move Ml and M2 changes the dimensions of a module, all TBS's 
are reachable by applying moves from the set {Ml, M2, M3, M4}. In addition, we will 
make sure that the sequences obtained after each move is a valid TBS (i.e., satisfying 
conditions (l)-(2)). 
For move Ml, we only exchange the module names in two randomly selected rooms. 
For move M2, we change the width and height of a module within the given limits of 
its aspect ratio. Obviously, both move Ml and M2 takes 0(1) time. For move M3 and 
56 
M4, we borrow and modify the idea of rotations in red-black tree [22]. A red-black tree 
is a binary search tree. The rotation in a red-black tree is an operation that changes 
the tree structure locally while preserving the inorder traversal of the tree. Two kinds of 
rotations, Right-Rotate and Left-Rotate, are defined originally in [22] (Figure 3.21). A 
and B represent two nodes. C, D and E represent arbitrary subtrees. Right-Rotate(T, A) 
transforms the left tree structure to the right tree structure, while keeping the inorder 
traversal of the tree unchanged (e.g., the inorder traversal of the tree before and after 
rotation are both equal to CBDAE in Figure 3.21). The operation of left rotation is 
similar. Both Left-Rotate and Right-Rotate run in 0(1) time. When we apply red-black 
tree rotations on our twin binary trees, the subtree D in Figure 3.21 should not be 1 or 
0. In the case that subtree D is 1 or 0, we modify the red-black rotations as shown in 
Figure 3.22, where D is designated to 0 or 1 after Right-Rotate(T, A) or Left-Rotate^, B). 
q  Q ivCii-ivoLaie^ l , D )  j) g 
Right-Rotate(T,A) i H I 
Figure 3.21 Right-Rotate and Left-Rotate for a binary search tree 
E 
Right-Rotate(T,A) 
Left-Rotate(T,B) Q G 
C 
C 
Figure 3.22 Modified red-black rotations when subtree D is 0 or 1 
For the moves M3 and M4, we randomly pick one module TT, from TT, and check A,. If 
ai = 0, 7Ti has a right child in tx and iri+i has a left child in t2. We can then use move 
57 
/d\ Left-Rotate(T,7ti) /b\ (A 
^B\/^ 
Case 1 
OC : No Change 
(B : Flip [3 (from 1 to 0) andp (from 0 to 1) 
P : No Change 
(a) 
Left-Rotate(T,7tj) 
Case 3 
a : Flip OCj (from 0 to 1) 
P : Flip PA (from 1 to 0) 
P : If |3a=0, flip (3a (from 0 to 1 ) 
else, flipp, (from 0 to 1) 
Cc) 
TTjJ /D\ Left-Rotate(T,It;) AX (A 
Case 2 
Ot : No Change 
P : Flip Pj (from 1 to 0) andPc (from 0 to 1) 
P : No Change 
(b) 
\ 
TUj ) /jj\ Left-Rotate(T,71,) /g\ (Ay 
Case 4 
1 0 
Flip (Xj (from 0 to 1) 
Flip p, (from 1 to 0) 
If (3a=0, flip pA (from 0 to 1 ) 
else, flippi (from 0 to 1) 
( à )  
Figure 3.23 Four cases of Left-Rotate (T, 7T;) on ti 
58 
M3 to apply Left-Rotate^, 7Tj) on t\ or use move M4 to apply Right-Rotate(T, 7ri+i) on 
t2. They are similar to each other and one of them will be randomly picked and applied. 
W.l.o.g., we present the details of Left-Rotate^,^) on according to the following four 
cases shown in Figure 3.23(a), (b), (c) and (d). For simplicity, we use letter B, C and D 
to represent the root of each subtree. 
Case 1 :/3i =0 and the right child of TT, has a left child. 
Case 2:Pi = 1 and the right child of 7Ti has a left child. 
Case 3:/?j = 0 and the right child of Tr, has no left child. 
Case 4:/?j = 1 and the right child of TT, has no left child. 
For Case 1, after left rotation of module 7%, the only change in tx is the directional bit 
of module A and C, so we only need to flip (3a and Pc- Because the labeling sequence 
a does not change, we do not need to update t2. Thus, we keep P' the same as before. 
Case 2 is similar to Case 1. For Case 3, both and the directional bit of module A are 
flipped after left rotation of module tt». In order to maintain conditions (1) and (2), we 
need to update t2 by flipping one directional bit of j3' from 0 to 1. Note that TT, is the left 
child of A in t2. Thus, if P'A is 0, we will flip fi'A from 0 to 1. Otherwise, we will flip P[ 
from 0 to 1. Case 4 is similar to Case 3. Actually, updating t2 in case 3 and 4 is exactly 
the Right-Rotate (T, A) on t2 in case 3 and 4. 
If ai = 1, iTi has a right child in t2 and iri+i has left child in t\. We can thus use move 
M4 to apply Left-Rotate(T, iTi) on t2 or use move M3 to apply Right-Rotate(T, 7ri+i) on 
ti. One of them will be randomly picked and applied. The algorithm of right rotation is 
similar to that of left rotation. 
In move M3 and M4, if a does not change, we only need to update one tree and 
each move takes O(l) time. If a changes, we need to update both trees (i.e., apply two 
rotations). Therefore, both move M3 and M4 take 0(1) time in practice. 
59 
Lemma 7 Starting from any TBS, we can generate any other TBS with the same TT 
sequence by applying one or more moves from the set {MS, M4} 
Proof: We observe that at most n—l Left Rotations suffice to transform any arbitrary 
n-node binary tree into a left-going chain [22]. Given a TBT, w.l.o.g., we can apply at 
most n—l Left Rotations by move M3. The binary tree t\ will become a left-going chain 
(Figure 3.24(a)). Since move M3 always results in a TBT, the binary tree t2 must also 
be transformed into a right-going chain (Figure 3.24(b)). The corresponding floorplan is 
shown in Figure 3.24(c). 
Noticing that any Left Rotation in move M3 has its reversed rotation which is the 
Right Rotation, a n-node TBT where ti is a left-going chain and t2 is a right-going chain 
can thus be transformed into any other arbitrary TBT by applying at most n — l Right 
Rotations by move M3. Therefore, at most 2n —2 moves are sufficient to convert a TBS to 
any other arbitrary TBS with the same TT sequence. We design move M4 as a symmetric 
move to M3. • 
ti tz Floorplan 
d) 
fa) (b) fc) 
Figure 3.24 Proof of Lemma 7. 
3.5 Experimental Results 
All experiments are carried out on a PC with 1400MHz Intel Xeon Processor and 
256Mb Memory. Simulated annealing as stated in section 4 is used to search for a good 
60 
TBS. 
We test our algorithm using TBS with empty room insertion on six MCNC bench­
marks. Besides, we also run the algorithm with empty room insertion disabled. In other 
words, only mosaic floorplan can be generated. For each case, two objective functions are 
considered. The first is to minimize area only. The second is to minimize a weighted sum 
of area and wirelength. The weights are set such that the costs of area and wirelength 
are approximately equal. Because of the stochastic nature of simulated annealing, for 
each experiment, ten runs are performed and the result of the best run is reported. The 
results for area minimization is listed in Table 3.1. The results for area and wirelength 
minimization is listed in Table 3.2. 
As the results show, our floorplanner can produce high-quality floorplans in a very 
short runtime. We also notice that empty room insertion is very effective in reducing the 
floorplan area. If empty room insertion is disabled, the deadspace is worse for all but two 
cases. The deadspace is 32.84% more on average. However, with empty room insertion, 
the floorplanner is about 40.8% slower. 
In Table 3.3, we compare our results with ECBL [34] and the enhanced Q-sequences [35]. 
Notice that ECBL is run on Sun Sparc20 (248MHz) while Enhanced Q-seq is run on Sun 
Ultra60 (360MHz). We found that the scaling factors for the speeds of the three machines 
are 1:1.68:5.03. The runtimes reported in brackets in Table 3.3 are the scaled runtimes. 
We can see that the run time of TBS is much faster, although the performance of all 
three of them in area optimization are similar. We also compared TBS with those rep­
resentations designed intrinsically for slicing structure. The performance of Fast-SP [31], 
Enhanced O-tree [29], B*-tree [21] and TCG [26] are shown in Table 3.4. Notice that 
Fast-SP and B*-tree are run on Sun Ultral (166MHz) while Enhanced O-tree and TCG 
are run on Sun Sparc20 (248MHz), and the scaling factors for their speeds are 0.613:1. 
Again, the runtimes reported in brackets in Table 3.4 are the scaled runtimes. We can see 
that TBS has again out-performed the other representations in terms of runtimes, while 
61 
Table 3.1 Area minimization. 
MCNC 
benchmark 
TBS (with X) TBS (no X) 
% Dead- Run-
space time (s) 
% Dead- Run- (s) 
space time (s) 
apte 1.89 0.86 1.30 0.73 
xerox 2.17 1.30 2.46 1.20 
hp 2.10 0.76 2.22 0.63 
ami33a 3.05 1.26 4.05 0.98 
ami49a 4.05 2.55 4.38 2.08 
playout 6.20 2.58 7.60 1.09 
Table 3.2 Area and wirelength minimization. 
MCNC 
benchmark 
TBS (with X) TBS (no X) 
% Dead- Wire- Run-
space length time (s) 
% Dead- Wire- Run-
space length time (s) 
apte 1.79 12652 0.89 3.45 13267 0.62 
xerox 2.64 14937 1.36 4.41 14738 1.22 
hp 1.32 4246 0.73 3.43 4292 0.61 
ami33a 8.41 6078 1.30 7.25 6488 1.02 
ami49a 9.40 29668 2.60 10.82 30256 2.14 
playout 5.19 2.373 2.50 6.32 2.265 1.08 
the packing quality in terms of area is similar. TBS is thus a more desirable representa­
tion since its fast computation allows us to handle very large circuits and to embed more 
interconnect optimization issues in the floorplanning process. 
62 
Table 3.3 Comparisons with ECBL and enhanced Q-sequences. 
MCNC 
benchmark 
Total 
area 
ECBL [34]1 Enhanced Q-seq [35]2 TBS 
Area Runtime (s) Area Runtime (s) Area Runtime (s) 
apte 46.56 45.93 * 3 (3) 46.92 0.35 (0.59) 47.44 0.86 (4.33) 
xerox 19.35 19.91 3 (3) 19.93 3.6 (6.05) 19.78 1.3 (6.54) 
hp 8.30 8.918 11 (11) 9.03 3.5 (5.88) 8.48 0.76 (3.82) 
ami33 1.16 1.192 73 (73) 1.194 40 (67.2) 1.196 1.26 (6.34) 
ami49 35.4 36.70 117 (117) 36.75 57 (95.76) 36.89 2.55 (12.83) 
1 Using Sun Sparc20 machine 2 Using Sun UltraGO workstation 3 Negative deadspace 
Table 3.4 Comparisons with other representations for slicing floorplan. 
MCNC 
benchmark 
Fast-SP [31]1 En. O-tree [29]2 B*-tree [21]1 TCG [26]2 
Area Time(s) Area Time(s) Area Time(s) Area Time(s) 
apte 46.92 1 (0.61) 46.92 11 (11) 46.92 7 (4.29) 46.92 1 (1) 
xerox 19.80 14 (8.58) 20.21 38 (38) 19.83 25 (15.33) 19.83 18 (18) 
hp 8.947 6 (3.68) 9.16 19 (19) 8.947 55 (33.72) 8.947 20 (20) 
ami33 1.205 20 (12.26) 1.242 119 (119) 1.27 3417 (2095) 1.20 306 (306) 
ami49 36.5 31 (19.00) 37.73 406 (406) 36.8 4752 (2913) 36.77 434 (434) 
1 Using Sun Ultral machine 2 Using Sun UltraGO machine 
63 
CHAPTER 4. CONGESTION DRIVEN FLOORPLANNING 
4.1 Introduction 
Floorplan design is to produce a chip-level plan of a set of circuit modules by de­
termining their positions and shapes on the chip. It is the first stage of the physical 
design process. Hence, it has significant effects on overall circuit quality. Traditionally, 
in fioorplanning stage, the major objective is to minimize area and total wirelength. The 
routability and congestion issues are not considered until global routing. However, due 
to the continued scaling of VLSI technology, the design of chip-level interconnect has 
become increasingly complicated [36]. Traditional floorplanners will produce floorplans 
with congested routing regions that are difficult to eliminate in later stages. Therefore, 
it is necessary to pay attention to the congestion optimization at fioorplanning stage in 
order to realize the single-pass design methodology. 
4.1.1 Previous work 
In the past few years, several works have been proposed to address the congestion 
issue in floorplan design. Until recently, all previous congestion models in literature 
divide the whole chip area into tiles [37, 38, 39, 40, 41]. The number of wires crossing a 
tile boundary is estimated and is used as a measure of congestion. In other words, the 
underlying routing graph is a grid graph [36], in which each vertex corresponds to a tile 
and each edge corresponds to a tile boundary. There is a tradeoff between the accuracy 
of congestion estimation and the cost of computation. If the number of tiles is small, the 
64 
congestion estimation will be inaccurate. If the number of tiles is large, the computation 
will be expensive. 
To estimate the congestion of each edge in the routing graph, previous approaches can 
be divided into two categories. The first category performs global routing on the grid 
graph [37]. Because the congestion estimation is performed inside the inner loop of the 
floorplanner, it is important to reduce the runtime of global routing. Thus, they restrict 
the routing geometry to L-shaped and Z-shaped. As a result, the congestion estimation 
may not correlate well with the real congestion. In addition, since all nets are routed one 
by one during global routing, even with restricted routing geometry, the computation is 
still very expensive. 
The second category applies a probabilistic map to estimate the probability of a net 
crossing each boundary [38, 39, 40, 41]. The congestion of a boundary is the summation 
over all nets of the probability on that boundary. The previous publications differ mainly 
in their probabilistic maps for a net and in the way they handle blockages. This approach is 
more efficient than the restrictive global routing approach in the first category. Therefore, 
this idea is also commonly used in placement stage to estimate congestion [42, 43, 44, 45, 
46]. Notice that the probability distribution of a net is determined independent of other 
nets. Whereas, a realistic global router routes a net based on current routing congestion 
information. Thus, the congestion estimated by the probabilistic approach can be very 
different from that by a global router. For example, in Figure 5.5, we have two 2-pin 
nets {A, B} and {C,D}. Their bounding boxes overlap in region II. By the probability 
approach, we will reach the conclusion that routing region II is more congested than 
regions I and III. However a global router can avoid congestion by routing net {A, B} in 
region I and {C, D} in region III. The resulting congestion in II can even be less than 
t h a t  i n  I  a n d  I I I .  
Recently, Lai et al. [47] proposed a novel approach which is very different from all 
previous approaches. For a floorplan of n modules, 2n regions are defined according 
65 
1 
I  
I I I  
C 
Figure 4.1 Congestion estimation by the probability approach. 
to the structure of the floorplan. Each region contains several adjacent modules. The 
congestion for a region is evaluated as the wire density passing through the boundary of 
the region (i.e., number of wires connecting modules inside the region to those outside 
divided by the length of the region boundary). An 0(nlogn) time algorithm based on 
least common ancestor computation is presented to evaluate the congestion of all regions. 
This approach is very efficient. However, since most regions are quite large and only 
a single number is provided for each region, only coarse congestion information can be 
provided. 
4.1.2 Our contributions 
In this chapter, we present a new congestion estimation model which is efficient and 
accurate. The basic idea is to perform global routing by a flow based approach to minimize 
the maximum congestion over channel segments. A channel segment is a segment of a 
channel shared by two adjacent rooms in a floorplan. If we represent each room by a 
vertex and connect each pair of adjacent rooms by an edge, the resulting graph is called 
an inner dual graph [48]. (See Figure 4.2 for an example.) Note that each edge in the 
inner dual graph corresponds to one channel segment. We use the inner dual graph as the 
underlying topology in global routing. In floorplanning, the exact pin positions inside a 
module are still unknown. It is a waste of time to use a fine grid graph to estimate the 
66 
routing congestion. It is enough for global routing to list out the set of rooms that a net 
passes through without specifying its exact route inside each room. Therefore, the inner 
dual graph is an ideal choice as the underlying routing topology. The size of the inner 
dual graph is linear to the number of modules and is typically much smaller than that of 
grid graph. Henece, itt is much more efficient to use. 
Inner dual graph is an undirected graph. In order to avoid detour in the routing 
solution, for each set of nets originating from a particular module, we use a different 
routing graph by assigning different directions to the edges of the inner dual graph. In 
order to solve this problem, we interpret it as a flow problem and we relax the integral 
flow constraints. We design an efficient two-phase algorithm to tackle this fractional flow 
problem. In the first phase, we propose an Incoming Flow Balancing (IFB) technique to 
derive a good initial fractional routing solution. In the second phase, we present a Stepwise 
Flow Refinement (SFR) technique to iteratively reduce the maximum congestion of the 
solution in the first phase. We prove that SFR always converges to the optimal solution. 
Since we relax the integral flow constraints, the optimal solution by our algorithm will 
only be a lower bound on the maximum congestion. A valid global routing solution can 
be obtained by a simple rounding procedure. We show experimentally that the maximum 
congestion after rounding is only increased by 2.82% on average. It justifies the use of 
fractional flow to estimate the routing congestion. 
We demonstrate our model by integrating it into a simulated annealing (SA) based 
floorplanner. The maximum congestion is used as part of the cost of SA. The experimental 
results show that, on average, our congestion-driven floorplanner can generate a much 
less congested floorplan (-36.44%) with a slight sacrifice in area (+1.30%) and wirelength 
(+2.64%). The runtime of the whole SA process is only increased moderately (+270%). 
The efficiency of our model is because of the use of inner dual graph, the simplicity of the 
two-phase algorithm, and the fact that we route a set of nets simultaneously rather than 
net by net. 
67 
The remainder of the chapter is organized as follows. In Section 2, we will give an 
overview of our congestion-driven floorplanner. In Section 3, we will present the two-phase 
algorithm used to solve the fractional flow problem in detail. In Section 4, a rounding 
procedure will be presented to derive a global routing solution from the fractional flow. 
The experimental results will be described in Section 5. Finally, the chapter will be 
concluded in Section 6. 
4.2 Overview of Our Floorplanner 
We make use of Twin Binary Sequences (TBS) [49] as our floorplan representation in 
simulated annealing. Basically, our congestion model can be employed with any floorplan 
representation. In our approach, we choose TBS representation because of two reasons. 
First, TBS itself is a very efficient and effective floorplan representation for mosaic floor-
plan and can be extended to represent general floorplan. Second, the inner dual graph of 
a floorplan can be easily obtained in TBS floorplan realization step. 
In the annealing process, we use a 3-stage SA with three different cost functions in 
different temperature ranges to reduce runtime. Firstly, at high temperature range, we 
only consider area and total wirelength in the cost, i.e., 
Cost = Area + a x Wirelength. 
Then, at medium temperature range, we add an accurate, although not optimal, maximum 
congestion Mcongi derived by only IFB as an additional part of the cost, i.e., 
Cost — Area + a x Wirelength + (3 x Mcongj. 
Finally, when the annealing process reaches low temperature range, we replace Mcongi 
with maximum congestion Mcong2 derived by IFB and SFR. The cost thus becomes: 
Cost = Area + a x Wirelength + (3 x Mcongg. 
68 
The following section will describe how to estimate the congestion of a given floorplan in 
detail. 
4.3 Congestion Estimation Model 
In Section 3.1, we first introduce the inner dual graph and describe how to obtain 
the underlying routing graph from the inner dual graph. Then we illustrate the method 
of constructing the inner dual graph directly from TBS in Section 3.2. Based on the 
inner dual graph, in Section 3.3, we formulate the congestion minimization problem as a 
flow problem. Section 3.4 and 3.5 describe an efficient two-phase algorithm to tackle the 
problem formulated in Section 3.3. 
4.3.1 Underlying routing graph 
The exact pin positions inside each module is not given in the floorplanning stage. 
So it is not necessary to use a fine grid graph to estimate the routing congestion as in 
previous works. It is enough for global routing in floorplanning stage to list out the set 
of rooms that a net passes through without specifying its exact route inside each room. 
Given a floorplan R, the room adjacency relationships can be described by channel 
segment which is defined as a segment of a channel shared by two adjacent rooms in 
the floorplan. For example, in Figure 4.2(a), the channel segment corresponding to the 
adjacent rooms C and D is highlighted. The room adjacency relationships can also be 
represented by the inner dual graph G = (V, E) [48] where 
V = {v\v corresponds to a room of fî} 
E = {{it, v}\u and v are adjacent to each other in R} 
See Figure 4.2(b) for an example. Note that there are one-to-one mappings between the 
rooms in R and the vertices in G, and between the channel segments in R and the edges 
69 
B Channel Segment — 
A 
D 
(A) 1 /f"® 
C 
E 
_((/) : j 
xL© 
(a) (b) 
Figure 4.2 (a) A rectangular floorplan R. (b) Its inner dual graph G. The 
channel segment and the inner dual graph edge corresponding 
to the adjacent rooms C and D are highlighted. 
in G. In the rest of the chapter, the terms floorplan room and inner dual graph vertex, 
and channel segment and inner dual graph edge are used interchangeably. 
The inner dual graph can be used as the underlying graph in global routing. The size 
of the inner dual graph is linear to the number of modules and is typically much smaller 
than the size of grid graph used in previous approaches. Hence, it is much more efficient 
to use. However, the inner dual graph is an undirected graph. If it is used directly as the 
underlying routing graph, the routing solution may have a lot of detour. We avoid detour 
by assigning directions to the edges of inner dual graph. Notice that different nets may 
require different direction assignments, but all nets originating from the same module 
share the same direction assignment. So for each set of nets originating from a specific 
module, we can derive a specific directed acyclic graph (DAG) G' as the routing graph 
according to the following rule (as illustrated in Figure 4.3). Consider nets originating 
from a source room s, and consider a channel segment e shared by a pair of adjacent 
rooms in floorplan R. By extending the channel segment, the floorplan region will be 
divided into two sides. We assign the direction of the edge e in the inner dual graph to 
be from the room on the same side as the center of s to the other side. See Figure 4.4 
for an example. Notice that even with direction assignment, some detour may still occur. 
For example, in Figure 4.4, a net following the path < D,E,C,A > may have detour 
70 
Figure 4.3 An illustration of routing direction assignment. 
VJ® 
(b) 
Figure 4.4 An example of routing direction assignment. 
depending on the exact pin positions inside rooms D and A. However, with direction 
assignment, major detour can be avoided. 
We observe that the underlying routing graph for a specific commodity obtained by 
the assignment above is a directed acyclic graph (DAG) in most cases. However, there 
exists a special case as illustrated in Figure 4.5. When D is considered as a source room, a 
cyclic path < A, B,C, E, A > exists in this graph. To obtain an acyclic underlying routing 
graph G' = (V', £"), we remove one edge from each cycle based on the rule described in 
Section 3.4. 
4.3.2 Inner dual graph construction from TBS 
The inner dual graph G describes the neighborhood information between any two 
rectangular rooms. Given a floorplan in TBS, we can construct its inner dual graph in 
linear time by finding all pairs of adjacent rooms. In TBS representation, each floorplan 
is one-to-one mapped to a pair of twin binary trees (*i, t2)- h and t2 are obtained by 
B 
1 I 
- D 
A I J 
- E 
(a) 
71 
(source) 
B 
E 
C 
© © 
© •© 
(a) 
Figure 4.5 A special case where cycle exists after direction assignment. 
connecting, respectively, lower-left corners and upper-right corners of all rooms. We use 
the floorplan in Figure 4.6 as an example to illustrate how to obtain all room adjacency 
relationships as well as the length of each channel segment directly from TBS. We define 
a left-going (right-going) branch of a binary tree to be any right (left) child and all its left 
(right) descendants. For example, in t1; the left-going branches are {C, B} and {E, D}. In 
t2, the right-going branches are {A} and {F, C}. We notice that for each vertical channel 
(not including the boundaries), the room(s) on its right side corresponds to the room(s) in 
a particular left-going branch of t\. The room(s) on its left side corresponds to the room(s) 
in a particular right-going branch of t2. See the vertical channel highlighted in Figure 4.6 
as an example. If there are m vertical channels in the floorplan, there will also be m left-
going branches in t\ and m right-going branches in f2. The branches have already been 
found in the original TBS packing procedure. Thus, in order to capture all horizontal 
adjacency relationships as well as the length of each vertical channel segment, we only need 
to compare the heights of rooms in a left-going branch of t\ with those in a corresponding 
right-going branch of (g (i.e., {C,B} with {A}, {E, D} with {F, C}). Similarly, we can 
obtain the vertical adjacency relationships and horizontal channel segment lengths by 
considering the horizontal channels (i.e., the right-going branches in t\ and the left-going 
b r a n c h e s  i n  t 2 ) .  
72 
a 
A, ® © 0# 
B 
A <• s. 
•e. 
a ) 
tl tl a 
tl left-going branches: {C, B}, {E, D}. 
I / 
t2 right-going branches: {A}, {F, C}. 
Figure 4.6 Determining the room neighborhood information directly from 
TBS. 
4.3.3 Problem formulation 
In our formulation, as in previous congestion estimation papers, we only handle 2-pin 
nets for the sake of simplicity. Notice that multi-pin nets can be easily broken down to 
several 2-pin nets by Minimum Spanning Tree or Rectilinear Steiner Tree techniques. 
Our congestion model is meant to estimate the best maximum congestion over all 
possible global routing solutions. If we do not assign directions to the inner dual graph G, 
we can formulate the global routing problem as a flow problem with several commodities, 
where each commodity corresponds to a set of nets originating from a particular module. 
We first introduce some notations. 
•  N f .  the set of neighboring vertices of vertex i .  
• : the demand of vertex i for commodity k, i.e., the total number of nets with 
source vertex k and sink vertex i. 
• f f j  : the amount of flow from i  to j  for commodity k  for { i ,  j ]  e  E  and ]  ^ k .  
• cape: the capacity of channel segment e in R, i.e., the maximum number of nets 
that can cross it. 
73 
• conge\ the congestion of channel segment e, i.e., the ratio of the number of nets 
crossing it to its capacity. 
Mcong: the maximum congestion over all channel segments of floorplan R. 
Note that all flow amount should be integral in order to be a valid global routing 
solution. Also note that may be different from /£-. For the capacity of channel 
segment, it is technology dependent. We can calculate it as follows. Let le be the length 
of the channel segment e. Let b be the sum of minimum wire width and minimum wire 
spacing. Then the routing capacity is calculated as cape = [le/b\. In general, the capacity 
can also be modified to model routing blockage. 
The congestion of edge e = { i , j }  can be written as follows: 
Y s k i f i j  +  f j i ^  > 3 1 /  conge = 
cape 
Then, the flow problem can also be formulated as the following integer linear program 
(ILP): 
such that 
Mcong 
+ -W < Afcomg, Ve = {%,;} 6 E (4.1) 
(4.2) 
je#; j'eNi 
(4.3) 
(4.4) 
Constraint (4.2) is the flow conservation constraint. It specifies that for each commodity 
k and for each vertex 2 ^ /;:, the total incoming flow equals the total outgoing flow plus 
the demand of vertex i. Note that by summing constraint (4.2) over all i ^ k, we can 
derive: 
E & = Ec<.viel/ 
jeJVfc 
74 
This means for commodity k, the total outgoing flow from vertex k equals the total 
demand. 
Since we restrict the flow direction for different commodity as described in Section 
3.1, we need to add 0(n2) constraints to the ILP formulation above. For each commodity 
k and each edge {i,j} 6 E, if the flow direction is from i to j, we add the constraints 
> 0 and = 0. 
ILP is known to be NP-complete. To tackle this problem, we first relax the integral 
flow constraint (4.4). Notice that the resulting problem is similar to the classical maxi­
mum concurrent flow problem [50]. However, in our problem, the flow direction on each 
edge may differ for different commodities. Our problem can be solved by any LP solver. 
However, it is too time consuming to be applied in the inner loop of the floorplanning pro­
cess. Instead, we propose an efficient two-phase algorithm to derive the optimal fractional 
flow solution. The algorithm will be explained in detail in the following two subsections. 
4.3.4 Incoming flow balancing (IFB) Phase 
In this Section, we present an Incoming Flow Balancing (IFB) technique to derive a 
good fractional flow solution. This solution is included into the cost of the second stage 
of SA and is also used as an initial solution of SFR technique described in Section 3.5. 
We construct the flow solution by iteratively deriving the flow of each commodity one 
by one based on current congestion information. At the beginning, we set the flow amount 
and congestion on each edge for each commodity to 0. When considering commodity k, 
we obtain the underlying routing graph G' = (V', E') by assigning the directions to each 
edge of the inner dual graph G. If cycle occurs, we remove the most congested edge in 
the cycle according to the current congestion information. Then we consider vertices in 
reverse topological order1 [51] of graph G'. For each vertex i, in order to minimize the 
1A reverse topological order of a directed acyclic graph (DAG) is a linear ordering of all its vertices such 
that if it contains an edge (u,v), then u appears after v in the ordering. For instance, in Figure 4.4(b), 
the  cor respond ing  reverse  topo log ica l  o rde r  i s  A,B ,C ,E ,D .  
75 
maximum congestion, we balance incoming flow to make the congestion of incoming edges 
as even as possible. Let din be the number of incoming edges for vertex i. Let fitij be the 
flow amount of commodity k, congirij be the current congestion excluding commodity k, 
and capirij be the capacity of the j-th incoming edge (1 < j < din). Our goal to minimize 
the maximum congestion over all incoming flow can be written as follows: 
Since we consider the vertices in reverse topological order, all outgoing flow of i has already 
been determined. Let fout be the total outgoing flow amount. Then the flow conservation 
constraint in equation (4.2) can be rewritten as follows: 
This problem can be easily solved by adding flow to the least congested incoming edge(s) 
until its congestion matches that of the next least congested edge. Note that there may 
be more than one edges with the least congestion. In that case, we add flow to them such 
that they still have the same congestion. We keep on adding flow until equation (4.5) 
is satisfied. See an example in Figure 4.7. For commodity k, The total outgoing flow 
amount is 6. The demand for vertex i is 2. So the total incoming flow amount should be 
6 + 2 = 8. We assign fini = 0, fin2 = 2, fin3 = 6 to edges 1, 2, and 3, respectively. The 
maximum incoming flow congestion is thus 0.6. 
The procedure of routing all commodities once is called a pass. In IFB phase, we 
perform several passes until the maximum congestion converges. Notice that in pass 
i > 2, for commodity k, we first remove all its flow in pass i — 1 and update the congestion. 
Then we balance the flow of incoming edges according to the updated congestion. Since 
our algorithm is very greedy, the maximum congestion will converge in 2 to 4 passes in 
practice. An example of the convergence of the MCNC benchmark apte circuit is shown 
Minimize < max < conginj + 
capmj 
d, 
(4.5) 
76 
Demand: Cki = 2 
conginl = 0.6 
capinl = 10 
\ font = 6 
Min(Max(conginI, congin2, congin3)) 
capin2 = 20 >— / 
congin3 = 0.3 
capin3 = 30 
(a) 
•0 .6  
fini = 0 
conginl =0.6 
\ fquj = 6 
fi"2 2 YTy.: 
congin2 = 0.5 —' T ' / 
fin3 = 6 
congin3 = 0.5 
(b) 
Figure 4.7 Illustration of Incoming Flow Balancing technique, (a) Flow dis­
tribution before assigning incoming flow, (b) Flow distribution 
after assigning incoming flow. 
Mcong 
# of pass 
Figure 4.8 The convergence of maximum congestion in IFB phase for circuit 
opte. 
in Figure 4.8. The overall flow of this phase is summarized in the IFB Algorithm below. 
4.3.5 Stepwise flow refinement (SFR) phase 
Since IFB phase can only achieve local incoming flow balance at each step, we still 
need an additional phase to obtain global solution by stepwise refining the flow solution 
given by IFB phase. An iteration in SFR phase is illustrated in Figure 4.9. 
First, we pick an edge e = { i , j }  with maximum congestion Mcong. Second, we pick 
a commodity k which contributes more than 7% of total flow amount on edge e (i.e., 
77 
IFB Algorithm: 
Input: Inner dual graph G = (V, E) 
Output:conge and on each vertex k  and edge e = { i , j }  
I n i t i a l i z e  c o n g e  a n d  f f j  t o  0  f o r  a l l  k  a n d  e  —  
While Mcong is not converged do 
For each commodity k do 
Assign flow directions to all edges to obtain DAG G'; 
If it is not the first pass, 
remove /^-, and update conge for each edge; 
Do a reverse topological ordering on DAG G'; 
For each vertex in a reverse topological order do 
Assign incoming flow in a balanced manner; 
I out 
Figure 4.9 SFR approach to globally optimize the maximum conges­
tion. In this example, e — {i,j} is with Mcong, 
Vout — {j, H,L,M,N}, Vin = {i,C,B,k}. After one iteration, 
w e  d e r i v e  n  = <  B ,  D , E , M  >  a n d  = <  B ,  i , j ,  M  > .  
78 
f£j > Mcong • cape • 7%). Third, based on the DAG G' of commodity k, we find the 
set of vertices Vout which is reachable from j. Vout includes j. Similarly, we find the 
set of vertices Vin which can reach i. Vin includes i. Then, by applying breadth first 
search (BPS), we find a simple path r, which links a vertex p G Vin and another vertex 
q 6 Vout- In the meantime, we require that r\ should not pass through edge e and the 
maximum congestion over the edges of r\ should be at least e less than Mcong. At the 
same time, we derive another path r2 which connects p and q by passing through edge e. 
Finally, we move 5f amount of flow from r2 to r\ in a way that the maximum congestion 
over the edges of ri and r2 is minimized. Sf should also satisfy an additional constraint 
that the total incoming flow amount for each vertex on r2 after moving ôf flow amount 
should not be less than its demand of commodity k. Finally, we update the congestion 
and flow amount on edges of n and r2. We keep applying the SFR technique on the 
edge with maximum congestion iteratively until it is not able to find ri and r2 for the 
current most congested edge after all commodities are tried. In practice, we could speed 
up this process by tuning the parameter 7 and e. A convergence of Mcong in SFR phase 
is shown in Figure 4.10, where the test circuit is apte, and 7 and e are set to 10 and 
0.001, respectively. We cannot find routes r\ and r2 to further improve Mcong after 13 
iterations. SRF Algorithm gives the overall flow of the procedure above. 
Lemma 8 // 7 = e = 0, the SFR algorithm always converges to the optimal solution. 
Proof: Note that if the current flow solution is not optimal, SFR can always find two 
routes r\ and r2 to reduce the maximum congestion. Since the maximum congestion is 
bounded below, the SFR algorithm always converges to the optimal solution. • 
4.4 Global Routing Solution Generation 
Based upon the final floorplan solution at the end of simulated annealing, we obtain a 
global routing solution by a simple rounding technique applied to the fractional flow solu-
79 
Mcong 
0.521 
0.505 
0.499 
0.496 
0.489 
0.482 
0.473 
0.470 
9 10 11 12 13 1 2 3 4 5 6 7 8 # of iteration 
Figure 4.10 The convergence of Mcong of apte in SFR phase, where 7 = 10, 
e = 0.001. 
SFR Algorithm: 
Input: Inner dual graph G with initial solution obtained 
from IFB algorithm. 
Output :Minimized maximum congestion Mcong. 
Do 
Pick an edge e  =  { i , j }  with maximum congestion; 
For each k with (or /jj) > Mcong • cape • 7% do 
Find Vin and Vout ; 
Find ri and r2; 
Move ôf from r2 to r\ and update the flow amount 
and congestion of edges on and r2; 
While edge e can find r\ and r2 
80 
tion. For each commodity k, we round the incoming flow of vertices in reverse topological 
order. In the process, we maintain the flow conservation constraint in equation (4.2) by 
adjusting the flow of the least congested incoming edge. Once the rounding process for all 
commodities is finished, we update the congestion for each edge. Then, we apply SFR to 
optimize the global routing solution. It is important to stress that this time we only allow 
to move an integral amount of flow, namely ôf, from one congested route to another less 
congested route. 
Table 4.1 Experimental results for MCNC benchmarks. 
MCNC 
benchmark 
Area (mm2) W L  ( m m )  F Mcong —• I Mcong Runtime (s) 
FI F2 FI F2 FI F2 FI F2 
apte 48.227 47.898 132.67 134.33 0.607 0.611 0.371 0.373 2.10 4.69 
xerox 20.243 19.864 147.38 152.01 1.059 -> 1.070 0.938 0.945 2.58 4.47 
hp 9.397 10.019 44.93 48.91 1.931 -> 2.014 0.803 -> 0.826 1.74 3.73 
ami33a 12.468 12.636 64.88 64.91 1.453 -» 1.502 0.980 -» 0.993 4.21 13.63 
ami49a 39.748 40.221 302.56 294.53 2.570 -• 2.738 1.590 1.699 8.14 75.04 
(F2-F1)/F1 +1.30% +2.64% -35.88% -36.44% +270% 
4.5 Experimental Results 
We implement the algorithm in the C programming language and test five MCNC 
benchmarks on a Sun4u machine with 8 GB memory and 750MHz Sparcv9 processor. The 
parameters of those benchmarks are listed in Table 4.2. The modules in the benchmark 
arc soft modules with aspect ratio 0.5 to 2. To calculate the capacity of each channel 
segment, we assume the sum of minimum wire spacing and minimum wire width to be 
6A. Since our congestion model is based upon 2-pin net, we decompose each multi-pin net 
into a set of 2-pin nets. In addition, we randomly choose one pin as the source since the 
benchmarks are lack of signal direction information. In the case that one pin is located 
81 
along the chip boundary, we assign this pin to its corresponding boundary room. Thus, 
each net starts from one room and ends at another room. 
Table 4.2 MCNC benchmark. 
Circuit modules nets 2-pin nets 
apte 8 97 172 
xerox 10 203 455 
hp 11 83 226 
ami33a 33 123 363 
ami49a 49 408 545 
In order to test the effectiveness and efficiency of our algorithm, we compare two floor-
planners: Fl, without congestion optimization; F2, with congestion optimization. In Fl, 
a single stage simulated annealing is used to search for a floorplan aiming at minimizing 
total area and wirelength only. The maximum congestion is obtained by applying IFB and 
SFR to the final floorplan. In F2, we employ aforementioned 3-stage simulated annealing 
to obtain a floorplan with minimized weighted sum of area, wirelength, and maximum 
congestion. The weights are set such that the costs of area, wirelength, and congestion 
are approximately equal. The initial temperature in annealing process is set to 1.5 x 106 
and drops down at a constant rate of 0.95 to 0.97 until it is below 1 x 10~10. The number 
of iterations at one temperature step is 30. For each experiment, 10 runs are performed 
and the result of the best run is reported. The area, total wirelength, maximum con­
gestion and total runtime are reported in Table 4.1. In terms of maximum congestion, 
F Mcong denotes the maximum congestion based on fractional flow and I Mcong denotes 
the maximum congestion based on integral flow after rounding. The experimental results 
show that the maximum congestion after applying rounding increases only by 2.82%, on 
average. This means our fractional flow based congestion model is fairly accurate in terms 
of estimation of the congestion for a floorplan. 
82 
Figure 4.11 Floorplanning result of ami49a using Fl 
From Table 4.1, we can notice that, compared to floorplanner Fl, the congestion-
driven floorplanner F2 can reduce the maximum congestion by 36.44% on average. More 
specifically, for ami33a, hp and xerox, Fl is not able to produce routable final floorplan 
solutions (as their maximum congestions exceed 1), whereas, F2 is able to produce routable 
solutions. For apte, both Fl and F2 generate a routable floorplan solution. However, F2 
can reduce maximum congestion by almost 40% as compared to Fl. For ami49a, both Fl 
and F2 are not able to produce a routable floorplan. Nevertheless, since the maximum 
congestion by F2 is much less than Fl, the floorplan by F2 is more likely to be successfully 
routed if more detour is allowed. Figure 4.11 and Figure 4.12 show the floorplan obtained 
by Fl and F2, respectively. The thickness of the lines in the boundaries denote the degree 
of congestion. We can see significant difference in Figure 4.11 and Figure 4.12 for ami49a 
in terms of congestion distribution, while the packing area are about the same. Hence, 
F2 can reduce the overall congestion and improve routability of a circuit significantly in 
floorplanning stage with slight increase of area (+1.30%), total wirelength (+2.64%), and 
runtime (+270%). 
4.6 Conclusion and Discussion 
In this chapter, we presented a flow based congestion estimation model for estimat­
ing the routing congestion in floorplanning level. This model is based on the inner dual 
83 
Figure 4.12 Floorplanning result of ami49a using F2. 
graph. The two-phase algorithm used in this model is optimal and efficient in estimating 
the maximum congestion of a floorplan. The experimental results show that the max­
imum congestion can be better optimized by incorporating this congestion model into 
floorplanner with slight sacrifice on area and wirelength. In the future, we plan to extend 
our algorithm to handle timing driven floorplanning and noise-aware floorplanning. 
84 
CHAPTER 5. CONGESTION ESTIMATION MODELS IN 
PLACEMENT 
5.1 Introduction 
The task in placement is to find a location for each cell such that all cells are completely 
contained in placement region and no two cells overlap with each other. Traditionally, in 
placement stage, the major objective is to minimize the total wirelength. But with growing 
complexity of chips (state-of-the-art chips have several million movable objects), not only 
wire length, but also routing congestion needs to be emphasized at the placement stage. A 
highly congested region in the placement often leads to routing detours around the region 
which results in a larger routed wire length and worse timing. Congested areas can also 
deteriorate the performance of global router and, in the worst case, create an unroutable 
placement in the fix-die regime [52]. Congestion is one of the main optimization objectives 
in global routing. However, the optimization performance is constrained because the cells 
are already fixed at this stage. Therefore, designer can save substantial time and resources 
by detecting and reducing congested regions during the planning stages. An efficient and 
yet accurate congestion estimation mod 
el is crucial to be included in the inner loop of floorplanning and placement design. 
The congestion can be interpreted as a supply and demand problem for routing re­
sources. The supply of routing resources can be roughly determined by technology pa­
rameters, such as die size, number of layers and position of preplaced macros. In planning 
stages, the demand of routing resources for each design solution can be predicted by 
85 
congestion estimation models. 
In the past few years, several congestion estimation models have been proposed to be 
utilized in floorplanning and placement design. All existing congestion estimation models 
in placement divide the placement region into tiles [41, 53, 44, 55, 43, 45, 56, 54, 58, 59]. 
These models estimate congestion based on either tile region [41, 53, 44, 43, 54] or tile 
boundary [45, 55, 56, 57, 58, 59]. For each tile or tile boundary, the expected number of 
wires routed within the tile or across tile boundary is compared to its capacity, i.e., the 
number of free routing tracks crossing the tile or tile boundary. Models defined for one 
can be easily modified for the other. Since there is no fundamental difference between 
these two approaches, we discuss them together. In this paper, congestion is estimated 
based on tile boundary. 
Previous approaches can be divided into three categories. The first category performs 
global routing on tile graph [56, 54, 57, 55, 58]. However, the run time is quite expensive, 
especially when multiple iterations are required during optimization. The second category 
applies a probabilistic map to estimate the probability distribution on tile structure [43, 
44, 53, 41]. The congestion of a tile (tile boundary) is the summation over all nets of 
the probability on that tile (tile boundary). The previous publications differ mainly in 
their probabilistic maps for a net and in the way they handle blockages. This approach is 
more efficient than the global routing based approach in the first category. Therefore, this 
idea is commonly integrated in placement stage to estimate and minimize congestion [57, 
43, 44, 46]. The third category includes other approaches like [45, 59]. In [45], Rent's 
Rule is used to estimate the peak congestion value and regional congestions on a chip and 
in [59], a normal distribution of the number of nets per tile is assumed in their congestion 
estimation model. 
Most congestion estimation models in floorplanning are also tile based by either global 
routing approach [37] or probabilistic approach [38, 39, 40]. Recently, in [60] and [47], 
congestion is estimated based on rectangular dissection structure in a floorplan instead 
86 
of tile structure. These two models are much more efficient than previous ones used in 
floorplanning, but not applicable to placement. 
Among all models discussed above, the most widely used is the probabilistic approach 
because of its simplicity and efficiency. However, the existing probabilistic approach 
mainly suffers from the following three drawbacks. First, the number of turns for each 
route is not limited in this approach. As a result, the predicted congestion could be quite 
different from the actual congestion by global router. Second, the probability distribution 
of a net is determined independent of other nets. That means the probability value 
assignment is not adaptive to the current congestion distribution. However, a global router 
routes a net based on current routing congestion distribution and also performs rip-up 
and re-route on congested regions. As a result, the probabilistic map could overestimate 
the congestion for heavily congested regions while underestimate the congestion for non-
congested regions. Third, the existing probabilistic map is only efficient for the placement 
of small and medium circuits. While for large scale circuit, the run time of this approach 
is still quite expensive. 
In this paper, three novel congestion estimation models are proposed to tackle these 
drawbacks. And we compare these models with LKS, a revision of probabilistic congestion 
estimation model proposed in [41] by experiments. 
1. Simple Probabilistic Model (S P M ) are proposed to avoid unlimited number of bends 
in a route. In addition, a linear time algorithm are applied in SPM to assign 
probability value. Experimental results show that SPM is faster and more accurate 
t h a n  L K S  .  
2. Multi-pin net Probabilistic Model (M P M )  are also proposed to speed up S P M .  
Unlike SPM, MPM does not need to apply Minimum Spanning Tree (MST) algo­
rithm to dissect multi-pin net into a set of two terminal nets. Instead, it directly 
applies SPM on each multi-pin net. As a result, MPM is about 21 times faster 
87 
than L K S .  
3. Post-Probability Processing (P P P ) are proposed to modify the predicted conges­
tion obtained by SPM so that the probability assignment will be adaptive to the 
congestion distribution. The experimental results show that PPP is quite effective 
t o  i m p r o v e  t h e  q u a l i t y  o f  p r o b a b i l i s t i c  m a p  b y  S P M .  
The remainder of the paper is organized as follows. In Section 2, we introduce and de­
fine some notations for congestion estimation based on tile structure in placement. Then, 
we summarize previous probabilistic model in [41] and propose its tile boundary version 
denoted as LKS. In Section 3, we present the details of Simplified Probabilistic Model 
(SPM) with its linear time probability assignment algorithm. In section 4, Multi-pin 
net Probabilistic Model (MPM) is proposed. In section 5, two steps of Post-Probability 
Processing (PPP) are presented in details. In Section 6, the experimental results show 
the comparisons of new models with LKS in terms of their run time and correlation. 
Finally, the conclusion and discussion will be given in Section 7. 
5.2 Notations 
Given a placement solution with pin coordinate information of each net in net list, 
we discretize the placement region with a homogeneous rectangular mesh. We analyze 
the congestion for each tile boundary in the mesh. The number of tiles is a user-defined 
parameter which depends on the core area and process technology parameters of the 
placement. 
Suppose the size of the mesh is M x N, which means the rectangular mesh consists 
of MN homogeneous tiles. An example of a 6 x 7 tile mesh is shown in Figure 5.1. We 
define several notations in this tile structure as follows. 
• Tij\ tile with its coordinate ( i , j ) ,  where 1 < i  < M, 1 < j  <  N .  
Figure 5.1 Tile structure for congestion estimation 
• RBij: right boundary of 7j,-. 
• UBif upper boundary of Ty. 
• RCif the capacity of RB^, i.e., the number of available horizontal routing tracks 
crossing RB^. 
• UCij: the capacity of UBij, i.e., the number of available vertical routing tracks 
crossing UB^. 
• RUiji the probabilistic usage of RB^ for net k, i.e., the probabilistic amount of used 
horizontal routing tracks which cross RB^ for net k. 
• UU^: the probabilistic usage of UB^ for net k, i.e., the probabilistic amount of used 
vertical routing tracks crossing UB^ for net k. 
• RDij\ demand on RB^, i.e., the total number of used horizontal routing tracks 
which cross RB^ for all nets. 
• UDij\ demand on UB^, i.e., the total number of used vertical routing tracks which 
cross UBij for all nets. 
Obviously, RDij and UDij can be derived by accumulating R U f j  and UU^ over all nets, 
respectively. The horizontal (vertical) congestion on RB^ (UBij) is defined as the ratio 
of RD^ (UD^) on RCij(UCij). 
89 
Before we explain our new models, we first go over Lou's model in [41] and propose a 
simplified version of this approach. 
5.2.1 Lou's model and its revision (L K S )  
Basically, Lou's model [41] is based on the supply and demand analysis of routing 
resources, where the supply is determined by technology parameters and the demand is 
computed by probabilistic congestion model. The model only handles two-pin net while 
multi-pin net needs to be broken into a set of two-pin nets first. For each two-pin net, this 
model considers all possible ways that a router can route. The number of turns within a 
possible route is not restricted as long as this route is monotonie. Then the model assigns 
the same probability to each possible route and computes the probabilistic track usages 
on each tile within the bounding box of the net. 
More specifically, for a two-pin net k covering an m x n mesh the model first computes 
F(m, n) as the total number of possible ways to optimally route this net. F(m, n) can be 
derived from Theorem 1. 
Theorem 1: 
•  F ( m ,  1 )  =  F ( l , n )  =  1 .  
•  F ( m ,  n )  =  F ( m  —  1 ,  n )  +  F ( m ,  n  —  1 ) .  
After that, the model computes the horizontal and vertical probabilistic usage, P x ( i , j )  
and Py(i,j), on each tile Ty for net k. The expressions of Px(i,j) and Py(i,j) are listed 
90 
as follows, where 1 < i <m and 1 < j < n. 
F ( m ,  n  —  1 )  i  —  1 ,  j  =  1  
F ( m , n )  x 
1 i = 1, j = n 
F ( m  —  i  +  1 ,  n  —  1 )  1  <  i  <  m ,  j  =  1  
F(m,n—j+l )+F(m,n—j )  1,1 < j < n 
F(m,n) x 
F(8, j ) .F(rra- i+l ,w-j)+F(i , j - l )F(m-i+l , ra- j+l)  
2 
F ( m  —  l,n) i = 1, j = 1 
1 i = 1, j = n 
F ( m  —  1 ,  n  —  j  +  1 )  i  =  1 , 1  <  j  <  n  
F( i , j )F (m- i , n - j+ \ )+F( i—l , j )F (m- i+ l ,n—j+l )  
2 
It should be noted that Lou's model estimates congestion on tile region instead of tile 
boundary. We derive its tile boundary version, denoted as LKS, with the corresponding 
RUij and UU^ in Theorem 2. Obviously, the usage assignments in LKS are much simpler 
than those in Lou's model. 
Theorem 2: 
RUk =  F ( i , j ) F ( m  - i ,n- j  +  1 )  
u F ( m , n )  I  <  i  <  m , l  <  j  <  n  
UU*,= + f  < m> , < , c „ 
J  F ( m ,  n )  
(5.1) 
(5.2) 
Proof: Given net k covering an m x n mesh as shown in Figure 5.2. Suppose the source 
s and the drain d, are located at the center of tile in lower left and upper right corner, 
respectively. Then, the total number of possible routes which start from s and arrive at 
Tij is F(i,j) according to the definition of F(i,j). The total number of possible routes 
which start from T(i+1)j and arrive at d is F(m — i,n — j + 1). Thus for net k, the total 
number of possible routes which horizontally cross RBij is F(i,j)F(m — i,n — j + 1). 
Since the total number of possible routes for net k is F(m,n), Eq. (5.1) holds. Similarly, 
we can prove Eq. (5.2). • 
91 
•c 
& 
Tu 
1 
Figure 5.2 Proof of theorem 1. 
After we obtain RU^ and UU^ for each net k, we accumulate them over all nets to 
derive RDij and UDij. 
5.3 Simple Probabilistic Model (S P M )  
Although LKS discussed above is already quite elegant, there are some drawbacks in 
this model. 
Note that in L K S ,  the maximum number of bends on a possible route for a net covering 
an m x n mesh can be min(m,n). In addition, it is assumed that a multi-bend route 
shares the same probability as L, Z or W shape route. However, a realistic router restricts 
the maximum number of bends on a route especially for global or semi-global nets. And 
most nets are routed in L,Z or W shape. Thus, the congestion estimation results by 
LKS can be quite different from actual result. In this subsection, a new probabilistic 
congestion model called Simple Probabilistic Model(S'PM) is proposed as a remedy for 
Unlike L K S , for a net k  covering an m  x  n  mesh, S P M  assumes every RBij within 
the bounding box of net k has the same horizontal probabilistic usage. And every UBij 
within the bounding box of net k also shares the same vertical probabilistic usage. RU 
92 
i  u-j r  
1 m 
Figure 5.3 n  pseudo horizontal routes with weight of 1/n in S P M  
and UUjj are given as follows. 
RUf, = - 1 <i <m,l < j <n (5.3) 
J n 
UUij — — 1 < i < m,l < j < n (5.4) 
Note that the probabilistic usage assignment is equivalent to evenly placing m vertical 
routing segments with weight of 1 /m and n horizontal routing segments with weight of 
1/n within the bounding box of net k. The horizontal routing segments are illustrated in 
Figure 5.3. We denote these horizontal and vertical routing segments as weighted pseudo 
horizontal routes and vertical routes, respectively. These segments are called pseudo 
routes since they are not real in actual global routing. We only make use of them to predict 
the horizontal and vertical usages. Note that we have identical horizontal (vertical) 
probabilistic usage for each net, we thus propose an efficient algorithm to calculate RDij 
and UDij. 
W.L.O.G., we assume each weighted pseudo vertical route starts from bottommost 
tile and stops at uppermost tile while each weighted pseudo horizontal route starts from 
leftmost tile and stops at rightmost tile in the bounding box. Then for all nets, we define 
RFij as the difference between the total amount of weighted pseudo horizontal routes 
which start from 2'y and that which stop at Ti:j. Similarly, UFij is the difference between 
93 
the total amount of weighted pseudo vertical routes which start from Ti3 and that which 
stop at Tij. 
Note that for each net k, we only need to update RF^ for Ty on leftmost and rightmost 
columns, and update UFij for on uppermost and bottommost rows in the bounding 
box. More specifically, suppose the coordinates of lower left tile in bounding box is (xi, y\), 
and the upper right tile is (x2,y2)- We increase each R F X l j  ( y 1  <  j  <  y  2 )  by 1/n and 
d e c r e a s e  e a c h  R F X 2 j  ( y i  <  j  <  y 2 )  b y  1 / n .  S i m i l a r l y ,  w e  i n c r e a s e  e a c h  U F i y i  ( x i  <  i  <  x 2 )  
by 1 /m and decrease each UFiy2 (x\ < i < X2) by 1/m. After we scan all nets one by 
one, RDij and UDij can be derived by following theorem. 
Proof: Note that we assume every weighted pseudo horizontal route walks from left to 
right. For each RBij (1 < j < N), the total amount of routes which cross RBij equals 
the total amount of weighted horizontal pseudo routes which start from Ty. Since no 
horizontal pseudo route stops at Ty, the Eq.(5.5) in Theorem 3 thus holds. For each 
RBij (l<i<M,l<j<N), the weighted pseudo horizontal routes which cross RBi7-
consist of two kinds of routes. The first kind is the route which starts from T^. The second 
kind is the route which crosses RB^iyj while does not stop at Tl:j. By the definition of 
and RFij, the Eq. (5.6) holds. Similar proof can be done for Eq. (5.7) and (5.8). 
• 
An example to calculate horizontal usage is shown in Figure 5.4, where the graph only 
includes the weighted pseudo horizontal routes. And the weight of each pseudo horizontal 
route is assumed as 0.5. 
Theorem 3: 
RD^ = RD(i-\)j + RF^ 1 < i < M, 1 < j < N 
C/Dii - % 1 < % < M 
RD^ = RFij 1 < j < N (5.5) 
(5.6) 
(5.7) 
[ZD,, - + [/F,, 1 < i < M, 1 < j < AT (5.8) 
94 
1 2 3 4 5 
Figure 5.4 Calculating horizontal usages in S P M  
The pseudocode of S P M  is summarized in Algorithm 1. We compare the time com­
plexities for LKS and SPM as follows. Assume the number of multi-pin nets in a place­
ment is K and we use MST for multi-pin nets. Assume the size of mesh is M x M and the 
maximum number of pins for any net is P. Then both LKS and SPM take 0(P2) [61]1 
t o  c a l l  M S T  f o r  e a c h  n e t .  N o t e  t h a t  f o r  e a c h  t w o - p i n  n e t  k  c o v e r i n g  a n  m x n  m e s h ,  L K S  
needs to assign 2Mr — m—n probabilistic usages to tile boundaries (i.e., (m —l)n times for 
RUfj and (n — l)m times for UU^)2. While SPM only needs to update 2/(m)n + 2/(n)m 
times of RD^ or UDij ( i.e., 2f(m)n times for RD^ and 2f(n)m times for UDij), where 
/(%) 
0 x  =  1  
1 z > 2 
(5.9) 
Thus the run time of LKS is 0(KP2 + M2KP) and the run time of S P M  is 0 ( K P 2  +  
M K P  +  M 2 )  w h i c h  e q u a l s  0 ( K P 2  +  M K P ) .  
In S P M ,  we apply the same approach as described in Lou's model to handle routing 
blockages (obstacles) in placement. We omit this part due to the page limitation. 
5.4 Multi-pin Net Probabilistic Model (M P M )  
Note that for each multi-pin net, L K S  and S P M  need to apply either Minimum 
Spanning Tree (MST) or Rectilinear Steiner Tree (RST) to convert it into a set of two-
pin nets. However, the run time for MST takes a significant portion in the total run time 
lrThe MST algorithm we used is an 0{P 2 )  time implementation of Prim's algorithm in RMST-Pack by 
Andrew Kahng and Ion Mandoiu downloaded from GSRC Bookshelf [61]. This algorithm is significantly 
faster than O(PlogP) algorithm [? ] due to small P typically seen in VLSI circuits.The run time 
comparision between these two algorithms is referenced in [61] 
2Lou's model [41] needs to assign ran probabilistic usages to tiles. 
95 
Algorithm 1: 
Initialize RF.\j, UF^ to 0 
For each net in the design 
MST(net) 
For each segment of the MST 
Update RF^ and UFij for the specified Ty within the 
bounding box. 
For each RBvt 
Compute RD^ and horizontal congestion 
For each £/•By-
Compute UDij and vertical congestion. 
of L K S  or S P M .  This has been discussed in the previous subsection and will be verified 
in the experimental results. 
Thus, for large scale design, the existing probabilistic congestion model is still quite 
expensive in terms of run time. 
In order to speed up the estimation process, we propose an approach to directly apply 
SPM on multi-pin nets. The new model is called Multi-pin net Probabilistic Model 
(.MPM). Instead of applying MST on multi-pin nets, MPM takes each multi-pin net as 
a weighted two-pin net. The bounding box of the weighted two-pin net is the rectangle 
which covers all pins of the multi-pin net. The weight w(p) is a variable of the pin count, 
p, in a multi-pin net. w(p) is listed in Table 5.1, which is cited from [43]. 
The experimental results show that M P M  dramatically reduces the total run time for 
congestion estimation, especially for large size circuits. However, MPM is only a little 
bit worse than LKS in terms of quality. 
5.5 Post-Probability Processing ( P P P )  
As we discussed before, existing congestion estimation models simply accumulate the 
probabilistic usage in tile or tile boundary for all nets. For the final congestion distribu­
tion, these nets are independent to each other. In other words, assignment of probabilistic 
96 
Table 5.1 Net weight w { p )  in MPM. 
pin count p net weight w ( p )  pin count p net weight w ( p )  
2-3 1.0000 15-19 1.6899 
4 1.0828 20-24 1.8924 
5 1.1536 25-29 2.0743 
6 1.2206 30-34 2.2334 
7 1.2823 35-39 2.3895 
8 1.3385 40-44 2.5356 
9 1.3991 45-49 2.6625 
10-14 1.4493 50-200 2.7933 
B 
I 
111 
c 
Figure 5.5 Congestion estimation by existing probabilistic approach. 
usage for a specific net is not adaptive to current congestion distribution. As a result, 
the congestions on some boundaries are overestimated while others are underestimated. 
However, a global router typically tries to avoid over congested region by adaptively rout­
ing each net according to current congestion distribution or reduce over congested region 
as much as possible by performing post route optimization (or rip-up and reroute). 
For example, in Figure 5.5, we have two 2-pin nets {A, B }  and { C ,  D } .  Their bounding 
boxes overlap in region II. By existing probabilistic approach, we will reach the conclusion 
that routing region II is more congested than regions I and III. However a global router 
can avoid congestion by routing net {A, B} in region I and {C,D} in region III. The 
r e s u l t i n g  c o n g e s t i o n  i n  I I  c a n  e v e n  b e  l e s s  t h a n  t h a t  i n  I  a n d  I I I .  
97 
Post-Probability Processing ( P P P )  is a procedure proposed to deal with the over 
congested regions so that the estimated congestion correlates better with the actual result. 
Basically, this procedure can be applied to SPM, MPM or LKS. However, in our 
experiments, we only apply PPP on SPM since we want to achieve the best correlation 
in comparative time. Typically, PPP consists of two steps. Step one is to reassign the 
probabilistic usage to tile boundary for each net based on the congestion distribution 
obtained from SPM. Step two is to locally even out the congestion distribution of the 
over congested regions by moving certain amount of probabilistic usages from the over 
congested tile boundary to its neighbors. 
These two steps are described in more details as follows. 
In step one, we consider all nets one by one. Given a net covering an m x n  mesh, we 
have n pseudo horizontal routes and m pseudo vertical routes. For each pseudo horizontal 
route i?,j (1 < i < n), we find out its maximum congestion M Ci over m, — I RBtJ on Rt. 
A f t e r  t h a t ,  f r o m  m  p s e u d o  h o r i z o n t a l  r o u t e s ,  w e  p i c k  t w o  r o u t e s ,  d e n o t e d  a s  A  a n d  B ,  
with largest MQ and smallest M Ci, respectively. Then we move the weight of route A to 
route B, equivalently decrease each RD^ on route A by 1/n and increase RD^ on route B 
by 1/n. A similar procedure will be performed on m pseudo vertical routes. An example 
of reassigning horizontal probabilistic usage is given in Figure 5.6. In this example, each 
RCij is 10 and the two-pin net covers a 4 x 4 mesh. The horizontal usages before step one 
and after step one are depicted in Figure 5.6(a) and (b), respectively. 0.25 of horizontal 
usage is moved from pseudo horizontal route i?2 to R4. 
In step two, we first scan each RB,.j ( 1 < i < M and 1 < j < N). If RDl;l exceeds 
RCij, a global router will be supposed to perform rip-up and reroute on some nets which 
cross RB^. In addition, we observe that most post route optimization procedure applies 
local detour to some nets in congested regions. In order to achieve more accurate conges­
tion estimation on those over congested regions, we propose step two to emulate the effect 
of rip and reroute on congestion re-distribution. We locally even out the congestion dis-
98 
4 5 3 
__d MC4 = 0.5 4.25 5.25 3.25 d 
7 7 4 MC3 = 0.7 7 7 4 
6 8 9 MC2 = 0.9 5.75 7.75 8.75 
n 7 
5 8 MCI - 0.8 s 1 5 8 
(a) (b) 
Figure 5.6 Step one in P P P .  (a) The horizontal usages before step one. 
(b) The horizontal usages after step one. 
RD i(j+l) 
UD ij 
RDij 
UD 
RDW-1) 
UD(i+i)0-i) 
Figure 5.7 Step two in P P P ,  the over congested horizontal boundary, R D ^ ,  
with its six neighbors 
tribution over the congested tile boundary with its six neighbors illustrated in Figure 5.7. 
In practice, we replace these seven horizontal (or vertical) demands with their average 
demand, i.e., AverDem, shown in the following equation. 
AverDem — (RD^ + RD^j—i} + RDi(j+1) + UDij 
+  U  D i j - i  +  U  - D ( i + 1 ) j  +  U  D ( j + i ) ( j _ i ) ) / 7  
A similar procedure is performed on vertical over congested tile boundaries. The 
experimental results show that PPP can significantly improve the quality of congestion 
estimation based on SPM while the run time is increased slightly. 
99 
5.6 Experimental Results 
We implemented four models in C. The benchmarks in our experiments are derived 
from ISPD-02 suite [62] and tested on a Sun4u machine with 8 GB memory and 750MHz 
Sparcv9 processor. Statistics for placement benchmarks are given in Table 5.2. For each 
circuit, we place it using a wire-length driven placer, FastPlace [63], then perform routing 
by a global router which is based on maze routing and rip-up and reroute [64]. After that, 
we compare the congestion distributions generated by global router and four congestion 
estimation models. For each benchmark circuit, we take the number of rows as the size 
of the mesh. In order to test the effectiveness of our models on circuits with different 
levels of congestion, we provide two sets of capacities to the benchmark circuits: small 
capacities to test heavily congested circuits and large capacities to test lightly congested 
circuits. In addition, we define two metrics and measure the degree of correlation between 
the predicted and actual congestion. 
• Congestion Difference (CD): the average on the difference between the predicted 
and actual routing congestion on tile boundaries. Since the congestion map is to 
detect the congested tile boundaries, we only count the tile boundaries with their 
actual routing congestions no less than 0.8. 
• Percentage of over Congested tile boundaries ( P C ) :  the ratio of the number of 
tile boundary with its routing congestion larger than 1 over the number of tile 
boundaries in the design. 
Note that a good correlation is indicated by a small CD or a predicted PC which is close 
to actual PC. 
100 
Table 5.2 Placement Benchmark Statistics 
Circuit #Nodes #Nets #Pins net #Rows 
ibmOl 12506 14111 50566 96 
ibm02 19342 19584 81199 109 
ibm03 22853 27401 93573 121 
ibm04 27220 31970 105859 136 
ibm05 28146 28446 126308 139 
ibm06 32332 34826 128182 126 
ibm07 45639 48117 175639 166 
ibm08 51023 50513 204890 170 
ibm09 53110 60902 222088 183 
ibmlO 68685 75196 297576 234 
ibmll 70152 81454 280786 208 
ibml2 70439 77240 317760 242 
ibml3 83709 99666 357075 224 
ibml4 147088 152772 546816 305 
ibml5 161187 186606 715823 303 
ibml6 182980 190048 778823 347 
ibml7 184752 189581 860036 379 
ibml8 210341 201920 819697 361 
5.6.1 Correlation 
Table 5.3 and Table 5.4 report the congestion correlation between predicted results and 
actual results. Small and large routing capacities are used in Table5.3 and5.4, respectively. 
We assume RC^ is same as UCIJ. PCR represents PC given by global router. Columns 3 
to 6 give CD of LKS, MPM, SPM and SPM .PPP, respectively. Columns 8 to 11 give 
PC — PCR for LKS, MPM, SPM and SPM.PPP, respectively. Obviously, a positive 
(negative) number means the corresponding model over (under) estimates congestion on 
this circuit. The best CD and PC—PCR for each circuit in terms of congestion correlation 
are in boldface. Note that Aver on PC — PCR is the average on the absolute value of 
PC — PCR. We compare three proposed models with LKS by their Aver. 
Table 5.3 and 5.4 show that SPM.PPP consistently has the best CD among four 
101 
Figure 5.8 Congestion map by global router. 
Figure 5.9 Congestion map by L K S .  
Figure 5.10 Congestion map by M P M .  
102 
Figure 5.11 Congestion map by S P M .  
Figure 5.12 Congestion map by S P M - P P P  
103 
models. Compared with CD L K s ,  CD SPM.PPP is on average 21.80% and 19.90% smaller 
for highly congested and lightly congested circuits, respectively. CDSPM is on average 
5.24% and 6.47% smaller than CDLKS for highly congested and lightly congested circuits, 
respectively. While CDMPM is 8.06% and 9.95%, on average, larger than CDLKs for 
highly congested and lightly congested circuits, respectively. In terms of PC estimated 
by four models, PCspm-Ppp is closest to PCR, on average. Compared to LKS, the 
average difference between PCSPM.PPP and PCR is decreased by 46% and 90.79% for 
highly and lightly congested circuits, respectively. In other words, among these four 
models, SPM-PPP shows the best correlation between its predicted congestion versus 
actual congestion. 
We also notice that for all lightly congested circuits, both LKS and SPM significantly 
overestimate the percentage of congested tile boundaries. In other words, PPP can be 
taken as  a  remedy to  overcome this  drawback in  LKS and SPM.  
A congestion map visually plots the congestion in the design by assigning different 
colors to different congestion costs. A lighter color means higher congestion cost. From 
Figure 5.8 to 5.12, we plot the congestion maps of circuit ibmld with both horizontal and 
vertical capacity of 17 by global router and four models. 
5.6.2 Run time 
Table 5.5 reports the run time for different models and global router. Due to the page 
limitation, we only report the run time for highly congested circuits. The run time for 
lightly congested circuits are very similar to that of highly congested circuits. In this 
table, Column 2 gives the run time used to call MST on multi-pin nets. Columns 3, 4 
and 6 give the run time used by LKS, SPM, and SPM-PPP to estimate congestion for 
two-pin nets  given by MST.  Column 5 gives  the run t ime used to  direct ly  apply MPM 
on nets given by circuits. If we do not count the run time of MST, SPM is on average 
2.05 times faster than LKS. LKS-PPP is on average 1.37 times faster than LKS. If 
104 
the run time of M S T  is included, M P M  is much faster than other three models due 
to the significant portion of run time used by MST. Table 5.5 shows that MPM is on 
a v e r a g e  2 1 . 7 0  t i m e s  f a s t e r  t h a n  L K S .  
Overall, Compared to L K S ,  S P M  is slightly better in correlation and slightly faster 
i n  r u n  t i m e ;  M P M  i s  c o m p a r a t i v e  i n  c o r r e l a t i o n  a n d  m u c h  f a s t e r  i n  r u n  t i m e ;  S P M . P P P  
is much better in correlation and slightly faster in run time. 
5.7 Conclusion and Discussion 
In this paper, we presented three different congestion estimation models for estimating 
the routing congestion in placement. All these models are based on probabilistic approach. 
An efficient yet more accurate probabilistic usage assignment is proposed in Simple Prob­
abilistic Model (SPM). Multi-net Probabilistic Model (MPM) is proposed based on 
SPM to reduce the total run time for congestion estimation process. SPM-PPP is pro­
posed to improve the correlation of SPM . However, there is still a lot of room to improve 
the effectiveness of Post-Probability Processing. In the future, we plan to integrate our 
different congestion estimation model in different stage of congestion driven placement. 
105 
Table 5.3 Correlation of predicted congestion with actual results for heavily 
congested circuits. 
Ckt Cap. CD PC* PC- PC* 
.LKS MPM 3PM P P P  M P M  SPM PPP 
ibmOl 9 0.240 0.322 0.215 0.160 21.7 6.5 -4.7 6.2 0.3 
ibm02 17 0.241 0.298 0.226 0.194 20.4 3.0 -0.6 3.5 0.4 
ibm03 18 0.258 0.267 0.225 0.205 20.5 2.3 1.4 2.9 2.4 
ibm04 16 0.262 0.286 0.254 0.188 28.9 -4.9 -6.5 -4.9 -4.7 
ibm05 28 0.189 0.194 0.176 0.138 23.3 0.1 6.7 0 -0.1 
ibm06 17 0.194 0.213 0.183 0.122 13.3 9.0 4.7 8.8 2.0 
ibm07 17 0.258 0.289 0.249 0.216 17.4 3.0 -0.2 3.0 1.8 
ibm08 17 0.266 0.271 0.255 0.206 21.9 1.0 -0.5 -1.1 -1.1 
ibm09 16 0.245 0.262 0.233 0.175 14.2 7.9 6.6 8.0 4.3 
ibmlO 17 0.244 0.259 0.237 0.186 14.1 6.6 5.6 6.9 4.1 
ibmll 17 0.250 0.260 0.237 0.183 18.4 5.0 2.4 4.9 2.0 
ibml2 21 0.252 0.258 0.239 0.201 17.3 1.9 1.7 1.9 0.1 
ibml3 19 0.261 0.289 0.249 0.230 24.4 -2.6 -4.0 -2.6 -1.1 
ibml4 18 0.256 0.274 0.245 0.223 28.1 -6.2 -6.6 -6.0 -3.4 
ibml5 22 0.282 0.288 0.267 0.226 40.4 -12.5 -10.6 -12.3 -11.5 
ibml6 22 0.236 0.247 0.228 0.189 7.6 5.4 6.7 5.4 -0.2 
ibml7 22 0.249 0.259 0.241 0.205 24.4 -2.1 -0.7 -2.3 -1.9 
ibml8 18 0.280 0.281 0.274 0.241 32.4 -10.2 -10.3 -10.2 -8.8 
Aver 0.248 0.268 0.235 0.194 5.0 * 4.5 * 5.0 * 2.7 * 
Deere -8.06% 5.24% 21.80% 10% 0% 46% 
* Average on abstract value of PC — PCR 
106 
Table 5.4 Correlation of predicted congestion with actual results for lightly 
congested circuits. 
Ckt Cap. CD % PC -PC* 
M P M  SPM P P P  MPM SPM P P P  
ibmOl 12 0.227 0.277 0.207 0.165 1.9 8.8 5.6 8.7 0.9 
ibm02 22 0.221 0.279 0.208 0.165 0.8 8.9 9.7 8.9 -0.1 
ibm03 23 0.179 0.187 0.160 0.133 0.1 8.5 6.3 8.2 -0.1 
ibm04 22 0.211 0.233 0.200 0.159 0.5 6.4 5.2 6.6 0.3 
ibm05 32 0.133 0.140 0.121 0.098 0.0 11.1 14.8 10.7 0 
ibm06 20 0.149 0.174 0.135 0.115 0.0 9.0 5.7 8.9 0 
ibm07 22 0.276 0.311 0.269 0.240 2.5 5.1 4.6 5.1 -1.4 
ibm08 22 0.283 0.281 0.272 0.242 2.8 3.6 3.9 3.7 -0.6 
ibm09 20 0.168 0.190 0.149 0.129 0.0 7.8 6.3 7.4 0 
ibmlO 21 0.188 0.201 0.173 0.142 0.2 8 7.3 8.1 0.2 
ibmll 21 0.211 0.223 0.197 0.161 0.1 9.9 7.8 9.5 0.3 
ibml2 24 0.250 0.251 0.238 0.201 3.6 7.1 6.5 7.0 -2.1 
ibml3 24 0.192 0.219 0.179 0.158 0.0 8.0 7.5 7.8 0 
ibml4 23 0.192 0.205 0.178 0.150 0.2 7.3 7.1 7.3 0.1 
ibml5 32 0.199 0.207 0.187 0.157 0.0 7.6 8.0 7.1 2.0 
ibml6 26 0.154 0.174 0.144 0.128 0.1 5.3 6.0 5.1 0.6 
ibml7 28 0.191 0.204 0.181 0.176 0.1 7.6 8.3 7.4 2.0 
ibml8 25 0.196 0.226 0.192 0.179 0.1 5.9 7.2 5.9 2.3 
Aver 0.201 0.221 0.188 0.161 7.6* 7.1* 7.4* 0.7* 
Deere. -9.95% 6.47% 19.90% 6.58% 2.63% 90.79% 
* Average on abstract value of PC — PCR 
107 
Table 5.5 Run time for congestion estimation models and global router. 
Circuit MST(a) ZJfSfs) &PM(s) MPM(s) &PM_PPP(s) Router(h) 
ibmOl 2.479 0.332 0.240 0.142 0.302 1.150 
ibm02 5.469 0.584 0.416 0.203 0.513 3.267 
ibm03 4.148 0.908 0.438 0.281 0.594 7.417 
ibm04 6.798 0.999 0.503 0.322 0.714 7.917 
ibm05 7.758 1.773 0.658 0.285 1.226 14.617 
ibm06 6.112 1.054 0.628 0.344 0.782 8.417 
ibm07 6.992 1.710 0.851 0.459 1.379 27.300 
ibm08 11.470 2.103 1.195 0.472 1.663 21.517 
ibm09 8.971 2.286 1.293 0.563 1.779 31.617 
ibmlO 11.733 4.921 1.846 0.730 3.055 63.417 
ibmll 10.757 3.106 1.568 0.758 2.454 46.450 
ibml2 13.325 5.565 2.106 0.758 3.680 58.617 
ibm!3 13.715 4.329 2.185 1.050 3.320 41.050 
ibml4 18.613 9.240 3.654 1.617 5.941 97.467 
ibml5 26.349 11.367 5.126 2.064 7.649 103.667 
ibml6 26.814 10.896 5.123 2.044 8.181 156.400 
ibml7 34.918 17.684 6.263 2.355 10.944 195.350 
ibml8 30.367 11.012 5.523 2.385 8.583 212.967 
Speedup 2.05* (1.14+) 21.70 1.37* (1.07+) 
* Average times faster than LKS without considering run time of MST. + Average 
times faster than LKS with run time of MST included. 
108 
CHAPTER 6. EFFICIENT RECTILINEAR STEINER TREE 
CONSTRUCTION WITH RECTILINEAR BLOCKAGES 
6.1 Introduction 
Given n points on a plane, a Rectilinear Steiner Minimal Tree (RSMT) connects these 
points through some extra points called steiner points to achieve a tree with minimal total 
wire length. Many works have been done on this fundamental problem in VLSI physical 
design. However, most of them did not take blockages into consideration. In fact, today's 
design often contains many rectilinear routing blockages, e.g., macro cells, IP blocks, 
and pre-routed nets. Thus, rectilinear Steiner minimal tree construction with rectilinear 
blockages (RSMTRB) becomes a very practical problem. 
Generally, RSMT is used in initial net topology creation for global routing or incre­
mental net tree topology creation in physical synthesis. And it is also utilized to accurately 
estimate congestion and wire length in early design stages, like block floorplanning and 
cell placement. The timing and congestion information obtained from RSMT can be used 
as a criteria in timing and congestion driven routing. It is the problem applied hundreds 
of thousands times and many of them have very large input sizes, RSMT thus deserves 
much intensive research in VLSI CAD. 
Unfortunately, RSMT itself was shown to be strongly NP-complete by Shi [65]. Tak­
ing blockages into account dramatically increases the problem complexity. Thus, it is 
extremely unlikely that an efficient optimal algorithm exits for RSMTRB. Although 
there exist some herustic algorithms for this problem, they have either poor performances 
109 
or expensive running time. 
Existing heuristics for RSMTRB can be classified into three categories. 
The first category is maze routing [66] based approach. Maze routing can optimally 
routes two pin nets. However, for multi-pin nets, designers need to introduce a multi-
terminal variant, which incurs a solution far from optimal. In addition, since the runtime 
and memory used in maze routing are proportional to the size of the routing area rather 
than the size of actual problem (i.e., the number of pins and blockages). Maze routing 
algorithms are inefficient in runtime and memory. 
The second category is called sequential approach which typically consists of two 
steps. Step 1 is to construct a tree Ti, which is either a minimum spanning tree (MST) 
or a Steiner minimal tree (SMT) with absence of blockages. Step 2 is to transform T\ 
to a RSMT with blockages by substituting edges around the blockages for the edges 
overlapped by the blockages. These approaches are commonly used in industry due to 
its simplicity and efficiency. However, since step 1 neglects the global view of blockages, 
step 2 can only locally remove overlap between Ti and blockages. The quality (i.e., the 
total wire length) of resulting RSMT can be much worse than expected in many cases. A 
commonly used approach is illustrated in Figure 6.1. This approach is to first construct 
an MST without considering any blockages. Then a RSMT without overlapping with 
any blockage is constructed through a simple line sweep technique. 
Later on, Yang et. al [67] introduce a complicated 4-process heuristics to remove 
the overlaps in step 2 in a more wise way. However the approach still can not avoid bad 
solution for many cases. This is because a bad Ti due to neglecting blockage in step 1 
could introduce an unexpected routing detour in step 2. 
The third category consists of connection graph [68] based approaches. Typically, the 
approach in this category is to first construct a connection graph by pins and blockage 
boundaries, which guarantees that at least one rectilinear Steiner minimal (or close to 
minimal) tree is embedded in the graph. Then, some graph searching technique is used to 
110 
Step 1 Step 2 
Figure 6.1 Sequential approach to solve RSMTRB. 
find a RM ST as a subgraph from the connection graph. Unlike sequential approach, this 
approach can globally catch the view of both pins and blockages in the design. Therefore, 
the connection graph based approach can generally achieve an optimal (or near optimal) 
AMST. 
The efficiency of connection graph based approach depends on the size of graph. While 
the accuracy or effectiveness of this approach depends on whether the graph contains a 
good steiner minimal tree. Obviously, there is a trade-off between efficiency and accuracy 
in this approach. 
In [69], a connection graph is called escape graph which is constructed by escape 
segments. The number of vertex in the escape graph can be 0(n2) in the worst case, 
where n is the sum of pins and blockage boundaries. Even the size of a reduced escape 
graph is still quite huge. In [70], authors proposed a connection graph with 0(nlogn) 
vertices and edges and later on [71], they introduced an even smaller graph which contains 
0(n\/logn) vertices and 0(nlog3/2 n) edges. The construction of the connection graph 
takes 0(n log3/2 n) time and memory usage. 
In this chapter, we will propose an efficient and effective connection graph. It is called 
spanning graph. We show that the spanning graph contains only 0(n) vertices and 0(n) 
edges, which is smaller than any previous connection graph. In addition, we show that 
Ill 
our spanning graph can always produce a RM ST with good quality. Due to the special 
property of the spanning graph, the construction takes only 0(n log n) time and memory 
usage, which are also smaller than those in any previous connection graph construction. 
We organize the rest of the chapter as follows. In section 6.2, we will formally define the 
problem and explain the details of the basic component in the problem. In section 6.3, we 
will first demonstrate the drawback in the escape graph and introduce the spanning graph 
as a connection graph in RSMTRB. Section 6.4 describes the details of construction of 
spanning graph for RMSTRB. The experimental results are shown in section 6.5. The 
chapter is concluded in section 6.6 
6.2 Problem Formulation 
Let P = {pi,P2,P3, • • • ,Pm} be a set of pins for m pin net. Let B = {bi,b2,b3,... ,bk} 
be a set of rectangular blockages. Let V = {vl,v2, u3,..., vn} = PU {corners in B} 
as the vertex set in the problem, where each Vi has coordinates (z«, %). Note that each 
rectangular blockage has 4 corners, we have n < m + 4k. The rectilinear distance between 
Vi and Vj is given as |xj — Xj\ + |% — yj\- A RSMT connects all pins through some 
extra points (called Steiner points) to achieve a minimal total length, while avoiding the 
intersection with any blockage in the design. 
6.2.1 Rectilinear blockages 
If all boundaries of a blockage are either horizontal or vertical, we call this blockage 
as rectilinear blockage. Note that each rectilinear blockage can be dissected into a set of 
rectangular blockages (see Figure 6.2 as an example). In the rest of chapter, for simplicity, 
we assume each blockage to be rectangular. 
112 
M 
Figure 6.2 Dissect a rectilinear blockage into 3 rectangular blockages. 
(a) (b) (c) 
Figure 6.3 Three types of blockages: (a) Complete blockage (b) Horizontal 
blockage (c) Vertical blockage 
6.2.2 Directional blockages 
In multi-layer routing, there exist three types of blockages. The first one is called 
complete blockage which blocks all vertical and horizontal metal layers in its obstructed 
area. A complete blockage requires all routes must detour around it. The second type is 
denoted as horizontal blockage in which all vertical layers in obstructed area are blocked 
while a certain number of horizontal layers are still available for routing. The routes are 
allowed to horizontally pass through the obstructed area, while not allowed in vertical 
direction. The third type is called vertical blockage, where routes can still horizontally 
pass through the obstructed area. These three types of directional blockages are illustrated 
in Figure 6.3. 
113 
i-
pl[ 
Figure 6.4 Escape graph 
6.3 Escape Graph Versus Spanning Graph 
6.3.1 Redundancy in escape graph 
As we mentioned in Section 6.1, a connection graph can catch the global view of 
both blockages and pins. And the efficiency of this approach highly depends on the size 
of the connection graph. In other words, a good connection graph is able to describe 
all necessary geometrical relationship between pins and blockages using as few edges as 
possible. In [69], the connection graph is called escape graph, which is constructed by 
escape segments. Escape segments are formed by horizontal and vertical lines extending 
from pins and blockage boundaries, and ending with their abutment to either a blockage 
boundary or the internal perimeter of the routing region. The number of vertex in the 
escape graph can be 0(n2) in the worst case. An example is shown in Figure 6.4. The 
collection of the escape segments (shown as dashed segments) composes the escape graph. 
And the graph preserves a good RSMT for a multi-pin net. However, we notice that most 
of edges and vertices are redundant in the escape graph to find a RSMT. For example, 
instead of using 13 edges, 3 edges (shown as solid segments) are enough to represent 
the connection relationship between the blockage b2 and pin p2 in the corresponding 
connection graph. 
114 
• 
Rs R4 
Figure 6.5 Eight regions defined for each point in spanning graph. 
6.3.2 Spanning graph 
In [72], a spanning graph is introduced as an intermediate step in minimal spanning 
tree construction. Given a set of points on the plane, a spanning graph is an undirected 
graph over the points that contains at least one minimal spanning tree. The number of 
edges in the graph is called the cardinality of the graph. 
The construction of spanning graph is illustrated in Figure 6.5. From each point p, 
a plane can be divided into 8 regions by horizontal, vertical and ±45° lines through p. 
It can be proved that the rectilinear distance between any two points in one region is 
always smaller than the maximal distance from them to p. Due to the cycle property of 
a minimal spanning tree, that is, the longest edge on any cycle should not be included in 
any minimal spanning tree, which means only the closest point to p in each region needs 
to be connected to p. Considering all given points, the connections will form a spanning 
graph of cardinality 0(n). In other words, spanning graph is able to describe the relative 
geometrical relationship between points in the plane using 0(n) edges. 
Enlightened by the spanning graph in M ST construction, we borrow this idea to solve 
RSMTRB. The details are presented in following section. 
115 
6.4 Spanning Graph Based Approach in RSMTRB 
6.4.1 Search regions 
In general spanning graph, each point p corresponds to 8 regions which are divided by 
the horizontal, vertical and ±45° lines going through p. Then we search each region and 
find the closest point in each region and connect it to p. While in RSMTRB, for each 
blockage bi} we divide the whole plane into 8 regions as shown in Figure 6.6(a). Each 
corner of blockage bi has three neiboring search regions which are adjacent to pj. We call 
these three regions as neighboring search regions for a given corner c. For example, the 
neiboring search regions for the lower left corner pi in Figure 6.6(a) are Rg, Ri and R2. 
For each pin p, we divide the whole plane into four search regions. The corresponding 
neiboring search regions for p are Ri, R2, R3 and i?4 as shown in Figure 6.6(b). 
For each v G V, we connect the closest visible point in each neiboring search region 
to p. Note that the visible point in each search region could be different for complete 
blockage and directional blockage. For example, the only difference between Figure 6.7 
(a) and (b) is that the blockage 63 is a complete blockage in (a) while a horizontal blockage 
in (b). The search region R2 for blockage 64 is denoted as the enclosed area in the dashed 
segments. Obviously, the search region in (b) is larger than that in (a). In Figure 6.7(a), 
there exists only one visible point in region R2 of 64. While in Figure 6.7(b), there are 
three visible points in R2 of 64. 
The reason why we only consider three search region for each corner p of blockage b is 
that for any visible point q in the rest five search regions of b, we can always find another 
corner p of b such that q lies in one search region of p and a shortest path from p to q 
can be obtained by making p as an intermediate point in the path. 
In addition, the reason why we only search the visible points is that between the visible 
region and invisible region, there must exist at least one intermediate blockage, thus the 
point p can always reach the points in inviable region by making use of the boundaries 
116 
R3 
J?2 A. 
Rs 
R2 m Re 
R. * R8 « Rv 
R2 ; R3 
' 
R1 ! R4 
(a) (b) 
Figure 6.6 Search regions for blockages and pins 
(a) (b) 
Figure 6.7 Visible points in search region for different blockages 
of the intermediate blockage(s) as part of the shortest path. For example in Figure 6.8, 
region R2 of blockage b2 is invisible to the upperleft corner p of b\. Note that any point 
in R2 of b2 can be reached from p by using boundaries of b2 as part of the shortest path. 
6.4.2 Spanning graph construction in RSMTRB 
First, from blockage set B, we find out a subset Bs which contains each blockage lying 
within or intersected by the bounding box of net. Let Vs be a point set which includes all 
Figure 6.8 Example of unseen region for point p 
117 
corner of Bs and pins. Our initial spanning graph G has Vs as its vertices and all blockage 
boundary segments of Bs as its edges. Then, we incrementally construct the graph by 
applying a sweep line based edge connection among vertices in Vs. For any point v € Vs, 
we only connect at most one visible point v in each neiboring search region to v, where 
v E Vs and v is the closest point to v in the corresponding neiboring search region. 
Based on an 0(n log n) sweep line algorithm proposed in [72], we propose a revised 
sweep line algorithm to construct spanning graph in RSMTRB as follows. Our con­
struction consists of four passes, each pass performs edge connection for a pair of search 
regions for Vs. For sake of simplifying the exposition, we will only present the detail of 
pass 1 which performs edge connection for search region R2 and Re of corners. The rest 
of regions are similar to R2 and the discussion can be easily extended to handle them. 
First, all points in Vs are sorted by their x coordinates in non-decreasing order. Note 
that for each blockage, lower left corner shares the same x coordinate as upperleft corner. 
Similarity, lowerright corner shares the same x coordinate as upperright corner. Thus, 
we can sort left boundary and right boundary segment of each blockage instead of point 
by point to speed up sorting process. Note that a fundamental operation of sweep line 
algorithm is to keep an active set A of if such that all points in A are visible to v. We thus 
build an active set A and dynamically keep A by adding and deleting points for set A. 
We start with an empty set A and check each point v in the sorted list Vss. If v is not a 
lowerleft corner of a blockage (i.e., v is a pin or any of other three corners in a blockage), 
we just add v into set A, otherwise we perform edge connection as follows. Suppose v is 
the lowerleft corner of blockage We first pick the point v which is right after v in the 
list of Vss. Note that v must be the upperleft corner point of bi due to the property of 
our sorting. Then, we check each point a in A. If point a lies in region R2 of bi, we add 
this point to another set As and if bi is not a horizontal blockage, we delete this point 
from A. After that, we find out two points q and q from As which are closest to v and 
v , respectively, in rectilinear distance. Finally, we add two edges, (v, q) and (v ,q), into 
118 
graph G. Note that q and q could be the same point in As. Before we move to the next 
point in V, we vocate set Aa. 
An example is shown in Figure 6.9. The bold segments are the edges added to graph 
G after edge connection is performed for R2 of corners. Since search region has the 
reverse sweep sequence, we can make use of the sorted list Vss to perform edge connection 
for RQ in Pass 1. Similarly, in pass 2, we perform edge connection for search region R,\ 
and Rs of blockages after sorting Vss in the non-decreasing order with their y coordinates. 
In pass 3, we perform similar edge connection for Ri and R-, of blockage and R\ and U/> 
of pin after sorting Vss in non-decreasing x + y. Similarily, in pass 4, edge connection 
is performed for search region i?3 and Rj of blockage and R2 and R4 of pin after Vss is 
sorted in non-decreasing y — x. 
To achieve 0 ( n )  running time, the active sets A  and As must be efficiently maintained 
so that searching, deletion, and insertion each can be done in O(logn) time. The span­
ning graph after four passes is shown in Figure 6.10. The algorithm of spanning graph 
construction in RSMTRB is summarized in Algorithm 1. The details of edge connection 
for search region R2 is summarized in Algorithm 2. 
6.5 RMST Construction 
After we complete the spanning graph, we apply a heuristic to construct an RMST 
based on the graph G. The heuristic consists of six steps. 
Step 1: Construct the complete undirected graph G \  = ( V \ , E i )  from G  and P .  in 
such a way that Vi = P and for every Vi, vj G Ei, the length on the edge Vi,Vj is equal to 
the length of the shortest path from 1\ to Vj. 
Step 2: Find the minimum spanning tree Tj. of G1. If there exist several minimum 
spanning trees, pick an arbitrary one. 
Step 3: Construct the subgraph Gs of G by replacing each edge in Ti by its corre-
Figure 6.9 Edge connection for region R2. 
Figure 6.10 Complete spanning graph 
120 
Algorithm 1: Spanning graph construction in RSMTRB 
Input: Vs 
sort Vs by non-decreasing x; 
perform edge connection for R% and Rq of corners; 
sort Vs by non-decreasing y; 
perform edge connection for i?4 and R$ of corners; 
sort Vs by non-decreasing x + y; 
perform edge connection for R\ and R5 
of corners and R\ and R$ of pins; 
sort Vs by non-decreasing y — x; 
perform edge connection for R3 and Ry 
of corners and R2 and R4 of pins; 
Return:spanning graph G for a net 
Algorithm 2: edge connection for R2 
Input: a sorted 1/ss with non-decreasing x. 
A = <p; 
For each v G Vss { 
if (v is a pin, lowerright or upright corner) { 
add v to A; 
else if (v is a lowerleft corner of bi) { 
if(A! = ,f){ 
= 
'P'i 
if (bi is not a horizontal blockage) 
delete points from A which are located in R2 of v; 
add the points located in R2 of v to As; 
if (As! = (p){ 
find point q and q from As which are closest to 
v and v , respectively; 
add new edge ( v ,  q )  and ( v  ,  q  )  to graph G; 
} 
} 
} 
} 
121 
spending shortest path in G. If there are several shortest paths, pick an arbitrary one. 
Step 4. Find minimum spanning tree Ts of Gs. If there are several minimum spanning 
trees, pick an arbitrary one. 
Step 5. Construct a Steiner tree from Ts by deleting edges in Ts, if necessary, so 
that all the leaves in T& are pins. 
Step 6. Rectilinearize Th to obtain a rectilinear steiner tree Tr. The above six steps 
are illustrated by an example in Figure 6.11. 
6.6 Experimental Results 
We implemented the spanning graph based RMSTRB algorithm in C++ language. 
We compile and run the program on Intel Pentium 4 machine with 2.80GHz frequency and 
1.5GB RAM. We run 3 industrial testcases and compare our approach with the traditional 
sequential approach illustrated in Figure 6.1. The statistic data of testcases are listed in 
Table 6.1. The total wirelength and runtime comparison is given in Table 6.2. 
The experimental results show that our spanning graph based approach can reduce 
14.31% (on average) wirelength of RMST, comparing to sequential approach. And the 
runtime only increases 47.401%, on average. 
Table 6.1 Statistics of testcases 
Test cases 1 2 3 
number of instances 437444 277356 450367 
number of I/O pins 774 1453 1276 
number of nets 477380 285556 451250 
2 pin net (%) 56.3 70.3 75.0 
3-10 pin net (%) 21.2 16.3 16.8 
10-50 pin net (%) 17.6 10.1 6.3 
50-100 pin net(%) 5.1 3.5 1.6 
> 100 pin net(%) 0.1 0.8 0.3 
number of blockages 136 539 487 
122 
Figure 6.11 Six steps to construct RSMT 
123 
Table 6.2 Experimental results. 
Total wirelength (fj,m) Total runtime (s) 
Testcase Sequential Ours decreased (%) Sequential Ours increased (%) 
1 2854122 2400499 15.89 18.671 27.339 42.465 
2 2452867 2051254 16.37 14.998 20.668 37.805 
3 1022723 913584 10.67 13.576 21.984 61933 
On average 14.31 47.401 
6.7 Conclusion and Discussion 
In this chapter, we propose an efficient and effective approach to construct rectilinear 
steiner minimum tree with rectilinear blockages. The connection graph we used in this 
approach is called spanning graph which only contains 0(n) edges and vertices. An 
0(nlogn) time algorithm is proposed to construct spanning graph for RSMTRB. The 
experimental results shows that this approach can achieve a solution with significantly 
reduced wirelength. The total runtime increased is negligible in the whole design flow. 
A possible extension of this approach is to perform congestion and timing global 
routing. We can handle the routing congestion by modifying the cost of each edge in 
spanning graph and deal with timing by controlling search direction in graph searching. 
124 
BIBLIOGRAPHY 
[1] U. Sigmund, N. Steinhaus, and T. Ungerer. On performance, transistor count, 
and chip space assessment of multimedia enhanced simultaneous multithreaded 
processors. Workshop on multi-threaded Execution, Architecture and Compilation 
(MTEAC-4)- Montery, CA, Dec. 10, 2000. 
[2] S. Sait and H. Youssef. VLSI Physical Design Automation. IEEE Press, NY, 1997. 
[3] N. Sherwani. Algorithms for VLSI Physical Design Automation. Kluwer Academic, 
3rd edition, 1999. 
[4] A. Kahng. Is Classical floorplanning harmful? Intl. Symp. on Physical Design, pages 
207-213, 2000. 
[5] D. F. Wong, H. W. Leong, and C. L. Liu. Simulated Annealing for VLSI Design. 
Kluwer Academic, 1988. 
[6] D. F. Wong and C. L. Liu. A new algorithm for floorplan design. Design Automation 
Conf., pages 101-107, 1986. 
[7] X. Hong, G. Huang, Y. Cai, J. Gu, S. Dong, C. K. Cheng, and J. Gu. Corner Block 
List: An effective and efficient topological representation of non-slicing floorplan. 
Intl. Conf. on Computer-Aided Design, pages 8-12, 2000. 
[8] T. Ohtsuki, N. Suzigama, and H. Hawanishi. An optimization technique for inte­
grated circuit layout design, pages 67-68, 1970. 
125 
[9] M. Otten. Automatic floorplan design. Design Automation Conf., pages 261-267, 
1982. 
[10] D.E. Knuth. The Art of Computer Programming, Fundamental Algorithms, volume 1. 
Addison-Wesley, 2nd edition, 1973. 
[11] K. Sakanushi and Y. Kajitani. The quarter-state sequence (Q-sequence) to represent 
the floorplan and applications to layout optimization. Asian Pacific Conf. Circuit 
System, pages 829-832, 2000. 
[12] H. Onodera, Y. Taniquchi, and K. Tamaru. Branch-and-bound placement for building 
block layout. Design Automation Conf., pages 433-439, 1991. 
[13] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani. Rectangular-packing-based 
module placement. Intl. Conf on Computer-Aided Design, pages 472-479, 1995. 
[14] S. Nakatake, K. Fujiyoshi, H. Murata, and Y. Kajitani. Module placement on BSG-
structure and IC layout applications. Intl. Conf. on Computer-Aided Design, pages 
484-491,1996. 
[15] P. N. Guo, G. K. Cheng, and T. Yoshimura. An O-Tree representation of non-slicing 
floorplan and its application. Design Automation Conf, pages 268-273, 1999. 
[16] Y. C. Chang, Y. W. Chang, G. M. Wu, and S. W. Wu. B*-trees: An new represen­
tation for non-slicing floorplans. Design Automation Conf., pages 458-463, 2000. 
[17] S. Zhou, S. Dong, X. Hong, Y. Cai, and C.-K. Cheng. ECBL: An Extended Corner 
Block List with solution space including optimum placement. Intl. Symp. on Physical 
Design, pages 156-161, 2001. 
[18] C. Zhuang, K. Sakanushi, L. Jin, and Y. Kajitani. An enhanced Q-sequence aug­
mented with empty-room-insertion and parenthesis trees. Conf. on Design Automa­
tion and Test in Europe, pages 61-68, 2002. 
126 
[19] F.Y. Young, C. Chu, and Z.C. Shen. Twin Binary Sequences: A non-redundant 
representation for general non-slicing floorplan. Intl. Symp. on Physical Design, pages 
196-201, 2002. 
[20] B. Yao, H.Y. Chen, C.-K. Cheng, and R. Graham. Revisiting floorplan representa­
tions. Intl. Symp. on Physical Design, pages 138-143, 2001. 
[21] Y.C. Chang, Y.W. Chang, G.M. Wu and S.W. Wu. B*-Trees: A New Representation 
for Non-Slicing Flo or plans. Design Automation Conference, pages 458-463, 2000. 
[22] T. H. Gormen, C. E. Leiserson and R. L. Rivest. Introduction to Algorithms. The 
MIT Press, pages 265-266. 1990. 
[23] S. Dulucq and O. Guibert. Baxter Permutations. Discrete Mathematics, 180:143-156, 
1998. 
[24] P.N. Guo, C.K. Cheng and T. Yoshimura. An O-Tree Representation of Non-Slicing 
Floorplan and Its Applications. Design Automation Conference, pages 268-273, 1999. 
[25] X. Hong, G. Huang, Y. Cai, J. Gu, S. Dong, C. K. Cheng and J. Gu. Corner Block 
List: An Effective and Efficient Topological Representation of Non-Slicing Floorplan. 
Intl. Conf. on Computer-Aided Design, pages 8-12, 2000. 
[26] J. M. Lin and Y. W. Chang. TCG: A Transitive Closure Graph-Based Representation 
for Non- Slicing Floorplans. Design Automation Conference, pages 764-769, 2001. 
[27] H. Murata, K. Fujiyoushi, S. Nakatake and Y. Kajitani. Rectangle-Packing-Based 
Module Placement. Intl. Conf. on Computer-Aided Design, pages 472-479, 1995. 
[28] S. Nakatake, K. Fujiyoushi, H. Murata and Y. Kajitani. Module Placement on BSG-
Structure and IC Layout Applications. Intl. Conf. on Computer-Aided Design, pages 
484-491,1996. 
127 
[29] Y. X. Pang, C. K. Cheng and T. Yoshimura. An Enhanced Perturbing Algorithm for 
Floorplan Design Using the O-tree Representation. Intl. Symp. on Physical Design, 
pages 168-173, 2000. 
[30] K. Sakanushi and Y. Kajitani. The Quarter-State Sequence (Q-Sequence) to Rep­
resent the Floorplan and Applications to Layout Optimization. Asian Pacific Conf. 
Circuits and Systems, pages 829-832, 2000. 
[31] X. P. Tang and D. F. Wong. FAST-SP: A Fast Algorithm for Block Placement based 
on Sequence Pair. Asia South Pacific Design Automation Conference, pages 521-526, 
2001. 
[32] D.F. Wong and C.L. Liu. A New Algorithm for Floorplan Design. Design Automation 
Conference, pages 101-107, 1986. 
[33] B. Yao, H. Chen, C.K. Cheng and R. Graham. Revisiting Floorplan Representations. 
Intl. Symp. on Physical Design, pages 138-143, 2001. 
[34] S. Zhou, S. Dong, X. Hong, Y. Cai and C.-K. Cheng. ECBL: An Extended Cor­
ner Block List with Solution Space Including Optimum Placement. Intl. Symp. on 
Physical Design, pages 156-161, 2001. 
[35] C. Zhuang, K. Sakanushi, L. Jin and Y. Kajitani. An Enhanced Q-Sequence Aug­
mented with Empty-Room-Insertion and Parenthesis Trees. Design, Automation and 
Test in Europe, pages 61-68, 2002. 
[36] N. Sherwani. Algorithms for VLSI Physical Design Automation, 3rd Edition. Pub­
lished by Kluwer Academic Publishers, 1999. 
[37] H. M. Chen, H. Zhou, F. Y. Young, D. F. Wong, H. H. Yang, and N. Sherwani. 
Integrated floorplanning and interconnect planning. Intl. Conf. on Computer-Aided 
Design, pages 354-357,1999. 
128 
[38] A. Ranjan, K. Bazargan and M. Sarrafzadeh. Fast hierarchical floorplanning with 
congestion and timing control. Intl. Conf. of Computer Design, pages 357-362, 2000. 
[39] C. K. Cheng, P. Sarkar and V. Sundararaman. Routability-driven repeater block 
planning for interconnect-centric floorplanning. Intl. Symp. Physical Design, page 
186-191. 2000. 
[40] C. W. Sham, W. C. Wong and F. Y. Young. Congestion Estimation with buffer 
planning in floorplan design. Design, Automation and Test in Europe Conf. and 
Exhibition, pages 696-701, 2002. 
[41] J. Lou , S. Krishnamoorthy and H. S. Sheng. Estimating routing congestion using 
probabilistic analysis. Intl. Symp. on Physical Design, pages 112-117, 2001. 
[42] M. Wang and M. Sarrafzadeh. On the behavior of congestion minimization during 
placement. Intl. ASIC Conf. and Exhibit, pages 145-150, 1999. 
[43] C. L. E. Cheng. RIS A: Accurate and efficient placement routability modeling. Intl. 
Conf. Computer-Aided Design, pages 690-697, 1994. 
[44] W. Hou, H. Yu, X. Hong, Y. Cai, W. Wu, J. Gu and W. H. Kao. A new congestion-
driven placement algorithm based on cell inflation. Asia South Pacific Design Au­
tomation, pages 605-608, 2001. 
[45] X. Yang, R. Kastner, and M. Sarrafzadeh. Congestion reduction during top-down 
placement. Int. Conf. on Computer-Aided Design, pages 573-576, 2001. 
[46] U. Brenner and A. Rohe An efficient congestion-driven placement framework. IEEE 
Transaction of Computer-Aided Design, Volume 22, pages 387-394, 2003. 
[47] S. T. W. Lai, F. Y. Young and C. N. Chu. A new and efficient congestion evalua­
tion model in floorplanning: Wire density control with Twin Binary Trees. Design, 
Automation and Test in Europe Conf. and Exhibition, pages 856-861, 2003. 
129 
[48] S. M. Sait and H. Youssef. VLSI Physical Design Automation, page 117. Published 
by IEEE Press, 1995. 
[49] F. Y. Young, C. N. Chu and Z. C. Shen. Twin binary sequences: A non-redundant 
representation for general non-slicing floorplan. Intl. Sym. on Physical Design, pages 
457-469, 2002. 
[50] F. Shahrokhi and D. W. Matula. The maximum concurrent flow problem. Journal 
of ACM, page 318-334, 1990. 
[51] T. H. Gormen, C. E. Leiserson, R. L, Rivest and C. Stein. Introduction to Algorithms, 
2nd Edition. Published by MIT Press, 2001. 
[52] A. E. Caldwell, A. B. Kahng and I. L. Markov. Can Recursive Bisection Alone 
Produce Routable Placements? Design Automation Conf, pages 153-158, 2000. 
[53] P. Hung and M. J. Flymn. Stochastic congestion model for VLSI systems. Standford 
Univ., CSL-TR-97-737. 
[54] P. N. Parakh, R. B. Brown, and K. A. Sakallah. Congestion driven quadratic place­
ment. Design Automation Conf, pages 275-278, 1998. 
[55] R. S. Tsay, S. C. Chang, and J. Thorvaldson. Early wireability checking and 2-D 
congestion-driven circuit placement. Intl. ASIC Conf. and Exhibit, pages 50-53,1992. 
[56] S. Mayrhofer and U. Lauther. Congestion-driven placement using a new multiparti-
tioning heuristic. Intl. Conf. on Computer-Aided Design, pages 332-335, 1990 
[57] M. Wang and M. Sarrafzadeh. On the behavior of congestion minimization during 
placement. Intl. ASIC Conf. and Exhibit, pages 145-150, 1999. 
[58] M. Wang and M. Sarrafzadeh. Modeling and minimization of routing congestion. 
Asian and South Pacific Design Automation Conf. , pages 185-190, 2000. 
130 
[59] M. Wang, X. Yang, K. Eguro, and M. Sarrafzadeh. Multi-Center Congestion Esti­
mation and Minimization during Placement. Intl. Symp. on Physical Design, pages 
147-152, 2000. 
[60] Z. Shen and C. N. Chu. Accurate and Efficient Flow based Congestion Estimation 
in Floorplanning. Asian and South Pacific Design Automation Conf. , 2004. 
[61] http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/RSMT/RMST. From VLSI CAD 
lab at Computer Science Department of University of California at San Diego. Date 
retrieved: May 15, 2004. 
[62] http://vlsicad.eecs.umich.edu/BK/ISPD02bench/. From VLSI CAD lab at Electri­
cal Engineering and Computer Science Department of University of Michigan. Date 
retrieved: May 15, 2004. 
[63] N. Viswanathan and C. N. Chu. FastPlace: Efficient Analytical Placement using Cell 
shifting, Iterative Local Refinement and a Hybrid Net Model. Intl. Sym. on Physical 
Design, 2004. 
[64] http://www.cs.ucla.edu/ kastner/labyrinth/. From VLSI CAD lab at Computer Sci­
ence Department of University of California at Los Angeles. Date retrieved: May 15, 
2004. 
[65] W. Shi and C. Su. The rectilinear Steiner arborescence problem is NP-complete. 
ACM-SIAM Symp. on Discrete Algorithms, pages 780-786, Jan. 2000. 
[66] C. Y. Lee. An algorithm for path connections and its application. IRE Transaction 
on Electronic Computers,V.EC-10, pages 346-365, 1961. 
[67] Y. Yang, Q. Zhu, T. Jing, X. Hong and Y. Wang. Rectilinear Steiner minimal tree 
among obstacles. ASIC 5th Intl. Conf., Vol. 1, 21-24 pages 348-351, Oct. 2003. 
131 
[68] T. Lengauer. combinatorial Algorithms for Integrated Circuit Layout. Wiley, England, 
1990. 
[69] J. Ganley and J. P. Cohoon. Routing a Multi-Terminal Critical Net: Steiner Tree 
Construction in the Presence of Obstacles. Intl. Symp. on Circuits and Systems, Vol. 
1, pages 113-116, 1994. 
[70] K. L. Clarkson, S. Kapoor and P. M. Vaidya. Rectilinear shortest paths through 
polygonal obstacles in 0(n log2 n) time. ACM Symp. on Computational Geometry, 
pages 251-257,1987. 
[71] K. L. Clarkson, S. Kapoor and P. M. Vaidya. Rectilinear shortest paths through 
polygonal obstacles in 0(nlog3/2 n) time. Unpublished manuscript. 
[72] H. Zhou, N. Shenoy, and W. Nicholls. Efficient spanning tree construction without 
delaney triangulation. Information Processing Letter, 81(5), 2002. 
132 
ACKNOWLEDGEMENTS 
I would like to take this opportunity to express my thanks to those who helped me 
with various aspects of conducting research and the writing of this dissertation. First of 
all, I would like to thank my God for His bountiful grace to give me strength and wisdom 
to finish this work. I am also greatly indebted to my thesis advisor Professor Chris 
Chu for introducing me to the area of Physical Design. It is to his constant motivation 
and encouragement that I owe this work. I really appreciate his guidance and support 
throughout this research and the writing of this dissertation. 
I would also like to thank my committee members for their efforts and contributions 
to this work: Dr. Randall Geiger, Dr. Degang Chen, Dr. Akhilesh Tyagi and Dr. Lu 
Ruan. 
I thank all my previous members: Sampath Dechu, Nat ara j Viswanathan, Rejesh and 
Min Pan in VLSI CAD group of ECE department of Iowa State University. 
I also thank my co-workers in Cadence for their rich input and useful feedback for this 
dissertation, especially thank Ying Meng Li, John Shu, Robert Tien for their contribution 
to this dissertation. 
Last, while not least, I would like to thank my parents, my parents-in-law, my dear 
wife, Jane and baby Samuel, for their strong support during the past months. 
