New Strategies for High Performance VLSI Physical Design by Xiang, Hua
NEW STRATEGIES FOR HIGH PERFORMANCE VLSI PHYSICAL DESIGN
BY
HUA XIANG
B.S., Peking University, 1997
M.S., Peking University, 2000
M.S., University of Texas at Austin, 2002
DISSERTATION
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in Computer Science
in the Graduate College of the
University of Illinois at Urbana-Champaign, 2004
Urbana, Illinois
c©2004 by Hua Xiang. All rights reserved
NEW STRATEGIES FOR HIGH PERFORMANCE VLSI PHYSICAL DESIGN
Hua Xiang, Ph.D.
Department of Computer Science
University of Illinois at Urbana-Champaign, 2004
Martin D. F. Wong, PhD. Adviser
Physical design plays an important role in connecting front-end design and back-end
design in chip development. In this thesis, we solve several important problems in physical
design of VLSI circuits.
Chapter 2 addresses a floorplan problem that considers floorplanning and bus planning
simultaneously. We propose an efficient evaluation algorithm to transform a sequence pair
to a floorplan with buses inserted. Then simulated annealing is used to search for an optimal
or near optimal solution.
Chapter 3 addresses a wire planning problem with the bounded over-the-block con-
straint. Two exact polynomial-time algorithms are presented, and both algorithms guaran-
tee to find an optimal routing solution for a two-pin net as long as one exists.
Chapters 4 and 5 are based on a min-cost max-flow algorithm. In chapter 4, we present
the first polynomial-time algorithm for simultaneous pin assignment and routing for all
two-pin nets between one source block and all other blocks. In chapter 5, we propose a
polynomial-time algorithm for integrated pin assignment and buffer insertion.
Chapters 6 and 7 address ECO problems. Chapter 6 presents two algorithms to resolve
overlaps between power rails and signal wires which are introduced by power rail redesign.
In chapter 7, we propose an algorithm to eliminate capacitive crosstalk violations.
iii
To Father and Mother
iv
ACKNOWLEDGMENTS
I would like to thank my adviser, Professor Martin D. F. Wong, for his constant guidance
and invaluable support throughout my graduate study. I am greatly benefited from his deep
insight in technical problems and valuable advice on my research.
I am also grateful to the other members of my committee — Professors Janak Patel,
Lenny Pitt, and Josep Torrellas — for their interest in my work.
Also I would like to acknowledge my colleagues for their cooperation and many thought-
ful technical discussions. I am grateful to all the wonderful friends I made during my grad-
uate study. Their friendship made my graduate study much more productive and enjoyable.
Finally, I would like to express my special thanks to my parents for their love, encour-
agement, and understanding throughout my graduate study.
v
TABLE OF CONTENTS
LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : x
LIST OF TABLES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xv
CHAPTER 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : 1
CHAPTER 2 BUS-DRIVEN FLOORPLANNING : : : : : : : : : : : : : : : 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Bus Ordering via Sequence Pair . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 A necessary condition for one bus . . . . . . . . . . . . . . . . . . 12
2.4.2 Bus ordering between two buses . . . . . . . . . . . . . . . . . . . 13
2.4.3 Multiple bus ordering . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Evaluation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5.1 Feasible Bus Checking Orientation . . . . . . . . . . . . . . . . . 25
2.5.2 Bus Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5.3 Modified LCS Computation . . . . . . . . . . . . . . . . . . . . . 25
2.6 BDF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.1 Perturbation (Move) . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.2 Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Soft Block Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
vi
CHAPTER 3 WIRE PLANNING WITH BOUNDED OVER-THE-BLOCK WIRE
CONSTRAINTS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 36
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 WP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 WP-Path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 WP-Split algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
CHAPTER 4 SIMULTANEOUS PIN ASSIGNMENT AND ROUTING : : : 54
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.1 ECO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.2 Improvement on any given solution . . . . . . . . . . . . . . . . . 68
4.4.3 Multiple-pin nets . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
CHAPTER 5 INTEGRATED PIN ASSIGNMENT AND BUFFER PLANNING 79
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Pin Assignment and Buffer Planning for One Source Block (PBO) . . . . . 82
5.3 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4 Pin Assignment and Buffer Planning (PB) . . . . . . . . . . . . . . . . . . 89
5.5 Improvement with Node Clustering . . . . . . . . . . . . . . . . . . . . . 94
5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
vii
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
CHAPTER 6 ECO ALGORITHMS FOR REMOVING OVERLAPS BETWEEN
POWER RAILS AND SIGNAL WIRES : : : : : : : : : : : : : : : : : : : 99
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 PSO (Power rail - Signal wire Overlap) Problem . . . . . . . . . . . . . . . 105
6.3 FP-Range (Fixed-Pin-decided Range) . . . . . . . . . . . . . . . . . . . . 107
6.4 Consistency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.5 PSO-H Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.6 PSO-G Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
CHAPTER 7 AN ECO ALGORITHM FOR ELIMINATING CROSSTALK
VIOLATIONS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 131
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.2 Crosstalk Violation Elimination . . . . . . . . . . . . . . . . . . . . . . . 133
7.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.1 FP-Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.2 Crosstalk model . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.4 CVE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.4.1 FCVE algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.4.2 SCVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.5.1 Node clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.5.2 Edge omitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
viii
CHAPTER 8 CONCLUSION : : : : : : : : : : : : : : : : : : : : : : : : : 154
8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 158
APPENDIX : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 165
VITA : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 168
ix
LIST OF FIGURES
FIGURE PAGE
1.1 The design flow of a chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Two floorplans have the same chip size. (a) Two buses u (A, C) and v (B, E,
H) are assigned. (b) Neither of the buses can be assigned. . . . . . . . . . . . 7
2.2 A feasible horizontal bus u =< H; t; fA; B; Cg >. ymax = yc+hc, ymin = yb,
and ymax − ymin  t. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 A necessary condition for one bus. (a) The sequence pair must be (::: A ::: D ::: B
::: C ::: ; ::: A ::: D ::: B ::: C :::) to fit in a horizontal bus. (b) The sequence
pair must be ( ::: B ::: C ::: A :::; ::: A ::: C ::: B ::: ) to fit in a vertical bus. . . 13
2.4 Cases of relative positions of two horizontal buses. . . . . . . . . . . . . . . . 16
2.5 Two kinds of cycles in bus ordering constraint graphs. (a) Two buses are cross-
ing. (b) The bus ordering constraint graph corresponding to (a). (c) Three
buses are crossing. (d) The bus ordering constraint graph corresponding to (c). 18
2.6 Independent set problem and node-deleting problem. (a) An instance of inde-
pendent set problem (ISP). (b) Gd is a horizontal bus ordering constraint graph. 20
2.7 Node Deleting Algorithm. (a) An instance of bus ordering constraint graph G.
(b) Nodes whose in-degree or out-degree is zero are removed from G. (c) Node
c is deleted from G in order to break cycles. (d) The residual acyclic graph of
G after deleting c and i. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
x
2.8 Insert two horizontal buses to the floorplan represented by (A D E B C F G,
E D A F B C G). (a) One horizontal bus fA; B; Cg is assigned. (b) In order
to insert another bus fB; E; Gg, blocks A and D have to move up and this
makes the bus fA; B; Cg changed, too. . . . . . . . . . . . . . . . . . . . . . 26
2.9 (a) Two buses overlap due to basic alignment adjustment. (b) Assignment of
two buses without overlap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.10 (Case A) Two buses share blocks A and B. Bus Overlap may happen. (Case
B) The blocks of two buses appear interlaced along x-axis. Bus Overlap may
happen. (Case C) Two buses have no overlaps along x-axis. Bus Overlap is
impossible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.11 Soft block adjustment. (a) A BDF solution. Block E is on an LCS path. (b)
The new BDF solution after changing the shape of block E. . . . . . . . . . . 31
2.12 The result packing of ami49-2 after soft block adjustment. There are 49 blocks
and 12 buses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.13 An optimal packing of grid7. There are 49 blocks and 14 buses. . . . . . . . . 35
3.1 The routing illustrated by thin lines is not valid; while the routing shown by
wide lines is a feasible solution. . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 (a) A routing-block Bi is divided into 3  3 subblocks, and the interconnect
bound is 3. (b) A graph denotes all valid OB-wires within Bi. . . . . . . . . . 41
3.3 (a) The two nodes rji and rki are within the same routing-block Bi. (b) Each
node is split into an in-node and an out-node. . . . . . . . . . . . . . . . . . . 42
3.4 The solid lines indicate a routing solution between blocks B1 and B6. . . . . . 44
3.5 The corresponding path graph Gp. The wide lines illustrate a shortest path
from u11 to u16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 (a) r1i , r2i , r3i , and r4i are subblocks of a routing-block Bi. (b) Each subblock is
represented by a node array. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7 The corresponding split graph for Figure 3.4. The wide lines illustrate a short-
est path from v11[1] to v16[1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
xi
3.8 The comparison of WP-Path and WP-Split on the relationship between running
time and the number of nets. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 (a) The two-step approach fails to route all nets. (b) The optimal solution of
pin assignment and routing by our approach. . . . . . . . . . . . . . . . . . . . 54
4.2 (a) The net-by-net approach fails to route all nets. (b) The optimal solution of
pin assignment and routing by our approach. . . . . . . . . . . . . . . . . . . . 56
4.3 A routing grid graph for two layers. . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 (a) A PAR problem in detailed routing. (b) The corresponding network graph. 61
4.5 (a) A solution in a flow network. A flow f1 goes from p to q; and another flow
f2 goes from q to p. (b) Another solution with less cost. . . . . . . . . . . . . . 63
4.6 Node splitting for capacitated nodes. The capacity of the new edge is U(r) and
its cost is 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7 (a) A flow f in the network in Figure 4.4 (b), jf j = 3. (b) The corresponding
solution of pin assignment and routing for the 3 nets in the problem of Figure
4.4 (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.8 (a) The initial pin assignment and routing solution. (b) The solution obtained
by applying PAR-by-Flow on Block A satisfying the new requirement. . . . . 67
4.9 Illustration of improvement on a given solution. (a) Illustration of net connec-
tions among Block A, B, C and D. (b) Initial net-by-net solution based on
the min-cost path approach. 5 nets (2 between B and C; 3 between C and
D) are not routed. The total cost is 51. (c) The solution after applying PAR-
by-Flow on Block A. Cost is reduced by 15. (d) The solution after applying
PAR-by-Flow on Block B. Two more nets between B and C are routed. (e)
The solution after applying PAR-by-Flow on Block C. All nets are routed
(3 more nets) with less cost (from 51 to 50). (f) The solution after applying
PAR-by-Flow on Block D. Nothing is changed. . . . . . . . . . . . . . . . . . 70
xii
4.10 Illustration of improvement for a pin assignment and routing of two/multiple-
pin nets. (a) A one-layer pin-assignment and routing solution. (b) When Block
A is selected to be the source block, all nets connecting to A are removed
to reroute. The routing e between B and C should not be changed. (c) The
corresponding flow network graph. (d) A flow f (jf j = 4) in the network. . . . 72
4.11 An improved solution of pin assignment and routing of two/multiple-pin nets. . 73
4.12 Two-layer pin assignment and routing for X18. (a) Net-by-net solution. (b)
The solution obtained by applying our method on (a). . . . . . . . . . . . . . 75
5.1 (a) Three nets use 3 buffers and the total wire length is 19. (b) An optimal
solution with 1 buffer and wire length 14. . . . . . . . . . . . . . . . . . . . . 80
5.2 (a) A PBO problem with 3 macro blocks and 3 buffer blocks. (b) The corre-
sponding flow network graph. . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.3 Node splitting for capacitated nodes. The new edge has capacity U(v) and cost
C(v). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 (a) A flow f in the network in Figure 5.2 (b), jf j = 3. (b) The corresponding
solution of pin assignment and buffer planning to the PBO problem of Figure
5.2 (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 (a) A PB problem with 3 macro blocks and 3 buffer blocks. (b) The corre-
sponding flow network when b1 is the source block. . . . . . . . . . . . . . . . 92
5.6 The corresponding flow network when b2 is the source block. . . . . . . . . . . 93
5.7 The corresponding flow network of the PB problem in Figure 5.5(b) using the
node clustering method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.8 (a) A flow f , jf j = 3 flows through a supernode. (b) The corresponding net-
work and a flow solution. (c) Deriving connections for original pin nodes. . . . 95
6.1 (a) Some horizontal signal wire segments on M5 overlap with P1 and P2. (b)
A feasible PSO solution. (c) A solution with violations. . . . . . . . . . . . . 101
6.2 Wire separation requirement illustration. . . . . . . . . . . . . . . . . . . . . 106
xiii
6.3 (a) A PSO problem. (b) Overlaps: vertical segments a0 and b0 on M4; and
horizontal segments c and d on M5. . . . . . . . . . . . . . . . . . . . . . . . 108
6.4 FP-Range illustration. The tiny squares are fixed pins. . . . . . . . . . . . . . . 109
6.5 Three cases of vertical overlaps. . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.6 Illustration of the FP-Range calculation when the width is taken into consider-
ation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.7 A routing solution of signal wires on the top layer . . . . . . . . . . . . . . . 118
6.8 (a) Full connections of adjacent segments. (b) Consistency graph. . . . . . . . 118
6.9 Illustration of Onodes/Rnodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.1 (a) A routing solution with crosstalk violations. (b) A routing solution with
overlap violations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.2 Segments A and C have capacitive crosstalk; while the crosstalk between seg-
ments B and C is zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.3 (a) B, C, and D are three children of A. The position of A is fixed. (b) B is
first selected and put to its highest available position. (c) A solution according
to our approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.4 (a) A CVEP problem. There are 4 signal wire segments A1, A2, A3, and A4,
and 1 power rail P . (b) The consistency graph is a path. . . . . . . . . . . . . . 141
7.5 (a) A CVEP problem. (b) FSP graph G of the CVEP problem. (c) SP graph G
of the CVEP problem. (d) A feasible solution to the CVEP problem. . . . . . . 143
7.6 (a) Ai is a wire segment and it has 12 available positions. (b) Every three nodes
are clustered as a “supernode”. . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.7 (a) FSP graph of a CVEP problem. p is a feasible position of Ai−1. (b) SP
graph of the CVEP problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.8 (Case 1) u is the closest feasible position to yi. Only one edge is needed. (Case
2) yi is the only feasible position in (Bu; 2yi−u). Two edges are added. (Case
3) There are two feasible positions in (2yi − u; Bl). Three edges are added. . . 150
xiv
LIST OF TABLES
TABLE PAGE
2.1 Test set 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Test sets 2 and 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1 Algorithm comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Test results of WP-Path and WP-Split algorithms. . . . . . . . . . . . . . . . . 52
4.1 Average results of 10 times for detailed routing test files. All nets are routed
after refinement by RepIMProve-by-PAR. . . . . . . . . . . . . . . . . . . . . 76
4.2 Average results of 10 times for global routing test files. All nets are routed after
refinement by RepIMProve-by-PAR. . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 Average results of PB-Flow for 5 times. All nets are found using PB-Flow
algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.1 Average results of PSO-H and PSO-G for 5 times. . . . . . . . . . . . . . . . 129
7.1 Test files of CVE problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2 Test results of CVE problem. . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.3 Optimization for test file N3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
xv
CHAPTER 1
INTRODUCTION
As the very large scale integration (VLSI) technology marches toward ultradeep sub-
micron, designs are becoming increasingly complex with millions of layout objects on a
monolithic chip. As a result, the chip design cycle time becomes longer. However, due to
the rapid development of current technology and tight marketing schedule, the chip design
time has to be short enough to satisfy time-to-market considerations. Therefore, sophisti-
cated computer-aided design (CAD) tools and methodologies are badly needed, and they
are widely used to facilitate chip development.
In general, the top-down design flow of a device can be abstracted as the following
steps as illustrated in Figure 1.1 [1]. A design starts from specifications that describe the
behavior of the target chip. Next, architectural design defines larger “circuit modules”
such as arithmetic units, memory units, etc. Then the defined architecture is mapped to
the logic structure in the logic design stage. Finally, physical design transforms the logic
structure from the previous stage to geometric shapes that are used in the fabrication of the
chip. Therefore, physical design plays an important role such that it connects front-end and
back-end design. Moreover, the quality of physical design tools directly determines the
performance and cost of the final product.
1
Specificaion
Architectural Design
Logical Design
Physical Design
Fabrication
Chip
Figure 1.1 The design flow of a chip.
In this thesis, we study several important problems in physical design. In Chapter 2, we
propose bus-driven floorplanning. In Chapter 3, we study the problem of wire planning with
bounded over-the-block wires. In Chapter 4, we address the simultaneous pin-assignment
and routing problem. In Chapter 5, we present an integrated approach for pin-assignment
and buffer insertion. In Chapters 6 and 7, we propose algorithms to solve some ECO
(engineering change order) problems. The results presented in these chapters can be briefly
summarized as follows.
In Chapter 2, we propose an integrated approach for floorplanning and bus planning,
i.e., bus-driven floorplanning (BDF). We are given a set of circuit blocks and bus specifi-
cations (i.e., the net list of blocks for the buses). A feasible BDF solution is a placement
of all circuit blocks such that each bus can be realized as a rectangular strip (horizontal or
vertical) going through all the blocks connected by the bus. The objective is to determine
a feasible BDF solution that minimizes the floorplan area and the total bus area. Our ap-
2
proach is based upon the sequence-pair floorplan representation. After a careful analysis
of the relationship between bus ordering and block ordering in the floorplan represented by
a sequence pair, we derive feasibility conditions on sequence pairs that give feasible BDF
solutions. We tested on three sets of test files and obtained excellent results.
In Chapter 3, we address the problem of wire planning (WP) with bounded over-the-
block wires. The constraints on over-the-block wires help the longest over-the-block wires
within a block to satisfy signal integrity without buffer inserted. We present two exact
polynomial-time algorithms to solve the WP problem. Both algorithms guarantee to find
an optimal routing solution for a two-pin net as long as one exists. One requires less
memory, while the other may take less running time when processing a large number of
nets. According to different application requirements, users can choose an appropriate
algorithm.
In Chapter 4, we present an algorithm for simultaneous pin-assignment and routing. In
previous works, the algorithms for these problems can be classified into two categories:
(1) a two-step approach where pin assignment is followed by routing, and (2) a net-by-net
approach where pin assignment and routing for a single net are performed simultaneously.
But none of the existing algorithms is “exact” in the sense that they may fail to route
all nets even though a feasible solution exists. This remains to be true even if only two-
pin nets with fixed pins between two blocks are concerned. In this chapter, we consider
the problem of two-pin net connections from one macro block to all other blocks, and
we present the first polynomial-time exact algorithm for simultaneous pin assignment and
3
routing for all two-pin nets between one block (source block) and all other blocks. In
addition to finding a feasible solution whenever one exists, it guarantees to find a pin-
assignment/routing solution with minimum cost  W +   V , where W is the total wire
length and V is the total number of vias. Our algorithm has various applications: (1) It is
suitable in ECO situations where an existing solution is modified incrementally. (2) Given
any pin assignment and routing solution obtained by any existing method, our algorithm
can be used to increase the number of routed nets and reduce the routing cost. Furthermore,
it provides an efficient algorithm for the pin assignment and routing problem of all blocks.
The method is applicable to both global and detailed routing with arbitrary routing obstacles
on multiple layers.
In Chapter 5, we present a polynomial-time exact algorithm for integrated pin assign-
ment and buffer planning for all two-pin nets from one macro block (source block) to all
other blocks. Moreover, we can guarantee to minimize the total cost   W +   R for
any positive  and  where W is the total wire length and R is the number of buffers. By
applying this algorithm iteratively ( i.e., each time pick one block as the source block),
it provides a polynomial-time algorithm for pin assignment and buffer planning for nets
among multiple macro blocks. Experimental results demonstrate that this approach is effi-
cient and effective.
In Chapter 6, we address the PSO (power rail - signal wire overlap) problem which
removes overlaps between power rails and signal wires on the top layer of a multiple layer
routing region under certain constraints. PSO problems are frequently caused by changes
4
from power delivery system or package design. Efficient and graceful solutions to PSO
are needed due to design constraints and tight schedules during the late ECO stages. In
this chapter, we first propose two algorithms to remove the overlaps between power rails
and signal wires. Both algorithms guarantee to find a feasible solution as long as one
exists. One is faster, while the other makes effort to minimize the total deviation as well
as the maximum deviation. For a set of industrial test circuits, we were able to remove all
overlaps between power rails and signal wires with minimal wire deviation.
In Chapter 7, we address the CVE (crosstalk violation elimination) problem. Due to
the changes in a multiple layer routing design, the total capacitive crosstalk on some signal
wire segments may be larger than their allowable bounds after post-layout timing/noise
analysis. The target is to find a new routing solution without crosstalk violations under
certain constraints which help to keep the new design close to the original one. We propose
a two-stage algorithm to solve the CVE problem, and present optimization strategies to
speed up the execution. One possible application of CVE algorithm is that it can be used
to eliminate crosstalk violations in the output of PSO problem in Chapter 6.
Finally, in Chapter 8, we summarize our research and propose some directions for future
research work.
5
CHAPTER 2
BUS-DRIVEN FLOORPLANNING
2.1 Introduction
As the deep submicron technology advances, chips become more congested even though
more metal layers are used for routing. Usually a chip includes several buses. As de-
sign increases in complexity, bus routing becomes a heavy task, especially for networking
chips or data processors. Since buses have different widths and go through several module
blocks, the positions of macro blocks greatly affect bus planning. To ease bus routing and
avoid unnecessary iterations in physical design, we need to consider bus planning in early
floorplanning stage.
In this chapter, we address the problem of bus-driven floorplanning (BDF). We use top
two layers for bus planning, and buses go either horizontally or vertically on one layer in
floorplanning stage. The simple bus structure is good and efficient at planning stage, and
would facilitate bus routing in later stages. Furthermore, more complicated bus structure
can always be decomposed into several horizontal/vertical bus segments.
Informally, the problem can be described as follows. Given a set of rectangular macro
blocks and the bus specifications (i.e., the net list of blocks for the buses), find a placement
6
AB
D
u
C F
E
I
G
v
H
A
G
D
H F
E
I
B
C
u
(a) (b)
Figure 2.1 Two floorplans have the same chip size. (a) Two buses u (A, C) and v (B, E,
H) are assigned. (b) Neither of the buses can be assigned.
of all circuit blocks such that each bus can be realized as a rectangular strip (horizontal or
vertical) going through all the blocks connected by the bus. At the same time, the chip area
as well as the total bus area is minimized.
Figure 2.1 gives an example. Figure 2.1(a) and (b) are two floorplans with the same
chip size. Two buses u (A, C) and v (B, E, H) are placed in the floorplan of Figure 2.1(a).
However, neither of the buses can be assigned based on the floorplan in Figure 2.1(b) since
blocks B, E and H are not aligned, and the vertical overlap between blocks A and C is
less than the width of bus u.
In previous works, researchers have discussed some particular kinds of floorplan con-
straints related to alignment. However, these kinds of alignment constraints are not suitable
for bus-driven floorplanning. Young et al. [2] handle a kind of alignment in which mod-
ules involved in an alignment are required to be aligned by left (right/bottom/upper) side.
But this is not necessary in BDF problems. For example, the bottom sides of blocks B,
E and H are not aligned, but bus v still fits in the floorplan in Figure 2.1(a). Tang and
7
Wong [3] proposed another alignment constraint in which several blocks are aligned in a
row, abutting with each other. But blocks involved in one bus do not need to be placed
adjacent to each other. In Figure 2.1(a), A and C are not adjacent while bus u is assigned.
Liu et al. [4] discussed predefined coordinate alignment constraint in which some blocks
are to be placed along a predefined coordinate within a small region. In BDF, there are no
constraints on coordinates. Rafiq et al. [5, 6] proposed bus-based integrated floorplanning.
However, the bus defined in their works is composed of bundles of wires connecting only
two blocks. Also bus assignment is accomplished by global routing.
Most floorplan algorithms use simulated annealing to search for an optimal solution.
The implementation of the simulated annealing scheme depends on a floorplan representa-
tion where a neighbor solution is generated and examined by perturbing the representation
(called ‘move’). In this chapter, we use the sequence pair representation and analyze the
relationship between bus ordering and sequence pair representation. Then a fast evaluation
algorithm is proposed to transform a sequence pair representation to a floorplan with buses
inserted. The algorithm first derives a bus ordering based on some necessary conditions.
Then a modified longest common subsequence algorithm is applied to decide block po-
sitions as well as bus assignments. Moreover, we also develop an efficient algorithm to
handle soft modules to further improve solution quality. Experimental results on three sets
of test files (MCNC benchmarks, industry test files,and bus grid test files) demonstrate the
effectiveness and efficiency of our approach.
The rest of the chapter is organized as follows. Section 2.2 provides background infor-
8
mation on the sequence pair representation. The formal definition of the BDF problem is
given in Section 2.3. In Section 2.4, we analyze the relationship between bus ordering and
sequence pair representation. Then a fast evaluation algorithm is proposed to transform
a sequence pair to a BDF solution in Section 2.5. In Section 2.6, a simulated annealing
BDF algorithm is presented. Finally, we address how to handle soft blocks to improve so-
lution quality in Section 2.7. Experimental results are given in Section 2.8, and Section 2.9
concludes the paper.
2.2 Preliminary
A sequence pair is a pair of sequences of n elements representing a list of n blocks. In gen-
eral, a sequence pair imposes the relationship between any two blocks a and b as follows:
(i) If a is ahead of b in both sequences, a is to the left of b in the floorplan.
(ii) If a is ahead of b in the first sequence while behind b in the second sequence, a is
above b in the floorplan.
The original paper which proposed sequence pair [7] presented an algorithm to trans-
form a sequence pair to a floorplan in (n2) time. Recently, Tang et al. sped up the
evaluation algorithm to O(n logn) in [9], and later further to O(n log log n) in [8].
The coordinates of blocks and the width and height of a floorplan can be obtained
by computing longest common subsequence (LCS) in terms of the two sequences [9, 8].
Given a sequence pair (X; Y ), the width of a floorplan equals the length of the longest
9
common subsequence of X and Y where weights are blocks’ widths. Furthermore, given a
block b, let (X; Y ) = (X1bX2; Y1bY2) and LCS(X; Y ) be the length of the longest common
subsequence of (X; Y ). Then the x-coordinate of block b equals to LCS(X1; Y1) with
blocks’ widths as weights. Similarly, the height of a floorplan is determined by dealing with
the longest common subsequence of (X; Y R) where Y R is the reverse of Y and weights
are blocks’ heights. Furthermore, all the computations of blocks’ x/y coordinates can be
integrated into a single longest common subsequence computation for a sequence pair.
2.3 Problem Formulation
Suppose the routing region has multiple layers and buses can be assigned on the top two
layers. So the orientation of buses is either horizontal or vertical. The problem of bus-
driven floorplanning (BDF) can be defined as follows.
Problem 2.1 Bus-Driven Floorplanning (BDF) Given n rectangular macro blocks B =
fbiji = 1; :::; ng and m buses U = fuiji = 1; :::; mg, each bus ui has a width ti and goes
through a set of blocks Bi where Bi  B and jBij = ki. Decide the positions of macro
blocks and buses such that there is no overlap between any two blocks or between any two
horizontal (vertical) buses, and bus ui goes through all of its ki blocks. At the same time,
the chip area as well as the total bus area is minimized.
In BDF problems, buses should go through all of their related blocks. So the positions of
blocks greatly affect bus assignments. For convenience, let < g; t; fb1; :::; bkg > represent
10
 y
a
 y
c
 yb
 yc+hc
B ya+ha
 yb+hb
A C
 y
max
 y
min
>t
x
 y
u
Figure 2.2 A feasible horizontal bus u =< H; t; fA; B; Cg >. ymax = yc+hc, ymin = yb,
and ymax − ymin  t.
a bus u where g 2 fH; V g is the orientation, t is the bus width, and bi (i = 1; :::; k) are
the blocks the bus goes through. For short, a bus is just represented as fb1; :::; bkg. Also let
(xi; yi) be the lower-left corner of block bi. And the width and height of block bi are wi and
hi respectively. In the following, we give the necessary conditions of a feasible horizontal
and vertical bus respectively.
Lemma 2.1 Feasible Horizontal Bus (H-Bus) If a horizontal bus u =< H; t; fb1; :::; bkg >
is feasible, then ymax − ymin  t where ymax = minfyi + hiji = 1; 2; :::; kg and ymin =
maxfyiji = 1; 2; :::; kg.
Lemma 2.2 Feasible Vertical Bus (V-Bus) If a vertical bus u =< V; t; fb1; :::; bkg > is
feasible, then xmax − xmin  t where xmax = minfxi + wiji = 1; 2; :::; kg and xmin =
maxfxiji = 1; 2; :::; kg.
Figure 2.2 illustrates an H-bus u =< H; t; fA; B; Cg >. In order to fit in bus u, the
vertical overlap of the three blocks has to be larger than the bus width t.
11
2.4 Bus Ordering via Sequence Pair
A sequence pair always entails a packing if no constraints are given. However, when con-
straints are introduced, there may not exist a corresponding packing for some sequence
pairs.
In this section, we discuss the relationship between bus ordering and sequence pair
representation. First, a necessary condition is derived when only one bus is considered.
Then we discuss the relative positions of any two horizontal (vertical) buses imposed by a
sequence pair. Based on the analysis of the ordering of two buses, we set up a bus ordering
constraint graph and propose an algorithm to remove infeasible buses.
2.4.1 A necessary condition for one bus
Since blocks cannot overlap in a BDF solution, blocks have at most one-dimension overlap;
i.e., if the projections on x-axis of two blocks have overlap, their projections on y-axis
cannot overlap. On the other hand, if the projections on y-axis of two blocks have overlap,
their projections on x-axis cannot overlap. However, in order to fit in a bus fb1; :::; bkg, the
projections on x-axis (y-axis) of bi and bj (i; j = 1; :::; k; i 6= j) must have overlap. In other
words, the position relationship of any two related blocks has to be left-right (below-above).
Thus we have the following necessary condition.
Theorem 2.1 (Block Ordering) Given a sequence pair (X; Y ) and a bus u = fb1; :::; bkg,
if u is feasible, then the ordering of the k blocks should be either the same or reverse in the
12
A B
D
y
x
C
A
C
y
x
B
(a) (b)
Figure 2.3 A necessary condition for one bus. (a) The sequence pair must be
(::: A ::: D ::: B ::: C ::: ; ::: A ::: D ::: B ::: C :::) to fit in a horizontal bus. (b) The
sequence pair must be ( ::: B ::: C ::: A :::; ::: A ::: C ::: B ::: ) to fit in a vertical bus.
two sequences X and Y . Furthermore, if the k blocks appear in the same order in both X
and Y , the orientation of the bus is horizontal; otherwise the bus is vertical.
For convenience, this necessary condition is also called block ordering. Figure 2.3
gives two examples. The sequence pair for Figure 2.3(a) is (::: A ::: D ::: B ::: C ::: ,
::: A ::: D ::: B ::: C :::), and a horizontal bus fA; B; C; Dg can be assigned. Figure 2.3(b)
shows another example. The sequence pair is ( ::: B ::: C ::: A :::; ::: A ::: C ::: B ::: )
and the bus is a vertical one fA; B; Cg. Note that Theorem 2.1 deals with only one bus.
When multiple buses are considered, it is likely that some buses cannot be assigned for the
floorplan although each bus satisfies the necessary condition.
2.4.2 Bus ordering between two buses
The relative positions of blocks is determined by a sequence pair. Since buses go through
blocks, the ordering of buses is also influenced by the sequence pair.
Given a sequence pair (X; Y ) and two horizontal buses u = fa1; a2; :::; akg and v =
13
fb1; b2; :::; blg, denote the block set Su = fa1; a2; :::; akg, Sv = fb1; b2; :::; blg, and S =
Su[Sv. Suppose jSj = L (L  k + l since the two buses may go through the same blocks)
and (X; Y ) satisfies block ordering for the two buses. Also we assume these L blocks
appear in the sequence pair as (::: c1 ::: c2 ::: ::: cL ::: , ::: d1 ::: d2 ::: ::: dL :::) where ci 2 S
and di 2 S (i = 1; :::; L), and the subsequence pair (X 0; Y 0) = (c1 c2 ::: cL ; d1 d2 ::: dL).
For convenience, let p[ci] = i (i = 1; :::; L) which denotes the position of ci in X 0, and
q[di] = i represents the position of di in Y 0. From this subsequence pair (X 0; Y 0), we can
derive the relative positions of the two buses.
Case 1. If 8a 2 Su, p[a]  q[a], and 9a 2 Su, p[a] > q[a], then bus u is below bus v.
Suppose p[ai] > q[ai], then (X; Y ) must be (::: bj ::: ai ::: , ::: ai ::: bj :::). bj is above
ai. Since u goes through ai while v goes through bj , bus u is below v.
Figure 2.4 (Case 1) shows an example. The subsequence pair is ( D A E B F C,
A D B E C F ). p[A] = 2 and q[A] = 1; p[B] = 4 and q[B] = 3; p[C] = 6 and q[C] = 5.
So bus u = fA; B; Cg is below bus v = fD; E; Fg.
Case 2. If 8a 2 Su, p[a]  q[a], and 9a 2 Su, p[a] < q[a], then bus u is above bus v.
Figure 2.4 (Case 2) shows an example. Block B is shared by both buses. Bus u =
fA; B; Cg is above bus v = fD; B; Eg.
Case 3. If 9a 2 Su, p[a] > q[a], and 9a0 2 Su, p[a0] < q[a0], then the two buses u and
14
v cannot be assigned at the same time.
Suppose p[ai] > q[ai] and p[aj ] < q[aj ], then (X; Y ) must be (::: bI ::: ai ::: aj ::: bJ ::: ,
::: ai ::: bI ::: bJ ::: aj :::). Block bI is above ai while bJ is below aj . The positions of blocks
are illustrated in Figure 2.4 (Case 3).
In the example, the two buses are u = fA; Bg and v = fC; Dg. Then the subsequence
pair is (X 0; Y 0) is (A C D B; C A B D). For block A, p[A] = 1 and q[A] = 2 while
p[B] = 4 and q[B] = 3. In this case, the two buses cannot be assigned at the same time.
Case 4. If 8a 2 Su, p[a] = q[a], then the two buses have no firm ordering. Either bus
can be above the other.
Figure 2.4 (Case 4) illustrates an example. In this example, bus u = fA; B; Cg can be
below bus v = fD; Eg. On the other hand, u is also possible above v. Therefore, the two
buses have no bus ordering constraints.
For any two vertical buses, we can get the similar results from (X; Y R).
2.4.3 Multiple bus ordering
In a BDF solution, it is impossible that the ordering of several buses forms a cycle. For
example, bus u is above bus v, bus v is above bus w, and bus w is above bus u. This kind
of relationship cannot exist in a feasible solution.
In the above section, we have discussed bus ordering imposed by the given sequence
pair. To express the relative positions among buses, we construct bus ordering constraint
15
A B C
F
D E
y
x
u
v
D A E B F C
A D B E C F
p
q
SubSequence Pair
  (D A E B F C , A D B E C F)
Case 1
D
B
E
CA
y
x
v
u
A D B C E
D A B E C
p
q
SubSequence Pair
  (A D B C E , D A B E C)
Case 2
C B
DA
y
x
v
u A C D B
C A B D
p
q
SubSequence Pair
  (A C D B , C A B D)
Case 3
D
C
y
x
v
A
B
E
u
v A D B E C
A D B E C
p
q
SubSequence Pair
  (A D B E C , A D B E C)
Case 4
Figure 2.4 Cases of relative positions of two horizontal buses.
16
graphs for horizontal buses and vertical buses, respectively. The construction rules for a
horizontal bus ordering constraint graph are listed as follows. The graph for vertical buses
can be derived similarly.
 Each bus is represented by a node.
 If one bus u is above another bus v (Case 1 or 2), add one edge (u; v).
 If one block related to bus u is above a block related to bus v, while another block
related to u is below a block related to v (Case 3), add two edges (u; v) and (v; u).
 If two buses have no bus ordering constraint (Case 4), no edge is added.
The horizontal bus ordering constraint graph serves in two ways:
(i) Given a BDF solution, the block packing must correspond to a sequence pair. Then
the horizontal bus relationship imposed by the sequence pair can be represented by
an acyclic constraint graph.
(ii) If a constraint graph contains a cycle, then at least one bus cannot be assigned. Ac-
cording to the construction rules, there are two kinds of cycles.
(a) A cycle includes only two nodes. Then the relative position of the two cor-
responding buses must comply with Case 3, and at least one bus cannot be
assigned. Figure 2.5(a) shows an example. Two buses u = fA; Bg and v =
fC; Dg are crossing, and the subgraph is given in Figure 2.5(b).
17
C B
DA
y
x
v
u u v
(a) (b)
B1
A1
C1
B2
u
C2
A2
y
x
v w
u v
w
(c) (d)
Figure 2.5 Two kinds of cycles in bus ordering constraint graphs. (a) Two buses are
crossing. (b) The bus ordering constraint graph corresponding to (a). (c) Three buses are
crossing. (d) The bus ordering constraint graph corresponding to (c).
(b) A cycle includes at least three nodes. Figure 2.5 (c) illustrates an example.
There are three buses u = fA1; A2g, v = fB1; B2g and w = fC1; C2g. The
sequence pair is (::: A1 ::: B1 ::: B2 ::: C1 ::: C2 ::: A2 ::: , ::: B1 ::: A1 ::: C1 :::
B2 ::: A2 ::: C2 :::). From this sequence pair, we can conclude that bus u should
be above v, v should be above w, and w should be above u. However, this is
impossible in a BDF solution. Therefore, at least one bus has to be discarded.
If a bus-ordering constraint graph contains cycles, there must be some buses that cannot
be assigned. Since our target is to assign as many buses as possible, the problem becomes
how to remove minimum number of buses so that the graph is acyclic. However this prob-
18
lem is an NP-Complete problem.
For convenience, if some nodes are removed from the graph G = (V; E), then edges
connecting to/from these nodes are also removed, and the result graph is called a residual
graph. Also all nodes are indexed, and node u < v means that the index of u is less than
that of v.
Problem 2.2 Node-Deleting Problem (NDP) Given a sequence pair and a set of buses, a
horizontal (vertical) bus-ordering constraint graph can be constructed. Remove nodes from
the constraint graph so that the residual graph is acyclic. At the same time, the number of
deleted nodes is minimized.
Theorem 2.2 Node-Deleting Problem (NDP) is NP-Complete.
Proof
NDP is the optimization problem of removing minimum number of nodes from a constraint
graph so that the residual graph is acyclic. As a decision problem, we ask simply whether
there exists a node set of size k such that the residual graph is acyclic by removing these
nodes from a constraint graph.
To show that NDP 2 NP, for a given constraint graph G = (V ; E), we use the set
V 0  V as a certificate of G. Checking whether the residual graph of removing V 0 from G
is acyclic or not can be accomplished in polynomial time.
Next we prove that NDP is NP-hard by proving that independent set problem (ISP),
which is NP-complete, is polynomial-time reducible to NDP; i.e., ISP p NDP.
19
vu
w
z
e1 e2
v
u
w
z
e1
e2
(a) (b)
Figure 2.6 Independent set problem and node-deleting problem. (a) An instance of inde-
pendent set problem (ISP). (b) Gd is a horizontal bus ordering constraint graph.
Let G = (V; E) be an instance of ISP. Suppose jV j = N and jEj = M . We form a
directed graph Gd = (V; Ed) where Ed = f(i; j)j(i; j) 2 Eg [ f(j; i)j(i; j) 2 Eg; i.e.,
each edge in G is represented by a pair of edges with different directions in Gd. Obviously,
the construction of Gd from G takes O(M) running time. Figure 2.6 shows an example.
Figure 2.6(a) is an instance of independent set problem G. Figure 2.6(b) is Gd. Note that
G is an undirected graph.
Given a node subset V , we can get a residual graph Gd of Gd by removing nodes in
fV − V g. We show that V is an independent set of G if and only if the residual graph Gd
is acyclic. If V is an independent set of G, then there are no edges in Gd. Obviously, Gd is
acyclic. On the other hand, for any residual graph Gd of Gd, if it contains no cycles, then
there are no edges in Gd since edges always appear in pairs. Therefore, the nodes in Gd
also form an independent set of G.
Finally we show that Gd is a bus ordering constraint graph.
For any edge ei = (u; v) 2 E (u < v), let block sequence xi = (aui bvi cvi dui ), and block
sequence yi = (bvi aui dui cvi ), where aui , bvi , cvi and dui are macro blocks. Suppose there are
20
L independent nodes wi (i = 1; :::; L) in G which are not incident on any edge. Let block
sequence xM+i = (awM+i dwM+i) and block sequence yM+i = (awM+i dwM+i) where awM+i and
dwM+i are blocks.
We form a sequence pair (X; Y ) = (x1:::xMxM+1:::xM+L , y1:::yM yM+1:::yM+L).
Blocks in X and Y form the macro block set B. So totally there are 4M + 2L blocks.
Since there are N nodes in G, the number of buses is also N . For each bus p, the blocks
that p goes through are fzpi jzpi 2 B; i = 1; :::; (M + L); z = a; b; c; dg. Since the ordering
of blocks of each bus is always the same in both sequences, all buses are horizontal buses.
For each pair of buses u and v ( u < v ), we get the subsequence pair (X 0; Y 0) =
(x01x
0
2:::x
0
M ; y
0
1y
0
2:::y
0
M), where x0i is a subsequence of xi, and y0i is a subsequence of yi.
Blocks appearing in X 0 or Y 0 are related to either bus u or v. Furthermore, (x0i; y0i) (i =
1:::M) can be only one of the two cases.
(i) (x0i; y0i) = (xi; yi)
(ii) x0i = y0i
If 9J , (x0J ; y0J) = (xJ ; yJ), then J is unique since there is only one edge between two
nodes u and v in G. At the same time, the subsequence pair (X 0; Y 0) involves bus crossing
(Case 3 ) for bus u and v. Therefore, the bus constraint graph contains two edges (u; v) and
(v; u).
On the other hand, if 8i 2 f1; :::; (M +L)g, x0i = y0i, the two buses have no bus ordering
constraint, and there is no edge between the two nodes u and v in the constraint graph. Also
if x0i = y0i (x0i/y0i can be empty), then there is no edge between u and v in G either. Thus
21
a b
c f
d
g
e
h
i
b
c f
g
h
i
(a) (b)
b
f
g
h
i
a b
f
d
g
e
h
(c) (d)
Figure 2.7 Node Deleting Algorithm. (a) An instance of bus ordering constraint graph G.
(b) Nodes whose in-degree or out-degree is zero are removed from G. (c) Node c is deleted
from G in order to break cycles. (d) The residual acyclic graph of G after deleting c and i.
we can conclude that Gd is the horizontal bus ordering constraint graph for the constructed
sequence pair. {
Since NDP is NP-Complete, we derive a heuristic method to remove nodes from a graph
so that the residual graph is acyclic. The method is based on the following lemma.
Lemma 2.3 Given a directed graph, if the in-degree and out-degree for each node are both
nonzero, then a cycle must exist in the graph.
For any given directed graph ~G = (~V ; ~E), if the in-degree and out-degree of each node
are nonzero, a cycle can be found in the following way. Randomly select a node ~v1 as the
first node of a path. Then let the second node ~v2 be a node incident on an out-going edge
22
of ~v1. Since the out-degree of ~v1 is nonzero, an out-going edge always exists. Repeat this
process until a node appears twice along the path. Since j ~V j is finite, the loop must finish
within j ~V j+ 1 steps. And if a node appears twice on a path, it means a cycle is formed.
An algorithm is used to remove nodes to break cycles in a bus-ordering constraint graph.
Algorithm 1 Node Deleting (V , E)
1: for i = 1 to jV j do
2: Calculate in-degree and out-degree of nodes in V
3: Find min in-degree minin and min out-degree minout
4: if (minin = 0) or (minout = 0) then
5: Remove the corresponding node v from V
6: Remove edges connecting to/from v from E
7: else
8: Find the node v with max degree
9: Insert v into Remove Set
10: Remove v from V and related edges from E
11: end if
12: end for
13: return Remove Set
For each iteration, the size of V is reduced by one. If we can find a node whose in-
degree or out-degree is 0, then this node is treated as a good node. Otherwise, we select
the node v with max degree (in-degree + out-degree) and insert it to the Remove Set, i.e.,
v should be discarded in order to break cycles. This algorithm guarantees that if the graph
is acyclic, Remove Set is empty. The running time is O(jV j2).
Figure 2.7 illustrates an example. Figure 2.7(a) shows an instance of bus ordering
constraint graph G, which includes nine buses. We first remove nodes whose in-degree
23
or out-degree is zero. Buses a, d, and e are removed from G as Figure 2.7(b). In Figure
2.7(b), the in-degree and out-degree of all nodes are nonzero; therefore, a cycle must exist.
Since c has the maximum degree, c is deleted, making b and g free. The result is illustrated
as Figure 2.7(c). Finally i is deleted to break the cycle between i and h. Therefore, there
are two nodes c and i in Remove Set. Figure 2.7(d) shows the residual acyclic graph after
deleting c and i. Furthermore, based on Figure 2.7(d), it is easy to find a bus ordering
consistent with the below-above relationship imposed by the sequence pair. For instance,
the bus ordering (from bottom to top) could be g, d, f , h, a, b, e.
2.5 Evaluation Algorithm
The evaluation algorithm Algorithm 2 transforms a sequence pair representation to a BDF
solution. However, for some sequence pairs, it is impossible to fit in all of the buses. For
example, if a sequence pair violates block ordering for a bus, then some buses cannot be
assigned. Therefore, the target of the evaluation algorithm is to find a floorplan that assigns
as many buses as possible. The algorithm is summarized as follows. Suppose there are n
blocks and m buses.
Algorithm 2 Evaluation BDF (Seq, Bus)
1: Feasible Bus Checking Orientation
2: Bus Ordering
3: Modified LCS Computation
In the following, we explain the above three procedures one by one.
24
2.5.1 Feasible Bus Checking Orientation
According to Theorem 2.1, if the blocks of a bus violate block ordering in the given se-
quence pair, the bus cannot be assigned. Therefore, the first step is to identify these buses
and remove them from the bus set. For each bus, one scan of the sequence pair is enough
to make the judgment. At the same time, if blocks related to a bus appear in the same order
in both sequences, the bus is a horizontal bus, otherwise, the bus is a vertical one. This step
takes O(mn) time.
2.5.2 Bus Ordering
Due to the bus ordering imposed by the given sequence pair, some buses cannot be assigned
at the same time as discussed in Section 2.4. We apply the Node Deleting algorithm to
further remove some buses. Meanwhile, since the constraint graph is acyclic, we sort the
horizontal buses from bottom to top (from left to right for vertical buses) according to the
below-above (left-right) relationship. This bus order will be used in the next step. This step
takes O(m2n) time.
2.5.3 Modified LCS Computation
The algorithm is based on the engine of computing longest common subsequence (LCS)
presented in [8]. LCS computation calculates x coordinates and y coordinates separately.
And it always packs blocks from bottom to top (from left to right). In this section, we only
discuss the calculation of y coordinates of blocks with the assignment of horizontal buses.
25
B C
E F
x
y
D
A
G
u
A
B C
E
x
y
D
G
u
v
F
(a) (b)
Figure 2.8 Insert two horizontal buses to the floorplan represented by (A D E B C F G,
E D A F B C G). (a) One horizontal bus fA; B; Cg is assigned. (b) In order to insert
another bus fB; E; Gg, blocks A and D have to move up and this makes the bus fA; B; Cg
changed, too.
The calculation of x coordinates of blocks and vertical buses can be derived similarly.
For any given horizontal bus < H; t; fb1; :::; bkg >, the y coordinates of the k blocks
are first calculated with LCS computation. Then these k blocks are aligned so that the bus
can be inserted. Suppose the height of bi is hi and the y-coordinate of the lower-left corner
of bi is yi. The basic alignment can be performed as follows:
ymax = maxfyiji = 1; 2; :::; kg
yi = maxfyi; ymax + t− hig 8i 2 fi = 1; 2; :::; kg.
However, the alignment adjustment for different buses may affect each other. For ex-
ample, in Figure 2.8, the given sequence pair is (A D E B C F G; E D A F B C G), and
two buses u = fA; B; Cg and v = fB; E; Gg are to be inserted. Suppose the horizontal
bus u = fA; B; Cg is first assigned as Figure 2.8 (a). When we want to place another bus
v = fB; E; Gg, block E has to move up, which causes blocks A and D to move up too.
Furthermore, due to the change of block A, bus u has to be reassigned. Since LCS compu-
tation packs blocks from the bottom up, it is important to process buses in the same way.
26
B C
F
x
y
D
A
vu
E
B C
F
x
y
D
A
v
u E
(a) (b)
Figure 2.9 (a) Two buses overlap due to basic alignment adjustment. (b) Assignment of
two buses without overlap.
Each time, the lowest bus is selected and assigned so that the bus would not be affected by
later processing of other buses. By calling Bus Ordering procedure, a sorted bus list can be
obtained.
At the same time, when multiple horizontal buses are considered, we also need to avoid
overlap between horizontal buses. Therefore, the above basic alignment is not enough.
For example, in Figure 2.9, two horizontal buses u = fC; Dg and v = fB; Eg are to be
assigned. First, bus u is assigned with yu as the y-coordinate of its bottom edge. However,
according to the basic alignment calculation, the y-coordinate of bus v’s bottom edge is
also yu. Then bus u and v are overlapped as Figure 2.9(a). This is not allowed in a BDF
solution. We call this situation Bus Overlap. Figure 2.9(b) shows a feasible solution.
If two buses have bus ordering constraint (only Case 1 and 2 in Section 2.4.2 are con-
sidered since only one bus can be assigned in Case 3), the above situation cannot happen.
This is because in Case 1 or 2, there must exist a block of one bus which is above a block
of the other bus. Then the two buses cannot overlap. On the other hand, if two buses have
no bus ordering constraint (Case 4 in Section 2.4.2), Bus Overlap may happen.
27
BC
y
x
u
v
A
B
y
x
A CE
D
u
v C
y
x
A E
DB
u v
(Case A) (Case B) (Case C)
Figure 2.10 (Case A) Two buses share blocks A and B. Bus Overlap may happen. (Case
B) The blocks of two buses appear interlaced along x-axis. Bus Overlap may happen.
(Case C) Two buses have no overlaps along x-axis. Bus Overlap is impossible.
Suppose two horizontal buses u = fa1; a2; :::; akg and v = fb1; b2; :::; blg have no bus
ordering constraints. Denote the block set Su = fa1; a2; :::; akg, Sv = fb1; b2; :::; blg. There
are still three cases:
Case A: Su \ Sv 6= . Two buses share at least one block. Bus Overlap may happen,
like Figure 2.10 (Case A) which involves two buses u = fA; Bg and v = fA; B; Cg.
Case B: Su\Sv = , and the subsequence pair (X 0; Y 0) is (::: ai1 :::bj ::: aik ::: ; ::: ai1
::: bj ::: aik :::). Figure 2.10 (Case B) shows an example. In this case, the blocks of two
buses u = fA; B; Cg and v = fD; Eg appear interlaced along x-axis. It is likely that
Bus Overlap happens.
Case C: Su \ Sv = , and in the subsequence pair (X 0; Y 0), all blocks in Su appear
ahead of (behind) blocks in Sv. Figure 2.10 (Case C) shows an example. In this case, the
28
two buses u = fA; B; Cg and v = fD; Eg do not have overlap along x-axis. Therefore,
the y-coordinates of the two buses can be decided independently; i.e., Bus Overlap cannot
happen.
Based on the above discussion, we can conclude that only Case A and B can lead to
Bus Overlap. Therefore, in the basic alignment calculation, we also need to check Case
A and B so that no overlaps between two buses are introduced. This kind of checking can
be incorporated in the Bus Ordering procedure, and the results are kept in a table. If Case
A or B is detected, the current bus has to move up until there is no overlap with the buses
below.
In summary, after we get the bus list from Bus Ordering procedure, we apply LCS
computation m times (suppose there are m buses in the bus list). In one iteration, after
the positions of blocks related to bus u are calculated by LCS, the position of bus u is first
calculated by basic alignment calculation. Then we check if bus u has Case A or B with
buses below u from the table which is created during Bus Ordering. If there is this kind
of buses, get the highest position of those buses and the position of bus u is decided by
adding the bus width to the value. After the position of bus u is decided, check all of its
related blocks, and move up blocks if necessary in order to let the bus go through. Once the
position of a bus is calculated, the bus is not changed any more. Therefore, each iteration
fixes one bus. The running time of LCS is O(n log log n) [8] where n is the number of
blocks. So Modified LCS Computation is bounded by O(mn log log n + m2n).
29
2.6 BDF Algorithm
Most floorplan algorithms based on the sequence pair representation use simulated anneal-
ing (SA). In this paper, we also use SA to search an optimal or near optimal solution to a
BDF problem.
2.6.1 Perturbation (Move)
We use the following operations to generate a neighboring sequence pair in simulated an-
nealing:
1. Swap is to swap two blocks in either the first sequence or the second sequence. Swap
can be done in constant time.
2. Rotation is to rotate a block (e.g. exchange the width and height of a block). Rotation
does not cause any changes to the sequence pair. This operation can be done in
constant time.
2.6.2 Cost Function
The target of BDF problem is to minimize the chip area and the total bus area. At the same
time, we hope to insert all of the buses. Therefore, we define the cost function as follows:
Cost =   C +   B + γ M
where C is the chip area, B is the bus area, M is the number of unassigned buses, and ,
, and γ are coefficients defined by users.
30
AC
D
F
E
G
B
y
x
A
C
D
F
E
G
B
y
x
(a) (b)
Figure 2.11 Soft block adjustment. (a) A BDF solution. Block E is on an LCS path. (b)
The new BDF solution after changing the shape of block E.
By applying the Evaluation BDF algorithm, the positions of blocks and buses are all
calculated. Therefore, it is easy to get the values of C, B, and M .
2.7 Soft Block Adjustment
For floorplan, the shapes of some blocks may not be fixed. For example, for some blocks,
their areas are fixed, but their width/height ratio can be changed in some range. This kind
of blocks are called soft block. The flexibility of soft block shapes can help us improve
BDF solution quality further. Our strategy is as follows.
After applying BDF algorithm, we get a BDF solution. Based on this solution, each
time we select a soft block on LCS path which decides the size of the chip [9, 8], reduce
the width (height) of the block a little bit, and apply Modified LCS Computation to get a
new solution. This process is executed repeatedly.
In order to control iterations, simulated annealing is adopted again. The perturbation
31
operation is to choose a soft block on LCS path and change its width or height accordingly.
The cost function is the same as that of BDF algorithm. Figure 2.11 shows an example.
Figure 2.11(a) shows a BDF solution. Blocks B, D and E are on an LCS path. E is selected
and its width is reduced. Figure 2.11(b) illustrates the BDF solution after repacking. Both
the chip area and the bus area are reduced.
2.8 Experimental Results
Our algorithm was implemented in C++ and tested on Intel Xeon (2.4GHz) with 1GB
memory. The technique of simulated annealing is used to search for an optimal or near
optimal BDF solution with a special annealing schedule. The annealing process starts from
an initial temperature. Then the temperature drops linearly while only a small number of
moves are made at each temperature. As to the cost function   C +   B + γ  M , let
 =  = 1. Since M is the number of unassigned buses, let γ equal to the chip area so that
γ M is in the same order of the other two items.
We tested on three sets of test files. The first set is derived from MCNC benchmarks
for block placement. We added different numbers of buses to the benchmarks. Once we
got a BDF solution, we also applied the soft block adjustment technique to further improve
the solution quality. Furthermore, we also made some post processing to move buses apart
from each other. The test results are listed in Table 2.1. As an illustration, Figure 2.12
displays the final packing result of ami49-2 after soft block adjustment. The ami49-2 in-
cludes 49 blocks and 12 buses. The buses are f0; 5; 9; 12; 18g, f1; 10; 21; 25g, f2; 28; 33g,
32
Table 2.1 Test set 1.
Results Soft-Adjust
File Block Bus time dead time dead
(s) space (s) space
apte 9 5 11 4.11% 12 (+1) 0.72%
xerox 10 6 12 3.88% 13 (+1) 0.95%
hp 11 14 28 5.02% 28 (+0) 0.62%
ami33-1 33 8 61 6.02% 62 (+1) 0.94%
ami33-2 33 18 81 6.10% 86 (+5) 1.27%
ami49-1 49 9 98 5.42% 101 (+3) 0.85%
ami49-2 49 12 278 6.09% 281 (+3) 0.84%
ami49-3 49 15 265 7.40% 268 (+3) 1.09%
 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
1112
13
1415
16
1718
19
20
21
22
23
24
25
26
2728
29
30
31
32
33
3435
36
37
38
39
40
41
42
43
44
45
46
47
48
Figure 2.12 The result packing of ami49-2 after soft block adjustment. There are 49
blocks and 12 buses.
33
Table 2.2 Test sets 2 and 3.
File Block Bus time (s) deadspace
cad1 40 13 209 4.40%
cad2 57 16 191 5.16%
grid4 16 8 1 0%
grid5 25 10 23 0%
grid6 36 12 103 0%
grid7 49 14 150 0%
f3; 19; 22; 26; 29; 34g, f4; 23; 27g, f5; 35; 30; 6g, f32; 31; 17g, f11; 14; 15; 32; 33g, f12; 8;
14g, f44; 43; 7g, f0; 3g, and f2; 47g. The second test set (cad1 and cad2) includes two test
files that are derived from industry designs. To further test our approach, we created a set
of bus grid test files. Each test file includes n2 (n = 4; :::; 7) blocks and 2n buses. All
blocks have the same block size (600 700) and each bus goes through n blocks. For ex-
ample, the test file grid7 includes 49 blocks and 14 buses. The buses are f0; 1; 2; 3; 4; 5; 6g,
f7; 8; 9; 10; 11; 12; 13g, f14; 15; 16; 17; 18; 19; 20g, f21; 22; 23; 24; 25; 26; 27g, f28; 29; 30;
31; 32; 33; 34g, f35; 36; 37; 38; 39; 40; 41g, f42; 43; 44; 45; 46; 47; 48g, f0; 7; 14; 21; 28; 35;
42g, f1; 8; 15; 22; 29; 36; 43g, f2; 9; 16; 23; 30; 37; 44g, f3; 10; 17; 24; 31; 38; 45g, f4; 11; 18;
25; 32; 39; 46g, f5; 12; 19; 26; 33; 40; 47g, and f6; 13; 20; 27; 34; 41; 48g. This kind of prob-
lem is quite hard, and the position of one bus heavily affects the assignment of other buses.
Still, our approach can find an optimal solution within a short time. Figure 2.13 illustrates
an optimal solution to the test file grid7. In this solution, not only the chip area is mini-
mized, but also the total bus area is minimized. Table 2.2 shows the test results of these
two test sets.
34
 0 1  2  3  4 5  6
 7 8  9 10 1112 13
1415 16 17 1819 20
2122 23 24 2526 27
2829 30 31 3233 34
3536 37 38 3940 41
4243 44 45 4647 48
Figure 2.13 An optimal packing of grid7. There are 49 blocks and 14 buses.
2.9 Conclusion
In this paper, we consider the bus-driven floorplanning (BDF) problem. We first derive nec-
essary conditions for feasible buses. Then based on the analysis of the relationship between
bus ordering and sequence pair representation, we develop an efficient evaluation algorithm
which transforms a sequence pair representation to a BDF solution. Simulated annealing is
adopted to search for an optimal or near optimal BDF solution. We also propose a simple
but efficient way to handle soft blocks. Experimental results demonstrate that our approach
is very efficient and effective.
35
CHAPTER 3
WIRE PLANNING WITH BOUNDED
OVER-THE-BLOCK WIRE CONSTRAINTS
3.1 Introduction
Due to the extreme complexity in System-on-Chip (SoC) design, the hierarchical approach
is widely used. By hiding the vast amount of distracting details in low-level objects, design
problems can be greatly simplified such that those problems can be solved in efficient ways.
In high-level design, a design is partitioned into several functional blocks, and then
each functional block can be designed independently in low levels. However, an implicit
constraint is that all these modules have to get connected in a certain way so that they can
accomplish the required functionality as a whole. Moreover, as technology scales down,
interconnect delay, especially global interconnect delay, has become the dominant factor
in achieving high performance in deep submicron design. Therefore, wire planning, which
plans the routing of global interconnect among macro blocks, has become an important
stage in physical design.
Recently, researchers have been examining extending buffering algorithm to consider
36
routing [10, 11, 12, 13]. In general it is the right direction to consider the impact of routing
on buffering, especially in the final implementation stages. However, explicit buffering
consideration in routing at early design stages is very CPU intensive. Furthermore, the final
buffering can be done only based on the detailed RC data, which can only be obtained after
the low-level block implementation is completed. Before low-level block implementation,
it does not make sense to put down all buffers – what we need is to reserve “proper” routing
resource for the global routes when doing block implementation.
In our work, we abstract buffering by looking at the problem at a higher level, and con-
sider the problem of wire planning with bounded over-the-block wires (WP). Informally,
the problem can be described as follows. Given a placement of macro blocks, some large
blocks are evenly divided into several subblocks. A two-pin net is to be routed by going
through these blocks/subblocks, i.e., to find a block/subblock sequence as the routing of the
net. However, some blocks (or some subblocks) are routing obstacles while some blocks,
such as IPs (Intellectual Property) whose internal structures have been fixed, only allow
routing but no buffers can be inserted. We call the latter kind of blocks “routing-block”.
Since a routing-block cannot hold any buffers, the longest over-the-block wires in it have
to be limited within a certain range since if the distance between two buffers is large, the
signal slew rate is slow which in turn may cause signal integrity problems. In WP problem,
each routing-block has an interconnect bound such that the longest over-the-block wires in
the routing block can only cross a certain number of subblocks. The bound is in line with
designers’ practice in keeping long routes “buffer-able” to meet timing and transition time
37
  
     
  
  
B2
  
  
B1
B9
B3
B8
B7
B5
B12
B11
B13B10
B15
B16
B4
B6
B14
Figure 3.1 The routing illustrated by thin lines is not valid; while the routing shown by
wide lines is a feasible solution.
constraints. Different routing-blocks may set different bounds.
Figure 3.1 illustrates an example of 16 macro blocks. Dark blocks are obstacles while
gray ones can be used for routing but cannot hold buffers. Since some macro blocks are big,
they are partitioned into several small subblocks in order to get more accurate estimation.
The blocks B2, B3, B9, B10, B11, and B12 are divided into 1  3, 2  3, 2  1, 3  1,
2  1, and 3  3 subblocks respectively. B4, B6, and B14 as well as a subblock of B3 are
obstacles. B3, B10, and B12 are routing-blocks, and the longest over-the-block wires in
these three blocks should not go across more than two subblocks. One two-pin net is to be
routed between B1 and B16. However, the routing illustrated by thin lines does not satisfy
the interconnect constraints since the over-the-block wires in both blocks B3 and B12 cross
4 subblocks. On the other hand, the routing illustrated by wide lines is a valid solution. For
short, let OB-wire refer to over-the-block wire.
In this chapter, we present two exact polynomial-time algorithms for wire planning with
38
bounded over-the-block wires. Both algorithms guarantee to find a feasible routing solution
with minimum wire length as long as a solution exists. Furthermore, both algorithms use
shortest path algorithm, but the constructions of the underlying graphs, which incorporate
interconnect constraints, are different. One approach requires less memory, but the other
approach may take less time to adjust the graph after routing one net. According to different
requirements, users can choose an appropriate algorithm to solve the problem.
The rest of the chapter is organized as follows. Section 3.2 defines the WP problem. In
Section 3.3, we present two exact polynomial-time algorithms based on the shortest path
algorithm with different graph constructions. Finally we show the experimental results in
Section 3.4 and conclude the chapter in Section 3.5.
3.2 Problem Formulation
A placement of m macro blocks B = fB1; :::; Bmg is given. Since some blocks are large,
they are divided into several subblocks. Suppose the block Bi is evenly divided into pi qi
blocks, and these subblocks form a pi  qi block grid. Let Ri = pi  qi be the number of
subblocks of Bi, and rji (j = 1; :::; Ri) refer to a subblock of block Bi. For convenience,
if Bi is not divided into subblocks, we still say it is divided into pi  qi subblocks where
pi = qi = 1.
Among these m blocks, some blocks are routing-blocks which allow over-the-block
routing but no buffers can be inserted. An interconnect bound di is set for each routing-
block Bi such that all over-the-block wires within Bi go through at most di subblocks.
39
Furthermore, some blocks do not even allow over-the-block routing and they become
routing obstacles. Some subblocks within a block can also be obstacles. Finally if a block
is neither a routing block, nor an obstacle, then buffers can be inserted and OB-wires in the
block are not constrained.
With all these requirements, the target is to find the shortest over-the-block routing for
a two-pin net such that the routing satisfies the interconnect constraints. The routing wire
length is measured by the Manhattan distance between the centers of two macro blocks.
3.3 WP Algorithm
If no interconnect constraints are considered, the routing of a two-pin net can be interpreted
as a shortest path between two blocks. But when interconnect constraints are involved, they
prevent us from applying shortest path algorithm directly. Therefore, our approaches are to
construct a graph so that interconnect constraints can be incorporated in the graph. Then
the shortest path algorithm is called to find the optimal routing solution. The main scheme
of a WP algorithm is shown as Algorithm 3:
Algorithm 3 WP-Algorithm ( )
1: Construct a graph incorporating constraints;
2: Apply shortest path algorithm on the graph;
3: Derive routing solution for the given net;
In the following two subsections, we present two ways of constructing the graph for a
given WP problem.
40
r i
1
Bi
(a) (b)
r i
4
r i
7
r i
2
r i
8
r i
3
r i
6
r i
9
Figure 3.2 (a) A routing-block Bi is divided into 3  3 subblocks, and the interconnect
bound is 3. (b) A graph denotes all valid OB-wires within Bi.
3.3.1 WP-Path algorithm
As we notice that all interconnect constraints are confined to OB-wires within routing-
blocks. Therefore, the main idea of WP-Path algorithm is to explore all legal connections
between any two boundary subblocks within the same routing-block.
Figure 3.2 illustrates an example. Figure 3.2(a) shows a routing-block Bi that is divided
into 3 3 subblocks. Suppose the interconnect bound is 3. In Figure 3.2(b), each subblock
is represented by a node and the edges imply that the corresponding two blocks can be
connected by an OB-wire which crosses at most three subblocks. In other words, any valid
OB-wire in Bi corresponds to an edge in Figure 3.2(b).
However, if two or more successive edges are selected for one OB-wire, the intercon-
nect constraint may not be satisfied. For example, in Figure 3.2(b), if the edges (r1i ; r3i ) and
(r3i ; r
9
i ) are selected, then the corresponding OB-wire goes through five subblocks (i.e., r1i ,
r2i , r
3
i , r
6
i , and r9i ). In order to avoid selecting successive edges related to a routing-block,
one node is split into two nodes which are called in-node and out-node, respectively. Figure
3.3 illustrates an example. Two nodes rji and rki belong to the same routing-block Bi, and
41
vi
j
v i
j
r i
j r i
k
vi
k
v i
k
(a) (b)
e1
e2
e1
e1
e2
e'2
Figure 3.3 (a) The two nodes rji and rki are within the same routing-block Bi. (b) Each
node is split into an in-node and an out-node.
there is an edge e^2 connecting the two nodes. But the edge e^1 connects rji to a node which
does not belong to the block Bi. Then the node rji is split to an in-node v
j
i and an out-node
vji . Similarly for the node rki . Also the undirected edge e^2 becomes two directed edges e2
and e02 pointing from an in-node to an out-node. Furthermore, since the edge e^1 connects a
node not belonging to Bi, it becomes two edges e1 and e01. e1 is incident to the in-node v
j
i ,
while e01 is incident from v
j
i .
Based on the above strategy, we construct a directed graph Path Graph Gp = (Vp; Ep)
for a WP problem as follows:
1. Nodes Vp = Vps [ Vin [ Vout where
Vsp = fuji ji = 1; :::; m; rji is a subblock of Bi, and Bi is neither an obstacle
nor a routing-block g
Vin = fvji ji = 1; :::; m; rji is not an obstacle and it is a subblock on the
boundary of Bi g
Vout = fvji ji = 1; :::; m; rji is not an obstacle and it is a subblock on the
boundary of Bi g
42
For convenience, node vji 2 Vin is called in-node, while node vji 2 Vout is called
out-node. Without misunderstanding, the corresponding subblock of a node vji (or vji
or uji ) is rji .
2. Edges Ep = Epa [Epb [Epc [Epd [Epe where
Epa = f(uji ; vlk)juji 2 Vps; vlk 2 Vin; rji and rlk are adjacent g
Epb = f(vji ; ulk)jvji 2 Vout; ulk 2 Vps; rji and rjk are adjacent g
Epc = f(uji ; ulk)juji ; ulk 2 Vps; uji 6= ulk; rji and rlk are adjacent g
Epd = f(vji ; vji )jvji 2 Vin; vji 2 Voutg
Epe = f(vji ; vki )jvji 2 Vin; vki 2 Vout; j 6= k; and there exists routing between
two subblocks rji and rki such that the routing goes across at most di
subblocks g
Edges in Epd [ Epe connects nodes whose corresponding subblocks belong to the
same routing-blocks. As we notice that those edges are always incident from an in-
node to an out-node. Therefore, no path can take two successive edges related to the
same routing-block. Furthermore, any valid OB-wire in a routing-block corresponds
to an edge in Epd [ Epe. On the other hand, one edge in Epd [ Epe implies that
at least one valid OB-wire exists between the two corresponding subblocks. For
convenience, we say an edge of Epd [ Epe is within the related routing-blocks.
3. Edge cost
43
B1
B2
B4
B5
B6r3
1
  
  
r3
4
r3
7
r3
2
r3
5
r3
8
r3
3
r3
6
r3
9
Figure 3.4 The solid lines indicate a routing solution between blocks B1 and B6.
If e 2 Epd, C(e) = 0
If e 2 Epa [ Epb [ Epc, and e = (ji ; lk), C(e) = jxji − xlkj+ jyji − ylkj where
(xji ; y
j
i ) and (xlk; ylk) are the center coordinates of subblocks r
j
i and rlk
If e 2 Epe, and e = (vji ; vki ) , C(e) is the shortest routing length within routing-
block Bi between two subblocks rji and rki
Figure 3.4 illustrates an example. Figure 3.4 includes six macro blocks. B3 is a routing-
block, and it is divided into 3 3 subblocks. Subblock r83 of B3 is obstacle. Also the block
B2 is divided into 1  2 subblocks. Suppose the interconnect bound for B3 is 3 and we
want to find the routing between B1 and B6 under the interconnect constraint. Figure 3.5 is
the corresponding Path Graph Gp. Since r83 is obstacle and r53 is not a boundary subblock,
they are not included in Gp.
In Figure 3.5, edges within ellipses belong to Epd. The two nodes in one ellipse are
the in-node and out-node corresponding to a subblock. Since this kind of edge is used to
connect in-nodes and out-nodes, the cost is zero. For other edges within the routing-block
B3 (i.e., edges belonging to Epe), an edge (vi3; vj3) (i; j = 1; :::; 9; vi3 2 Vin; vj3 2 Vout; i 6= j)
44
v3
1
v3
1
v3
4
v3
4
v3
2
v3
2
v3
3
v3
3
v3
6
v3
6
v3
7
v3
7
v3
9
v39
u2
1
u2
2
u1
1
u6
1
u5
1
u4
1
    
    
    
Figure 3.5 The corresponding path graph Gp. The wide lines illustrate a shortest path
from u11 to u16.
is added if the subblock ri3 can reach r
j
3 by crossing at most three subblocks of B3. For
example, r73 can reach r13 by going through r43. So one edge (v73 ; v13) is added. But the
shortest routing between r73 and r93 has to go through r43, r53 and r63. Therefore (v73; v93) is not
inserted in Gp. The cost for this kind of edges is the shortest routing length between the
two subblocks.
Once the path graph Gp is constructed, we can apply the shortest path algorithm [14, 15]
to decide the routing of a net. In this example, we plan to route one net between B1 and
B6. Therefore, we let u11 be the starting point and u16 be the ending point. The shortest
path from u11 to u16 is illustrated by wide lines in Figure 3.5. And it is easy to derive the
over-the-block routing from this solution as shown in Figure 3.4.
In order to construct a path graph Gp, we need to find all valid connections within each
45
routing-block, i.e., how to create edges in Epe. For each routing-block Bi, an undirected
graph ~Gi is set up by presenting each subblock, which is not obstacle, as a node and adding
edges according to block adjacency. The cost of all edges is 1. Then apply all-pair-shortest
path algorithm on ~Gi. If the distance between two boundary subblocks is less than or equal
to the interconnect bound of Bi, then add edges in Gp accordingly.
Suppose each block Bi is divided into Ri (Ri = pi  qi) subblocks. If Bi is a routing-
block, the nodes related to Bi are O(pi + qi); otherwise, the number of nodes is (Ri).
Therefore, we get
jVpj = O(
mX
i=1
Ri)
For edges in Gp, jEpa [ Epb [ Epcj = O(jVpj) since each edge connects two adjacent
blocks/subblocks, and the adjacency relationship among blocks/subblocks on a plane con-
stitutes a planar graph. Also jEpdj = jVinj = jVoutj. Finally jEpej = O(
Pm
i=1 minf(pi +
qi)
2; di  (pi + qi)g) where di is the interconnect bound for block Bi. Without loss of gen-
erality, we assume di = O(pi + qi). Then jEpej = O(
Pm
i=1(pi + qi)
2). Therefore, we
get
jEpj = O(
mX
i=1
(pi + qi)
2)
However, during the construction of path graph Gp, we first need to set up ~Gpi =
(~Vpi; ~Epi) for each routing-block Bi in order to decide edges in Epe. Since j ~Vpij = O(Ri),
j ~Epij = O(Ri), and all-pair shortest path algorithm takes O(j ~Vpijj ~Epij + j ~Vpij2 log~jVpij)
running time [14, 15], the creation of edges inside routing-block Bi takes O(R2i log Ri)
running time. Therefore, the total construction time for Gp is O(
Pm
i=1 R
2
i log Ri).
46
r i
1 r i
2 r i
3 r i
4
(a)
vi
1
ein
[1]
vi
1[2]
vi
1[3]
vi
2[1]
vi
2[2]
vi
2[3]
vi
3[1]
vi
3[2]
vi
3[3]
vi
4[1]
vi
4[2]
vi
4[3]
eout
(b)
Figure 3.6 (a) r1i , r2i , r3i , and r4i are subblocks of a routing-block Bi. (b) Each subblock is
represented by a node array.
Once the path graph Gp = (Vp; Ep) is constructed, the single-source shortest path al-
gorithm can be used to find the routing with over-the-block wiring constraint for a two-pin
net, and it can be accomplished in O(jVpjjEpj) time.
3.3.2 WP-Split algorithm
In this section, we propose another algorithm by constructing a directed graph Split Graph
Gs. The main idea of this approach is to represent a subblock in a routing-block, which
is not an obstacle, by di nodes. Then each path segment in the routing-block starts with a
node whose index is 1, and ends at a node whose index is di. Since all nodes with index
di only have out-going edges pointing to a node which is not related to Bi, the length of a
path segment is forced to be di. Figure 3.6 illustrates an example. Suppose the interconnect
47
bound for Bi is 3. r1i , r2i , r3i and r4i are subblocks of a routing-block Bi. Each subblock is
represented by a node array. Then a path segment in Bi must go from a node with index 1
to a node with index 3.
The construction of a Split Graph Gs = (Vs; Es) is as follows. If Bi is a routing-block,
let Li = di where di is the interconnect bound of Bi. Otherwise, let Li = 1.
1. Nodes Vs = [mi=1Vsi where
Vsi = fvji [k]jrji is a subblock of Bi and it is not an obstacle; k = 1; :::; Lig
2. Edges Es = Esa [ Esb [ Esc where
Esa = f(vji [k]; vji [k + 1])jvji [k]; vji [k + 1] 2 Vsi; k = 1; :::; Li − 1g
Esb = f(vji [k]; vli[k + 1])jvji [k]; vli[k + 1] 2 Vsi; k = 1; :::; Li − 1,
j 6= l, and subblocks rji and rli are adjacent.g
Esc = f(vji [Li]; vlk[1])jvji [Li] 2 Vsi; vlk[1] 2 Vsk; and two subblocks rji and rlk
are adjacent, but do not belong to the same routing-block g
As we notice that edges belonging to Esb always connect nodes whose indexes in-
crease by one. Then a path segment within a routing-block Bi is always from a node
with index 1 to a node with index Li. In other words, any OB-wire in Bi cannot
exceed the interconnect bound di.
3. Edge Cost
48
v3
1[1]
v3
1[2]
v3
1[3]
v3
2[1]
v3
2[2]
v3
2[3]
v3
3[1]
v3
3[2]
v3
3[3]
v3
4[1]
v3
4[2]
v3
4[3]
v3
5[1]
v3
5[2]
v3
5[3]
v3
6[1]
v3
6[2]
v3
6[3]
v3
7[1]
v3
7[2]
v3
7[3]
v3
9[1]
v3
9[2]
v3
9[3]
v2
1[1]
v2
2[1]
v1
1[1] v4
1[1]
v5
1[1]
v6
1[1]
    
    
    
    
Figure 3.7 The corresponding split graph for Figure 3.4. The wide lines illustrate a short-
est path from v11[1] to v16[1].
If e 2 Esa, C(e) = 0
If e 2 Esb, e = (vji [k]; vli[k + 1]), C(e) = jxji − xlij+ jyji − ylij where
(xji ; y
j
i ) and (xli; yli) are the center coordinates of subblocks r
j
i and rli
If e 2 Esc, e = (vji [Li]; vlk[1]), C(e) = jxji − xlkj+ jyji − ylkj where
(xji ; y
j
i ) and (xlk; ylk) are the center coordinates of subblocks r
j
i and rlk
We still use the example of Figure 3.4 to illustrates our approach. Since the interconnect
bound for B3 is 3, three nodes are created for each subblock excluding obstacles. Then
edges are added among these nodes. Edges in ellipses belong to Esa and their cost is
0. For any other edge e inside B3 (i.e., e 2 Esb), the index of its source node is less
than that of its target node, and the difference is 1. On the other hand, for subblocks of
B2, they are represented by only one node since B2 is not a routing-block and has no
49
interconnect constraint. Finally, if two subblocks are adjacent and they do not belong to
the same routing-block, then the tail of a node array points the head of another node array.
This kind of edges belongs to Esc. The wide lines in Figure 3.7 shows a shortest path from
v11[1] to v
1
6 [1], and it corresponds to the routing in Figure 3.4.
Since a routing-block is divided into Ri = pi  qi subblocks and each subblock rji is
represented by di nodes, jVsj = O(
Pm
i=1 di  Ri). jEsaj = O(jVsj). jEsb = O(jVsj) and
jEscj = O(jVsj). Therefore jEsj = O(jVsj).
Once the Split Graph Gs = (Vs; Es) is constructed, the single-source shortest path
algorithm can be used to find the over-the-block routing for a two-pin net, and it can be
accomplished in O(jVsjjEsj) = O((
Pm
i=1 di  Ri)2) time.
3.3.3 Comparison
In the above, we have presented two approaches to set up a graph which implies inter-
connect constraints. The comparison between the two graphs are listed in Table 3.1.
Ri = pi  qi is the number of subblocks in a block, and di is the interconnect bound
of a routing-block Bi. Without loss of generality, we assume di = O(pi + qi).
The advantage of WP-Path algorithm is that it requires less memory since the underly-
ing graph is smaller. However, it takes longer time to construct the graph. “Setup” in the
table shows the construction time. “Path” is the time to search for a shortest path based on
the graph. And since the size of the underlying graph of WP-Path is smaller than that of
WP-Split, it takes shorter time to find a routing solution for one net.
50
Table 3.1 Algorithm comparison.
Algorithm WP-Path WP-Split
Nodes O(
Pm
i=1 Ri) O(
Pm
i=1 di Ri)
Edges O(
Pm
i=1(pi + qi)
2) O(
Pm
i=1 di Ri)
Setup O(
Pm
i=1 R
2
i log Ri) O(
Pm
i=1 di Ri)
Path O((
Pm
i=1 Ri)(
Pm
i=1(pi + qi)
2)) O((
Pm
i=1 di  Ri)2)
Adjust O(R2i log Ri) O(d2i )
Furthermore, after routing one net, some subblocks may become routing obstacles since
the net consumes all routing resource. Therefore, we have to judge whether the edges
within a routing-block are still valid. For example, in Figure 3.4, the routing of the net goes
through routing-block B3. Suppose the subblock r43 can accommodate only one net. After
routing this net, r43 becomes a routing obstacle. Then in Figure 3.5, serval edges such as
(v13; v
7
3) and (v23; v43) inside B3 are not valid any more. For the two approaches, WP-Path
have to recalculate edges within every routing-block along the path, while WP-Split only
needs to remove some edges from the graph. The time required for adjusting edges within
one routing-block is shown in Table 3.1 “Adjust”. In general, WP-Split is faster for graph
adjustment, and it is suitable for handling a large number of nets.
3.4 Experimental Results
Our algorithms were implemented in C++ on PC (733MHz) with 128MB memory. We
tested WP-Path and WP-Split algorithms on three test files which were generated randomly.
We compared the results of our two approaches with another shortest path approach that
51
Table 3.2 Test results of WP-Path and WP-Split algorithms.
File test1 test2 test3
Macro Blocks 50 100 150
Routing-Blocks 7 12 18
Nets 350 1600 3500
Total Subblocks 1337 1736 2604
Obstacles 18 48 131
Shortest Path Time(s) (per net) 11 (0.031) 23 (0.014) 76 (0.022)
Nets 165 (47.14%) 613 (38.31%) 1211 (34.6%)
WP-Path Time(s) (per net) 9 (0.026) 104 (0.065) 202 (0.058)
Nets 350 (100%) 1600 (100%) 3500 (100%)
WP-Split Time(s) (per net) 18 (0.051) 91 (0.057) 227 (0.065)
Nets 350 (100%) 1600 (100%) 3500 (100%)
finds a shortest path from the source node to the end node without considering interconnect
constraints. If the interconnect constraint is not satisfied, the net is discarded. Each time,
one net is selected to be routed. Furthermore, we assume that all subblocks can accom-
modate a large number of nets so that the routing of one nets is not affected by others. In
this way, we can concentrate on the routing quality of WP algorithms. Both WP-Path and
WP-Split algorithms guarantee to find feasible routing as long as one solution exists. As
shown in Table 3.2, all nets can be routed by our approaches, while less than 50% of the
nets can find feasible routing by the simple shortest path approach.
The underlying graph of WP-Path usually has smaller size than that of WP-Split. For
the test file “test2”, the number of nodes in WP-Path is about 1200 and edges are about
25000; while WP-Split needs about 16500 nodes and 73000 edges. WP-Path gains in mem-
52
020
40
60
80
100
120
140
500 1000 1500 2000
nets
tim
e 
(s) Split
Path
Figure 3.8 The comparison of WP-Path and WP-Split on the relationship between running
time and the number of nets.
ory usage. However, WP-Split has the advantage that it can make changes to the underlying
graph more easily after routing one net. When a larg number of nets are to be processed,
WP-Split may requires less running time than that of WP-Path. Figure 3.8 shows the re-
lationship between the number of nets and the running time for “test2”. As the number of
nets increases, the running time of WP-Split becomes shorter than that of WP-Path.
3.5 Conclusion
In this chapter, we presented two exact polynomial-time algorithms for wire planning with
bounded over-the-block wires. Both algorithms guarantee to find an optimal routing so-
lution for a two-pin net as long as one exists. One requires less memory, while the other
may take less running time when processing a large number of nets. According to different
application requirements, users can choose an appropriate one.
53
CHAPTER 4
SIMULTANEOUS PIN ASSIGNMENT AND ROUTING
4.1 Introduction
Due to the enormous complexity of VLSI design, a hierarchical approach is needed for the
placement and routing of millions of standard cells in order to reduce runtime and improve
solution quality. Pin assignment and routing for macro blocks are important steps in a
typical top-down hierarchical approach.
Existing algorithms for macro-block pin assignment and routing can be classified into
              
              

   
   
   



      
      
      
      
      
      
      
      
      









d
d
c
c
b
b
a
a
              
              

   
   
   



      
      
      
      
      
      
      
      
      









d
d
cc
bb
a a
(a) (b)
Figure 4.1 (a) The two-step approach fails to route all nets. (b) The optimal solution of
pin assignment and routing by our approach.
54
two categories: (1) a two-step approach where pin assignment is followed by routing [16,
17, 18, 19], and (2) a net-by-net approach where pin assignment and routing for a single
net are performed simultaneously [20, 21, 22, 23, 24]. None of the existing algorithms
is “exact” in the sense that the algorithm may fail to route all nets even though a feasible
solution exists. This remains true even if only two-pin nets with fixed pins between two
blocks are concerned. Let us use two examples to illustrate that previous approaches cannot
guarantee a feasible solution. The first example in Figure 4.1 includes two macro blocks
and three obstacles in a one-layer routing environment. Four nets f a; b; c; d g are to be
assigned pins and routed. Figure 4.1(a) is obtained by a two-step approach. But at most
three nets can be routed for the pin assignment solution; i.e., the pin assignment solution
leads to a routing problem that is not routable by any router. Figure 4.1(b) shows a feasible
solution. Figure 4.2 gives another example that includes two macro blocks and six obstacles
in one-layer routing environment. Four nets fa; b; c; dg are to be assigned pins and routed.
Figure 4.2(a) is the result of the net-by-net approach. Still it is impossible to assign pins
and route for net d. Figure 4.2(b) gives a feasible solution. Note that both feasible solutions
can be obtained by our algorithm in this chapter.
In this chapter, we first consider the two-pin net connections from one block to all
other blocks. We present the first polynomial-time exact algorithm for simultaneous pin
assignment and routing for all two-pin nets between one block (source block) and all other
blocks. In addition to finding a feasible solution whenever one exists, it guarantees to find
a pin-assignment/routing solution with minimum cost  W +   V for any positive pair
55
   
   
   



              
            

 
  
  
  
  
  
  
  
  








 
 
 
 
 
 
 
 
 









 
 
 
 
 
 
 
 








     
c
c
b
a a
b
              
              


   
   

      
  
  
  
  
  
  
  
  








 
 
 
 
 
 
 
 
 









 
 
 
 
 
 
 
 








c d
d
c
b
b
a
a
(a) (b)
Figure 4.2 (a) The net-by-net approach fails to route all nets. (b) The optimal solution of
pin assignment and routing by our approach.
 and , where W is the total wire length and V is the total number of vias. Our algorithm
has various applications. (1) It is suitable in ECO situations where the existing solution is
modified incrementally. (2) Given any pin assignment and routing solution obtained by any
existing method, we can increase the number of routed nets and reduce the routing cost by
removing the routes connecting to one block and redoing pin assignment and routing with
our algorithm. Furthermore, applying the algorithm iteratively (each time randomly pick
one block as source block) provides a polynomial-time randomized algorithm for the pin
assignment and routing problem among blocks. This method is applicable to both global
and detailed routing with arbitrary routing obstacles on multiple layers. Experimental re-
sults demonstrate its efficiency and effectiveness.
Our method is based on min-cost flow computations. Although network flow formula-
tions have been proposed for routing in the past [25, 26, 27, 28, 29, 30], there are important
differences between our work and previous results. First, previous network flow formula-
56
tions were primarily designed for global routing, whereas ours combines pin assignment
with routing (detailed or global). Second, almost all of those previous works needed to
solve the multicommodity flow problem which is NP-hard, whereas our algorithm uses
min-cost flow which is a polynomial time solvable problem. Meixner and Lauther pro-
posed an algorithm using min-cost flow. However, the nets handled by the algorithm have
to be connected to one common node. Third, our algorithm exactly solves the simultaneous
pin assignment and routing problem for all two-pin nets from one block to all other blocks
in polynomial time. Note that the routing step alone is NP-complete even if there are only
two blocks and all nets are two-pin nets with fixed pins.
The rest of the chapter is organized as follows. Section 4.2 defines the problem of
simultaneous pin assignment and routing in multilayer. In Section 4.3, we give a network
flow formulation to find the routes for all two-pin nets between one block and all other
blocks. In Section 4.4, we discuss its application in ECO situation, and demonstrate how to
use it to improve any given solution and to solve the pin assignment and routing problem
among blocks. Furthermore, we extend the algorithm to handle multiple pin nets. Finally,
we show the experimental results in Section 4.5 and conclude the chapter in Section 4.6.
4.2 Problem Definition
The macro block layout in multilayer is modelled by a three-dimensional multilayer routing
grid graph G = (V; E). For convenience, we call the 3 dimensions as x, y, and z dimension.
Each layer is an x-y dimensional grid. Along z axis, the grid nodes with the same x; y
57
      
      
      
      
      
      
      
      
      
      
      











     
     


Layer 1
Layer 2
C
A
B           
     
     
     
     
     
     
     
     
     











Figure 4.3 A routing grid graph for two layers.
coordinates on different layers are connected by via edges. The adjacent nodes on the
same layer are connected by edges which represent wire segments. In some technology, a
layer has a specified track orientation. In this case, if a layer is used for horizontal tracks,
horizontally adjacent nodes of the layer are connected by edges. The similar rule applies
for vertical tracks. For routing obstacles (routing congestion region, prerouted wires, and
so on) where wiring is not allowed, there are no edges (then the nodes inside the obstacles
can be omitted). In practice, routing obstacles can present in any layer, and a block can
occupy any number of layers. There is no node inside the region that a block occupies, but
the nodes on the boundary, which are possible pin locations, are connected to the nodes
outside the block. Furthermore, the nodes over a block in the graph are for over-the-block
routing. As an example, Figure 4.3 illustrates a two-layer routing grid graph. Layer 1 is
used for vertical tracks, and Layer 2 for horizontal tracks. Three shaded regions (A; B; C)
are macro blocks. The other three regions represent routing obstacles. Blocks occupy Layer
1. There are two obstacles in Layer 1 and one in Layer 2.
58
The grid graph contains not only the topological information (layer and via informa-
tion), but also the routing obstacle information. So this model is quite flexible and accurate
for multilayer technologies, and is suitable for both global routing and detailed routing.
In this model, each edge and each node have a capacity which specifies how many
wires are allowed to go through. In detailed routing, the capacity is 1. Also each edge is
associated with a cost. The cost of a via edge is  which is specified by users. For wire,
the cost is   le where  is specified by users and le is the wire length. Since different
layers may have different width and resistance, the cost of edges of different layers could
be different accordingly. For example, for a two-layer routing grid, we assign 1 to Layer
1 and 2 to Layer 2. Then the cost function becomes
1 W1 + 2 W2 +   V
where W1 is the total wire length on Layer 1, W2 is the total wire length on Layer 2, and
V is the total number of vias. The algorithm guarantees to find a solution with minimum
total cost.
The goal of pin assignment is to decide the exact pin positions on macro blocks. Rout-
ing is to find an appropriate connection among the pins of the same net. The two tasks are
closely related. Pin assignment alone neglects many important factors since interconnect is
hard to estimate accurately without carrying out the actual routing step. Moreover, a global
view of net and routing resource information is critical for pin assignment and routing. In
this chapter, we consider the problem of simultaneous pin assignment and routing for all
two-pin nets between a block (source block) and all other blocks. The general problem of
59
pin assignment and routing among blocks will be discussed in Section 4.4.
The problem (called PAR: Pin Assignment and Routing) can be formally described as
follows:
Problem 4.1 PAR: Given a routing grid graph G = (V; E) with U and C where U is a
function on edges and nodes denoting the capacity of edges and nodes and C is a function
on edges denoting the cost of edges, a set of m + 1 macro blocks B(one block is the source
block bs, and the others are sink blocks b1; b2; :::; bm), and a set of nets N = N1 [ N2 [
:::[Nm where Ni; i = 1; 2; :::; m is the set of nets between block bs and blocks bi, find a set
of paths connecting bs and b1; b2; :::; bm, each path corresponding to a net in N , such that
each edge/node is used no more than its capacity and the total cost for all nets is minimized.
Each endpoint of a path is a pin location.
In PAR problem, the connections from the points inside the block to the boundary points
of the same block are redundant since we can pick the boundary points as pins with less
wiring cost. It is true that for multilayer layout a pin can be placed inside a block. However,
the pin must be connected with some pin outside the block by over-the-block routing which
has to cross the block boundary. Therefore, without loss of generality, we may assume that
a pin is only on the boundary of a block and can be placed in any available layer (note that
stacked pins are allowed).
As we know, routing itself is an NP-hard problem. It remains NP-hard even if only
two-pin nets with fixed pins between two blocks are concerned. In the traditional two-
step approach, the problem is inherently difficult even for two blocks. However, the PAR
60
          
          
          
          
          
          
          
          
          
          










bs
1
b2
2
b
1
t 1
t 2
(2,0)
(1,0)
s
t
(a) (b)
Figure 4.4 (a) A PAR problem in detailed routing. (b) The corresponding network graph.
problem, which combines pin assignment and routing, is solvable in polynomial time.
4.3 The Algorithm
In this section, we mainly use single-layer routing graph to simplify the presentation, since
the single-layer illustration is easier for visualization.
To solve the PAR problem, we first construct a network graph based on the routing
graph, and then apply a min-cost flow algorithm [31] to get the solution.
Given a routing grid graph G = (V; E) with capacity U and cost C, blocks B =
fbs; b1; b2; :::; bmg, and nets N = N1 [N2 [ ::: [Nm, the network graph GN = (VN ; EN)
is constructed as follows.
1. VN = fs; t; t1; t2; :::; tmg [ V , where s is the source node, t is the sink node, ti is a
subsink node.
2. EN = E[f(s; v)jv 2 Psg[f(u; ti)ju 2 Pi; i = 1; 2; :::; mg[f(ti; t)ji = 1; 2; :::; mg,
61
where Ps is the set of the available nodes for pin assignment on the boundary of block
bs and Pi is the set of the available pin nodes on the boundary of block bi.
3. Edge capacity: for edges (s; v) and (u; ti), UN(s; v) = UN(u; ti) = 1 in detailed
routing and UN (s; v) = UN(u; ti) = pin node capacity in global routing; for edge
(ti; t), UN(ti; t) = jNij; for any other edge e 2 E, UN(e) = U(e).
4. Node capacity: for v 2 V , UN (v) = U(v). Other nodes are incapacitated.
5. Cost function: CN(s; v) = 0, CN(u; ti) = 0, CN(ti; t) = 0; for other edge e 2 E,
CN(e) = C(e).
As an example, Figure 4.4(a) shows a PAR problem of detailed routing. One net is to
be routed between bs and b1 and two nets between bs and b2. Figure 4.4(b) illustrates the
corresponding network graph for this PAR problem. The expression (u; c) specifies the
capacity u and cost c of an edge. All nodes have capacity 1. For obstacles, no edges or
nodes inside obstacles are created. Therefore, no routing is inside obstacles. Note that each
undirected edge represents a pair of directed edges with capacity 1 in opposite directions.
Although there are two directed edges between a pair of nodes, the two edges can not
appear together in a min-cost solution and the total flow going through the two edges can
not exceed the capacity of one edge. Figure 4.5 illustrates the idea. In Figure 4.5(a),
suppose both edges between p and q appear in a min-cost solution. The capacity of (p; q)
and (q; p) is u and the cost is c. Let (1) be the route that flow f1 goes from s to p; let (2) be
the route that flow f1 goes from q to t; let (3) be the route that flow f2 goes from s to q; and
62
S tp q
f1
(u,c)
(u,c)
f1
f2 f2
f2 f2
f1(1)
(2)
(3) (4)
f2
(a)
S tp q
f1-f2
(u,c)
(u,c)
f1
f2 f2
f2 f2
(f1-f2)+f2(1)
(2)
(3) (4)
(b)
Figure 4.5 (a) A solution in a flow network. A flow f1 goes from p to q; and another flow
f2 goes from q to p. (b) Another solution with less cost.
let (4) be the route that flow f2 goes from p to t. The flow f1 is s ! (1) ! p ! q ! (2) !
t and f2 is s ! (3) ! q ! p ! (4) ! t. (f1  u, and f2  u.) Suppose f1  f2. Then
the flow in Figure 4.5(a) can be transformed to the flow in Figure 4.5(b). The flow f1 is
splitted into (f1 − f2) (s ! (1) ! p ! q ! (2) ! t) and f2 (s ! (1) ! p ! (4) ! t);
and the flow f2 is s ! (3) ! q ! (2) ! t. Obviously, the total cost is reduced by
2cf2 since (p; q) only has a flow of (f1 − f2). Therefore, in a min-cost flow solution, no
flow goes through both edges between any pair of nodes, and the flow will not exceed the
capacity of one of the two edges.
Furthermore, it is necessary to make nodes capacitated in the network graph GN (ca-
pacitating edges only is not enough since some routes may share the same node without
sharing an edge). However, classical network flow problem only capacitates edges. This
can be solved by splitting the capacitated node r into two nodes r0 and r00, adding an edge
(r0; r00) with capacity U(r0; r00) = U(r) and cost 0, and turning the original edges (u; r) and
63
r(U(r),0)
r’ r"
U(r)
Figure 4.6 Node splitting for capacitated nodes. The capacity of the new edge is U(r)
and its cost is 0.
(r; v) into edges (u; r0) and (r00; v) respectively (refer to Figure 4.6).
Any flow in the network GN can be mapped to a pin assignment and routing solution
for a subset of the given nets. Figure 4.7 illustrates a flow f , jf j = 3, corresponding to a
solution of pin assignment and routing for 3 nets. Given a set of nets N , let n = jN j, i.e.,
the number of nets in N . If a flow f exists and jf j = n, then we can feasibly assign pins
and route all nets in N . Furthermore, the cost of the flow is the cost for the solution of pin
assignment and routing. Therefore, min-cost flow guarantees a solution with minimum total
cost:  W + V . The total capacities of edges going into sink node t is: Pmi=1 UN(ti; t) =
Pm
i=1 jNij = jN j. Therefore, the maximum flow fmax in GN , jfmaxj  jN j. Then min-cost
maximum flow assigns pins and routes for as many nets as possible with minimum total
cost.
The following theorem shows that the PAR problem can be exactly solved by a min-cost
flow computation on GN .
Theorem 4.1 A min-cost flow f , jf j = jN j, in GN corresponds to a pin assignment and
routing solution to PAR problem for all nets in N with minimum total cost:  W +   V .
If the size of the max-flow, jfmaxj < jN j, then there is no feasible solution to the PAR
64
t 1
t 2
(2,0)
(1,0)
s
t
          
          
          
          
          
          
          
          
          
          










b
bs
2
1
b
(a) (b)
Figure 4.7 (a) A flow f in the network in Figure 4.4 (b), jf j = 3. (b) The corresponding
solution of pin assignment and routing for the 3 nets in the problem of Figure 4.4 (a).
problem, i.e., not all nets in N are routable. A min-cost maximum flow assigns pins and
routes for the maximum number of nets with minimum total cost.
Algorithm 4 summarizes the algorithm PAR-by-Flow.
Algorithm 4 PAR-by-Flow (G,U ,C,B,N)
1: Construct the network graph GN(VN ; EN)
2: Assign capacities UN and costs CN
3: Apply min-cost max-flow algorithm on GN
4: Derive pin assignment and routing solution
Finding a min-cost maximum flow in a network is a classical problem for which several
polynomial time optimal algorithms are available [14, 15]. Deriving a solution of PAR
from a flow in GN can be done in O(E) time. Thus, if we adopt the double scaling al-
gorithm in [31], which is capable of solving integer problems, we get the following time
complexity for the PAR problem.
65
Theorem 4.2 The PAR-by-Flow algorithm optimally solves the PAR problem in O(V E log
log Umax log(V Cmax)) time for G = (V; E), Umax is the maximum value of U , and Cmax
is the maximum value of C.
Note that the complexity of our algorithm PAR-by-Flow is mainly dependent on the
size of the routing grid graph G = (V; E). In global routing model where the size of the
routing graph is smaller, PAR-by-Flow requires less runtime.
In applications, some locations on the boundary of a block may not be allowed for
pin assignment. This problem can be solved easily as follows. We remove the directed
edge from the source node to the boundary node of the source block, which forbids pin
assignment, or remove the edge from the boundary node of the sink block to the subsink
node. Then our network-flow based algorithm will not assign a pin to the location.
4.4 Applications
In this section, we discuss applications of PAR-by-Flow algorithm. PAR-by-Flow exactly
solves the PAR problem, and can be used as a powerful sub-routine in many situations.
4.4.1 ECO
PAR-by-Flow provides an optimal solution to pin assignment and routing problem for all
2-pin nets from one block to other blocks. This problem matches well with some situations
in ECO. Usually, a design needs to go through many changes. At each step, designers
66
bh
i
k
j
g f
e
d
c
       
       
       
       
       
       
       
       
       
       
       











        
        
        
        
        
        
        
        
        
        
        
        
        













a
B C
D
E
A
d
        
        
        
        
        
        
        
        
        
        
        
        
        













q
p
h
i
k
j
a
b
f
g
e
c
       
       
       
       
       
       
       
       
       
       
       











B C
D
E
A
(a) (b)
Figure 4.8 (a) The initial pin assignment and routing solution. (b) The solution obtained
by applying PAR-by-Flow on Block A satisfying the new requirement.
do not want to redo everything and will just modify the existing solution incrementally.
For instance, a designer changes the design of one block in a floorplan. As a result, net
connections between the block and other blocks have to be changed accordingly. Some
nets become unnecessary, and some new nets need to be added. Also, during rerouting,
some routes are kept untouched. Now the problem becomes how to find a new solution
subject to these constraints as well as minimizing the total cost  W + V . The PAR-by-
Flow algorithm provides an ideal way to solve this kind of problem. For unchanged routes,
we regard them as obstacles. In this way, the pins, wire segments and vias occupied by
these nets can not be used by others. Then we update the set of nets according to the added
or deleted nets. After removing the connections to the block, we apply PAR-by-Flow and
get an optimal solution.
Figure 4.8 illustrates an example. In Figure 4.8(a), we have a pin assignment and rout-
67
ing design of a floorplan, and want to change net connections from Block A subject to: (1)
keep the routing of 3 nets c, d, and f unchanged; (2) add two new nets p and q between A
and B. Now we select Block A as the source block. Since the routes for nets c, d, f and the
nets among B, C, D, and E should not be changed, they are regarded as obstacles. The set
of nets becomes fa; b; e; g; h; i; j; k; p; qg. The result obtained by our algorithm is shown in
Figure 4.8 (b).
4.4.2 Improvement on any given solution
Our approach has a global view of all two-pin nets connecting to one block. And it can
be used to improve any pin assignment and routing solution. Given a pin-assignment and
routing solution, pick a block as the source block and regard others as sink blocks, then
remove all routes connected to the source block and apply PAR-by-Flow to redo pin as-
signment and routing. In the next step, another block is chosen as the source block and
this process is repeated until all blocks are touched as the source block. The optimality
of PAR-by-Flow guarantees that the new solution is no worse than the original one, either
increasing the number of routed nets or reducing the cost. By repeating the procedure on
each block (as source block), we can improve the solution obtained by any method. We
call this iterative method as IMProve-by-PAR.
As we notice, the ordering of source blocks in IMProve-by-PAR has influence on the
final result. Different orderings may lead to different results since each step is based on the
previous step. To alleviate the influence of block order, we implement IMProve-by-PAR
68
by enforcing a random order on blocks to apply PAR-by-Flow. Furthermore, we repeat
IMProve-by-PAR several times to get a better result. Each time, we get a new solution
from IMProve-by-PAR, and let this new solution be the input of the next IMProve-by-PAR.
This repeated application of IMProve-by-PAR is referred to RepIMProve-by-PAR, and
its pseudocode is shown in Algorithm 5. S is a previous solution, and T is the iteration
times.
Algorithm 5 RepIMProve-by-PAR (G,U ,C,N ,B,S,T )
for i = 1 to T do
Generate a random order Order on blocks B
S = IMProve-by-PAR (G,U ,C,N ,B,S, Order)
end for
Figure 4.9 illustrates an example of IMProve-by-PAR. The numbers in Figure 4.9(a) are
the number of nets to be routed between two blocks. The target is to route 1 net between A
and B, A and C, 4 nets between A and D, and 3 nets between B and C, B and D, C and
D. Figure 4.9(b) illustrates the initial net-by-net solution for pin assignment and routing
among the 4 blocks based on the min-cost path approach. In this solution, only 10 nets are
routed, and 5 nets (2 nets between blocks B and C; 3 nets between blocks C and D) are not
routed. The total cost is 51. First we choose Block A as the source block. After removing
all routes connected to Block A, we apply PAR-by-Flow to find net connections for Block
A and get a new solution as shown in Figure 4.9(c). In this step, the cost is reduced by 15.
Then we apply PAR-by-Flow to Block B as Figure 4.9(d). Two more nets between B and
C are routed. Similarly, Figure 4.9(e) shows the solution after applying PAR-by-Flow on
69
A
      
      
      
      
      
      
      
      
      
      










B
D
C
 
 
 
 
 
 
 







1
4
1
3 3
3
      
      
      
      
      
      
      
      
      
      









 C
B
A
D
(a) (b)
D
      
      
      
      
      
      
      
      
      
      










A
B
C
      
      
      
      
      
      
      
      
      
      










A
D
B
C
 
 
 
 




(c) (d)
A
D
B
      
      
      
      
      
      
      
      
      
      









 C
 
 
 
 




A
D
B
      
      
      
      
      
      
      
      
      
      









 C
 
 
 
 




(e) (f)
Figure 4.9 Illustration of improvement on a given solution. (a) Illustration of net connec-
tions among Block A, B, C and D. (b) Initial net-by-net solution based on the min-cost
path approach. 5 nets (2 between B and C; 3 between C and D) are not routed. The total
cost is 51. (c) The solution after applying PAR-by-Flow on Block A. Cost is reduced by 15.
(d) The solution after applying PAR-by-Flow on Block B. Two more nets between B and
C are routed. (e) The solution after applying PAR-by-Flow on Block C. All nets are routed
(3 more nets) with less cost (from 51 to 50). (f) The solution after applying PAR-by-Flow
on Block D. Nothing is changed.
70
Block C. All nets are routed (3 more nets) with less cost (from 51 to 50). Finally, apply
PAR-by-Flow on Block D and get the complete solution as Figure 4.9(f). Note that for the
final solution, the total cost is reduced from 51 to 50 even though 5 more nets are routed.
In addition, IMProve-by-PAR itself provides a new way to solve pin assignment and
routing problem among multiple macro blocks. The general problem can be decomposed to
a set of PAR problems and solved by PAR-by-Flow iteratively. Again, to alleviate the influ-
ence of block order, we just choose source block randomly. This comes out a polynomial-
time randomized algorithm to solve the pin assignment and routing problem among blocks.
Of course, RepIMProve-by-PAR can be used to improve the result. In fact, if we just let
the input solutions of RepIMProve-by-PAR S empty, we can always get one solution when
the program terminates.
4.4.3 Multiple-pin nets
In the above sections, we focus on two-pin net connections among multiple macro blocks.
Now we extend the algorithm to handle both two-pin nets and multiple-pin nets.
Suppose a multiple pin net is routed as a steiner tree. Once a block is selected as the
source block, the branch on the tree connecting to the source block is removed to reroute.
Since the routing of the rest of the tree (called residual tree) should not be changed, the
edges and nodes occupied by the residual tree are treated as obstacles. Furthermore, one
new node v is added to the flow network to present this multiple pin net. All nodes on the
residual tree are connected to v and v is connected to the sink node t. The capacity of the
71
AB
C
S
t
tb
m
(1,0)
(1,0)
(2,0)
A
e
C
B
tc
(a) (c)
A
B
C
e
S
t
tb
m
(1,0)
(1,0)
(2,0)
A
e
C
B
tc
(b) (d)
Figure 4.10 Illustration of improvement for a pin assignment and routing of two/multiple-
pin nets. (a) A one-layer pin-assignment and routing solution. (b) When Block A is selected
to be the source block, all nets connecting to A are removed to reroute. The routing e
between B and C should not be changed. (c) The corresponding flow network graph. (d)
A flow f (jf j = 4) in the network.
72
AB
C
Figure 4.11 An improved solution of pin assignment and routing of two/multiple-pin nets.
edge (v; t) is 1 and the cost is 0. Therefore only one flow is allowed from v to t, i.e., only
one node on the residual tree is connected to the source block.
Figure 4.10 illustrates an example. Figure 4.10(a) shows the initial solution of pin
assignment and routing for a one layer problem. When Block A is selected, all routed nets
connecting to A are removed for rerouting as illustrated in Figure 4.10(b). And the routing
e between B and C should not be changed. We construct the flow network as described in
section 4.3. For the multiple pin net among blocks A, B and C, a new node m is created
to represent this net. All nodes along e are connected to m and m is connected to the
sink node t. Furthermore, the edges along e have already been occupied. Therefore the
edges along e are deleted (represented by dash line in Figure 4.10(c) and (d) ), and no more
flows can push through them. The corresponding flow network is shown in Figure 4.10(c).
Figure 4.10(d) shows a flow f (jf j = 4) in the network. By mapping the flow to the original
routing grid, we get a new pin assignment and routing solution as Figure 4.11. Compared
to the original one, the total wire length is reduced from 28 to 17.
73
Moreover, the above method is also a way to solve pin assignment and routing problem
of two/multiple pin nets. For each multiple-pin net, we can first set up one connection
between two blocks (just as a two-pin net between the two blocks). Connecting all nodes
on the tree(line) to node v which presents the multiple-pin net, we can construct a steiner
tree by adding branches one by one. Once every block has been selected as the source
block, we can get a pin assignment and routing solution for nets among multiple blocks.
Still we can further improve the solution by repeating the above procedure several times.
A different block ordering can be adopted each time.
4.5 Experimental Results
We have implemented the PAR-by-Flow and RepIMProve-by-PAR algorithms in C++ lan-
guage, and carried out experiments on Sun Sparc Ultra 5 (360MHz) with 128MB memory.
We have tested the refinement algorithm RepIMProve-by-PAR for two pin nets by us-
ing a net-by-net approach to get the original solution. The net-by-net approach considers
only one net each time, and finds a min-cost path between source block and sink blocks
to assign pins and route the net. Nets in net-by-net approach are processed randomly. In
RepIMProve-by-PAR, we repeat IMProve-by-PAR 10 times. Both RepIMProve-by-PAR
and net-by-net are executed 10 times on 8 data files, 4 for detailed routing and 4 for global
routing. Tables 4.1 and 4.2 list the average of these ten results for detailed routing and
global routing respectively. After refinement, all nets are routed with significant improve-
ment on the total cost, the wire length and the number of vias. As an illustration, Figure
74
(a)
(b)
Figure 4.12 Two-layer pin assignment and routing for X18. (a) Net-by-net solution. (b)
The solution obtained by applying our method on (a).
75
Table 4.1 Average results of 10 times for detailed routing test files. All nets are routed
after refinement by RepIMProve-by-PAR.
Detailed Routing
File C2 Am3 P2 X18
Grid 67x61 134x140 118x108 155x157
Block 11 33 12 10
Node 8458 38140 25970 49320
Edge 29992 125956 86655 157119
Capacity 1 1 1 1
Net 152 355 295 268
Time Previous 3.78 35.42 16.44 32.06
(sec) Refined (per iter) 8.94 136.76 43.03 101.72
Routed Previous 150.2 354.5 293.6 263.9
Nets Refined 152 355 295 268
Previous 2575.0 16215.7 6093.8 9817.4
(per net) (17.14) (45.74) (20.76) (37.20)
Cost Refined 1966.0 14597.6 5154.5 7191.0
(per net) (12.93) (41.12) (17.47) (26.83)
Improve 24.6% 10.1% 15.8% 27.9%
Previous 2502.3 15906.8 6012.9 9659.8
(per net) (16.66) (44.87) (20.48) (36.60)
Wire Refined 1909.9 14362.7 5098.0 7081.9
(per net) (12.57) (40.46) (17.28) (26.43)
Improve 24.5% 9.8% 15.6% 27.8%
Previous 72.7 308.9 80.9 157.6
(per net) (0.48) (0.87) (0.28) (0.60)
Via Refined 56.1 234.9 56.5 109.1
(per net) (0.37) (0.66) (0.19) (0.41)
Improve 22.9% 24.1% 32.1% 31.7%
76
Table 4.2 Average results of 10 times for global routing test files. All nets are routed after
refinement by RepIMProve-by-PAR.
Global Routing
File K1 V2 Z2 N3
Grid 20x21 41x44 33x30 50x47
Block 8 20 28 40
Node 944 3824 2168 4980
Edge 3695 14307 8615 18987
Capacity 55 30 70 25
Net 2131 3088 5498 2588
Time Previous 5.15 27.52 34.83 43.31
(sec) Refined (per iter) 2.48 26.62 15.57 54.17
Routed Previous 2117.9 3050.1 5440.1 2572.8
Nets Refined 2131 3088 5498 2588
Previous 17787.6 52744.8 73015.8 63484.7
(per net) (8.40) (17.29) (13.42) (24.68)
Cost Refined 14966.3 48113.8 66722.2 57308.2
(per net) (7.02) (15.58) (12.14) (22.14)
Improve 16.4% 9.9% 9.5% 10.3%
Previous 16295.7 50782.4 68586.1 60861.0
(per net) (7.69) (16.65) (12.61) (23.66)
Wire Refined 13728.9 46611.0 63031.3 55172.4
(per net) (6.44) (15.09) (11.46) (21.32)
Improve 16.3% 9.4% 9.1% 9.9%
Previous 1491.9 1962.4 4429.7 2623.7
(per net) (0.70) (0.64) (0.81) (1.02)
Via Refined 1237.4 1502.8 3690.9 2135.8
(per net) (0.58) (0.49) (0.67) (0.83)
Improve 17.1% 23.4% 17.3% 18.6%
77
4.12 shows the results of pin assignment and routing for input file “X18”, where  = 1 and
 = 1. Vertical tracks are in Layer 1 and horizontal tracks are in Layer 2. Figure 4.12(a)
shows the net-by-net solution. Only 262 nets are routed and the total cost is 11023. Figure
4.12(b) is obtained by applying our method on (a). All nets (totally 268) are routed, and
the total cost is 7153.
4.6 Conclusion
In this chapter, we considered the problem of two-pin net connections from one block to all
other blocks. We presented the first polynomial-time optimal algorithm for simultaneous
pin assignment and routing in multilayer for all two-pin nets between a source block and
all other blocks. Our algorithm is applicable for both global routing and detailed routing
with arbitrary routing obstacles on multiple layers, and guarantees a pin-assignment and
routing solution with minimum total cost   W +   V by computing a min-cost flow in
a network. In an ECO situation where a designer does not want to redo everything after a
change, the algorithm provides an ideal way for incremental modification of the existing
solution. Also, it can be applied to improve any pin assignment and routing solution by any
existing method. Furthermore, by applying the algorithm iteratively for all blocks (each
time randomly pick one block as the source block), it provides a polynomial-time random-
ized algorithm to solve the pin assignment and routing problem of all blocks. Experiments
demonstrate that the algorithm is very efficient and effective.
78
CHAPTER 5
INTEGRATED PIN ASSIGNMENT AND BUFFER
PLANNING
5.1 Introduction
As chip size grows larger and minimum feature size is reduced, the capacitance and re-
sistance of wires increase dramatically. This makes interconnects play a critical role in
achieving high performance and reducing circuit complexity in deep-submicron design.
Many techniques have been proposed to reduce interconnect delay. One effective way is
buffer insertion [32, 33] and it is heavily used in chip design. It is estimated that the amount
of inserted buffers for on-chip interconnect may be up to 800; 000 in 50nm technology [34].
At the same time, the introduction of so many buffers raises many challenging problems in
physical design. In a hierarchical approach, buffers are clustered together as a buffer block
to facilitate floorplan and routing. Cong et al. [35], Tang and Wong [36] and Sarkar et al.
[37] proposed algorithms for buffer block planning problem to minimize the chip area and
the number of buffer blocks. In two recent works, Dragan et al. [38, 39] gave algorithms for
global buffered routing problem with fixed pins. Their work is based on multicommodity
79
b3
b2
b1
r1
r3
r2
a
a
b
c
c
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11
b
b3
b2
b1
r1
r3
r2a
a
b
c
c
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11
b
(a) (b)
Figure 5.1 (a) Three nets use 3 buffers and the total wire length is 19. (b) An optimal
solution with 1 buffer and wire length 14.
flow, which is an NP-hard problem. In this chapter, we address the problem of simultane-
ous Pin assignment and Buffer planning (PB) for a given buffer block plan. Our algorithm
uses min-cost flow computation which is solvable in polynomial time.
Informally, the problem can be described as follows: given a placement of macro blocks
and buffer blocks, assign pins and plan buffer insertions for the given set of nets subject to
the required lower and upper bounds of connection intervals (i.e., the range of allowable
distance between two buffers or two pins or a pin and a buffer) as well as minimizing the
total cost   W +   R where W is the total wire length and R is the number of buffers.
Figure 5.1 shows an example. The placement includes three macro blocks, and three buffer
blocks. Buffer blocks may have different capacities; i.e., the number of buffers in a buffer
block can be different. r1 has a capacity 1 while the capacities of r2 and r3 are 2. A net
set includes 2 nets between b1 and b2 and 1 net between b1 and b3. The range of allowable
distance between two buffers or a buffer and a pin is bounded by Manhattan distance 2
80
and 4. Also if the distance of two pins is longer than 5, buffers have to be inserted; i.e.,
the lower and upper bounds for the allowable distance between two pins are 0 and 5. The
purpose is to assign pins and plan buffers for the 3 nets while minimizing the number of
buffers and the total wire length.
The goal of pin assignment is to find the exact locations of pins on macro blocks.
Buffer planning is to decide buffer usages along net connections to maintain required delay
constraints. The two tasks are closely related. Pin assignment alone may neglect many
important factors since interconnect is hard to predict, and a global view of net connections
is always helpful. Figure 5.1(a) illustrates a solution by a two-step approach (i.e., pin
assignment following buffer planning). The pin assignments are decided according to the
shortest Manhattan distance. In total, three buffers are used and the wire length is 19.
Figure 5.1(b) shows an optimal solution with only one buffer and the wire length is 14.
In this chapter, we present a polynomial-time exact algorithm for simultaneous pin
assignment and buffer planning for all two-pin nets from one macro block (source block)
to all other blocks for a given buffer block plan such that each net satisfies the lower and
upper bounds on connection intervals as well as minimizing the total cost   W +   R
for any positive constants  and  where W is the total wire length and R is the number
of buffers. By applying this algorithm iteratively (each time pick one block as the source
block), it provides a polynomial-time algorithm for pin assignment and buffer planning for
nets among multiple macro blocks. Experimental results demonstrate its efficiency and
effectiveness.
81
The rest of this chapter is organized as follows. Section 5.2 defines the PBO (Pin as-
signment and Buffer planning for One source block) problem which simultaneously assigns
pins and plans buffers for nets between one macro block and all other blocks. In Section
5.3, we present a network flow formulation to solve the PBO problem. Then we extend
PBO problem to PB (Pin assignment and Buffer Planning) problem that considers all nets
among multiple macro blocks in Section 5.4. In Section 5.5, we also provide a node clus-
tering method to speed up the computation. Finally, we show the experimental results in
Section 5.6 and conclude the paper in Section 5.7.
5.2 Pin Assignment and Buffer Planning for One Source
Block (PBO)
Given a placement of m + 1 macro blocks B = fbs; b1; :::; bmg with n buffer blocks R =
fr1; :::; rng. (For convenience, we call bs the source block and other blocks sink blocks.)
Each buffer block ri (i = 1; :::; n) is associated with a positive integer ci denoting the
capacity of ri; i.e., buffer block ri can hold ci buffers.
Let N = N1 [ N2 [ ::: [ Nm where Ni (i = 1; :::; m) is the set of nets between block
bs and bi; and P = Ps [ P1 [ ::: [ Pm where Pi (i = s; 1; :::; m) is the set of available pin
locations of block bi.
The distance of two points u and v on a planar region is denoted as duv. Let buffer
interval be the allowable distance range between two buffers or a pin and a buffer; and pin
82
42
8
1
7
6
5
2
0
1
76
1
3
2
83 4 5
s
2r
r
2
r1
p
p
p
p
p
s2
s4
s6
s3
s5
p
p p
p p
p
232221
24 25 26
p
s1
p
p
p
12
1413
11
3
1
p
b
b
b
p
s5
s3
p p
p
p
s1
s4
s2
s6
26
11
2
12
1
13
21 22 23
24 25
1
2
3
14
p
p p
p
pp
r
r
r
pp
t
p
s
t
p
t
p
(a) (b)
Figure 5.2 (a) A PBO problem with 3 macro blocks and 3 buffer blocks. (b) The corre-
sponding flow network graph.
interval be the allowable distance range between two pins. For convenience, let connection
interval refer to buffer interval or pin interval.
Suppose the lower and upper bounds for buffer intervals are ~L and ~U respectively; and
those for pin intervals are L and U respectively. A valid path means:
1. If the path is p = (ps; r01; :::; r0k; pt) where ps; pt 2 P and r0i 2 R (i = 1; :::; k), then
the distance of each path segment is bounded by ~L and ~U , i.e., ~L  dpsr01  ~U ,
~L  dr0ir0i+1  ~U (i = 1; :::; k − 1) and ~L  dr0kpt  ~U ;
2. If the path is p = (ps; pt) where ps; pt 2 P , then L  dpspt  U .
The length of a path is the sum of the distances of all path segments.
For any given positive constants  and , the PBO problem is to find a set of valid paths
connecting bs and all other macro blocks as well as minimizing the total cost  W +  R
where W is the total length and R is the number of buffers. Each path corresponds to a net
83
in N and the two end points of the path are assigned pin locations for this net.
Figure 5.2(a) gives a simple example. There are three macro blocks and three buffer
blocks. The capacities of the three buffer blocks r1, r2, and r3 are 2, 1, and 2, respectively.
For any two points u and v on the planar region, their coordinates are (ux; uy) and (vx; vy),
respectively. Define distance duv = jux − vxj + juy − vyj. The required lower and upper
bounds for buffer intervals are 2 and 3, respectively; and the bounds for pin intervals are 0
and 3, respectively. The tiny squares pxy on the boundaries of macro blocks are available
pin locations. The net set includes one net between bs and b1 and two between bs and b2.
The purpose is to decide pins and buffer locations for 1 net between bs and b1 and 2 nets
between bs and b2 with a minimum cost  W +   R.
5.3 The Algorithm
To solve the PBO problem, we first construct a network graph, then apply a min-cost flow
algorithm to get the solution.
Given a PBO problem, we construct the network graph G = (V; E) with capacity U
and cost C as follows:
1. V = fs; t; t1; t2; :::; tmg [ R [ P , where s is the source node, t is the sink node, and
ti (i = 1; :::; m) is a subsink node.
2. E = Es [ Et [E~t [ Eb [Er [ Ebr [Erb, where
84
Es = f(s; ps)jps 2 Psg,
Et = f(ti; t)ji = 1; 2; :::; mg,
E~t =
Sm
i=1f(p; ti)jp 2 Pig,
Eb =
Sm
i=1f(ps; p)jps 2 Ps; p 2 Pi; L  dpsp  Ug,
Er = f(ri; rj)ji 6= j; i; j = 1; :::; n; ~L  drirj  ~Ug,
Ebr =
Sn
i=1f(ps; ri)jps 2 Ps; ~L  dpsri  ~Ug,
Erb =
Sn
i=1f(ri; p)jp 2 Pj; j = 1; :::; m; ~L  drip  ~Ug
3. Edge capacity:
for edges e(ti; t), U(e) = jNij,(i = 1; :::; m);
other edges e, U(e) = 1.
4. Node capacity:
for ri 2 R, U(ri) = ci;
for p 2 P , U(p) = 1;
other nodes are incapacitated.
5. Edge cost:
for e 2 Es [Et [ E~t, C(e) = 0;
other edges e(u; v), C(e) =   duv.
6. Node cost:
for r 2 R, C(r) = ;
other nodes v, C(v) = 0.
Figure 5.2(b) illustrates the constructed network graph for the PBO problem in Figure
5.2(a). Note that whether an edge (u; v) (u; v 2 P [ R) should be added to G depends on
85
v’ v"
v
(U(v),C(v))
(U(v),C(v))
Figure 5.3 Node splitting for capacitated nodes. The new edge has capacity U(v) and
cost C(v).
whether duv falls in the range [ ~L; ~U ] (or [ L; U ]) or not. For example, the distance between
p13 and r2 is 1, which is less than the lower buffer interval bound ~L = 2. Thus, there is no
edge between p13 and r2 in the constructed flow network. Similarly, the distance between
ps2 and p13 is 5 which is larger than the upper pin interval bound U = 3, thus the edge
(ps2; p13) is not included.
In the constructed flow network, every node in P[R has a cost and a capacity. However,
classical network flow problem only assigns cost and capacity to edges. This can be solved
by splitting the capacitated node v into two nodes v0 and v00. A new edge (v0; v00) is added
with a capacity U(v) and a cost C(v). Then change the original edges (u; v) and (v; w)
into edges (u; v0) and (v00; w) respectively (refer to Figure 5.3).
Any flow in G can be mapped to a pin assignment and buffer planning solution for a
subset of the given nets. The used capacities of R nodes in the flow are the number of
buffers needed in the solution. Figure 5.4(a) illustrates a flow f , jf j = 3. And Figure
5.4(b) is a solution of pin assignment and buffer planning derived from the above flow.
If a flow f exists and jf j = jN j, then we can find a feasible solution of pin assignment
and buffer planning for all of the nets in N . On the other hand, given a pin assignment
86
ps5
s3
p p
p
p
s1
s4
s2
s6
25
11
2
12
1
13
21
26
22 23
24
1
2
3
14
p
p p
p
pp
r
r
r
pp
t
p
s
t
p
t
p
0
7
6
3
2
76543 8
8
1
5
4
1 2
r
p
b
b
bs
2
2
p
r3
r1
p
p
p
p
p
s2
s4
s6
s3
s5
p
p
p
s1
p
p
232221
24 25 26
p
1p
p
p
12
1413
11
(a) (b)
Figure 5.4 (a) A flow f in the network in Figure 5.2 (b), jf j = 3. (b) The corresponding
solution of pin assignment and buffer planning to the PBO problem of Figure 5.2 (a).
and buffer planning solution for n nets, a flow f (jf j = n) can always be found on the
constructed flow network. Since the total capacities of edges going into sink node t are
Pm
i=1 U(ti; t) =
Pm
i=1 jNij = jN j, the maximum flow fmax in G, jfmaxj  jN j. Thus
if a flow jf j < jN j, then there is no feasible solution to the original PBO problem, which
requires considering all of the nets between bs and all other macro blocks. Furthermore, the
cost of the flow is also the cost of pin assignment and buffer planning solution. Therefore
min-cost maximum flow assigns pins and plans buffers for as many nets as possible with
minimum total cost.
The following theorem shows that the PBO problem can be exactly solved by min-cost
flow computation on G.
Theorem 5.1 A min-cost flow f , jf j = jN j, in G corresponds to a pin assignment and
buffer planning solution to PBO problem for all nets in N with minimum total cost  
W +   R for any given  and  where W is the total wire length and R is the number of
87
buffers. If the size of the max-flow, jfmaxj < jN j, then there is no feasible solution to the
PBO problem. A min-cost maximum flow assigns pins and plans buffers for the maximum
number of nets with minimum total cost.
The algorithm PBO-Flow is summarized in Algorithm 6.
Algorithm 6 PBO-Flow (B, R, N , P , C, , )
1: Construct the network graph G(V; E)
2: Assign capacities U and costs C
3: Apply min-cost maximum flow algorithm on G
4: Derive the pin assignment and buffer planning solution
Finding a min-cost maximum flow in a network is a classical problem for which several
polynomial-time optimal algorithms are available [14, 15]. Deriving a solution of PBO
from a flow in G can be done in O(E) time. Thus, if we adopt the double scaling algorithm
in [31], we get the following time complexity for the PBO problem.
Theorem 5.2 PBO-Flow algorithm optimally solves the PBO problem in O(V E log log Umax
log(V Cmax)) time for G = (V; E), Umax is the maximum value of U , and Cmax is the max-
imum value of C.
Note that the complexity of our PBO-Flow algorithm is mainly dependent on the size
of the constructed network graph G(V; E). According to the graph construction rules,
jV j = 2 + m + 2(jP j + jRj) (consider node splitting), and the upper bound of edges is
m + jP j+ jPsj  jP j+ jP j  jRj + jRj2 + jP j+ jRj (in fact, the number of edges is much
smaller due to distance constraint) which is bounded by (jP j+ jRj)2.
88
In applications, we may put more effort on reducing the number of buffers. In this case,
we can set a large weight to buffer nodes, i.e., a large . For example, let  larger than
the upper distance bound ~U . If we set  large enough, we tend to get a solution with the
minimum number of buffers.
Furthermore in some circuits, some locations on a block may not be allowed for pin as-
signment. In this case, these kinds of locations will not appear in the pin set P . Obviously,
our network-flow based algorithm will not assign pins to these kinds of locations.
5.4 Pin Assignment and Buffer Planning (PB)
In the above section, we discuss how to solve PBO problem using min-cost maximum
flow computation. PBO problem only consider net connections between one source block
and other blocks. In reality, we need to deal with net connections among all of the macro
blocks, called PB problem. The definition of PBO problem can be easily extended to PB
problem as the following:
Given:
1. A placement of m macro blocks B = fb1; b2; :::; bmg and n buffer blocks R =
fr1; r2; :::; rng; buffer block ri has a capacity ci.
2. A set of available pin locations P = P1 [ P2 [ ::: [ Pm where Pi (i = 1; :::; m) is a
set of available pin locations of macro block bi.
3. A set of nets N = N1 [ N2 [ ::: [ Nm where Ni (i = 1; :::; m) is the set of nets
89
between block bi and all other blocks.
4. Two nonnegative numbers ~L and ~U denoting the lower and upper bound of buffer
intervals, respectively.
5. Two nonnegative numbers L and U denoting the lower and upper bound of pin inter-
vals, respectively.
Goal:
For any given positive  and , find a set of valid paths corresponding to the net set N as
well as minimizing the total cost  W +   R where W is the total wire length and R is
the number of buffers.
PBO-Flow algorithm solves pin assignment and buffer planning for nets between one
block and all other blocks. Naturally, if we treat each macro block as the source block and
apply PBO-Flow algorithm repeatedly, we can get a solution for all nets among multiple
macro blocks. Algorithm 7 shows the basic idea of the PB-Flow algorithm.
Algorithm 7 PB-Flow (B, R, N , P , C, , )
1: Let b1 be the source block
2: Apply PBO-Flow(B, R, N1, P , C, , )
3: for i = 2 to m do
4: Let bi be the source block
5: Adjust the network graph G(V; E)
6: Apply PBO-Flow(B, R, Ni, P , C, , )
7: end for
8: Derive the pin assignment and buffer planning solution
90
Note that when different blocks are selected as the source block, their constructed flow
networks are slightly different, i.e., the edges incident from/to the pin nodes of the two
blocks are different. So in PB-Flow algorithm, at each iteration, we need to do some
adjustment to transform the existing network graph to the one corresponding to the next
source block (line 6).
Now suppose two different macro blocks bi and bi+1 are selected as the source block
sequentially. Let Gi be the constructed flow network when bi is chosen as the source block.
The following steps change Gi to bi+1’s corresponding flow network Gi+1:
1. Delete edges connecting the pin nodes of Pi and the pin nodes of other blocks.
2. Add edges connecting the pin nodes of Pi+1 and the pin nodes of other blocks as well
as maintaining distance constraints.
3. Reverse the direction of edges (p; r) where p 2 Pi and r 2 R;
4. reverse the direction of edges (r; p) where r 2 R and p 2 Pi+1.
5. Reverse the direction of edges (s; p) where p 2 Pi.
6. Reverse the direction of edges (p; ti+1) where p 2 Pi+1.
7. Let ti+1 be the source node s and the original source node become a subsink node ti.
Thus remove the edge (ti+1; t) and add one new edge (ti; t).
Figure 5.5(a) is a PB problem. Figure 5.5(b) shows the constructed flow network when
b1 is the source block. When the source block is switched to b2, t2 becomes the source node
91
481 2
7
6
5
1
3
2
0
1
7
8
2
3 4 5 6
1
2r
r
3
r1
p
p
p
p
p
12
14
1615
p
p p
p p
p
333231
34 35 36
p11
p
p
p
22
2423
21
3
2
b
b
b
p
13
p
15
13
p p
p
p
11
14
12
16
36
21
3
22
2
23
31 32 33
34 35
1
2
3
24
p
p p
p
pp
r
r
r
pp
t
p
s
t
p
t
p
(a) (b)
Figure 5.5 (a) A PB problem with 3 macro blocks and 3 buffer blocks. (b) The corre-
sponding flow network when b1 is the source block.
and the original one becomes a subsink node t1, but the edges connecting two buffer nodes
remain unchanged.
For a newly added edge (u; v), the cost is   duv, and the capacity is 1. For other edges
(u0; v0), the cost is unchanged and the capacity is 1 if no flow flows through the correspond-
ing edge in the previous iteration. Otherwise, the capacity is 0. As to the capacities of
nodes, similarly, if a node v has c capacities left after pushing flow in Gi, then the capacity
of v is c in Gi+1.
Gi and Gi+1 have the same node set. Also, it is easy to show that the number of changed
edges is bounded by (jPij+ jPi+1j)  (jP j+ jRj). When taking the distance constraint into
consideration, this bound can be further greatly reduced. Thus the adjustment to the graph
can be efficiently accomplished.
A two-pin net connecting bi and bi+1 belongs to both net sets Ni and Ni+1. Suppose
we can get a feasible solution to the PB problem with PB-Flow algorithm. After applying
92
p15
13
p p
p
p
11
14
12
16
34
21 22
1
2
35
31
36
32 33
23
1
3
3
24
p
p
pp
pp
r
r
r
p
p
t
p
t
p
s
p
t
Figure 5.6 The corresponding flow network when b2 is the source block.
PBO-Flow on Gi, the nets belong to Ni \ Ni+1 should have already been found. Thus we
only need to consider Ni+1 − Ni − :::− N1 while pushing flows on Gi+1.
On the other hand, as we notice that a net belonging to Ni\ Ni+1 should correspond to a
path (pu; r01; :::; r0k; pv) (pu 2 Pi, pv 2 Pi+1, r0i 2 R; i = 1; :::; k) in Gi. Then in the modified
flow network Gi+1, the path (pv; r0k; :::; r01; pu) must exist since the edge connections among
buffer blocks are not changed. Thus we can remove all paths connected to nodes in Pi+1
by increasing the capacities along all these paths, and applying PBO-Flow with nets Ni+1.
The optimality of PBO-Flow guarantees that the cost will not increase.
Furthermore, even if the algorithm does not return a feasible solution, (e.g, no feasible
solution exists), the second way still outperforms the first one since the optimality of PBO-
Flow guarantees that the new solution won’t become worse, i.e., it can either connect more
nets or reduce the cost.
93
p3
2
3
2
1
23 24
21 22
3534
3332
36
31
16
12
14p
15
13
p p11
p
t
p
p
t
p
p
p p
p p
p
p
t
p
r
r
s r
Figure 5.7 The corresponding flow network of the PB problem in Figure 5.5(b) using the
node clustering method.
5.5 Improvement with Node Clustering
When we handle a big circuit which may include a huge amount of nets, the corresponding
flow network might be quite large. In order to facilitate the process of huge PB problems,
we propose the following node clustering method to speed up the computation.
For any p 2 P , if p is inside a macro block, it must be connected to some pin outside
the block. Without loss of generality, we may assume that a pin is only on the boundary
of a macro block. Then by grouping neighbor pin nodes together, we can greatly reduce
the number of nodes, consequently reducing the number of edges. Once several nodes are
grouped together, we can use the average coordinate as the location of the new “supernode”.
And the capacity of the supernode is the number of nodes it includes. Figure 5.7 illustrates
a constructed flow network for Figure 5.5(b) when 2 neighbor pins are clustered to one
supernode. Actually, we can set different scale rates to blocks according to their sizes or
94
P1
P2
P3
P4
q1
q2
q3
| f |=3
P1
P2
P3
P4
q1
q2
q3
S t
P1
P2
P3
P4
q1
q2
q3
(a) (b) (c)
Figure 5.8 (a) A flow f , jf j = 3 flows through a supernode. (b) The corresponding
network and a flow solution. (c) Deriving connections for original pin nodes.
other requirements.
Once we get a solution from the supernode flow network, we need to map the flow to a
solution of the original PB problem, i.e, distributing the flows through one supernode to its
pin nodes. Of course, we hope the mapping has the minimum connection length. In Figure
5.8(a), a flow f , jf j = 3 flows through a supernode which includes four pin nodes. 5.8(c)
shows a feasible mapping. Still it can be solved with min-cost maximum flow as shown in
Figure 5.8(b). Each node pi (i = 1; 2; 3; 4) in the super-node is connected to the destination
nodes qj (j = 1; 2; 3) if dpiqj satisfies the distance constraints. The capacity of each edge
is 1. The cost of edge (pi; qj) is the distance of two nodes dpiqj . All other edges have a cost
0. Min-cost maximum flow guarantees to find an optimal mapping.
5.6 Experimental Results
Our algorithms were implemented in C++ on Sun Sparc Ultra 5 (360 MHz) with 128MB
memory. We tested PB-Flow on 6 circuits which were generated randomly. For all files,
95
we adopted the node clustering method in Section 5.5 to reduce runtime.
We compared PB-Flow algorithm with a two-step approach (first assign pins, then plan
buffers). We used a classical way [40] to assign pins as follows: by connecting the centers
of two macro blocks, we got two crossing points on the boundaries of each block, then
assign pins around the crossing points for the nets between these two macro blocks. After
pin assignment was done for all nets, we used a net-by-net approach for buffer planning.
The net-by-net approach considered only one net each time and found the min-cost path
between the two pins of the net for buffer insertion.
For each test circuit, we repeated both approaches 5 times. Table 5.1 listed the aver-
age results of these five times. Using PB-Flow, we could find a feasible solution of pin
assignment and buffer planning for all of the nets with a significant improvement on both
the total wire length and the number of buffers. The last two rows show the comparison of
the number of buffers and the total wire length. For both methods, we listed the test results
and the values per net. The percentage was calculated according the values per net since
the numbers of found nets are different. The last two testing files F110 and M200 include
more than 4000 nets. The net-by-net approach can not handle such large files. For compar-
ison purpose, we first apply the node clustering strategy to group pins, then use net-by-net
for buffer insertion.
96
Table 5.1 Average results of PB-Flow for 5 times. All nets are found using PB-Flow
algorithm.
File A33n X40 H80 S100 F110 M200
Grid 107x106 146x160 235x230 232x227 367x401 587x560
Blocks 33 40 60 91 110 100
Buffer Blocks 30 40 120 90 90 120
Nets 640 1021 1317 2021 4180 5659
Time Net-Net 27.80 34.49 75.46 119.67 91.91 135.77
(second) PB-Flow 41.91 27.32 67.94 83.12 220.45 326.9
Found Net-Net 528.4 940.2 1316 1994 4167 5578
nets PB-Flow 640 1021 1317 2021 4180 5659
Net-Net 391.8 1731.8 2259 6400.2 9964.4 15463
(per net) (0.741) (1.842) (1.717) (3.210) (2.391) (2.772)
Buffers PB-Flow 327 1439 1949.8 5627 8631.4 13873.4
(per net) (0.511) (1.409) (1.480) (2.784) (2.065) (2.452)
reduced 31.04% 23.51% 13.80% 13.27% 13.63% 11.54%
Net-Net 920.2 2672 3575 8394.2 14131.4 21041
Wire (per net) (1.741) (2.842) (2.717) (4.210) (3.391) (3.772)
Length PB-Flow 967 2460 3266.8 7648 12811.4 19532.4
(per net) (1.511) (2.409) (2.480) (3.784) (3.065) (3.452)
reduced 13.21% 15.24% 8.72% 10.12% 9.61% 8.48%
97
5.7 Conclusion
In this chapter, we presented a polynomial-time algorithm for simultaneous pin assignment
and buffer planning for all two-pin nets between a source macro block and all other blocks
such that each net satisfies the lower and upper bound of connection intervals as well as
minimizing the total cost  W +   R. By applying this algorithm iteratively (each time
pick one block as the source block), it provides a polynomial-time algorithm for pin as-
signment and buffer planning for nets among multiple macro blocks. Experimental results
demonstrate that the algorithm is very efficient and effective.
98
CHAPTER 6
ECO ALGORITHMS FOR REMOVING OVERLAPS
BETWEEN POWER RAILS AND SIGNAL WIRES
6.1 Introduction
In ECO, a design needs to go through many changes [41, 42] due to constraints or target
changes from manufacturing, marketing, reliability, or performance. At each step, design-
ers usually want to modify the existing solution incrementally and keep the design as close
as possible to the existing one. For high-volume high-revenue multiple-year design prod-
ucts, power rails may be changed due to added power rails for higher reliability, post silicon
discovery, or design changes (such as cache size changes due to market reasons) that may
not be predesigned in previous tapeout designs. And the post silicon debugging mandates
the design to be fixed in previous converged design. This is different from ASIC or foundry
model ECO where the original design assumption may change during manufacturing pro-
cess according to its final volume recipe to tune yield and performance. Therefore, we
cannot risk already highly invested design convergence efforts and schedule to redo ECO
routing that changes the routing topology and needs to spend time again on converging the
99
design with no guarantee.
In this chapter, we address the PSO (Power rail - Signal wire Overlap) problem and
propose two algorithms to solve it. PSO problems are usually caused by design changes
in power delivery or package. For most high-performance VLSI design where mesh power
rails are used to provide dense power supply for better signal integrity, the most upper
metal layer can be changed due to different reasons such as current consumption require-
ment changes, package connection changes when the low resistance highest metal are used
to deliver power, local routing ECO changes, etc. While the changes of power rails on the
top metal layer may lead to the introduction of design rule violations between power rails
and signal wires. It is not wise to rip up signal wires completely and reroute them for new
power rails drop-in without keeping routing topology that has already proven converged
in timing. Therefore, an efficient and graceful solution to PSO is very important due to
design constraints and tight schedules during the late ECO stages. Furthermore, it is im-
portant to minimize disturbance upon the existing converged design while implementing
ECO requests.
Informally, the PSO problem can be described as follows. On a multiple layer routing
region with fixed power rails P on the top layer, there is a clean signal wire routing design
which has no wire separation violations. Now a new design of power rails P on the top
layer is adopted to replace the existing P . Some power rails in P may overlap with signal
wires or the horizontal/vertical spacing between two wire segments is less than the wire
separation requirement. Without loss of generality, we assume the multiple layer routing
100
ab
c1
c2
e
f
a'
b'
c'
d' e'
f'
P1
P2
g
(a)
a
b
c1
c2
e
f
a'
b'
c'
d' e'
f '
P1
P2
g
(b)
a
b
c1
c2 e
f
a'
b'
c'
d' e'
f'
P1
P2
g
(c)
Figure 6.1 (a) Some horizontal signal wire segments on M5 overlap with P1 and P2. (b)
A feasible PSO solution. (c) A solution with violations.
101
design is in HVHVH style. We refer to the top layer as M5 and the second top layer as
M4. M5 is used for horizontal tracks, and M4 is used for vertical tracks. Figure 6.1 shows
an example. We use solid horizontal lines to present the top layer M5 routing, and doted
lines are for M4 routing. Segments of different nets may have different widths. Due to the
introduction of the new power rails P1 and P2 (the shadow area), some segments on M5
overlap with P1 and P2.
Our purpose is to find a new clean routing solution without wire spacing violations. At
the same time, we hope to keep the topology of the new routing solution as close as possible
to that of the original one. Thus four constraints are set: (1) Keep the routing of power rails
in the new design unchanged. This is required by the problem itself. (2) Only the routing
of the top two layers is changed. The changes on the top layer M5 may cause changes on
M4. However, the changes should not propagate to all layers. By treating all connections to
M4 from lower layers as fixed pins, the changes can be restricted to the top two layers. (3)
Horizontal signal wire segments on the top layer M5 can only move up/down. At the same
time, keep the routing design pattern unchanged. This requires the following: (i) If one end
point of a horizontal (vertical) wire segment on the top layer is a fixed pin, this segment
cannot move. This kind of segments is called “P-segment”. And its unmovable property
is called “P-constraint”. In Figure 6.1, fixed pins are denoted by a tiny square. But some
pins on M5, which are end points of horizontal segments, are allowed to move with the
corresponding horizontal segments. They are unfixed pins and they are presented by solid
dots. In Figure 6.1, horizontal segment b is a P-segment and cannot be moved like Figure
102
6.1(c). Fixed pins and unfixed pins come from signals passing through the local power rail
ECO area where some of them are absolutely critical and are not allowed to be changed,
while some of them have some minor freedom to shift. Furthermore, since pins on M4
may connect to both M5 and M3, keep all pins on M4 fixed so that there is no influence to
lower layers. (ii) If vertical (horizontal) projections of two horizontal (vertical) signal wire
segments have overlaps, then the up/down (left/right) relationship should not be changed.
This is called “order consistency”. For example, in Figure 6.1(c), horizontal segment e
is above f while in the original design, e is below f . So this violates order consistency.
(iii) If two horizontal (vertical) segments belonging to different nets are on the same track,
their left/right (up/down) relationship should not be changed as long as the two segments
still exist in the new solution. This is called “track consistency”. In Figure 6.1(c), vertical
segment c0 is below d0 while c0 should be above d0 to maintain track consistency. (4) For
each signal segment, the deviation (i.e., the difference between its new position and the old
one) should not exceed the user-defined allowable deviation bound so that local changes
are confined. Different bounds can be set on different segments.
For the given new design of power rails P , does there exist a clean routing solution
satisfying the above constraints? If there is a solution, how to find one? For Figure 6.1, (b)
gives a clean routing solution.
We notice that, once a segment is moved, it may overlap with some signal wire seg-
ments. Consequently these signal segments have to be moved, which may cause overlaps
with other segments, etc. In Figure 6.1(b), the move-up of a causes g to move up too.
103
Furthermore, the movement of wire segments on the top layer may make some segments
on M4 become longer/shorter and introduce overlaps on M4.
Note that our problem is significantly different from the overlap removal problem in
macro-cell placement [43, 44, 45]. When macro cells are placed based on analytical tech-
niques (e.g., force-directed placement, quadratic placement), there are overlaps among the
macro cells. A clean-up phase is needed to shift the macro cells to remove all overlaps.
However, PSO problem is significantly more difficult since moving the horizontal wire
segments on M5 also changes the vertical wire segments on M4.
In this chapter, we first give the definition of PSO problem in Section 6.2. In Section 6.3,
we give the definition of FP-Range and prove that if all segments move in their FP-Range
as well as keeping order consistency and horizontal wire separation in the new routing
solution, it satisfies the vertical wire separation requirement as well as track consistency and
P-constraint. On the other hand, if a solution of a PSO problem exists, the new position
of each segment must be in its FP-Range. In Section 6.4, we discuss the construction
of the consistency graph. Based on FP-Range and consistency graph, we propose two
polynomial-time algorithms PSO-H and PSO-G to solve PSO problems in Sections 6.5
and 6.6 respectively. Both algorithms guarantee to find a clean routing solution as long
as one exists. PSO-H is faster than PSO-G, but PSO-G makes effort to minimize the total
deviation. We show the experimental results in Section 6.7 and conclude the chapter in
Section 6.8.
104
6.2 PSO (Power rail - Signal wire Overlap) Problem
A multiple layer routing region is three-dimensional. For convenience, we call the three
dimensions the x, y and z dimensions. Each layer is an x-y dimensional plane, where x-
axis goes horizontally and y-axis goes vertically. For convenience, let the coordinate of its
bottom left corner be (0; 0), and W and H be the width and the height of the routing region
respectively. In most existing high-performance chips, each layer has its specific track
orientation in either horizontal or vertical and its design rule which specifies minimum
width and separation of wires. Let s be the half minimum wire separation of a metal layer.
To simplify the presentation, suppose the wire separation requirement is 2s for both M4
and M5. The algorithms proposed in this chapter can be easily extended to handle different
wire separations for different layers.
A horizontal segment can be presented by (x1; x2; y; w) where (x1; y) and (x2; y) are the
end point coordinates of the center line (x1 < x2), and w is the half-width of the segment.
Similarly, a vertical segment can be presented as (y1; y2; x; w). For any two horizontal
segments (x1; x2; y; w) and (x01; x02; y0; w0) (x2 < x01), if (x1−s; x2+s)\(x01−s; x02+s) 6= ,
jy−y0j  w+w0+2s must hold; also if (y−w−s; y+w+s)\(y0−w0−s; y0+w0+s) 6= ,
jx2 − x01j  2s. The similar rule applies to vertical segments. Figure 6.2 gives an example
of three horizontal segments. We say a routing solution is a clean solution if it has no
overlap or wire separation violations. Sometimes we can simplify the presentation. For
example, if we do not care the width of a horizontal segment, it is presented by (x1; x2; y).
Given a clean routing solution S with N signal nets, there are P power rails on the
105
>2s
w'
w(x1, y) (x2, y)
>2s
(x'1, y') (x'2, y')
A
B
C }}d
d
Figure 6.2 Wire separation requirement illustration.
top layer M5. Also each signal segment is associated with a nonnegative number d called
allowable deviation bound, i.e., for a horizontal segment (x1; x2; y; w), when it moves
up/down, its new position (x1; x2; y; w) should satisfy that jy − yj  d. Now if this design
P is replaced by a new power rail design P , the wire separation requirement might no
longer be satisfied. How to modify the existing routing solution S so that the new routing
solution S is a clean routing solution as well as satisfying the following constraints?
1. The power rails P on the top layer are not changed.
2. Only the routing of the top two layers M4 and M5 can be changed.
3. Horizontal signal wire segments on the top layer can only move up/down, i.e., the
x-coordinate of the two end points of the segment keep unchanged. This is called
“Shift movement”.
4. The difference between the new position of a wire segment and its old location should
not exceed its allowable deviation bound d.
5. If an end point of a signal wire segment on the top layer is a fixed pin, it is called
106
“P-segment” and cannot be moved. This property is called “P-constraint”. All pins
on lower layers are treated as fixed pins.
6. Suppose any two horizontal signal wire segments on M5 (x1; x2; y) and (x01; x02; y0)
(assume y > y0) have new positions (x1; x2; y) and (x01; x02; y0), respectively, in the
new routing solution S. If (x1 − s; x2 + s)\ (x01 − s; x02 + s) 6= , y > y0 must hold.
This property is called “order consistency”.
7. If two vertical signal wire segments (y1; y2; x; w) and (y01; y02; x0; w0) (assume y2 <
y01) belonging to different nets on the same layer satisfy (x − w − s; x + w + s) \
(x0 − w0 − s; x0 + w0 + s) 6= , then in the new design S, if the two segments still
exist, y2 < y01 holds. This property is called “track consistency”.
6.3 FP-Range (Fixed-Pin-decided Range)
In a PSO problem, since the original power rail design P is replaced by P , we can just
ignore P and simplify the problem as solving the overlaps between power rails and signal
wire segments on M5 as well as satisfying all of the constraints.
If we arbitrarily move one horizontal signal wire segment up or down, both horizontal
overlaps between segments on M5, and vertical overlaps on M4 may be introduced. For
example, in Figure 6.3(b), if horizontal segment d moves up, it overlaps with c. On the
other hand, if a moves down, the vertical segments a0 and b0 on M4 overlap with each other.
In this section, we discuss how to avoid vertical segment overlaps. The following theo-
107
ca'
a
b'
b
d
c
a'
a
b'
b
d
(a) (b)
Figure 6.3 (a) A PSO problem. (b) Overlaps: vertical segments a0 and b0 on M4; and
horizontal segments c and d on M5.
rem provides a rule of moving horizontal signal wire segments without introducing vertical
wire separation violations. Also the theorem proves that if a horizontal signal wire seg-
ments moves out of its FP-Range, at lease one of the moving requirements, i.e., order con-
sistency, track consistency, or P-constraint, or wire separation requirement, does not hold
any more. To simplify the presentation, we neglect the widths of wire segments. It is easy
to incorporate it into the following theorem. Each item is presented by its xy-coordinates,
e.g., one segment is presented as (x1; x2; y) and one of its end point is denoted as (x1; y).
Suppose the wire separation requirement is 2s. A horizontal wire segment R = (x1; x2; yr)
on M5 belongs to net nr. And its two end points are r1 = (x1; yr) and r2 = (x2; yr). If R
is a P-segment, it cannot be moved. Its movable range is [yr; yr]. Otherwise, calculate two
pin sets ~P and ~Q. If r1 is an unfixed pin, ~P = ; otherwise let ~P be the set of fixed pins
whose x-coordinate falls in (x1 − 2s; x1 + 2s) and do not belong to net nr. Also if r2 is
an unfixed pin, ~Q = ; otherwise let ~Q be the set of fixed pins whose x-coordinate fall in
(x2 − 2s; x2 + 2s) and do not belong to net nr. Let U = minffy − 2sjy 2 ~P [ ~Q ^ y 
yrg
SfH−2sgg and V = maxffy+2sjy 2 ~P [ ~Q^y  yrgSf2sgg. The range [V; U ] is
108
 
 






2s
U
V
R2s
2s
2s
2s 2s
Figure 6.4 FP-Range illustration. The tiny squares are fixed pins.
called “FP-Range”. Figure 6.4 shows the FP-Range of a horizontal segment R. As we no-
tice that, if a vertical segment is created or becomes longer/shorter, it is caused by the end
points of a horizontal segment. Therefore vertical overlap or vertical wire separation vio-
lation is only related to the end points of horizontal segments. When moving R up/down,
the pins or segments in the shadow area cannot create vertical overlaps with R.
For any two end points a and b on layer M4, suppose their coordinates are (xa; ya) and
(xb; yb) respectively. And in a new routing solution, their positions are (xa; ya) and (xb; yb)
respectively.
Lemma 6.1 Let all horizontal segments on the top layer M5 move up/down within their
FP-Ranges [V; U ] and satisfy horizontal wire separation requirement and order consis-
tency. For any two end points a and b on M4, if jxa−xbj  2s, then if ya  yb, jya−ybj  2s
and ya  yb.
Proof
Each end point of a wire segment is a pin or it is connected to the other layer through a
109
via. Due to the shift movement of horizontal segments, some vertical segments are created
and some disappear. If a vertical segment T = (y1; y2; xt) is created when a horizontal
segment moves up/down, one end point of T must be a pin on M4. Suppose the pin locates
at (xt; y1). Since T does not exist in the original design, we let T ’s end points correspond
to the end points of a empty vertical segment E = (y1; y1; xt), which connects to T through
a via. In this way, for each end point in the new design, we can find its corresponding end
point in the original one.
For convenience, if an end point is an end point of a horizontal segment or it is con-
nected to a horizontal segment through a via, we call this kind of end points “turning point”;
otherwise, it is called “fixed point”. If an end point is a turning point in the new design, it
must be a turning point in the original one since horizontal segments only move up/down.
For a fixed point, there is no horizontal segments on M5 connecting to it. So it must be a
fixed pin on M4 or connected to a fixed pin on M5 through a via. Therefore, the location of
a fixed point is not changed in the new design.
There are four cases:
1. Both a and b are fixed points. ya = ya and yb = yb. Obviously, if ya  yb, ya  yb.
Also in the original design, jya − ybj  2s, therefore, jya − ybj  2s.
2. Both a and b are turning points. They are connected to horizontal segments through
vias. Suppose a and b correspond to horizontal segments A and B, respectively.
Since ya  yb, A is above B in the new routing solution. Also the shift movements of
A and B stick to ”order consistency” as well as satisfying horizontal wire separation
110
requirement, so jya − ybj  2s, and A must be above B in the original routing
solution. Thus ya  yb.
3. Point a is a fixed point and b is a turning point. b must be connected to a horizontal
segment through a via. Let the horizontal segment be B. Since a is fixed, ya = ya.
Suppose ya < yb. According to the calculation of B’s FP-Range [V; U ], ya < V .
From V  yb, ya < yb, i.e., ya < yb. This contradicts with our assumption that
ya  yb. Therefore, ya  yb. Since Pv is a fixed pin, and ya  yb, we know
ya  U + 2s > U  yb according to the calculation of FP-Range. Therefore
jya − ybj  2s.
4. Point a is a turning point and b is a fixed point. The proof is similar to case (3). {
Theorem 6.1 If all horizontal segments on the top layer M5 move up/down within their FP-
Ranges [V; U ] and satisfy horizontal wire separation requirement and order consistency, the
new routing solution has no vertical wire separation violation as well as satisfying track
consistency and P-constraint. On the other hand, if one horizontal segment moves out
of its FP-Range, order consistency, wire separation or track consistency or P-constraint
violation is introduced.
Proof
First, we prove that if all horizontal segments move within their FP-Ranges as well as
maintaining order consistency and horizontal wire separation, no violation of P-constraint
or track consistency or vertical wire separation is introduced. (If two segments have wire
111
separation violation, we also say they are overlapped.)
If a horizontal segment is a P-segment, its FP-Range makes it unmovable and causes no
changes to the new routing solution. If both end points of a horizontal segment are floating
pins, it can be moved up/down freely and causes no changes to M4.
From Lemma 6:1, we know the relative positions of end points from different nets are
not changed in the new design. Therefore, the track consistency is kept. Obviously, the
P-constraint is satisfied. We only need to prove that if horizontal segments move within
their FP-Range [V; U ] as well as keeping order consistency and horizontal wire separation,
no vertical segment overlaps are introduced.
Suppose this claim does not hold. Then there exists a horizontal segment A = (x1; x2; a)
whose new position (x1; x2; a) falls in its FP-Range [V; U ], and it causes overlaps between
two vertical segments (a;b) and (c; d) on M4 in the new routing solution. (We use y-
coordinates to denote a vertical segment.) The two segments must belong to two different
nets since overlap only occurs between two different nets. Suppose the two vertical seg-
ments in the original design is (a; b) and (c; d). They could be an empty segment. (For
example, if no vertical segments are connected to A at (x1; a) through a via, (x1; a) is a
pin on M4. (a;b; x1) is a new segment and b = a. Let it correspond to an empty segment
(a; a; x1) in the original design.) Without loss of generality, suppose a  c. Then their
positions can be classified into three cases as shown in Figure 6.5.
1. jb− cj < 2s, this case does not hold according to Lemma 6:1.
2. a  c  b  d. According to Lemma 6:1, we can conclude that a  c  b  d.
112
ab
(i)
< 2s
c
d
a
c
(ii)
b
d
c
a
(iii)
d
b
Figure 6.5 Three cases of vertical overlaps.
(i) a 6= b and c 6= d. (a; b) and (c; d) are vertical segments in the original solution.
But this contradicts the fact that the original one is a clean solution.
(ii) a = b = c = d. Both (a; b) and (c; d) are empty segments; suppose they are
created from fixed pins Pa and Pc, respectively. Since Pa and Pc belong to different
nets, Pa and Pc have to be on different layers. Suppose Pa is on M5 and Pc is on M4.
Then Pa is an end point of a horizontal segment A. However, A is a P-segment and
cannot be moved; therefore (a; b) cannot be created. a = b = c = d does not hold.
(iii) a = b = c < d. (c; d) is a vertical segment in the original design, Pa must be on
M5 above (c; d). Therefore, A is a P-segment, and the edge (a; b) cannot be created.
a = b = c < d does not hold.
3. a  c  d  b. According to Lemma 6:1, we can conclude that a  c  b and
a  d  b. Still this case does not hold. The proof is similar to case 2.
Thus, we have proved that no vertical overlaps, track consistency and P-constraint are
introduced in the new design.
113
On the other hand, if a horizontal segment moves out its FP-Range, the constraints, i.e.,
order consistency, track consistency, P-constraint, wire separation, do not hold any more.
Suppose one horizontal segment A = (x1; x2; ya) moves out its FP-Range [V; U ] and
all of the constraints still hold. Without loss of generality, assume A moves below V to
(x1; x2; ya). If A is a P-segment, it cannot be moved. Thus A cannot be a P-segment. Also
if both of A’s end points are unfixed pins, V is 2s and A cannot go below V . Therefore, we
assume at least one end point a of A is not a fixed pin, i.e., a connects to a fixed pin on M4
through a via or a vertical segment B through a via. Suppose this end point is a = (x1; ya)
and its new position is a = (x1; ya). Also suppose V is decided by a pin Pv below A and its
x-coordinate falls in (x1 − 2s; x1 + 2s). For convenience, Pv also denotes its y-coordinate.
Furthermore, we may refer to a horizontal segment as its y-coordinate.
1. Pv  A < V . If Pv is on M5, it has wire separation violation with A. Thus Pv is
on M4. However, if a connects to a fixed pin p on M4 through a via, Pv has wire
separation violation with p. If a connects to a vertical segment B through a via, Pv
has wire separation violation with B. Therefore, Pv  A < V does not hold.
In the following cases, we suppose A < Pv.
2. Point a connects to a fixed pin p through a via. When A moves down, a new vertical
segment B is created connecting a and a. If Pv is on M4 or it has connection to M4
through a via, it overlaps with B. If Pv is on M5 and have no connection to M4, Pv
must be an end point of a horizontal segment C in the new design and C is above the
A. So in the original design, C is above A. However, we know that A is above C
114
since Pv is a fixed pin below A. Therefore, this case does not hold.
3. Point a connects to a vertical segment B through a via.
(i) Pv is on M4.
(a) Pv is connected to a horizontal segment C through a via, and C is below A in
the original design. Let C be C’s new position. To maintain order consistency,
C is still below A in the new design. Then there must be a vertical segment
D connecting C and Pv. A is between C and Pv. Therefore, A cannot have
connections to M4; i.e., B disappears in the new design and a must be a pin.
Thus in the original design, B connects a and a through vias. But Pv is between
a and a. This leads to contradiction.
(b) Pv has no connection to M5. Pv is an end point of a vertical segment D. If B
still exists, it has to be above D in order to keep track consistency and A cannot
be below Pv. If B does not exist in the new design, it means B connects a and
a in the original design. But Pv is between a and a. This leads to contradiction.
(ii) Pv is on M5.
(a) Pv is an end point of a horizontal segment C. C is a P-segment and cannot be
moved. Therefore, A cannot go below Pv in order to maintain order consistency.
(b) Pv is connected to a vertical segment D through a via. D must still exist in
the new design. If B still exists, it has to be above D in order to keep track
consistency and A cannot be below Pv. If B does not exist in the new design, it
115
means B connects a and a in the original design. But B and D have overlap in
the original design. This leads to contradiction. {
From the above discussion, we can conclude that each horizontal segment must satisfy
its FP-Range in the solution of a PSO problem. In other words, if we cannot find a solution
that makes all horizontal segments in their FP-Range as well as keeping order consistency
and horizontal wire separation, the PSO problem has no solution.
Since the locations of fixed pins are fixed, the FP-Range of each segment can be pre-
calculated. And when searching for a solution of a PSO problem, we only need to consider
order consistency, horizontal wire separation and FP-Range, and do not need to consider
M4 any more.
Suppose the number of fixed pins is Np. To calculate the FP-Range of a segment, it
takes O(Np). However, we can use a look-up table to speed up the searching instead of
checking all of the fixed pins.
Given a routing region (W; H) (W and H could be quite huge), claim a two-dimensional
array MAP[dW=Rwe, dH=Rhe], where Rw and Rh are two positive numbers set by users.
The elements of MAP are a set of pins. A pin with location (px; py) is put in MAP[dpx=Rwe,
dpy=Rhe]. For any horizontal segment (x1; x2; y), we only need to search MAP[d(x1 −
2s)=Rwe:::d(x1 + 2s)=Rwe; 0:::dH=Rhe] and MAP[d(x2 − 2s)=Rwe:::d(x2 + 2s)=Rwe;
0:::dH=Rhe]. In this way, the running time can be greatly reduced.
Also this theorem can take width into consideration. Suppose pin width is the same as
segment width. As illustrated in Figure 6.6, we only need to consider the fixed pins falling
116
 
 






2s
2s
A
w’
U
V
2s
2s
w"
wa
B
C
w"
wa
wa
wa
wa
wawa
Figure 6.6 Illustration of the FP-Range calculation when the width is taken into consid-
eration.
in the rectangular area. Also the [V; U ] bound should take the width of segment A into
consideration, that is [V + wa; U − wa].
Furthermore, this theorem can handle different spacing requirements of different layers
without any extra work. Let s5 and s4 be the half minimum wire separation of M5 and M4
respectively, and usually the upper metal layer M5 has larger spacing requirement than M4.
Since FP-Range is used to avoid vertical segment overlaps on M4 under the assumption of
no horizontal overlaps on M5, if we use s4 to do calculate the FP-Range, the theorem still
holds.
6.4 Consistency Graph
From the theorem, we know that the influence of M4 can be totally reflected in FP-Range.
Thus we only need to deal with horizontal signal wire segments on the top layer.
An important property of PSO problem is to keep “order consistency”. Given any two
117
A
B
C D
E F
G
H K
Figure 6.7 A routing solution of signal wires on the top layer
BA
C
D
E F
G
H K
BA
C
D
E F
G
H K
(a) (b)
Figure 6.8 (a) Full connections of adjacent segments. (b) Consistency graph.
horizontal segments A = (xa1; xa2; ya) and B = (xb1; xb2; yb), if (xa1 − s; xa2 + s) \
(xb1−s; xb2 + s) 6=  (2s is the wire separation requirement), we define segments A and B
adjacent segments. According to order consistency, the relative positions of two adjacent
segments should not be changed in the new routing solution. A good way to represent their
relative positions is to construct a directed graph “consistency graph”.
In the consistency graph, each node represents a horizontal signal wire segment. (With-
out misunderstanding, we use the same notation for segments and nodes, and refer to a
node as its corresponding segment, vice versa.) For any two adjacent segments A and B, if
A is above B, there must exist a path from A to B.
One simple way to set up the consistency graph is to create an edge for each pair of
118
adjacent segments. Figure 6.7 gives a routing solution of signal wire segments on M5.
(Since we only consider horizontal segments, the picture only shows the top layer.) Figure
6.8(a) is its corresponding graph. But in this graph, a lot of edges are not necessary. For
example, segment A is above C and C is above G, thus no necessary for edge (A; G).
Algorithm 8 Consistency-Graph-Construction (Sh)
1: for each segment in Sh do
2: create a node;
3: end for
4: for i = 1 to jShj do
5: for j = i + 1 to jShj do
6: if si and sj are close adjacent segments then
7: if si is above sj then
8: add edge(si, sj)
9: else
10: add edge(sj, si)
11: end if
12: end if
13: end for
14: end for
For any two adjacent wire segments A = (xa1; xa2; ya) and B = (xb1; xb2; yb), let
[x1; x2] = [xa1 − s; xa2 + s]\ [xb1− s; xb2 + s] and (ya > yb). If there is no other segments
overlapped with the rectangle (x1; yb; x2; ya) where (x1; yb) and (x2; ya) are the coordinates
of the bottom-left corner and up-right corner, respectively, the two adjacent segments are
called close adjacent segments, and one edge (A; B) is added. The rectangle (x1; yb; x2; ya)
is called “clear box” if no other horizontal segments have overlaps with it. In Figure 6.7,
119
the shadow areas are clear boxes. Figure 6.8(b) shows a consistency graph by adding edges
for each pair of close adjacent segments.
The construction of consistency graph can be summarized as Algorithm 8 . Sh is the
set of the horizontal signal wire segment on the top layer and a segment with index i is
presented as si.
The consistency graph G is a planar graph and the number of nodes is jShj. So the
number of edges is no more than 3jShj. Then it takes O(jShj) to check if two segments
are close adjacent segments, and the graph construction involves two loops. Its worse case
runtime is O(jShj3). However if we use a look-up table to record segments (similar to the
pin look-up table in Section 6.3), the runtime can be greatly reduced.
6.5 PSO-H Algorithm
To solve the PSO problem, we draw on the consistency graph G to maintain the order
consistency. For convenience, for any two nodes A and B in G, if there is a path from A to
B, we say A is B’s parent, and B is A’s child.
Each time, select the nodes that have no parent nodes, and move them to their highest
available positions. These positions are their new locations. Then remove these nodes from
the graph. Repeat this process until no nodes are left.
For each segment, its available position is decided by its FP-Range, allowable deviation
bound, the distribution of power rails, fixed pins on M5, and the positions of its parents.
Let the wire separation requirement be 2s. Suppose segment A = (x1; x2; y; w) has an
120
FP-Range [V; U ] and an allowable deviation bound d. If A moves in the range [V; U ] \
[y − d; y + d], vertical wire separation, track consistency and P-constraint are satisfied
according to Theorem 6:1. Obviously, the deviation bound is also satisfied. Also each
node t records a value Ubound. Ubound = minfyr − 2s − wrjyr is the y-coordinate of
a t’s parent node and wr is its width g. Thus if A moves in the range [0; Ubound − w],
the order consistency and horizontal signal wire separation are guaranteed. Let [ V ; U ] =
[V; U ] \ [y − d; y + d] \ [0; Ubound − w]. If no power rails fall in the rectangle region
(x1 − 2s; U −w− 2s; x2 + 2s; U + w + 2s), track U is the new position of the signal wire
segment A. Otherwise, suppose a power rail segment P = (xp1; xp2; yp; wp) has overlap
with the rectangle region, let U = yp − wp − w − 2s. Repeat the checking until a suitable
position is found or U < V . The latter means no solution to the PSO problem. Furthermore,
since power rails are checked segment by segment, the shorts of two power rail segments
have no effect on the calculation. For fixed pins on M5, they are handled similar to power
rails.
PSO-H algorithm can be summarized as shows in Algorithm 9. Sh is the set of horizon-
tal signal wire segments, P is the power rail set, R is the set of fixed pins, D records the
allowable derivation bound for each segment. For each node t, its Ubound is denoted as
t:Ubound.
In this algorithm, each time we always put a horizontal segment to its highest available
position. This leaves more room for other segments since once one segment is processed,
its location is fixed and other segments below it cannot take the places above it. If we
121
arbitrary assign an available position, some segments may have no place to put.
Algorithm 9 PSO-H (Sh, P , R, D)
1: G = Consistency-Graph-Construction(Sh);
2: Calculate the FP-Range for each node;
3: Calculate [FP-Range] \ [allowable deviation];
4: Push all nodes without parent nodes into a List L;
5: for all nodes t in L do
6: t:Ubound = H;
7: end for
8: while L 6=  do
9: Remove a node q from L;
10: Calculate q’s new position;
11: if no position is found then
12: return “No Solution”;
13: end if
14: Update q’s child nodes’ Ubound;
15: Delete q from G;
16: Push the nodes without parent nodes into L;
17: end while
In the calculation of available positions, FP-Range, deviation bound and the distribution
of power rails are decided by the problem itself. If no range satisfies these three constraints,
no solution. On the other hand, if the problem has a solution, the position of each signal
wire segment in this solution must fall in the range of the available position obtained in
PSO-H since PSO-H always puts segments to their highest available positions, therefore
PSO-H guarantees to find a solution as long as one exists.
To construct the consistency graph, it takes O(jShj3). The calculation of FP-Range
122
may take O(jRjjShj). For the while loop, it has jShj round. In each round, checking the
intersection of power rails and fixed pins on M5 may take O(jP j+jRj). To update Ubound,
each edge is visited only once for the whole loop. Since the number of edges is no more
than 3jShj, the runtime for PSO-H is O(jShj3+jShjjRj+3jShjjP j+3jShjjRj). Still, we can
set up loop-up tables for pins, signal wire segments and power rails so that the searching
time can be greatly reduced.
Theorem 6.2 Given a PSO problem, PSO-H algorithm guarantees to find a feasible solu-
tion in polynomial time as long as one solution exists.
6.6 PSO-G Algorithm
In Section 6.5, we propose an algorithm PSO-H to solve PSO problem. In that approach, all
horizontal signal wire segments are put to their highest available positions. Surely for some
segments, this is not necessary. And in many applications, we hope to make the changes
as little as possible. So in this section, we propose another algorithm PSO-G which tries to
reduce the total deviation. The “total deviation” is defined as the sum of the deviations of
all horizontal segments. Also we define the node overlapped with power rails as Onode; if
an Onode overlaps a power rail P^ , and it has no parent nodes overlapped with P^ , the node
is called URnode; Similarly, if an Onode has no child nodes in the same power rail, the
node is called DRnode. And a Rnode refers to either a URnode or a DRnode.
Here are two observations:
1. Given a routing solution T , one solution T 0 is obtained by moving one Onode q
123
A B
C E D
F
URnode
DRnode
Figure 6.9 Illustration of Onodes/Rnodes.
outside power rail P^ . To minimize the total deviation, the spacing between the new
position of q and P^ must equal to the wire separation requirement.
2. Given a routing solution T , one solution T 0 is obtained by moving an Onode no which
is not a Rnode outside power rails. Then there must be another solution T 00 which
is obtained by moving a Rnode outside power rails such that the total deviation of
T 00 is less than that of T 0. This is because no has at least one parent and child in the
same power rail. If no moves outside the power rail, it forces its parent/child nodes
to move outside power rails in order to keep order consistency. Therefore, moving a
no’s parent/child node outside power rails leads to less total deviation. In Figure 6.9,
the shadow area is a power rail. All segments inside are Onodes. A, B, and D have
no parent nodes in this power rail. Thus A, B, and D are URnodes. F and D have no
child nodes in this power rail. They are DRnodes. A node can be both URnode and
DRnode, like D. Then the total deviation for moving the segment E must be larger
than that for moving B or F .
Based on the two observations, we propose the PSO-G algorithm. In PSO-G, we need
to search the graph in both directions (to parent nodes and to child nodes). To facilitate
124
searching, another set of edges is added. If there is an edge (a; b) in G, add an edge (b; a).
To separate the two sets of edges, assign a color “black” to the original edges and “white”
to newly added edges. If a node searches for its parents, use white edges; if it searches for
child nodes, use black edges.
First, we assign a cost to each Rnode r. Suppose r is a URnode. If r is moved outside
the power rail, the new position of r may overlap with other segments or it is above its
parents’ position. Then the affected signal segments are forced to move up accordingly.
The cost of a Rnode r is the total deviation caused by moving r outside power rails. Since
r is a URnode, the affected segments can only be its parents. Let the node r be the starting
point. Using BFS(Breath-first search) algorithm on white edges, we can identify all pos-
sible affected segments and mark them “red”. Assume r is on power rail P^ , then its new
position is yp + wp + wr + 2s where yp is the y-coordinate of the center of P^ , wp and wr
are the width of P^ and r respectively, and 2s is the wire separation requirement. If this
position is still overlapped with an power rail ~P , test y ~P + w ~P + wr + 2s. Repeat this
process until no overlaps with power rails or the position is outside r’s available position.
r’s available position is the intersection of r’s FP-Range and its allowable deviation range.
If no suitable position is found, set its cost 1. Otherwise, go ahead to process the affected
segments. The position of an affected segment is not calculated until all its red child nodes
have calculated their new positions. Suppose an affected segment is A = (x1; x2; y; w).
Let z = maxfyc + wc + 2sjyc is the y-coordinate of an A’s red child node and wc is its
width g. If z is not a suitable position, use the above procedure until an available position
125
is found. If no suitable position is found, the cost is 1. If all affected segments can find a
position, the cost is the sum of the difference of their new positions and the old ones. The
similar rule applies if r is a DRnode.
Each time, select one Rnode with the minimum cost, move it outside power rails and
adjust the positions of affected segments if the minimum cost is not 1. Once a Rnode is
moved outside power rails, it is just an ordinary segment and it is not a Rnode any more.
Furthermore, some of its child/parent Onodes may become a Rnode. For each Rnode,
recalculate the cost and repeat this process until there is no Rnode (i.e., no segments have
overlap with power rails, and the result is a solution to the PSO problem), or until the
minimum cost is 1 (i.e., no solution to the PSO problem). The algorithm is summarized
in Algorithm 10.
Algorithm 10 PSO-G (Sh, P , R, D)
1: Construct consistency graph, mark Rnodes;
2: Calculate the FP-Range for each node;
3: Calculate [FP-Range] \ [allowable deviation];
4: Push all Rnodes into List and assign costs;
5: while List 6=  do
6: Select the Rnode r with the minimum cost;
7: if min cost = 1 then
8: return “No Solution”;
9: end if
10: Change positions of r and affected segments;
11: Push new Rnodes into List;
12: Adjust/assign costs to Rnodes in the List;
13: end while
126
In Algorithm 10, the order consistency and horizontal wire separation are satisfied.
Since all segments move within their FP-Range, no vertical wire separation or track con-
sistency or P-constraint is violated according to theorem 1. Furthermore, each segment is
in its deviation bound and has no overlap with power rails. Therefore, if PSO-G returns a
solution, the solution is a correct one. On the other hand, if PSO-G cannot find a solution,
the PSO problem has no solution.
If a URnode u has a cost 1, at least two parent nodes p1 and p2 of u are overlapped
or their up/down order is disturbed. Suppose p1 is above p2 in the original design. Since
p1 cannot move up further, it must have reached its highest possible position which is the
intersection of FP-Range and allowable deviation bound excluding power rails. Also along
the path from u to p2, the spacing between two segments must be the required minimum
spacing which is the sum of the widths of the two segments and 2s. Thus if the cost of a
URnode is 1, it cannot be moved up outside power rails and this is totally decided by the
problem itself instead of the processing method. Similarly, whether u can be moved down
outside power rails is totally decided by the problem itself too.
If PSO-G returns “No Solution”, the minimum cost must be1. For any Rnode r left in
List, there are two cases: (1) The term r is a URnode and DRnode, r cannot be moved up
or down outside power rails and it is decided by the problem itself. Therefore, no solution
to PSO-problem. (2) Let r be a URnode and it has a child DRnode t. Both r and t have a
cost of 1 and cannot be moved down outside power rails. Still the PSO problem has no
solution. Thus if PSO-G algorithm cannot find a solution, the PSO problem has no solution.
127
Suppose the number of segments overlapped with power rails is L. To construct the
consistency graph and mark Rnodes, it takes O(jShj3 + jShjjP j). The calculation of FP-
Range may take O(jShjjRj). To calculate the cost of one segment, each edge is visited at
most twice (First for BFS (Breath-First-Search), second for calculating the position) and it
takes O(jShjjP j) since the edges of the consistency graph is at most 6jShj (including both
black and white edges). Furthermore, each Rnode can be in the List at most twice. Thus the
while loop has at most 2L rounds, and each round can be finished in O(LjP jjShj). Thus
the total runtime is O(jShj3 + jShjjRj+ L2jP jjShj). By setting up look-up tables for pins,
signal wire segments and power rails, the runtime can be greatly reduced.
Furthermore, once a URnode (DRnode) is moved outside power rails, the segments
affected can only be its parents (children). So if the cost calculation of a Rnode does not
involve these segments, its cost will not be changed after the URnode (DRnode) moves out,
i.e., no cost adjustment (Line 12 in Algorithm 10) is needed for this node. Therefore, for
each Rnode r, we record a bounding box (xb1; yb1; xb2; yb2) where (xb1; yb1) and (xb2; yb2)
are the coordinates of bottom left corner and up right corner. The bounding box covers all
of the affected segments when r moves out. After one URnode/DRnode t is moved out,
if the bounding box of a Rnode has no overlap with r’s bounding box, the cost keeps the
same and no need for calculation. In this way, a lot of calculation can be saved.
Theorem 6.3 Given a PSO problem, PSO-G algorithm guarantees to find a feasible solu-
tion in polynomial time as long as one solution exists.
128
Table 6.1 Average results of PSO-H and PSO-G for 5 times.
File N3 S6 M8 F10
ECO Region Area (4908.92, (3295.52, (10872.90, (4799.54,
(um2) 3295.52) 4908.92) 4799.54) 10872.90)
Top layer Signal Segments 1601 2098 1266 726
Power Rail Segments 166 1128 631 747
Overlapped Signal Segments 465 594 441 206
Allowable Deviation 2% 2% 2% 2%
Time PSO-H 5.62 9.17 3.81 3.26
(second) PSO-G 21.44 41.28 23.72 11.38
Max PSO-H 1.987% 1.992% 1.991% 1.996%
Deviation PSO-G 1.617% 0.231% 1.522% 0.944%
Total PSO-H(um) 63029.58 126855.37 108874.92 120950.08
Deviation PSO-G(um) 13097.40 1208.10 7066.87 4200.80
PSO-G/PSO-H 20.780% 0.952% 6.491% 3.473%
6.7 Experimental Results
Our algorithms were implemented in C++ on PC (733 MHz) with 128MB memory. We
tested PSO-H and PSO-G algorithms on four test files. These circuits were derived from
industry files and the top layer is for horizontal tracks. Both approaches were repeated 5
times. Table 6:1 lists the average results of these five trials. For all of the test circuits, we
can find a clean routing solution and the derivation of each signal segment is bounded as
2% of the height of the ECO region area. For PSO-H algorithm, each top layer signal wire
segment is moved only once. So it is much faster than PSO-G algorithm. If the requirement
only wants a clean solution satisfying the deviation bound, PSO-H is preferred. But if no
129
deviation bound is given or want to find a solution as close as possible to the original
design, PSO-G algorithm always returns a solution with less total deviation and a smaller
max-deviation.
6.8 Conclusion
In this chapter, we have presented two polynomial-time algorithms to solve the overlaps
between power rails and signal wires on the top layer as well as satisfying the allowable
deviation bound, order consistency, track consistency, P-constraint and wire separation re-
quirement. Both algorithms guarantee to find a feasible solution as long as one exists.
One is faster, while the other makes effort to reduce total deviation. According to differ-
ent application requirements, users can choose an appropriate one. Experimental results
demonstrate their efficiency and effectiveness.
130
CHAPTER 7
AN ECO ALGORITHM FOR ELIMINATING
CROSSTALK VIOLATIONS
7.1 Introduction
Any changes on existing routing design may cause design rule violations and it is necessary
to develop efficient and graceful algorithms to resolve these violations. In this chapter, we
propose an algorithm (CVE) to eliminate crosstalk violations to a given routing design.
The target is to find a new clean routing solution with no crosstalk violations under the
constraints similar to the constraints stated in the previous chapter. Therefore, the CVE
algorithm can also be applied to the output of the PSO problem.
a
b
d
e
c
b'
c'
P
a
b
d
e
c
b'
c'
P
(a) (b)
Figure 7.1 (a) A routing solution with crosstalk violations. (b) A routing solution with
overlap violations.
131
As we notice, once a signal wire segment is moved, the total capacitive crosstalk on both
this segment and its neighbor segments may be changed. At the same time, design spacing
rule violations must be avoided. For convenience, if the spacing between two segments
is less than the minimum spacing requirement, we say the two segments overlap. Figure
7.1(a) gives a routing solution with five horizontal signal wire segments, two vertical wire
segments and one power rail. Suppose segments b and e violate the capacitive crosstalk
requirement; i.e., the total crosstalk on b and e exceeds defined thresholds. As illustrated
in Figure 7.1(b), if e is moved down, it overlaps with the power rail P . Also if b is moved
down, vertical overlap between b0 and c0 on L^ is introduced.
In this chapter, we propose a two stage CVE (Crosstalk Violation Elimination) algo-
rithm to eliminate crosstalk violations for a given routing design as well as minimizing
the total deviation. The first stage FCVE processes signal wire segments on L one by
one and tries to find a clean routing solution satisfying all constraints. Then in the second
stage SCVE, we make efforts to minimize the total deviation based on the shortest path
algorithm. Experimental results demonstrate that our approach is efficient and effective.
The rest of the chapter is organized as follows: Section 7.2 gives the definition the CVE
problem, and Section 7.3 introduces some preliminaries. Then we propose a two-stage
algorithm in Section 7.4. In Section 7.5, some optimization strategies are presented and
experimental results are given in Section 7.6. We conclude the chapter in Section 7.7.
132
7.2 Crosstalk Violation Elimination
Given a routing solution S with N signal nets, there are P power rails on layer L. For
convenience, let the coordinate of the left bottom corner of the routing region be (0; 0).
Suppose s is the half minimum wire separation of a metal layer. For a horizontal segment,
it can be represented by (x1; x2; y; w; c; d) where (x1; y) and (x2; y) are the end point co-
ordinates of the center line (x1 < x2), and w is the half-width of the segment, c is called
crosstalk threshold, i.e., the total capacitive crosstalk to the segment should not exceed this
bound, and d is allowable deviation bound, i.e., when the segment moves up/down, its new
position (x1; x2; y; w; c; d) should satisfy jy − yj  d. Similarly, a vertical segment can
be represented as (y1; y2; x; w; c; d). Sometimes, we can simplify the representation. For
example, a horizontal segment can be represented by (x1; x2; y) if we do not care other
factors.
Since the crosstalk on some sensitive segments in S exceeds the given bounds, the target
is to modify the existing routing solution S so that the new routing solution S is a clean
routing solution which satisfies the following constraints:
1. The power rails P on L are not changed.
2. Horizontal signal wire segments on L can only move up/down, i.e., the x-coordinates
of the two end points of the segment keep unchanged.
3. The total crosstalk on a wire segment should not exceed its capacitive crosstalk
threshold c.
133
4. The relative positions of two segments on all layers should not be changed.
For example, for any two horizontal signal wire segments on one layer (x1; x2; y)
and (x01; x02; y0) (assume y > y0), their new positions are (x1; x2; y) and (x01; x02; y0) in
the new routing solution S respectively. If (x1 − s; x2 + s) \ (x01 − s; x02 + s) 6= ,
y > y0 must hold. Similar requirements for vertical segments. This property is called
“order consistency”.
5. The difference between the new position of a wire segment and its old location should
not exceed its allowable deviation bound d.
The term d is defined to constrain that one segment does not derive too much from
its original position. At the same time, it helps to prevent introducing new crosstalk
violations to other layers. When horizontal segments on L are changed, the length
of vertical segments on L^ or ~L may also be changed. However, the length change
is no more than 2d since each vertical segment connects to at most two horizontal
segments on L. Then the crosstalk introduced by length increase is also limited.
Therefore, by setting appropriate deviation bounds, new crosstalk violations on layer
L^ or ~L can be avoided.
Although the CVE problem deals with crosstalk violations on one layer, it can be ap-
plied layer by layer to resolve violations on all layers to a given multiple layer routing
design.
134
7.3 Preliminaries
7.3.1 FP-Range
Although the constraints of the CVE problem is slightly different from those of the PSO
problem, we have the following theorem related to FP-Range.
Theorem 7.1 If all horizontal segments on layer L move up/down within their FP-Ranges
[V; U ] and satisfy horizontal wire separation requirement and order consistency, the new
routing solution has no vertical wire separation violations.
7.3.2 Crosstalk model
In general, each segment has coupling effect to all other segments. However, the coupling
capacitance decreases dynamically if the segment is out of the neighborhood of the other
segments [46, 47]. Therefore, we only consider the capacitive crosstalk between two neigh-
boring parallel wires and suppose the neighborhood distance is D = γ  2s, (0 < γ < 2).
Then the capacitive crosstalk between two segments can be expressed by the following
formula:
c =
8>>><
>>>:
  l
t2
t  D
0 t > D
where  is the coupling parameter, l is the coupling length, and t is the distance between
two segments.
In Figure 7.2, there are three wire segments A, B and C. Segments A and C have
135
t
l t '>D
A
B
C
Figure 7.2 Segments A and C have capacitive crosstalk; while the crosstalk between
segments B and C is zero.
capacitive crosstalk. While the crosstalk between segments B and C is zero since the
distance between the two segments is larger than D.
Furthermore, for power rails, they act as a shield and do not cause crosstalk to their
adjacent segments.
7.4 CVE Algorithm
To solve the CVE problem, we develop a two-stage algorithm. The first stage FCVE pro-
cesses signal wire segments on L one by one and tries to find a clean routing solution
satisfying all constraints. Then in the second stage SCVE, efforts are made to minimize the
total deviation based on the shortest path algorithm.
7.4.1 FCVE algorithm
For convenience, for any two nodes A and B in G, if there is a path from A to B, we say A
is B’s parent, and B is A’s child.
The main idea of FCVE algorithm is as follows: each time, select the nodes that have
136
no parent nodes and try to move them to their highest available positions. These positions
are their new locations. Then remove these nodes from the graph. Repeat this process until
no nodes are left.
For each segment, its available position is related to its FP-range, allowable deviation
bound, crosstalk threshold, the distribution of power rails and the positions of its parents.
Let the wire separation requirement be 2s. Suppose segment A = (x1; x2; y; w; c; d) has an
FP-range [V; U ]. Also A records a value Ubound. Ubound = minfyp − 2s− wpjyp is the
y-coordinate of an A’s parent node and wp is its half widthg. Then if A moves in the range
[0; Ubound − w], the order consistency is guaranteed. Let [ V ; U ] = [V; U ] \ [y − d; y +
d] \ [0; Ubound − w]. Check tracks t starting from U . If t is not occupied by any power
rails and no crosstalk violations are introduced to A’s parents and itself if A is put at track
t, t is assigned as A’s new position. Otherwise, check the next track below t. Repeat this
process until a feasible position is found or the track goes beyond V . The latter case means
no feasible solution is found. Once the position of A is decided, the capacitive crosstalk
bounds of A and A’s parents have to be adjusted accordingly, i.e., minus the crosstalk
between A and its parent from the crosstalk bounds of A and its parent.
Furthermore, if one segment has several children, then the children selected first always
have higher priority. For example, in Figure 7.3, suppose the position of A has been fixed.
B, C and D are three children of A. The coupling length ratio of B, C and D is 2 : 1 : 1.
The crosstalk bound of all segments is 30. The numbers in the figure indicate the capacitive
crosstalk if the segment is placed at their highest available positions. Suppose B is first
137
selected and it is placed as Figure 7.3(b). Then the crosstalk bound of A is reduced to 0.
Therefore C and D have to be placed lower, which pushes E down too. In order to avoid
one segment consuming all or most of the crosstalk budget, we use the following approach.
Suppose a segment R is fixed and its crosstalk bound is cr. Also its total coupling length
with all of its unfixed children is lr. Let dr = minfD;
p
  (lr=cr)g. Then the distance
between R and its first selected child T must be no less than dr. Once T is fixed, cr is
adjusted accordingly, i.e., minus the crosstalk between R and T from cr. Then the new cr
is used for R’s other children in the same way. Figure 7.3(c) shows a solution with this
approach. According to the crosstalk budget, the crosstalk between A and B, A and C, A
and D should be 15, 7:5, and 7:5, respectively. Suppose segment B is first selected, then
the crosstalk upper bound of A is reduced to 15. Since the lengths of C and D are the same,
C and D get a crosstalk budget 7:5. And the new position of C can be calculated. However,
the highest available position of C is lower than the calculated position. Therefore, C is
put on its highest available position and the crosstalk to A is 7. Finally, D takes all of the
crosstalk budget.
In FCVE, we always try to put a horizontal segment upwards. This leaves more room
for other segments since once one segment is processed, its location is fixed and other
segments below it cannot take the places above it. If we arbitrarily assign a segment to one
of its available positions, some segments may have no place to put.
We notice that, even if there are no crosstalk violations in the given input routing design,
segments may still be moved in the above procedure. However, our targets are not only to
138
AB
C
D
E
30 15
7
(a)
A
E
30
00
(b)
B
C D
A
E
15 7
(c)
B C D
8
Figure 7.3 (a) B, C, and D are three children of A. The position of A is fixed. (b) B
is first selected and put to its highest available position. (c) A solution according to our
approach.
eliminate crosstalk violations, but also to minimize the total deviation. Therefore, we start
with a zero allowable deviation bound, and each time we increase the bound by a certain
percentage. For each deviation value, we calculate the positions of all segments according
to the above procedure. We repeat this process until a feasible solution is found or the
deviation bound exceeds the predefined value. For the latter case, no feasible solution is
found.
The algorithm can be summarized as in Algorithm 11. Sh is the set of horizontal signal
wire segments, P is the power rail set, Fp is the set of fixed pins. For each node v, v:d is its
allowable deviation bound and v:Ubound refers to its Ubound. The term  is the increase
percentage for each iteration. Then v:d  increase is the allowable deviation of node v.
Suppose the width and height of the chip are W and H , respectively.
139
Algorithm 11 FCVE (Sh, P , Fp, )
1: G = Consistency-Graph-Construction (Sh);
2: Calculate FP-Range for each node;
3: increase = 0;
4: while increase  1 do
5: Calculate [FP-Range] \ [v:d  increase] of nodes v;
6: Push nodes without parent nodes into a List T ;
7: for all nodes t in T do
8: t:Ubound = H;
9: end for
10: while T 6=  do
11: Remove a node q from T ;
12: Calculate q’s new position;
13: if no position is found then
14: increase = increase + ;
15: Restore G;
16: Goto 4;
17: end if
18: Update Ubound of q’s children;
19: Update crosstalk bounds of q and its parents;
20: Delete q from G;
21: Push the nodes without parent nodes into T ;
22: end while
23: Return “New Solution”;
24: end while
25: Return “No Solution”
140
AP
4
A3
A2
A1
A4
A3
A2
A1
(a) (b)
Figure 7.4 (a) A CVEP problem. There are 4 signal wire segments A1, A2, A3, and A4,
and 1 power rail P . (b) The consistency graph is a path.
7.4.2 SCVE
If FCVE returns a solution, then the solution must be a feasible solution satisfying all of the
constraints. However, FCVE tends to place segments to their “highest” available positions
while some segments do not need to deviate so much from their original positions. In this
section, we first consider a special case of CVE problem (CVEP) and propose an exact
polynomial-time algorithm to decide wire segment positions with minimum total deviation
under all constraints. Then by applying this algorithm repeatedly on the output of FCVE,
we can greatly reduce the total deviation.
Problem 7.1 CVEP is a special case of CVE problem when all horizontal segments on
layer L are placed in a line, i.e., the corresponding consistency graph is a path.
Figure 7.4(a) shows an example. There are 4 signal wire segments and 1 power rail.
Figure 7.4(b) is its consistency graph and it is a single path from node A4 to A1. For
convenience, segments in a CVEP problem are indexed as A1, ..., An from bottom to top.
141
To solve the CVEP problem, we first construct a “Segment Position” (SP) graph, and
then apply the shortest path algorithm to get the solution. The SP graph is constructed in
two steps. The first step graph (FSP) G = (V; E) is formed as follows.
1. Nodes: Since the allowable deviation of segment Ai is di, totally there are 2di + 1
possible positions for Ai. Let node set V 0 = fvji ji 2 [1; n]; j 2 [−di; di]g represent-
ing possible positions of Ai; i.e., vji refers to the position yi + j. For convenience, we
call vji a node of Ai. Also for any possible position, if it is occupied by a power rail or
it is outside Ai’s FP-Range, then Ai cannot put there. Suppose nodes corresponding
to this kind of positions form the set V 00. V = V 0 − V 00.
2. Edge: E = f(vji ; vki+1)j vki+1 − wi+1 − (vji + wi)  2s; i 2 [1; n− 1]; j 2 [−di; di];
k 2 [−di+1; di+1]; vji 2 V; vki+1 2 V g. For each node of Ai, it is connected to
the nodes of Ai+1 such that the distance between two nodes satisfies the minimum
spacing requirement.
3. Cost: each edge (vji ; vki+1) is assigned a cost which is the capacitive crosstalk between
Ai and Ai+1 supposing the two segments are placed at vji and vki+1 respectively.
Figure 7.5 shows an example. (a) is a CVEP problem with 3 signal wire segments A1,
A2, A3 and 3 power rails. For simplicity, suppose all wires have the same length, and the
deviation bounds of signal wire segments are all 2. Also the crosstalk thresholds are all
0, i.e., the distance between any two signal wire segments must be larger than 1 unit. In
Figure 7.5(a), since segments A1 and A2 are adjacent to each other, the capacitive crosstalk
142
02
1
4
3
6
5
7
A3
A2
A1
P 3
P 2
P 1
0
2
1
4
3
6
5
7 v 32
v30
v 3-2
v22
v 20
v 2-1
v11
v10
v 1-2
e1
e2
e3
e4
e5
e6
e7
e8
e9
e10
e11
e12
A3
A2
A3
(a) (b)
s
t
e1
e2
e3
e4
e5
e6
e7
e8
e9
e10
e11
e12
2
2
2
0
0
2
1
0
2
0
2
2
22
2
0
0
1
1
0
2
1
4
3
6
5
7
A3
A2
A1
P 3
P 2
P 1
(c) (d)
Figure 7.5 (a) A CVEP problem. (b) FSP graph G of the CVEP problem. (c) SP graph G
of the CVEP problem. (d) A feasible solution to the CVEP problem.
143
between them exceeds the crosstalk bounds of both A1 and A2. Suppose A1 and A2 violate
the crosstalk requirement.
Figure 7.5(b) shows the corresponding CVEP graph G for (a). Due to the overlap with
power rails, the available positions of each segment are only 3 and they are represented by
3 nodes respectively. The costs of all edges are 0 except two edges e6 and e10.
In FSP graph, the allowable deviation bound is reflected by nodes, and the edge cost
records the crosstalk between two segments. However, the crosstalk constraint is not in-
cluded. Therefore, based on FSP graph, we derive the SP graph G = (V ; E) so that the
shortest path algorithm can be applied to find the solution. G is formed as follows.
1. Nodes: Each edge in G is represented by a node. For convenience, a edge (u; v)
in FSP graph also refers to a node in SP graph. Also two nodes s and t are added
representing the starting and ending nodes respectively.
2. Edges: For any two edges (vji ; vki+1) and (vki+1; vli+2) in FSP graph, if the total cost
of the two edges is less than ci+1, which is the crosstalk bound of segment Ai+1, an
edge is added between the two corresponding nodes in G. Also connect s to all of
the nodes corresponding to the edges related A1 in FSP graph, and all of the nodes
corresponding to the edges related to An are connected to t.
3. Cost: If edge e connects two nodes (vji ; vki+1) and (vki+1; vli+2), the cost of e is jkj (i.e.,
the deviation of Ai+1); if edge e starts from s (i.e., e connects s and (vj1; vk2)), the cost
is jjj; if edge e ends at t (i.e., e connects (vjn−1; vkn) and t), the cost is jkj.
144
Figure 7.5(c) illustrates the SP graph G for the given CVEP problem. Each edge in
G is represented by a node in G. For edges e6 and e10 in G, since their cost is 1, edges
(e6; e12), (e2; e10), and (e4; e10) are not included in G. Based on G, we apply the shortest
path algorithm to find the shortest path from s to t. In Figure 7.5(c), the shortest path is
indicated by thick curves. It is easy to derive a CVEP solution from the shortest path in G
as shown in Figure 7.5(d).
Suppose totally there are n wire segments and M is the max allowable deviation. The
number of nodes in FSP graph is O(n M). For each node in FSP graph, it connects to at
most M nodes. Therefore, the number of nodes and edges in SP graph G are O(n  M2)
and O(n M3), respectively. Since G is a directed acylic graph, the shortest path algorithm
can be accomplished in O(j V j+ j Ej) [14, 15], i.e., O(n M3).
We now summarize the CVEP algorithm as Algorithm 12.
Algorithm 12 CVEP (P )
1: Construct SP graph G for the input path P ;
2: Apply shortest path algorithm on G ;
3: Derive the solution to the given CVEP problem
The construction of SP graph G takes O(n  M3), and the derivation from a shortest
path in G to a CVEP solution takes O(n). Therefore, CVEP algorithm can solve CVEP
problems in O(n M3). Furthermore, the algorithm guarantees to return a feasible solution
with minimum deviation as long as there is a solution to the given CVEP problem.
Based on CVEP algorithm, we have the SCVE algorithm as Algorithm 13. SCVE
algorithm performs as the second stage of CVE algorithm since its input is the output of
145
FCVE algorithm, which is a feasible solution to the given CVE problem. The target of
SCVE is to reduce the total deviation.
Based on the consistency graph, each time we select a path and apply CVEP algorithm
to find the optimal solution corresponding to the selected path. Once a path is processed,
all nodes along the path are marked “Processed”, and their positions are no longer changed.
Since FCVE algorithm traverses a consistency graph from top to bottom, and many seg-
ments may put on a position higher than their original positions, SCVE algorithm selects
paths from the bottom of a consistency graph. For each path, the first node u must either
have no child or all of its children are marked. Then trace up to its parents. If one of its
parents p has u as the only unmarked child, then p is selected and continue this procedure
until no nodes satisfy the selection rule. Once a path is selected, we treat all other nodes
unchanged and apply the CVEP algorithm. Note that the capacitive crosstalk of each pos-
sible position of a signal wire segment is also affected by other segments which are not
incident on the path.
Algorithm 13 SCVE ()
1: Set all nodes in the consistency graph “UnProcessed”
2: while 9 “UnProcessed” nodes do
3: Select a path P from the consistency graph;
4: Apply CVEP algorithm on P ;
5: Mark all nodes on P as “Processed”;
6: end while
146
Ai
(a)
Ai
(b)
Figure 7.6 (a) Ai is a wire segment and it has 12 available positions. (b) Every three
nodes are clustered as a “supernode”.
7.5 Optimization
CVEP algorithm is the kernel part of SCVE algorithm. However, the number of possible
positions of wire segments may be quite large and it makes SP graph include a lot of nodes
and edges, which requires not only much memory but also long running time. In order to
speed up the execution, we develop the following optimization strategies.
7.5.1 Node clustering
When the deviation of a wire segment is large, the corresponding FSP graph and SP graph
must include a large number of nodes. In order to facilitate the process of huge CVEP
problems, we propose the following node clustering method to speed up the computation.
For any wire segment Ai, suppose the number of its possible positions is M . Then
by grouping neighbor positions together, we can greatly reduce the number of nodes in
FSP graph, consequently reduce the size of SP graph. Once several nodes are grouped
together, we can use the average coordinate as the location of the new “supernode”. Figure
147
Ai-1
Ai
Ai+1
p
e1
e2
e6
e5
e4
e3
q
yi
e
e7
Ai-1
Ai
Ai+1
p
e1
e2
e6
e5
e4
e3
q
yi
e
e7
(a) (b)
Figure 7.7 (a) FSP graph of a CVEP problem. p is a feasible position of Ai−1. (b) SP
graph of the CVEP problem.
7.6 illustrates an example. Ai includes 12 feasible positions. When clustering 3 nodes as
a “supernode”, there are only 4 “supernodes”. Accordingly, the size of SP graph can be
greatly reduced.
7.5.2 Edge omitting
The construction of SP graph G is based on FSP graph G. During the transformation from
FSP graph to SP graph, if we know that some edges will not appear in the final solution,
then these edges can be omitted in SP graph. Therefore, the target of this optimization
strategy is to identify this kind of edges.
Suppose a path P = (A1; :::; An) is the input of a CVEP problem, where Ai(i =
148
1; :::; n) is a wire segment. Let Ai = (xi1; xi2; yi; wi; ci; di), and its FP-range be [Vi; Ui].
For convenience, we call a position is a feasible position of Ai if it is not occupied by any
power rail, and its y-coordinate falls in [Vi; Ui]\ [yi− di; yi + di]. For a feasible position p
of Ai−1, suppose there is one edge e connecting to p either from s or a feasible position of
Ai−2 as illustrated in Figure 7.7 (a). In (a), Ai includes 7 feasible positions. p is a feasible
position of Ai−1. e connects to p and p connects to all feasible positions of Ai. Based on
this FSP graph, the corresponding SP graph is Figure 7.7 (b), assuming (e; ei) (i = 1; :::; 7)
satisfies the crosstalk constraint. However, in some cases, some of these edges may not be
needed.
Suppose q is the lowest feasible position of Ai+1. Let Bu = minfq − 2s − wi −
wi+1; q−D−wi−wi+1g, where 2s is the minimum spacing between two segments and if
the distance of two segments is larger D, there is no crosstalk between the two segments.
Also let Bl = maxfp+2s+wi +wi−1; p+D +wi +wi+1g. We have the following cases.
Case 1 Bl  yi  Bu.
Let r = minfyi−Bl; Bu−yig. Start from yi, and search within the range [yi−r; yi+r].
If yi is occupied by power rails, then check yi − 1, yi + 1, yi − 2, yi + 2 ... until a feasible
position u is found or it is out of the range. If u is found, then only one edge is needed in
the SP graph, i.e., connecting the two nodes corresponding to e and (p; u) in the FSP graph.
In Figure 7.8 (Case 1), u is the closest feasible position to yi in the range [yi − r; yi + r].
Only one edge (e; e5) is needed in SP graph.
Consider other feasible positions v of Ai. Given an optimal solution S of a CVEP
149
Ai-1
Ai
Ai+1
p
e1
e2
e6
e5
e4
e3
q
yi
e
e7
Bu
Bl
yi r+ u
Ai-1
Ai
Ai+1
p
e1
e2
e6
e5
e4
e3
q
yi
e
e7
Bu
Bl
yi2 u-
u
Ai-1
Ai
Ai+1
p
e1
e2
e6
e5
e4
e3
q
yi
e
e7
Bu
Bl
u
yi2 u-
(Case 1) (Case 2) (Case 3)
Figure 7.8 (Case 1) u is the closest feasible position to yi. Only one edge is needed. (Case
2) yi is the only feasible position in (Bu; 2yi − u). Two edges are added. (Case 3) There
are two feasible positions in (2yi − u; Bl). Three edges are added.
problem, suppose positions p, v and w (w is a feasible position of Ai+1) are selected for
Ai−1, Ai and Ai+1 respectively. Then p, u and w must also be feasible positions of the three
wire segments since the capacitive crosstalk of (p; u) and (u; w) is zero. However, u is the
closest feasible position to yi and it has the least deviation among all feasible positions of
Ai. Therefore, a solution with p, u and w as the positions of Ai−1, Ai and Ai+1 should have
less deviation. But this contradicts that S is an optimal solution.
Case 2 Bl  Bu  yi
Start from Bu, and search within the range [Bl; Bu]. If Bu is occupied by power rails,
then check Bu−1, Bu−2 ... until a feasible position u is found or it is out of the range. If u
is found, then add edges (e; (p; u)) and (e; (p; u)) where u 2 (Bu; 2yi−u). As illustrated in
150
Figure 7.8 (Case 2), u is a feasible position in [Bl; Bu], and yi is the only feasible position
in (Bu; 2yi − u). Therefore, only two edges (e; e3) and (e; e4) are added in SP graph.
As to other feasible positions v of Ai, it must be outside the range [u; 2yi − u). If p, v
and w (w is a feasible position of Ai+1) are selected for wire segments Ai−1, Ai, and Ai+1,
respectively, in a solution S, then there must exist a solution S with less total deviation.
In S, the positions of all segments are the same as those in S except that Ai is placed at u
instead of v.
Case 3 yi  Bl  Bu
Start from Bl, and search within the range [Bl; Bu]. If Bl is occupied by power rails,
then check Bl + 1, Bl + 2 ... until a feasible position u is found or it is out of the range. If
u is found, then add edges (e; (p; u)) and (e; (p; u)) where u 2 (2yi − u; Bl). As shown in
Figure 7.8 (Case 3), u is a feasible position in [Bl; Bu], and there are two feasible positions
in (2yi − u; Bl). Therefore, three edges (e; e3), (e; e4) and (e; e6) are added in SP graph.
If the conditions in the above three cases are not satisfied, then just connect nodes in
the original way.
7.6 Experimental Results
Our algorithms were implemented in C++ on PC (733MHz) with 128MB memory. We
tested CVE algorithms for four test files in Table 7.1. These circuits were obtained from
industry files. For all of the test circuits, the allowable derivation of each signal wire seg-
ment is bounded as 2% of the height of the ECO region area. After applying the FCVE
151
Table 7.1 Test files of CVE problem.
File N3 S6 M8 F10
ECO Region Area (4908.92, (3295.52, (10872.90, (4799.54,
(um2) 3295.52) 4908.92) 4799.54) 10872.90)
Signal Segments 1601 2098 1266 726
Power Rail Segments 166 1128 631 747
Sensitive Segments 1439 1868 1085 683
Crosstalk Violation Segments 406 296 177 227
Allowable Deviation 2% 2% 2% 2%
Node Clustering for CVE 10 9 30 60
algorithm, we can find clean routing solutions for all four files, and the max deviations are
much smaller than the given bound. Then based on the output of the FCVE algorithm, we
use SCVE to further improve the total deviation. The test results in Table 7.2 show that
SCVE can greatly reduce the total deviation, for example, the total deviation is reduced to
less than 5% of the original total deviation for both N3 and S6.
Table 7.2 Test results of CVE problem.
File N3 S6 M8 F10
Max Deviation 0.12% 0.01% 0.28% 0.85%
Crosstalk Violation Segments 0 0 0 0
Time FCVE 3 2 2 1
(second) SCVE 5 1 9 42
Total FCVE (um) 4533.54 768.15 11713.80 51930.80
Deviation SCVE (um) 215.32 23.28 633.90 6820.99
FCVE/SCVE 4.75% 3.03% 5.41% 13.13%
152
Table 7.3 Optimization for test file N3.
Node Total Time (second)
Clustering Deviation NEO EO
2 160.23 251 149
4 176.56 49 32
6 198.77 22 14
8 209.17 12 8
10 215.32 8 5
Moreover, we tested the optimization strategies on the test file N3. Table 7.3 shows the
test results of different granularity of node clustering. When more nodes are clustered as a
“supernode”, the running time is much shorter although the total deviation is a little larger.
At the same time, the experimental results show that edge omitting optimization strategy
is also very effective such that the running time can be shortened by 1=3. NEO means no
edge omitting is adopted; while EO refers to edge omitting.
7.7 Conclusion
In this chapter, we present a two-stage algorithm to solve the CVE (Crosstalk Violation
Elimination) problem. The first stage processes signal wire segments one by one and tries
to find a clean routing solution. Then efforts are made in the second stage to minimize
the total deviation. Furthermore, in order to facilitate the process of huge problems, we
propose efficient optimization strategies to speed up the execution. Experimental results
demonstrate the efficiency and effectiveness of our approach.
153
CHAPTER 8
CONCLUSION
8.1 Summary
Physical design plays an important role of connecting front-end design and back-end design
in chip development. In this thesis, we present various algorithms for problems in VLSI
physical design.
In Chapter 2, we propose bus-driven floorplanning that considers floorplanning and bus
planning simultaneously. An efficient evaluation algorithm is developed to transform a se-
quence pair representation to a BDF solution which is a placement of all circuit blocks such
that each bus can be realized as a rectangular strip (horizontal or vertical) going through all
the blocks connected by the bus.
In Chapter 3, we address the wire planning problem with bounded over-the-block con-
straints. We present two exact polynomial-time algorithms. Both algorithms guarantee to
find an optimal routing solution for a two-pin net as long as one exists. One requires less
memory, while the other is faster when processing a large number of nets.
In Chapter 4, we present the first polynomial-time optimal algorithm for simultaneous
pin assignment and routing in multilayer for all two-pin nets between a source block and
154
all other blocks. Our algorithm is applicable for both global routing and detailed routing
with arbitrary routing obstacles on multiple layers, and guarantees a pin-assignment and
routing solution with minimum total cost   W +   V where W is the total wire length
and V is the number of vias. This algorithm matches well with ECO situations and can be
used to improve any routing solution.
In Chapter 5, we propose a polynomial-time algorithm for simultaneous pin assignment
and buffer planning for all 2-pin nets between a source macro block and all other blocks
such that each net satisfies the lower and upper bounds of connection intervals as well as
minimizing the total cost   W +   R where W is the total wire length and R is the
number of buffers. By applying this algorithm iteratively (i.e., each time selecting one
block as the source block), it provides a polynomial-time algorithm for pin assignment and
buffer planning for nets among multiple macro blocks.
In Chapter 6, we present two polynomial-time algorithms to resolve the overlaps be-
tween power rails and signal wires on the top layer as well as satisfying other constrains.
Both algorithms guarantee to find a feasible solution as long as one exists. One is faster,
while the other makes effort to reduce total deviation. According to different application
requirements, users can choose an appropriate one.
In Chapter 7, we propose a two-stage algorithm to solve the CVE problem. The target is
to find a new routing solution without crosstalk violations under certain constraints which
help to keep the new design close to the original one. Furthermore, in order to handle very
large problems, we propose efficient optimization strategies to speed up the execution. This
155
algorithm can also be used to eliminate crosstalk violations in the output of the above ECO
wire legalization problem.
8.2 Future Research
We now address some possible directions for the future research.
Advances in fabrication technology allow exponential growth in the number of tran-
sistors integrated on a die and keep modest increase in the cost of manufacturing process.
However, the increase in design cost cannot parallel with the flat increase rate in manu-
facturing cost. Proliferation, a process of making a new design by modifying an existing
product, such as product upgrade for speed push, saves design efforts from designing a
totally new chip and produces main revenue from the initial design. For high-end and high-
volume products, one good option of further improving chip performance is to add extra
metal layers based on an existing design after all easy and quick circuit fixes and process
tricks are already applied. Since the decision window is very limited, the utilization of
the newly added metal layer is a challenging task for design re-optimization. For example,
how to migrate wire segments to the new layer so that the coupling capacitance can be min-
imized. Also how to adjust segment positions when the two layers have layer-dependent
design spacing rules.
Another exciting new direction of research is to investigate non-Manhattan design
which is prompted by X-architecture. Compared with the Manhattan, the wire length can
be reduced by more than 20%, the via deduction can be more than 30% and the chip size
156
shrunk can be 10% [48]. However, to take full advantages of X-architecture, it is necessary
to develop X-aware floorplan and placement as well as mixed Manhattan-Diagonal routing.
Research in this direction should be very valuable.
157
REFERENCES
[1] S. M. Sait and H. Youssef, VLSI Physical Design Automation - Theory and Practice,
McGraw-Hill Book Company, 1995.
[2] F. Y. Young, C. N. Chu and M. L. Ho, “A unified method to handle different kinds
of placement constraints in floorplan design,” in Proceedings of the 15th International
Conference on VLSI Design, 2002, pp. 661-667.
[3] X. Tang and D. F. Wong, “Floorplanning with alignment and performance constraints,”
in Proceedings of ACM/IEEE Design Automation Conference, 2002, pp. 848-853.
[4] R. Liu, X. Hong, S. Dong, Y. Cai, and J. Gu, “VLSI/PCB placement with predefined
coordinate alignment constraint based on sequence pair,” in Proceedings of the 4th In-
ternational Conference on ASIC, 2001, pp. 167-170.
[5] F. Rafiq, M. Chrzanowska-Jeske, H. H. Yang, and N. Sherwani, “Bus-based integrated
floorplanning,” in IEEE International Symposium on Circuits and Systems, 2002, pp.
875-878.
[6] F. Rafiq, M. Chrzanowska-Jeske, H. H. Yang, and N. Sherwani, “Integrated floorplan-
ning with buffer/channel insertion for bus-based microprocessor designs,” in Proceed-
ings of International Symposium on Physical Design, 2002, pp. 56-61.
158
[7] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, “VLSI module placement based
on rectangle-packing by the sequence-pair,” IEEE Trans. on Computer-Aided Design,
vol. 15:12, pp. 1518-1524, 1996.
[8] X. Tang and D. F. Wong, “FAST-SP: a fast algorithm for block placement based on
sequence pair,” in ASP-DAC, 2001, pp. 521-526.
[9] X. Tang, R. Tian, and D. F. Wong, “Fast evaluation of sequence pair in block placement
by longest common subsequence computation,” in DATE-00, 2000, pp. 106-111.
[10] L. D. Huang, M. H. Lai, D. F. Wong, and Y. X. Gao, “Maze routing with buffer inser-
tion under transition time constraints,” IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, Vol. 22-1, pp. 91-95, Jan 2003.
[11] J. Lillis, C. K. Cheng, and T. T. Lin, “Optimal and efficient buffer insertion and wire
sizing,” in CICC, 1995, pp. 259-262.
[12] L.P.P.P. van Ginneken, “Buffer placement in distributed RC-tree networks for minimal
elmore delay,” in ISCAS, 1990, pp. 865-868.
[13] H. Zhou, D. F. Wong, I-M. Liu, and A. Aziz, “Simultaneous routing and buffer in-
sertion with restrictions on buffer locations,” IEEE Transaction on Computer-Aided De-
sign, 2000.
[14] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows, Prentice Hall, 1993.
159
[15] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, The MIT
Press, 1992.
[16] H. N. Brady, “An approach to topological pin assignment,” IEEE Transaction on
Computer-Aided Design, vol. CAD-3, pp. 250-255, 1984.
[17] N. L. Koren, “Pin assignment in automated printed circuit board design,” in Proceed-
ings of ACM/IEEE Design Automation Conference, 1972, pp. 72-79.
[18] L. Mory-Rauch, “Pin assignment on a printed circuit board,” in Proceedings of
ACM/IEEE Design Automation Conference, 1978, pp. 70-73.
[19] X. Yao, M. Yamada, and C. L. Liu, “A new approach to the pin assignment problem,”
in Proceedings of ACM/IEEE Design Automation Conference, 1988, pp. 566-572.
[20] S. G. Choi and C. M. Kyung, “Three-step pin assignment algorithm for building block
layout,” Electron. Lett., vol. 28, no. 20, pp. 1882-1884, 1992.
[21] J. Cong, “Pin assignment with global routing for general cell designs,” IEEE Trans.
on Computer-Aided Design, vol. 10, pp. 1401-1412, 1991.
[22] T. Koide, S. Wakabayashi, and N. Yoshida, “An integrated approach to pin assign-
ment and global routing for VLSI building-block layout,” in Proceedings of European
Conference on Design Automation with the European Event in ASIC Design, 1993, pp.
24-28.
160
[23] L. E. Liu and C. Sechen, “Multilayer pin assignment for macro cell circuits,” IEEE
Transaction on Computer-Aided Design, vol. 18, pp. 1452-1461, 1999.
[24] L. Y. Wang, Y. T. Lai, and B. D. Liu, “Simultaneous pin assignment and global wiring
for custom VLSI design,” in Proceedings of IEEE International Symposium on Circuits
and Systems, 1991, vol. 4, pp. 2128-2131.
[25] C. Albrecht, “Provably good global routing by a new approximation algorithm for
multicommodity flow,” in ACM International Symposium on Physical Design, 2000, pp.
19-25.
[26] C. Albrecht, A. B. Kahng, I. Mandoiu, and A. Zelikovsky, “Floorplan evaluation with
timing-driven global wireplanning, pin assignment, and buffer/wire sizing,” in IEEE
ASP-DAC, 2002, pp. 580-587.
[27] R. C. Carden and C. K. Cheng, “A global router using an efficient approximate mul-
ticommodity multiterminal flow algorithm,” in Proceedings of ACM/IEEE Design Au-
tomation Conference, 1991, pp. 316-321.
[28] J. D. Cho and M. Sarrafzadeh, “Four-bend top-down global routing,” IEEE Transac-
tion on Computer-Aided Design, vol. 17, pp. 793-802, 1998.
[29] J. Huang, X. L. Hong, C. K. Cheng, and E. S. Kuh, “An efficient timing-driven
global routing algorithm,” in Proceedings of ACM/IEEE Design Automation Cconfer-
ence, 1993, pp. 596-600.
161
[30] G. Meixner and U. Lauther, “A new global router based on a flow model and linear as-
signment,” in Proceedings of IEEE/ACM International Conference on Computer-Aided
Design, 1990, pp. 44-47.
[31] R. K. Ahuja, A. V. Goldberg, J. B. Orlin, and R. E. Tarjan, “Finding minimum-cost
flows by double scaling,” Mathematical Programming, pp. 243-266, 1992.
[32] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley,
1990.
[33] R. Otten, “Global wires harmful?” in Proceedings of International Symposium on
Physical Design, 1998, pp. 104-109.
[34] J. Cong, “Challenges and opportunities for design innovations in nanometer technolo-
gies,” SRC Working Papers, Dec 1997, http://www.src.org/prg mgmt/frontier.dgw.
[35] J. Cong, T. Kong, and D. Z. Pan, “Buffer block planning for interconnect driven
floorplanning,” in Proceedings of IEEE/ACM International Conference on Computer-
Aided Design, 1999, pp. 358-363.
[36] X. Tang and D. F. Wong, “Planning buffer locations by network flows,” in Proceedings
of International Symposium Physical Design, 2000, pp. 180-185.
[37] P. Sarkar, V. Sundararaman, and C. K. Koh, “Routability-Driven repeater block plan-
ning for interconnect-centric floorplanning,” in Proceedings of International Symposium
on Physical Design, 2000.
162
[38] F. F. Dragan, A. B. Kahng, I. I. Mandoiu, S. Muddu, and A. Zelikovsky, “Prov-
ably good global buffering using an available buffer block plan,” in Proceedings of
IEEE/ACM International Conference on Computer-Aided Design, 2000, pp. 104-109.
[39] F. F. Dragan, A. B. Kahng, I. I. Mandoiu, S. Muddu, and A. Zelikovsky, “Provably
good global buffering by multiterminal multicommodity flow approximation,” in Proc.
ASP-DAC, 2001, pp. 120-125.
[40] B. Preas and M. Lorenzetti, Physical design automation of VLSI systems, Benjam-
in/Cummings, 1988.
[41] T. E. Dillinger, VLSI Engineering, Prentice Hall, 1988.
[42] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Addison-Wesley,
1993.
[43] F. Mo, A. Tabbara, and R. K. Brayton, “A force-directed macro-cell placer,” in Pro-
ceedings of IEEE/ACM International Conference on Computer-Aided Design, 2000.
[44] S. Nag and K. Chaudhary, “Post-placement residual-overlap removal with minimal
movement,” in Proceedings of the Design, Automation and Test in Europe Conference
and Exhibition, 1999.
[45] N. Quinn and M. A. Breuer, “A forced directed component placement procedure for
printed circuit boards,” IEEE Transaction on Circuits and Systems, 1979.
163
[46] H. Zhou and D. F. Wong, “Global routing with crosstalk constraints,” in Proceedings
of ACM/IEEE Design Automation Conference, 1998.
[47] T. Sakurai and K. Tamaru, “Simple formulas for two and three dimensional capaci-
tance,” IEEE Transaction on Electron Devices, 1993.
[48] S. L. Teig, “The X architecture: not your father’s diagonal wiring,” in Proceedings
of the 2002 International Workshop on System-level Interconnect Prediction, 2002, pp.
33-37.
164
APPENDIX
PUBLICATIONS
Journal Papers
1. H. Xiang, X. Tang, and D. F Wong, “Bus-driven floorplanning,” IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, 2004.
2. L. Huang, X. Tang, H. Xiang, D. F. Wong, and I. Liu, “A polynomial time-optimal
diode insertion/routing algorithm for fixing antenna problem,” IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, No. 1, pp.
141-147, January 2004.
3. H. Xiang, X. Tang, and D. F. Wong, “Min-cost flow based algorithm for simultane-
ous pin assignment and routing,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 22, No. 7, pp. 870-878, July 2003.
Conference Papers
1. H. Xiang, K. Chao, and D. F. Wong, “An ECO algorithm for eliminating crosstalk vi-
olations,” in ACM/SIGDA 2004 International Symposium on Physical Design, 2004.
165
2. H. Xiang, X. Tang, and D. F. Wong, “Bus-driven floorplanning,” in IEEE/ACM In-
ternational Conference on Computer Aided Design, 2003.
3. S. Lee, H. Xiang, D. F. Wong, and R. Sun, “Wire type assignment for FPGA routing,”
in ACM International Symposium on Field-Programmable Gate Arrays, 2003.
4. H. Xiang, K. Chao, and D. F. Wong, “ECO algorithms for removing overlaps between
power rails and signal wires,” in IEEE/ACM International Conference on Computer
Aided Design, 2002, pp. 67-74.
5. H. Xiang, X. Tang, and D. F. Wong, “An algorithm for integrated pin assignment and
buffer planning,” in ACM/IEEE Design Automation Conference, New Orleans, 2002.
6. L. D. Huang, X. Tang, H. Xiang, D. F. Wong, and I. M. Liu, “A polynomial time
optimal diode insertion/routing algorithm for fixing antenna problem,” in Design,
Automation and Test in Europe, 2002.
7. H. Xiang, X. Tang, and D. F. Wong, “An algorithm for simultaneous pin assignment
and routing,” in IEEE/ACM International Conference on Computer Aided Design,
2001, pp. 232-238.
8. X. Tang, R. Tian, D. F. Wong, and H. Xiang, “A new algorithm for routing tree
construction with buffer insertion and wire sizing under obstacle constraints,” in
IEEE/ACM International Conference on Computer Aided Design, San Jose, 2001,
pp. 49-56.
166
9. H. Xiang, X. Tang, and D. F. Wong, “Simultaneous pin assignment and routing,”
in The Tenth Workshop on Synthesis and System Integration of Mixed Technologies,
2001.
10. L.D. Huang, X. Tang, H. Xiang, D. F. Wong, and I. M. Liu, “An exact diode in-
sertion/routing algorithm for fixing antenna problem,” in The Tenth Workshop on
Synthesis And System Integration of Mixed Technologies, 2001.
167
VITA
Hua Xiang was born in Shanghai, P.R.China, on December 26, 1973, the daughter of
Xinmin Xiang and Xueying Xu. She received the B.S. and M.S. degree from the Com-
puter Science and Technology Department of Peking University, China, in 1997 and 2000,
respectively. She entered the Department of Computer Sciences of University at Texas at
Austin in 2000 with MCD Fellowship. In 2002, she transferred to the department of Com-
puter Science of University of Illinois at Urbana-Champaign. She joined Cadence Design
Systems Inc. in San Jose, CA in Jan 2004.
She received many awards and honors during her undergraduate and graduate studies.
She has published 3 journal papers and 10 conference papers in VLSI physical design in
her Ph.D. study.
168
