Floorplan Design and Yield Enhancement of 3-D Integrated Circuits by Nain, Rajeev Kumar
Portland State University
PDXScholar
Dissertations and Theses Dissertations and Theses
2011
Floorplan Design and Yield Enhancement of 3-D Integrated
Circuits
Rajeev Kumar Nain
Portland State University
Let us know how access to this document benefits you.
Follow this and additional works at: http://pdxscholar.library.pdx.edu/open_access_etds
Part of the Computer Sciences Commons, and the Electrical and Computer Engineering
Commons
This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized
administrator of PDXScholar. For more information, please contact pdxscholar@pdx.edu.
Recommended Citation
Nain, Rajeev Kumar, "Floorplan Design and Yield Enhancement of 3-D Integrated Circuits" (2011). Dissertations and Theses. Paper
2810.
10.15760/etd.2804
ABSTRACT 
An abstract of the dissertation of Rajeev Kumar Nain for the Doctor of Philosophy in 
Electrical and Computer Engineering presented January 06, 2011. 
Title: Floorplan Design and Yield Enhancement of 3-D Integrated Circuits 
The semiconductor industry has witnessed aggressive scaling of transistors following 
Moore's law, and has harnessed its benefits in terms of speed, density, and die size in the 
past several decades. At present transistor count has crossed one billion per chip, and 
transistor delay has been reduced to picoseconds range. However, the aggressive scaling has 
slowed down in deep submicron technology because of several challenges in VLSI design 
and manufacturability. Due to increasing power, performance, cost of fabrication, 
challenges in lithography, and other financial bottlenecks beyond 28nm, the industry has 
begun to look for alternative solutions. This has led to the current focus of the industry on 
stacked three-dim~nsional (3-D) ICs. Three-dimensional integrated circuits, in which 
multiple device layers are stacked vertically, are an alternative solution to interconnect 
related·p~oblems. One of the main objectives of 3-D ICs is to replace longer interconnects 
by shorter wires and vias. Thus it reduces total wirelength, signal delay, buffer ~ount, and 
power consumption. In addition, 3-D ICs are more suitable for system-on-chip (SOC) 
design, in which heterogeneous technologies can be fabricated independently in different 
device layers prior to 3-D stacking. Thus different families of circuits such as logic, 
processor, memory, analog/RF circuits, sensors, optical I/ Os etc. can be integrated in the 
3-D stack. However, there are several challenges in the physical design of 3-D ICs such as 
optimal partitioning, floorplanning, placement and routing, yield, and reliability that need to 
be addressed before the mainstream acceptance of 3-D ICs by the electronics industry. 
Therefore several CAD tools and intelligent design methodologies are needed to evaluate 
and realize a 3-D design. Consequently, this work focuses on two sub-problems of 3-D IC 
design a) three-dimensional floorplanning, and b) yield enhancement of 3-D ICs. 
We have developed a placement-aware 3-D floorplanning algorithm that enables additional 
wirelength requction by planning for 3-D placement of logic gates in selected circuit 
modules during the floorplanning stage. Thus it also bridges the existing gap between 3-D 
floorplanning and 3-D placement. To reduce the solution space of 3-D floorplanning 
which is known to be an NP-hard problem, we derive a set of feasibility conditions on the 
topological representation of a floorplan. In addition, we have designed a fast module 
packing algorithm that satisfies a set of constraints for placement-aware 3-D floorplanning. 
Furthermore, we have designed an efficient evolutionary algorithm that is used in the 
proposed 3-D floorplanning algorithm for multi-objective combinatorial optimization. Our 
results show that the proposed placement-~ware 3-D floorplanning algorithm is very fast, 
and it reduces the system level total wirelength by 9.8% compared to existing state-of-the-
art floorplanning tools that do not plan for 3-D placement of floorplanning modules. 
Our module packing algorithtn and previously derived feasibility conditions are ·used to 
design another 3-D floorplanning algorithm with vertical module alignment (3-D FMA) 
that can be applicable in bus-driven 3-D floorplan design and heterogeneous 3-D IC 
2 
design. Our results - show that 3-.D FMA can generate good quality floorplans while 
satisfying a user defined set of constraints such as vertical alignment of modules, layer 
assignment of modules, and module repulsion constraints. It also scales well with increasing 
problem size. 
Next, we identify the functional yield problem due to failure of vertical interconnects 
known as Through-Silicon Vias (TSVs) and propose yield improvement techniques based 
on via redundancy technique. Monte Carlo simulation results show that with proposed 
redundancy structures, high yield can be achieved for a large number of TSV s and a wide 
range of defect rates. We then present a stochastic methodology to estimate parametric 
yield in terms of the number of fast/ slow chips in a bin. 
Finally, we present a set of redundant via dependent analytical yield models for functional 
as well as parametric yield. The results obtained by the analytical models match closely with 
Monte Carlo simulation results. Thus they eliminate the need for computationally expensive 
Monte Carlo simulations. These analytical models can be used in fast estimation of yield, 
and can be used in yield-aware physical design such as 3-D floorplanning and P&R. We 
further derive an a~alytical model for a sweet spot between the numbers of fast/ slow chips 
obtained using our proposed solutions, and present an analytical model for the estimation 
of total chip revenue. The total chip revenue model takes the prices of fast and slow chips 
as input, and for a given TSV defect rate and our redundancy con~guration, it estimates the 
total number of fast/ slow chips in a bin for the total chip revenue estimation. 
3 
DEDICATION 
This dissertation is dedicated to my parents. 
ACKNOWLEDGMENTS 
I would like to thank my advisor Malgorzata Chrzanowska-J eske, whose encouragement, 
supervision and support from the preliminary to concluding level enabled me to pursue the 
research. I sincerely thank all the members of my dissertation committee for their 
participation during various stages of my academic progress. My special thanks to Prof. 
Robert Daasch for providing useful feedback on 3-D IC yield work during my seminar 
presentation. Furthermore, I wish to thank my colleague Rehman Ashraf for the insightful 
discussions during the research work related to 3-D IC yield. My special thanks to Darcy 
Kennc;dy and Melissa Sutherland for their help during proof reading of my research articles, 
and this dissertation. 
Finally I would like to thank my family for their endless support and love. Without their 
support, the completion of this work wo~d not have been possible. 
11 
Floorplan Design and Yield Enhancement of 3-D Integrated Circuits 
by 
Rajeev Kumar N ain 
A dissertation submitted in partial fulfillment of the 
requirements for the degree of 
Doctor of Philosophy 
ID 
Electrical and Computer Engineering 
Dissertation Committee: 
Malgorzata Chrzanowska-Jeske, Chair 
Dan W. Hammerstrom 
W .. Robert Daasch 
Paul Van Halen 
Fei Xie 
Portland State University 
©2011 
TABLE OF CONTENTS 
Acknowledgments: ................................................................................................................................ ii 
Llst of Tables ......................................................................................................................................... vi 
Llst of Figures ....................................................................................................................................... vii 
Glossary .................................................................................................................................................. xi 
Chapter 1: Introduction ........................................................................................................................ 1 
1.1 Major Challenges of 2-D Integrated Circuits .............................................................. 1 
1.2 Three-Dimensional Integrated Circuits ....................................................................... 3 
1.3 3-D IC Technologies ................................................... ~: ............................................. : .... 4 
1.4 Through-Silicon Via Technologies ............................................................................... 9 
1.5 Major Challenges in 3-D IC Design ........................................................................... 10 
1.5.1 CAD Tools·for 3-D ICs ........................................................................................ 10 
1.5.2 Heat Extraction and Thermal Management ...................................................... 11 
1.5.3 Power and Ground Delivery in 3-D ICs ............................................................ 13 
1.5.4 Clock Tree Synthesis for 3-D ICs ........................................................................ 13 
1.5.5 TSV Management in 3-D ICs ............................................................................... 14 
1.5.6 TSV-Induced Design for Manufacturability and 3-D IC Yield ...................... 14 
1.6 Current Status of 3-D Integrated Circuits ..................... : ........................................... 17 
1. 7 Contributions of the Dissertation ............................................................................... 18 
1.7.1 CAD Tool Design for Native 3-D Floorplanning ............................................ 18 
1.7.2 Yield Improvement of 3-D ICs in the Presence ofTSV failure .................... 20 
1.8 Structure of the Dissertation ........................................................................................ 21 
Chapter 2: Floorplanning of 2-D and 3-D Integrated Circuits ................................................... 23 
2.1 Shapes of Circuit Modules in a Floorplan ................................................................. 23 
2.2 Classification of Floorplanning .................................................................................... 24 
2.3 Complexity of the Floorplanning Problem ............................................................... 25 
2.4 2-D Floorplan Representations .............................................................. , .................... 27 
2.5 3-D Floorplan Representations ................................................................................... 28 
2.6 Previous Work on 3-D Floorpl~nning ........................................................................ 31 
2.7 A Basic 3-D Floorplanning Tool ................................................................................ 33 
Chapter 3: Vertical Constraints in Sequence Pair Representation .............................................. 36 
3.1 Introduction to Sequence Pair ..................................................................................... 38 
3.2 Vertical Constraints on Sequence Pairs .................................................................... .41 
3.3 Two Layer Feasibility Condition ................................................................................. 42 
3.4 Graph Representations of Feasibility Conditions ................................................... .44 
3.5 ·3DCG: A Module Packing Algorithm with Vertical Constraints ........................ .49 
3.6 LCSLS: A Fast Module Packing Algorithm with Vertical Constraints ................ 53 
Chapter 4: Placement-Aware 3-D Floorplanning .......................................................................... 57 
4.1 Motivation ....................................................................................................................... 57 
111 
4.2 Placement-Aware Constraints ....................................................................................... 59 
4.3 Problem Formulation .................................................................................................... 62 
4.4 Stochastic Combinatorial Optimization ..................................................................... 62 
4.5 Cost Estimation of wirelength reduction due to module splitting inside 3-D 
Modules .......................................................................................................................................... 63 
4.6 Perturbation of Solution Space .................................................................................... 67 
4.7 Cost Function ................................................................................................................. 73 
4.8 Design Flow of the Placement-Aware 3-D Floorplanning Algorithm ................ 75 
4.9 Experimental Results .............................................................................. : ...................... 78 
4.9.1 Experimental Setup ................................................................................................ 78 
4.9.2 Comparison of 3-D Packing Algorithms with Vertical Constraints ............. 79 
4.9.3 , Comparison of Area and .Wirelength Minimization without Vertical 
Constraints ......... : .................................................................................................................... 81 
4.9.4 Impact of Placement-Aware 3-D Floorplanning with Vertical Constraints 
on System Level Wirelength ................................................................................................ 85 
4.9.5 Effect of Feasibility Conditions on the Solution Quality of 3-D FVC. ........ 91 
Chapter 5: 3-D Floorplanning with Module Alignment .............................................................. 93 
. 5.1 Introduction and Motivation ............................................................................................. 93 
5.2 Problem Formulation ......................................................................................................... 95 
5.3 Combinatorial Optimization and the Cost Function .................................................... 97 
5.4 Perturbation of the Solution Space ................................................................................ 100 
5.5 Experimental Results ....................... : ..................................................... : .......................... 104 
5.5.1 Effect of Module ft.lignment on MCNC Benchmarks .................................. 104 
5.5.2 Effect ofincreasing the Number of Module Alignment Constraints on 3-D 
Floorplanning using GSRC Benchmarks ........................................................................ 106 
5.5.3 Composite Effect ofincreasing the Number of Various Constraints on 3-D 
Floorplanning using GSRC Benchmarks ........................................................................ 106 
5.5.4 Runtime Comparison of 3-D FMA with LTCG based 3-D Floorplanning 
Algorithm .............................................................................................................................. 108 
5.5.5 . An Example of a 4-Layer 3-D Floorplan with Various Constraints ........... 109 
Chapter 6: TSV-Induced 3-D IC Yield ......................................................................................... 112 
6.1 TSV Fabrication Technologies .................................................................................. 112 
·6.2 Alternative Via Technologies: Wireless Vias ........................................................... 114 
6.3 Carbon N anotube based Inductors for Wireless Vias .......................................... 11 7 
6.3 Yield as a Function of TSV Failure .......................................................................... 118 
Chapter 7: Via Redundancy for 3-D IC Yield Improvement .................................................. 119 
7.1 Redundancy Lattices .................................................................................................... 122 
7.2 Redundancy Evaluation Factors ................................................................................ 123 
7.3 Redundancy Configuration in a Device Layer ........................................................ 125 
7.3.1 Wireless Via Redundancy Configuration .......................................................... 125 
7.3.2 Physical Via Redundancy Configuration .......................................................... 126 
7.4 Yield Estimation by Monte Carlo Simulation ......................................................... 129 
7.5 Modeling Area, Delay and Power of Redundant Vias .......................................... 134 
7.5.1 Area Tradeoff ........................................................................................................... 134 
lV 
7.5.2 Delay Tradeoff ......................................................................................................... 135 
7.5.3 Power Tradeoff ........................................................................................................ 137 
7.6 Effect of Redundancy on Parametric Yield ............................................................ 138 
7 .6.1 Estimation of the Total Number of Global Wires ............................................ 139 
7.6.2 Estimation of the Total Number of Fast Chips ................................................ 140 
Chapter 8: Redundant Via Dependent Analytical Yield Models .: ............................................ 143 
8.1 Nomenclature ............................................................................................................... 144 
8.2 Analytical Model for Quad Wireless Plus (QWP) Configuration ....................... 145 
8.3 Analytical.Model for Octal Wireless Plus (OWP) Configuration ....................... 149 
8.4 Analytical Model for Octal TSV Complete (OTC) Configuration ..................... 150 
8.5 Analytical Model for Fast/Slow Chips ..................................................................... 151 
8.6 Analytical Model for Chip .Revenue ......................................................................... 154 
8.7 Extension ·of Analytical Yield Models of Two-Layer 3-D Chips to Multi-Layer 
3-D Chips ..................................................................................................................................... 155 
8.8 Application of Yield Improvement Strategies during Floorplanning ................ 157 
Chapter 9: Conclusions and Future Work .......................... -................. : ........................................ 161 
Chapter 10: Summary of Major Contributions ............................................................................ 167 
References ........................................................................................................................................... 170 
Appendix A: Mathematical Proofs of the Feasibility Condition Theorems ........................... 184 
Appendix B: Stochastic Rectangular 3-D Wirelength Distribution Models ........................... 188 
B.1 Introduction ...................................................................................................................... 188 
B.2 General Background ....................................................................................................... 189 
B.3 Rectangular 2-D Wirelength Distribution Model... .......................................... : ........ 192 
B.4 Extension to Rectangular 3-D Wirelength Distribution Models ............................ 200 
B.5 Experimental Results ...................................................................................................... 201 
B.5.1 Effect of Aspect Ratio on Wirelength Distribution and Total Wirelength202 
B.5.2 Comparison Among Rectangular 3-D Wirelength Distribution Models .... 203 
Appendix C: Rent's Parameter Extraction .................................................................................... 205 
C.1 Extraction of Rent's Parameters from The Netlist of A Circuit Block ............. 205 
C.2 Rent Parameter For Modules of Floorplan Benchmarks ..................................... 205 
Appendix D: Sensitivity Analysis of the Cost Function: of Floorplanning Algorithms ....... 213 
D.1 Sensitivy Analysis of the Cost Function of 3-D PVC Algorithm Without 
Module Splitting .......................................................................................................................... 213 
D.2 Sensitivy Analysis of the Cost Function of 3-D PVC Algorithm With Module 
Splitting ......................................................................................................................................... 216 
D.3 Sensitivy Analysis of the Cost Function of 3-D FMA .......................................... 219 
v 
LIST OF TABLES 
TABLE 4.1: COMPARISON OF TOTAL MEASURED WIRELENGTH (226827.0 GATE PITCHES) 
OF 18 IFU CIRCUITS OF IBM POWER4 WITH THEIR ESTIMATED WIRELENGTH 
USING DIFFERENT WIRELENGTH DISTRIBUTION MODELS ......................................... 66 
TABLE 4.2: RUNTIME COMPARISON OF 3-D PACKING WITH VERTICAL CONSTRAINTS ..... 80 
TABLE 4.3: COMPARISON OF FOOTPRINT AREA, WIRELENGTH AND VIA COUNT 
OPTIMIZATION WITHOUT VERTICAL CONSTRAINTS ..................................................... 83 
TABLE 4.4: RUNTIME COMPARISON WITHOUT VERTICAL ALIGNMENT ................................ 84 
TABLE 4.5: EFFECT OF 3-D PVC WITH MODULE SPLITTING ON THE SYSTEM LEVEL 
TOTAL WIRELENGTH ON A 4-LAYER 3-D FLOORPLAN ............................................... 88 
TABLE 4.6: EFFECT OF FEASIBILITY CONDITIONS ON THE SOLUTION QUALITY OF 3-D 
FVC .......................................................................................................................................... 90 
TABLE 5.1: EFFECT OF DIFFERENT PLACEMENT CONSTRAINTS ON A 4-LAYER 3-D 
FLOORPLAN USING MCNC BENCHMARKS ..................................................................... 105 
TABLE 5.2: EFFECT OF INCREASING THE NUMBER OF MODULE ALIGNMENT 
CONSTRAINTS ON A 4-LAYER 3-D FLOORPLAN USING GSRC BENCHMARKS ........ 107 
TABLE 5.3: COMPOSITE EFFECT OF INCREASING THE NUMBER OF DIFFERENT 
CONSTRAINTS ON A 4-LA YER 3-D FLOORPLAN USING GSRC BENCHMARI<S ........ 107 
TABLE 5.4: RUNTIME COMPARISON OF 3-D FMA WITH LTCG BASED 3-D 
FLOORPLANNER .................................................................................................................. 110 
TABLE 7.1: NUMBER OF TSVs REQUIRED To BE CONNECTED IN PARALLEL WITH EACH 
PRIMARY TSV IN 3-D CHIPS To OBTAIN 90% FUNCTIONAL YIELD ........................ 120 
TABLE 7.2: MONTE CARLO YIELD RESULTS FOR REDUNDANT TSV CONFIGURATIONS .130 
TABLE 7.3: MONTE CARLO YIELD RESULTS FOR WIRELESS VIA REDUNDANCY .............. 130 
TABLE 7.4: COMPARISON OF 3-D IC YIELD OBTAINED BY MONTE-CARLO SIMULATION 
WITH INCREASING VIA COUNT AND USING THE MOST PROMISING VIA 
REDUNDANCY CONFIGURATIONS ................................................................................... 133 
TABLE 7.5: AREA PENALTY OF REDUNDANT LATTICES IN TERMS OF THE EQUNALENT 
AREA OF A Two-INPUT NAND GATE ........................................................................... 135 
TABLE 7.6: DELAY PENALTY AND IT'S SCALINGWITHTECHNOLOGYNODE .................... 135 
TABLE 7. 7: POWER PENALTY AND ITS SCALING WITH TECHNOLOGY NODE ................... 137 
Vl 
LIST OF FIGURES 
Figure 1.1: Trend between interconnect delay and gate delay for different technology nodes 
[124] ........................................................................................................................................... 2 
Figure 1.2: Comparison between (a) 2-D ICs, and (b) 3-D ICs. In 2-D ICs, all blocks are 
integrated on a single silicon substrate. 3-D IC technology offers the flexibility of 
integration using heterogeneous types of substrates in a 3-D stack .............................. 3 
Figure 1.3: (a) Schematic of an epitaxially grown second device layer [98] (b) Schematic of 
the Ge seeded solid phase crystallization (SPC) process [1] ............................................ 5 
Figure 1.4: Cross-sectional view of different types of wafer bonding (a) face-to-face (b) 
face-to-back. [4] ....................................................................................................................... 7 
Figure 1.5: Different types of stacking methods used in 3-D IC fabrication (a) Wafer-to-
Wafer, (b) Die-to-Die, and (c) Die-to-Wafer ..................................................................... 8 
Figure 1.6: TSVs in a stacked 3-D IC. [28] ....................................................................................... 9 
Figure 1.7: Technology range and applications of vertical interconnects. [10] ........................ 10 
Figure 1.8: Cross sectional view of a 3-D IC using integrated microchannel cooling 
technology. [22] ...................................................................................................................... 12 
Figure 1.9: (a) High thermo-mechanical stress shown by the white rectangle at the edge of a 
TSV [38], (b) An SEM image of a visible crack due to stress in a TSV [23], and (c) 
High stress sites in a multi-layer 3-D IC [27] ................................................................... 16 
Figure 2.1: (a) the structure of a T-tree, (b) A compacted placement of modules and the 
corresponding T-tree. [62] ................................................................................................... 29 
Figure 2.2: (a) true 3-D floorplanning [125] (b) quasi 3-D floorplanning [13). ........................ 30 
Figure 2.3: Flow-chart of a basic 3-D floorplanning tool inherited at the beginning of the 
research. The shaded boxes indicate the portions of the algorithm that have been 
modified as a part of this research work. .......................................................................... 35 
. Figure 3.1 : Example of vertical constraints in 3-D SoC (a) module alignment, bus planning, 
and each layer containing analog/RF blocks are vertically aligned together, (b) 
analog/RF blocks have been a~signed in on the top layer and they have been 
separated from noisy digital block. ...................... :······························································37 
- .Figure 3.2: (a) oblique grid for r+ = {ab c d} and r-- = {cad b}, (b) resultant placement 
of blocks, and (c) different packing for the same sequence pair due to change in 
sizes of modules ..................................................................................................................... 39 
Figure 3.3: Weighted graphs Gh Oeft) and Gv (right) of the sequence pair shown in Figure 
3.2 .............................................................................................................................................. 39 
·vii 
Figure 3.4: (a) An example of a two-layer grouped sequence pair, and (b) its corresponding 
floorplan in two device layer ............................................................................................... 41 
Figure 3.5: (a) Module pairs {Ah A2} and {B1' B2} under vertical constraints in two device 
layers are shown. (b) If A1 is moved rightward as shown by the arrow to align with 
A2, B1 moves further away from B2• Both pairs cannot be aligned simultaneously, 
and thus are infeasible .......................................................................................................... 43 
Figure 3.6: Construction of a feasibility condition graph: (a) Given sequence p~rs {r/; r 1-
} of layer 1 and {r/ ; r 2-r of layer 2. Module pairs {A1, A2} and {B1, B2} are 
vertically constrained. (b) Constrained sequence pairs ( c) The resultant feasibility 
condition graph ...................................................................................................................... 45 
Figure 3.7: Graph -representation showing six out of eight cases which are feasible 
conditions for two module pairs in two· device layers ................................................... .46 
Figure 3.8: Graph representation of infeasible sequences pair constraints for two module 
pairs in two device layers ...................................................................................................... 46 
Figure 3.9: 3-D-X constraint graph creation. (a) 2-D constraint graph along + X axis in 
Layer 1 (b) 2-D constraint graph along+ X axis in Layer 2 (c) Merging the two 2-D 
constraint graphs using a global source and a globa:l sink node. To vertically align 
. node2 and node 7, two edges are inserted betwee1?- node2 and node 7 in the cyclic 
·fashion (i.e. 2~7 and 7~2). All newly introduced edges are dotted and have zero 
weights ..................................................................................................................................... 52 
Figure 3.10: Creation of a constrained adjacency list Adj_X for 2-device layers. (a) ·A = 
{A1, Ai}, B = {B1, B2}, ••• , E = {E1, E2} are vertically constrained in two device 
layers. (b) Adj_X obtained by merging the graphs of two layers and denoting the 
merged nodes by their group name ................................................................................... 54 
Figure 4.1: (a) A 3-D floorplan of 2-D modules (b) a new 3-D floorplan with sub-module 
pairs {A1, A2} and {C1' Ci} transformed as 3-D modules after module splitting and 
imposing vertical constraints ............................................................................................... 59 
Figure 4.2: Sub-module pair placed in (a) consecutive (b) alternate device layers. Sub-
modules in the consecutive device layers minimize the TSV height. .......................... 61 
Figure 4.3: Sub-module pair with (a) the same planar location (b) different planar locations. 
Vertical constraint with the same planar location of sub-modules produces smaller 
intra-module wirelength ....................................................................................................... 61 
Figure 4.4: Predicted vs. placed and routed wirelengths of ISPD'98 benchmark circuits. 
Wirelength of 3-D placement and routing is normalized w.r.t. 2-D design to show 
the percentage reduction due to 3-D design [7 5] ............................................................ 66 
Figure 4.5: An example _of Module split move. (a) Sub-modules {A1, B1, C1, D 1}, {A2, B2, 
C2, D2} are vertically constrained in layer 1 and layer 2 under the feasibility 
configuration of a cliqµe of size 4 and their sequence pairs are shown. Module "w" 
initially resides in layer 2 which is about to be split. (b) Resultant sequence pairs of 
layer 1 and layer 2 after splitting ......................................................................................... 71 
viii 
Figure 4.6: Flowchart of the proposed placement-aware 3-D floorplanning using vertical 
constraints (3-D PVC). The grey shaded portions have been modified from the 
initial algorithm. The left most rectangular boxes in orange shade are the new 
methods ................................................................................................................................... 77 
Figure 4.7: Runtime of 3-D FVC (with increasing problem size) using various packing 
· algorithms with vertical constraints. LCSLS is faster than 3DCG ............................... 80 
Figure 4.8: Runtime comparison of various 3-D floorplan algorithms with increasing 
problem size. Thermal optimization and vertical alignment were disabled. All 
runtimes have been scaled linearly to 3 GHz CPU speed ............................................. 84 
Figure 4.9: Comparison of (a) inter-module wire, (b) total wirelength (c) footprint area, and 
(d) inter-module via count, obtained using different floorplanning tools. Total 
wirelength is the sum of inter and intra module wirelength. Vertical constraints are 
applied only when module splitting (M:S) is activated. 3-D PVC (with MS) reduces 
total wirelength by 9.8%. The bar charts represent the aggregate data of all the 
benchmarks. Please note that each chart has a different y-scale .................................. 87 
Figure 4.10: A 4-layer 3-D floorplan of ami49. Please notice that parts of module 0 (OA, 
OB), 1, 2, 3, 4, 5, 7, 29, 32, 43, 44 and 47 have been placed in consecutive device 
layers. The planar locations of sub-modules are the same ............................................ 89 
Figure 5.1: Example of different placement constraints in 3D floorplanning ......................... 94 
Figure 5.2: Runtime Comparison of 3-D FMA with LTCG ........................... ~ ......................... 110 
Figure 5.3: A 4-layer 3-D floorplan of ami49 obtained using 3D-FMA. Module groups 
under MA constraints are {O, 3}, {1, 2, 5, 32}, and {10, 17, 20, 48}. MR constraint 
groups are {4, 33} and {11, 28}. Layer assignment constraints in Layer 1 = {3, 33, 
40}, Layer 2 = {O, 7, 32}, Layer 3 = {12}, and Layer 4 = {13} ................................ 111 
Figure 6.1: TSVs in 3-D ICs using (a) via-first, and (b) via-last methods. [107] ................... 112 
Figure 6.2: Concept of ACCI (a) Capacitive ACCI (b) Inductive ACCI. [82] ....................... 115 
Figure 6.3: Schematic of (a) transmitter, and (b) receiver circuits of a wireless via using 
inductive coupling [82] ......................................... ~ ............................................................. 116 
Figure 7.1: Proposed lattice structures for redundant via insertion in a device layer of a 3-D 
IC (a) Quad Lattice, (b) Octal Lattice, and (c) Dual Lattice. The redundant via can 
either be a wireless via or a physical TSV ....................................................................... 123 
Figu~e 7.2: Quad Wireless Plus (QWP) configuration. If all the four TSVs within the shaded 
quad lattice fail, then it can be repaired by the neighboring wireless vias ................ 127 
Figure 7.3: Interaction of an Octal Lattice (shaded region at the center) with its 
neighborhood lattices in an Octal Wireless* Configuration ................... , .......... : ......... 128 · 
Figure 7.4: Interaction of an Octal Lattice (grey shaded region at the center) with its 
neighboring lattices in an Octal Wireless Plus configuration ...................................... 128 
lX 
Figure 7 .5: Interaction of a Dual (grey shaded) lattice with the neighboring lattices 
(encircled by dotted lines) in a Dual TSV redundancy configuration ....................... 129 
Figur~ 7.6: Performance Reduction due to via re-routing through MUX logic. The chip's 
. target frequency is 2.5 GHz (ideal case) .......................................................................... 136 
Figure 7.7: A critical path spanning across two device layer using a TSV. We assume that a 
critical path crosses only once through a TSV ............................................................... 138 
Figure 7 .8: Comparison between the number of fast chips obtained (a) for 4% defect rate 
using Quad Wireless Plus (QWP), Octal Wireless Plus (OWP), and Octal TSV 
Complete (OTC) configurations (b) for 1 % and 4% defect rates by Quad Wireless 
Plus (QWP) configuration ................................................................................................. 142 
Figure 8.1: Comparison of Analytical model results with Monte Carlo simulation results for 
yield obtained using (a) Quad Wireless Plus configuration, and (b) Octal Wireless 
Plus Configuration for different numbers ofTSVs in 3-D ICs ................................. 148 
Figure 8.2: Analytical and Monte Carlo simulation results for Fast/Slow chips for Quad 
Wireless Plus configuration for (a) 1 % defect rate (b) 4% defect rate ...................... 153 
Figure 8.3: Decomposition of a 3-layer chip into a pair of identical two-layer 3-D ICs for 
yield calculation using superposition ............................................................................... 156 
Figure 8.4: The flow chart of a Yield-Aware 3-D Floorplanning ............................................. 159 
x 
3DCG 
3-D STAF 
ACCI 
BCB 
BSG 
BEOL 
CAD 
CBA 
CBL 
CMOS 
CMP 
CNT 
CPU 
CTE 
DAG 
D2D 
D2W 
DTL 
EA 
ES 
FEM 
FPGA 
GSRC 
GSP 
HPWL 
IEEE 
IBM 
IC 
GLOSSARY 
3-D Constraint Graph 
3-D Scalable and Temperature Aware Floorplanning 
AC Coupled Interconnects 
Benzocyclobutene 
Bounded Sliceline Grid 
Back-end-of-line 
Computer Aided Design 
Combined Bucket Array 
Corner Block List 
Complementary Metal Oxide Semiconductor Field Effect Transistor 
Chemical Mechanical Polishing 
Carbon Nano Tube 
Central Processing Unit 
Coefficient of thermal expansion 
Directed Acyclic Graph 
Die-to-Die 
Die-to-W a.fer 
Dual TSV Lattice 
Evolutionary Algorithm 
Evolution Strategy 
Finite Element Method 
Field Programmable Gate Array 
Giga Scale Research Center 
Grouped Sequence Pair 
Half Perimeter Wire Length 
Institute of Electrical and Electronics Engineers 
International Business Machine 
Integrated Circuit 
X1 
IFU 
ITRS 
LNA 
LCS 
LCSLS 
MC 
MCNC 
MUX 
N 
OWL* 
OWP 
OW* 
OTL 
P&R 
QWL 
QWP 
QTL 
Rx 
soc 
SA 
SEM 
SP 
SPC 
SRAM 
TCG 
TSV 
Tx 
VLSI 
W2W 
Instruction Fetch Unit 
International Technology Roadmap for Semiconductors 
Low No~se Amplifier 
Longest Common Subsequence 
Longest Common Subsequence and Lateral Shifting 
Monte Carlo 
Microelectronics Center of North Carolina 
Multiplexer 
Number of logic gates 
Octal Wireless* Lattice 
Octal Wireless Plus Configuration 
Octal Wireless* Configuration 
Octal TSV Lattice 
Placement & Routing 
Quad Wireless Lattice 
Quad Wireless Plus Configuration 
Quad TSV Lattice 
Receiver 
System on Chip 
Simulated Annealing 
Scanning Electron Microscope 
Sequence Pair 
Solid Phase Crystallization 
Static Random Access Memory 
Transitive Closure Graph 
Through Silicon Via 
Transmitter 
Very ~arge Sc~e Integration 
Wafer-to-Wafer 
X11 
CHAPTER 1: INTRODUCTION 
1.1 MAJOR CHALLENGES OF 2-D INTEGRATED CIRCUITS 
As device size continues to decrease, designers integrate more and more functional blo.cks 
on a single die of a 2-D IC. As a result, circuit density, complexity of wiring,. and the 
number of global wires continue to increase. Furthermore, aggressive scaling with new 
technology results in a smaller wire cross-section, shorter wire pitch and longer global 
interconnects. Due to these scaling trends, interconnect delays continue to increase whereas 
the gate delay continues to decrease with shrinking technology node. Therefore 
interconnect delay dominates over gate delay in deep submicron technology [1], and 
remains a critical bottleneck of high performance VLSI chip design. According to ITRS, 
interconnect delay determines the system performance. For example, for a 1mm minimum 
pitch Cu global wire without repeater as 45nm technology node has a 542 ps delay, whereas 
a 10 level ~ogic depth) FO 4 delay is around 150ps at 45 nm [124]. Figure 1.1 shows the 
trend of global wire delay and gate delay with respect to different technology nodes. From 
Figure 1.1,. increasing gap between the gate delay and the interconnect delay can be 
observed with advancement in the technology node. 
In the era of Information Technology, the need for system-on-chip (SOC) has been 
constantly increasing. The SOC integrates different types of circuits such as logic, memory, 
analog/RF blocks, etc within a single chip. This heterogeneous integration on a single die 
becomes difficult because different families of circuits might require different types of 
substrates. Furthermore, the unwanted interaction of electrical signals among these circuits 
1 
-+- global wire w/o repeater 
101 -e- global wire with repeater 
-+- local wi re w/o repeater 
-&- local wire with repeater 
0 - gate 
- ·-· · -:·-···· ·· ········:·-·· · ···· . ... . .. . . . --·- · · -··· · 
· ··· ··'. ······ ·· ·· · ·· ·· ·:· -· -- ·- · - · · -· ·· ·:. - -- . ....... .. . 
. . . 
. ... . .. ... .. ... .. .. . .. . , .. ... .. ... .. . ..... .... .. ... ... . 
. . -. . -: . -- . - . -- . - . .. . - .. - . -- . - . -.. - . -.. ~ . . . . . .. .... .. . 
· · · · · ·:·· · · · · · ··· · ·· ·· ·:· - · ·-···· ···· ·-· : ·· - -- ··· -- · - -· 
70 60 50 40 30 20 
technolo·gy(nm) 
Figure 1.1: Trend between interconnect delay and gate delay for different 
technology nodes [124]. 
(such as substrate noise in a mixed signal design) might require additional design steps and 
increase design complexity. 
The electronics industry miniaturizes ICs by advanced lithography technology that is 
expected to continue through a 22 nm technology node and beyond. However, due to the 
increasing power, performance, and financial bottlenecks beyond 28 nm, industry has 
begun to look for alternative solutions. Therefore it has led to the current focus of the 
industry on stacked three-dimensional (3-D) integrated circuits. Research works have 
shown that 3-D ICs can reduce global wirelength between 28 - 56% [1],[2],[3],[4] which 
will in turn improve performance of 3-D chips, reduce buffer count, and decrease power 
consumption in wires. Researchers in (121] have presented a case study of 3-D design for 
ternary content-addressable memory (fCAM), and an 8192-point FFT processor fabricated 
2 
DSP Microprocessor 
Analog/ 
RF ckt 
Logic ckt Optical 1/0 
(a) 2-D IC (b) 3-D IC 
Figure 1.2: Comparison between (a) 2-D ICs, and (b) 3-D ICs. In 2-D ICs, all 
blocks are integrated on a single silicon substrate. 3-D IC technology offers the 
flexibility of integration using heterogeneous types of substrates in a 3-D stack. 
at 180nm technology. The 3-D TCAM design shows 23% power reduction whereas the 3-
D FFT shows 22% reduction in cycle time, 28% improvement in clock frequency, and 18% 
reduction in energy per Fourier transform compared to the same designs fabricated as a 
single layer 2-D chip [121]. Thus 3-D ICs offer advantages over 2-D chips even at the same 
technology node. 
1.2 THREE-DIMENSIONAL INTEGRATED CIRCUITS 
Three-dimensional (3-D) integrated circuits, in which multiple device layers are stacked 
vertically, are an alternative solution to interconnect related problems. One of the main 
objectives of 3-D ICs is to replace longer interconnects by shorter wires and vias. Thus it 
reduces total wirelength [1]-[4], signal delay, buffer count, and power consumption. In 
addition, 3-D ICs are more suitable for system-on-chip (SOC) design, in which 
heterogeneous technologies can be fabricated independently in different device layers prior 
to 3-D stacking. Thus different families of circuits such as logic, processor, memory, 
analog/RF circuits, sensors, optical I/ Os etc. can be integrated in the 3-D stack. 
3 
Furthermore, circuits fabricated using different types of substrates and different technology 
nodes can also be integrated in 3-D ICs as shown in Figure 1.2. In addition, memory 
intense digital systems ~an attain lower latency and higher throughput/bandwidth by 
stacking memory on top of digital logic circuits. It has been shown by the researchers that 
3-D technology can close the processor-qiemory performance gap [96],[97]. 
1.3 3-D IC TECHNOLOGIES 
The main technologies available for 3-D ICs are Silicon Epitaxial Growth, Solid Phase 
Crystallization and Processed Wafer Bonding [1]. In the Silicon Epita:xial. Growth technique, 
an additional Si layer is formed by etching holes in the passivated wafer and epitaxially 
growing a single crystal Si that is seeded from open windows in the inter-layer dielectric 
(ILD). The silicon crystal first grows vertically and then laterally to cover the ILD as shown 
in Figure 1.3(a) [98]. Since the newly grown layer of Si is a single crystal with few defects, 
the. quality of devices fabricated on the epitaxial layer can be theoretically as good as those 
fabricated on the wafer surface. However, the high temperature (,..., 1000°C) involved in this 
process causes significant degradation in the quality of lower layer devices. Also, this 
technique cannot be used over metallization layers. 
As an alternative to high temperature Silicon Epitaxial Growth, the Solid Phase Crystallization 
(SPC) technique offers low temperature deposition and crystallization of amorphous silicon 
(a-Si), on top of the lower active layer devices. The amorphous film is randomly crystallized 
to form a polysilicon film. The device performance is enhanced with local crystallization 
using low-temperature processes (<600°C) such as patterned seeding of Germanium as 
4 
1st Layer 
(a) 
Ge seeds 
ElO Islands 
connection of Islands 
i Seeding 
Substrate 
Grain Growth 
Lateral crystallization 
Substrate 
(b) 
Figure 1.3: (a) Schematic of an epitaxially grown second device layer [98] (b) 
Schematic of the Ge seeded solid phase crystallization (SPC) process [1]. 
5 
shown in Figure 1.3(b). In this method, Ge seeds implanted in narrow patters made on a-Si 
are used to induce lateral crystallization. This results in the formation of small islands, 
which are nearly single-crystal Si [1]. This technique offers multiple active layer creation 
compatible with the current CMOS processing environment. However, the. electrical 
characteristics of the fabricated devices are still not as good as single-crystal devices [99] due 
to the fabrication processes (in upper device layer) involving temperature around 500° C 
[99] . 
Another attractive technique, Processed W t!fer Bonding, allows two fully processed wafers with 
fabricated devices and some interconnects to be bonded together using copper wafer 
bonding [5],[6]. This technique, also known as 3-D stacking, is very suitable for further 
bonding of more device layers in the vertical direction and provides similar electrical 
properties of devices on all active layers. 
The processed wafer bonding technique allows face-to-face or face-to-back stacking as 
shown in Figure 1.4. In a face-to-face 3-D stacking, the silicon devices and interconnects in 
two device layers face each other. Thus it only allows stacking of two-device layers. On the 
other hand, in face-to-back stacking, the silicon devices of one layer face the thinned 
substrate of another layer. Thus it is suitable for more than two layer 3-D integration. 
Please note that face-to-face bonding does not need inter-layer vertical interconnects 
because the electrical connections between two layers are established using metal layers. 
However, face-to-back bonding requires such vertical interconnects kn~wn as through-
silicon vias (TSV s) that can pass through the. thinned substrate of another layer. 
6 
- I D 
••• 
I 
D 
devi e laye r 2 
• • top-level metal ll•ll•f (device layer 2) 
------inte r-layer interco nnec t 
---- laye r-to- layer bond 
• "" top-le\el me tal 
(dev ic~ layer l ) 
•••• • I •••:. 
0 D 
device layer f 
(a) 
• • • • • • • • • • • inter-layer inte rconnect 
D 
- I 
••• 
•••• 
D 
(b) 
• device layer 2 
• • layer-ta-layer bond 
D 
device layer I 
Figure 1.4: Cross-sectional view of different types of wafer bonding (a) face-
to-face (b) face-to-back. [4] 
7 
(a) Wafer-to-Wafer (W2W) (b) Die-to-Die (D2D) 
(c) Die-to-Wafer (D2W) 
Figure 1.5: Different types of stacking methods used in 3-D IC fabrication (a) 
Wafer-to-Wafer, (b) Die-to-Die, and (c) Die-to-Wafer. 
Currently there are three different stacking methods used in 3-D IC fabrication: a) wafer-to-
wafer (WTW), b) die-to-wafer (DTW), and c) die-to-die (DTD) as shown in Figure 1.5. In 
wafer-to-wafer integration, entire wafers are bonded together [1],[7]. It offers the highest 
throughput and the thinnest via diameter. However, WTW suffers serious yield loss due to 
lack of control of stacking a good die with a bad die. Die-to-wafer and die-to-die methods 
allow the die to be diced and tested prior to 3-D stacking. Thus these methods offer 
known-good-die stacking which can improve the 3-D yield. The yield improvement, 
however, comes at lower throughput, lower TSV density, additional testing and bonding 
cost [8]. 
8 
1.4 THROUGH-SILICON VIA TECHNOLOGIES 
Through-Silicon Vias (TSVs) are vertical interconnects that entirely pass through a silicon 
die in order to provide electrical connectivity across different device layers in a 3-D stack as 
shown in Figure 1.6. There are two different types of TSV fabrication technologies that 
depend on the order of their fabrication during 3-D integration. They are known as via-first 
and via-last technologies. In the via-first technology, TSVs are fabricated before CMOS or 
BEOL (back-end-of-line) metallization. The diameters of TSVs fabricated using this 
technology are typically in the range of 1 - 1 Oµm with aspect ratio (i.e. height : diameter) 
ranging from 3:1 to 10:1. In contrast, the via-last TSVs are created after BEOL or bonding. 
Thus, the processing for via-last can be done at the foundry or at the packaging house. The 
diameters of TSV s obtained by the via-last technology are larger, typically in the range of 10 
- SOµm, with aspect ratios from 3:1 to 15:1 (9]. A detailed description of TSV fabrication 
...,_Device 
surface 
Figure 1.6: TSVs in a stacked 3-D IC. [28] 
9 
Ghip Stack~ . 
~rt)' 
'. -: E3 slcm2) 1E2 ~ ~ E5 : i~6 
: 
c:i. ioo Via Size ( 200 . ·50 1 ; 
v r (,approx, 1990 ;f~95 · 2000 : 3010 2020 
: i 1E7 1E6 . ·;Es SURJ<)·r1ed 1.E6 . . -! ! !!! ~~ ~ ! Freq (Hz.) : : : ... .. .. 
Figure 1.7: Technology range and applications of vertical interconnects. [10] 
technologies will be presented in Chapter 6. 
1.5 MAJOR CHALLENGES IN 3-D IC DESIGN 
1.5.1 CAD Tools for 3-D ICs 
3-D IC design technology and 3-D CAD are still in the formative stages of development. 
Therefore various CAD tools, including those for physical design, are being developed. An 
easy solution to the lack of physical design tools for 3-D ICs is to build quasi 3-D tools that 
are based on the extension of existing CAD tools for 2-D ICs. These quasi 3-D tools can 
handle simpler 3-D designs wherein existing 2-D designs are simply stacked and connected 
vertically without any major design changes. An obvious example of this type of design is 
the stacking of processor and memory dies. In this scenario, the only change needed is the 
insertion of TSV s in the layout to deliver signal, power, ground, and clock in the vertical 
10 
direction. The insertion of TSV s can be performed in the layout's white space or by slightly 
modifying the layout to leave the space for TSV s. 
As the TSV technologies mature, their diameter is projected to shrink below 0.1 µm and via 
density is expected to increase by an order of magnitude by the year 2020 [1 O] as shown in 
Figure 1. 7. These future trends call for fine grained 3-D IC designs. Therefore there is a 
need for native 3-D CAD tools that are built from scratch for fine-grained 3-D 
optimizations such as 3-D module floorplanning, 3-D gate placement across consecutive 
device layers for performance, and power improvement. Furthermore, TSV-aware 3-D 
design verification and analysis tools such as timing, power, signal integrity, 3-D 
DRC/L VS, etc. need to be integrated and efficiently managed. 
In the 2009 edition of ITRS [11], a new metric has been introduced that captures the 
physical design technology requirements for 3-D I Cs. The metric is called "percentage of native 
3-D pl?Jsical design technology in TSV-based 3-D IC implementation flow." This metric is around 
10% at present, and is expected to grow further in future. Thus by the year 2025, all 
physical design tools that will be used for 3-D ICs are required to be native 3-D to meet the 
demand of future 3-D designs. 
1.5.2 Heat Extraction and Thermal Management 
Heat extraction and thermal management are some of the most significant challenges for 3-
D ICs. The power density in 3-D ICs increases drastically due to stacked device layers that 
pack more circuits in a small footprint. Furthermore, due to enhanced performance, 
switching activity increases. Thus 3-D ICs generate more heat compared to the 
11 
Microchannel 
Electrical tnrough 
s.ilicon via (TSV) 
FluidicTSV 
Bonding int~rfa~ 
Fluidic pipe 
Figure 1.8: Cross sectional view of a 3-D IC using integrated microchannel 
cooling technology. [22] 
conventional 2-D chips. However, there is only one source of heat removal (through a heat 
sink) for a 3-D IC, the same as a 2-D chip. Thus thermal modeling, simulation and accurate 
temperature estimation within 3-D ICs is important. This has led to the proposal to insert 
thermal vias that create different vertical paths from the upper device layer towards the heat 
sink [12]. Incorporation of the thermal model based on a resistive thermal network during 
3-D floorplanning [13] and 3-D placement [14] has also been proposed. Based on the 
thermal model, hot modules which generate large amounts of heat are identified. These hot 
modules are placed closer to the heat sink whereas the cooler modules are placed away 
from the heat sink to minimize the peak temperature of 3-D I Cs. 
In addition to the above heat removal techniques, researchers have been actively searching 
for other innovative and alternative heat removal techniques. Microchannel liquid cooling 
[15] has been shown to be a promising solution [16]-[19] because the combination of 
microfluidic channels and liquid coolant media could offer very low thermal resistance 
between the chip surface and the coolant. Researchers from IBM [20],[21] have recently 
12 
demonstrated the feasibility of integrated microchannel 3-D ICs as shown in Figure 1.8. 
Microchannel heatsinks are distributed among different device layers, and cooling fluid is 
delivered to each microchannel heatsink through fluidic TSV s and pipes. The presence of a 
cooling path at each layer enables direct heat dissipation from each individual device layer 
[22]. 
1.5.3 Power and Ground Delivery in 3-D ICs 
Power-ground (P /G) delivery in 3-D ICs is among the major challenges in 3-D technology. 
The on-chip power-ground networks in different device layers are vertically connected 
using P / G TSV s that demand higher current per TSV. The number of Pi G TSV s is limited 
in order to prevent Placement & Routing (P&R) congestion. Furthermore, signal routing 
should be performed carefully. in order to prevent coupling noise between P / G TSV s and 
signal wires. This leads to a complex optimization problem and research is needed on P / G 
network synthesis, optimization, and analysis to address the noise and power integrity issues 
along with minimizing congestion and the on-chip resources such as P / G wires, P / G 
TSV s, and on-chip decoupling capacitors [9]. 
1.5.4 Clock Tree Synth~sis for 3-D ICs 
Sequential circuit elements such as flip-flops and latches are basic building blocks in digital 
circuit design. In homogeneous 3-D ICs, they have the potential to be located in almost 
every device layer. To operate these circuits, clock signal delivery across different device 
layers is important while reducing power consumption, clock-skew, slew and jitter. Similar 
to signal and P / G TSV s, clock TSV s are used for clock delivery and they occupy layout 
space and cause coupling. Furthermore, the clock tree is the longest wire_ which contains 
13 
many buffers to control clock skew. Since the delay of clock wires, buffers and TSV s are 
significantly affected by temperature; extreme care should be taken for minimal skew based 
on the given non-uniform thermal profile within 3-D ICs. 
1.5.5 TSV Management in 3-D ICs 
As described in sub-sections 1.5.3 and 1.5.4, TSV s are used for establishing several kinds of 
electrical connections across different dies in 3-D ICs. In addition to providing vertical 
connections, they consume routing resources and may create congestion. The location of 
TSV s, via pitch and their diameters have significant impact on the quality and reliability of 
3-D IC layout as well. The number of TSVs used in 3-D ICs depends on the design 
applications and their partitioning styles to divide circuits into multiple dies. Research work 
is needed to study the tradeoff among core-level, block-level and gate-level partitioning 
across the different device layers within 3-D ICs. 
1.5.6 TSV-Induced Design for Manufacturability and 3-D IC Yield 
TSVs in 3-D ICs cause non-uniform layout density distributions on the active, poly, and 
metal layer. This density variation causes variation during the Chemical Mechanical 
Polishing (CMP) steps for the individual die and requires new TSV-aware solutions. 
Furthermore, due to the high thermal profile in 3-D ICs, TSVs suffer from thermo-
mechanical stress. The thermo-mechanical stress occurs due to a mismatch in the 
coefficient of thermal expansion (CTE) of copper TSV s and the silicon that surrounds 
them. As the temperature rises during the operation of a chip, TSV s expand more than the 
surrounding dielectric as shown in Figure 1.9(a). In this figure the high stress sites within 3-
D ICs can be observed. Researchers from IBM have reported an SEM (Scanning Electron 
14 
Microscope) image of a visible crack in a TSV due to high stress [23] as shown in Figure 
1.9(b). Figure 1.9(c) shows high stress sites in a multi-layer 3-D IC that are prone to 
cracking. Thus TSV s are prone to failure or may attain plasticity due to the high thermo-
mechanical stress resulting in significant yield loss. 
Yield is directly related to the cost-effectiveness of 3-D technology [1 ],[24]. In spite of its 
importance, very few research works directly address the yield problem. Patti et al. [24] 
suggests insertion of redundant circuits in the uniform circuit architectures such as 
memories and FPGAs to make them reparable in the presence of defects. Recently, Ferri et 
al. [25] proposed the improvement in parametric yield of DTW and DTD integration 
methods by carefully matc?ing the speed of the dies being matched in the stack. Smith et al. 
[26] proposed the matching of wafers in WTW integration to minimize the stacking of 
good dies with a bad die in order to enhance the yield. Finally, Smith et al. [8] presented the 
impact of 3-D IC yield on the cost of WTW, DTW and DTD stacking methodologies. 
These studies consider the failure of 2-D dies in the 3-D stack and assume that the 3-D 
integration process is defect free. However in real life, this assumption might not be 
accurate. It is a well known fact that TSV s suffer from thermo-mechanical stress apd may 
attain plasticity [27],[28] causing the failure of vertical interconnects. Bentz et al. [27] 
proposed an FEM based model for the study of thermo-mechanical stresses of TSV s 
within 3-D ICs, and reported the variation in stresses due to via pitch, via diameter, and 
Benzocylcobutene (BCB) thickriess which is used as a glue layer for stacking. 
A recent study performed on a chain of 1 OK vias fabricated on a 1 Oµm via pitch showed 
that only 67% of the physical vias were fully functional and met the desired electrical 
15 
(a) 
(c) 
(b) 
von Mises 
stress, GPa 
2.5 
cr I y 
... 
2 
1.5 
1 
0.5 
0 
Figure 1.9: (a) High thermo-mechanical stress shown by the white rectangle at 
the edge of a TSV [38], (b) An SEM image of a visible crack due to stress in a 
TSV [23], and (c) High stress sites in a multi-layer 3-D IC [27]. 
16 
characteristics [29]. Another study shows 99.98% via operability of regular arrays of 
256x256 vertical interconnects on a 30µm via pitch [30]. The large variation in yield could 
have been caused by the change in thermo-mechanical stresses [27],[28] due to difference in 
via density and via diameter. One important thing to note is that these studies were 
performed on a uniform configuration, which can be suitable for regular architec~res such 
as memory and FPGAs, and redundaricy in circuitry can be inserted to avoid yield loss [24]. 
In a 3-D system-on-chip (3-D SOC) with heterogeneous circuit architecture, a single TSV 
failure due to thermo-mechanical stress will cause the entire chip to fail and therefore 
reduce the yield. The circuit redundancy in a 3-D SOC might have too much overhead in 
terms of silicon real estate (i.e. area). Thus the effect of thermo-mechanical stress in TSVs 
leading to yield loss needs to be explored. Furthermore, yield-aware 3-D design techniques 
are required to improve 3-D IC yield. 
1.6 CURRENT STATUS OF 3-D INTEGRATED CIRCUITS 
There are a few commercial 3-D ICs which are present in the market at present. IBM has 
produced a 3-D power amplifier chip [123]. In addition, a 3-D image sensor by Toshiba 
was unveiled in the year ~008, and MEMS devices are being produced using 3-D 
technology [122]. Researchers from NCSU have fabricated test chips for a 3-D FFT, and a 
3-D ternary content-addressable memory (TCAM) [121]. Georgia Tech's GTCAD lab has 
taped-out the first many core microprocessor from academia as a 2-layer 3-D test chip. 
These are small scale designs, and current research works in industry and academia focus 
on ASIC like 3-D integration, such as microprocessors stacked as CPU and L2 cache, and 
memory stacking on top of a microprocessor for improved speed and bandwidth. 
17 
1.7 CONTRIBUTIONS OF THE DISSERTATION 
In section 1.5 we discussed the major challenges faced by the semiconductor industry in 
achieving mainstream acceptance of 3-D ICs. This dissertation focuses on two aspects of 
such challenges: a) CAD tool design for native 3-D floorplanning, and b) yield improvement 
of 3-D ICs in the presence ofTSV failure. 
1.7.1 CAD Tool Design for Native 3-D Floorplanning 
Most of the current designs are composed of various modules/blocks and reqwre 
floorplanning before placement can be performed. A good floorplan provides compact 
chip area and improves the timing performance of a circuit. Floorplanning is a well studied 
problem for 2-D ICs. Moving from 2-D to 3-D ICs, however, increases the complexity and 
the solution space [45]. Therefore development of efficient floorplan algorithms is 
important. As discussed in sub-section 1.5.1, the existing floorplanning [13],[45],[64],[65] 
tools are actually quasi-3-D floorplanning tools, i.e. they simply perform 3-D floorplanning 
of 2-D modules only. Therefore they miss the opportunity for 3-D placement of logic cells 
and hence they will not be able to completely harness the advantages of 3-D ICs. In other 
words, there exists a significant gap between existing 3-D floorplanning and 3-D placement 
tools that prohibits the usage of 3-D placement. 
To bridge the existing gap between 3-D floorplanning and 3-D placement, a novel 
placement-aware 3-D floorplanning algorithm has been designed (Chapter 4). It uses 
stochastic wirelength distribution models for rectangular 2-D and 3-D modules (Appendix 
B) to estimate the advantages of 3-D module placement (in terms of wiring reduction) and 
18 
identifies potential 2-D modules that should be converted into 3-D modules to harness the 
advantages of 3-D placement. A set of constraints ~ave been derived (Chapter 3) on the 
topological representation of floorplanning that identifies feasible solutions and eliminates 
infeasible solutions quickly in the proposed placement-aware floorplanning. Thus it reduces 
the solution search space and improves the runtime. Finally a novel module-packing 
algorithm has been designed that quickly computes the geometrical 3-D floorplan (Chapter 
3) while satisfying the set of constraints for placement-aware 3-D floorplanning. 
Furthermore, a 3-D floorplanning algorithm with module alignment has been designed 
(Chapter 5) that can be useful in bus-driven 3-D design. It also considers heterogeneous 3-
D stacking in which a certain set of ~odules can only be placed in specified types of 
substrates/ device layers due to technology and the thermal profile. In addition, a set of 
modules can be specified by designers that need to be placed away from each other or need 
not overlap due to thermal or noise sensitivity. This algorithm also uses the set of feasibility 
constraints and fast 3-D packing algorithm presented in Chapter 3. 
This work falls under the constrained combinatorial optimization problem category. Most 
of this research has been published in IEEE sponsored international conferences and 
journals. For example, the statistical wirelength distribution m9dels for rectangular 2-D and 
3-D modules (Appendix B) is published in [32]. The initial version of the proposed 
placement-aware 3-D floorplanning algorithm appears in [33]. In addition, an improved 
version of the .algorithm (Chapter 4) that uses a set of constraints for feasible solutions 
(Chapter 3) along with the fast 3-D module packing algorithm satisfying· such constraints 
19 
(Chapter 3) is currently in press for publication in IEEE Transactions on Very Large Scale 
Integration Systems [34]. 
1.7.2 Yield Improvement of 3-D ICs in the Presence ofTSV failure 
As discussed in sub-section 1.5.6, TSV s are prone to failure due to thermo-mechanical 
stress within 3-D ICs, and may cause serious yield loss. To mitigate the yield loss problem 
of 3-D ICs, a novel set of yield improvement strategies that focus on defects in through 
silicon vias is proposed. Furthermore, the parametric yield and chip revenue that can 
improve profitability are estimated. A quantitative analysis of the impact of our approach 
on chip area, delay and power is also studied. Initially Monte-Carlo simulations are used to 
evaluate the effect of our proposed yield enhancement techniques. The Monte-Carlo 
simulation results demonstrate that our strategies can improve the 3-D IC yield 
significantly. 
Since Monte-Carlo simulations are computationally expensive and time consunung, we 
derive a set of analytical models for the most promising yield improvement strategies. 
Furthermore, analytical models for parametric yield such as slow and fast chips, and an 
expression for the sweet spot between them are derived. The sweet spot insures that the 
number of fast chips produced for a particular yield improvement technique will always be 
greater than the number of slow chips. The analytical models can be very helpful in quickly 
analyzing the functional and parametric yield for a given 3-D design. In addition, it can also 
be incorporated in the physical design CAD tools such as floorplanning and P&R for yield-
aware 3-D IC design. 
20 
This part of the research has been initially published in an IEEE sponsored workshop [35], 
and another in a conference [36]. In addition, a matured version of this work (Chapter 7) 
along with analytical models (Chapter 8) has been submitted to IEEE Transactions on Very 
Large Scale Integration (IVLSI)·Systems and is currently under review [37]. 
1.8 STRUCTURE OF THE DISSERTATION 
The remaining part of this dissertation is organized as follows: 
Chapter 2 presents the formal discussion about floorplanning in 2-D and 3-D ICs. It 
presents information about the different types. of topological representations for 2-D 
floorplanning and their extensions to 3-D floorplanning. Next, a summary of previous· 
work on 3-D floorplanning is presented. The second half of Chapter 2 includes a brief 
overview of the initial software that was inherited at the beginning of the research. Then an 
overview of changes made during the research to solve the targeted placement-aware 3-D 
floorplanning problem is presented. 
Chapter 3 introduces the concept of vertical constraints during 3-D floorplanning and its 
importance. Then a set of feasibility conditions are derived on a popular topological 
representation of floorplanning and they are later extended to a graph-based formulation. 
Furthermore, a set of module packing algorithms satisfying these vertical constraints are 
presented. 
21 
Chapter 4 contains the architecture of the proposed placement-aware 3-D floorplanning 
algorithm, its implementation details followed by experimental results and its comparison 
with existing state-of-the-art 3-D floorplanning tools. 
Chapter 5 includes the problem formulation for the 3-D floorplanning with Module 
alignment, its importance, implementation details and. experimental results. 
In Chapter 6, the TSV-induced 3~D IC yield problem is formulated, and other possibilities 
of vertical interconnects are discussed. Chapter 7 presents the proposed yield improvement 
techniques and Monte-Carlo simulation results for ~nctional yield and par~etric yield. In 
addition, it also presents tradeoffs in terms of area, delay and power associated with 
different yield improvement strategies. Chapter 8 contains the derivation of analytical 
models for the most promising strategies and comparison of analytical models' results with 
Monte-Carlo simulation results. 
Chapter 9 presents the conclusions and future work, and Chapter 10 presents the summary 
of major contributions. 
Appendix A contains the mathematical proofs of feasibility condition theorems discussed 
in Chapter 3. Appendix B presents the derivation of rectangular 3-D wirelength distribution 
models that are used to estimate the advantages of 3-D placement for different modules 
during the placement-aware 3-D floorplanning. Appendix C presents a methodology for 
Rent's parameter estimation, and a list of Rent's parameters assigned to each module of 
floorplanning benchmark circuits for performing experiments. Finally, the sensitivity 
analysis of the cost functions of 3-D floorplannit?-g algorithms are given in Appendix D. 
22 
CHAPTER 2: FLOORPLANNING OF 2-D AND 3-D INTEGRATED CIRCUITS 
Floorplan design is an important step in physical design of VLSI circuits to plan the 
position of circuit blocks or modules .within a minimum area along with several other 
design cost parameters such as wirelength, substrate noise, temperature, power, etc. in order 
to optimize circuit performance. Floorplanning is performed after circuit partitioning in the 
physical design cycle. Once :the module positions are chosen during floorplanning which 
ensures the optimization of various design cost parameters, the positions of modules are 
fixed in the placement stage, i.e. they cannot be moved to other locations in order to 
maintain the optimized values of the <;lesign cost parameters. With the advent of deep sub-
micron technology and advancement in VLSI technology there has been an increasing need 
for early floorplanning which enables design budgeting for major design parameters such as 
wiring, area, substrate-noise [39], power/thermal estimation [40], and bus planning [41] 
when exact circuit information is unknown. If the floorplanning stage is neglected, 
placement might not meet these timing/ power/ thermal parameters, which can result in 
failure of the chip. Thus floorplanning guides the placement stage, which helps in meeting 
design parameters. 
2.1 SHAPES OF CIRCUIT MODULES IN A FLOORPLAN 
The shapes of circuit modules are usually assumed to be rectangular and the aspect ratio for 
each module is defined as the ratio of the width to the height. The modules are classified as 
hard and soft modules. In the early stages of the design cycle, the exact dimensions of the 
modules are unknown. Thus if is reasonable to make assumptions about the area of each 
23 
block and have a flexible aspect ratio. Hard blocks have fixed aspect ratios while soft blocks 
have a fixed aspect ratio range. In addition to the rectangular shapes of modules, rectilinear 
shaped blocks are occasionally desired to save chips area. There are two types of rectilinear 
blocks a) convex rectilinear which has an "L" or "T" shape, and b) concave rectilinear 
which has a "U" shape. These rectilinear shapes are usually handled by partitioning them 
into small rectangular blocks, and imposing rectilinear constraints on the floorplan 
representation such that the partitioned parts of a rectilinear block preserve its original 
rectilinear shape. The partition based rectilinear shape constraints have been reported on 
several floorplan representations such as on SP [68],[100],[116], BSG [114][115], 0-tree 
[11 7], CBL [118], B* tree [119], and TCG [120]. Although the rectilinear block packing can 
be done using any of these representations, it is easy to satisfy these constraints on 
sequence pair for which a fast module packing algorithm satisfying these constraints has 
been proposed in [100]. 
In this research, only hard modules with rectangular shapes have been considered because 
a) rectilinear shapes can be constructed by the methods described in the previous 
paragraph, and b) rectilinear blocks are not very frequently present in designs. 
2.2 CLASSIFICATION OF FLOORPLANNING 
Floorplanning can be classified in two categories: slicing and non-slicing. A slicing floorplan 
is obtained by recursively bisecting a rectangle using a horizontal or vertical line. It has a 
smaller solution space which implies a faster runtime. However, it limits the set of reachable 
layout topologies [42] and can degrade layout density, especially when modules are of 
24 
different sizes. Slicing floorplans were predominantly used when design complexity was not 
a major concern. Unfortunately, most of today's real designs are complex and of the non-
slicing type, which requires complex topological representations for floorplanning. In 
addition, fixed outline floorplanning has been proposed by researchers in which module 
packing is done within a specified die size [31],[48]. To use the area within fixed outline 
floorplanning, rectilinear shape modules have been proposed. The method for handling 
rectilinear blocks has been described previously in section 2.1. In recent years, it has 
become mandatory for floorplan designers to move to non-slicing floorplanning 
[42],[43],[44]. Thus only non-slicing flooiplanning is considered in this work. 
2.3 COMPLEXITY OF THE FLOORPLANNING PROBLEM 
Floorplanning is an NP-hard problem and its solution search space increases exponentially 
with the problem size. Due to the simultaneous optimization of various design parameters 
(such as area, wirelength, noise, etc.), floorplanning becomes a combinatorial optimization 
problem. For a design consisting of "n" modules, the solution search space for a 2-D 
floorplanning is (n !)2 [45]. Moving from 2-D to a k-layer 3-D floorplanning the solutions 
space further increases to nk-1(n !)2 /(k-1) ! [45]. Let us consider 2-D/3-D floorplans 
containing 50 modules (i.e. n = 50) which is a reasonable size of floorplan according to the 
MCNC benchmark. In this scenario, the size of the solution search space will exponentially 
grow and today's modern computers cannot solve it within an acceptable/limited time by 
any deterministic search approach. Thus stochastic search based algorith'ms which search 
for an acceptable (near optimal) solution within a limited amount of time, are efficient for 
25 
the floorplan design. These stochastic algorithms iteratively perturb the ~olution space with 
the hope of improving the quality of the soh.ition. 
Simulated annealing is one of the most popular optimization engines that have been used 
by floorplan designers [46],[47],[48],[49],[50]. Simulated anne~g is based on the analogy of 
annealing in metallurgy in which heating is used above recrystallization temperature of a 
material which allows the molecules to separate. In the next stage, :cooling is gradually 
applied and molecules are annealed to refine the material's structure, induce ductility, and 
relieve internal stresses. Following the analogy, simulated annealing starts with a randomly 
generated initial solution, and iteratively searches for a better solution using a pre-defined 
set of moves. It involves a temperature scheduling in which the temperature is set higher at 
the beginning, and is lowered iteratively upon improvement of the solution quality. 
An evolutionary algorithm is an alternative method for stochastic search which is based on 
Darwin~s theory of evolution and starts with an initial set of solutions (i.e. population) in 
parallel. The individual solutior:i in the population competes for survival among other 
individuals of the population. F?llowing n~ture's rule, the individuals evolve from one 
generation to another using mutation and reproduction processes. In the mutation process, 
the properties of an individual are changed by perturbing the solution quality using a pre-
defined set of moves, and an offspring is generated. In the reproduction process, the 
properties of two individual are mixed, and one or more offspring are ·created. The parents 
and offspring compete for survival, and be~t fit indiViduals become the next generation 
parents. Evolutionary algorithms have been used by several researchers [33],[34],[51],[52] 
for floorplan design. 
26 
2.4 2-D FLOORPLAN REPRESENTATIONS 
Due to the NP-hard complexity of floorplanning problems, it is important to use a suitable 
topological representation for representing floorplans. The topological representation 
should be simple, effective and the algorithm to transform it to a geometric floorplan 
should take the least computational effort. Researchers have come up with various non-
slicing floorplan representations such as Sequence Pair (SP) [53], Bounded Sliceline Grid 
(BSG) [54], 0-tree [55], B*-tree [44], Corner Block List (CBL) [56], and Transitive Closure 
Graph (TCG) [57]. These representations can be distinguished based on the P-admissibiliry 
[53] of the solution space which satisfies the following requirements: 1) the solution space is 
finite, 2) every solution is feasible, 3) packing and cost evaluation can be performed in 
polynomial time, and 4) the best evaluated packing in the space corresponds to an optimal 
floorplan [53]. The P-admissible representations are BSG, SP and TCG. Both SP and TCG 
have one-to-one mapping between their topological representations and corresponding 
floorplans. In contrast, BSG itself generates multiple representations corresponding to one 
packing which implies larger solution space than SP and TCG. The runtime for packing 
from sequence pair is o.nly O(n lg lg n) [58] compared to TCG and BSG which have O(n2) 
complexity [54],[57]. Given an 0-tree or a B* tree, it may not be feasible to find a packing 
corresponding to the original representation, and thus they are not P-admissible. Corner 
block list (CBL) has a smaller solution space and faster packing scheme but it cannot 
guai;antee a feasible solution in each perturbation, and thus it is not P-admissible as well. 
P-admissibility provides a standard for the classification of floorplan representations. Since 
P-admissible representations have one-to-one mapping between the topological 
27 
representation and their geometric floorplan, these representations have smaller solution 
space compared to those representations which generate infeasible solutions, have one-to-
many or many-to-one mapping between floorplan representations and their geometric 
floorplan. Considering the large solution space of floorplanning (as described in section 
2.3), p-admissible representations can be a better option. It does not mean that non p-
admissible representations cannot produce good floorplan solutions, due to NP hard nature 
of the problem and the stochastic search process, which is very random. However, a 
comparison of floorplan solution quality obtained using different p-admissible and non p-
admissible representations shows that p-admissible representations produce better solution 
quality [57]. 
2.5 3-D FLOORPLAN REPRESENTATIONS 
The topol?gical representations of 3-D-floorplanning can be grouped into two classes: the 
true 3-D and the quasi-3-D. The true-3-D representations contain placement information for 
modules in the x,y, and z directions, and modules can be treated as solid cuboids with finite 
length, width and height as shown in Figure 2.1 (b). Researchers have extended several existing 
2-D representations of a floorplan to true 3-D representations, e.g. 3-D reconfigurable 
functional unit operation (RFUOP) [59], 3-D slicing tree [6Q], and sequence triple [61], in 
which representation for the third dimension is appended. Present 3-D technology has a 
limited number of device layers and the inter-layer height is fixed. Thus true-3-D 
representations would have more redundancy in the z-axis data structure resulting in an 
increased time and space penalty. AT-tree representation was presented in [62], which also 
considers modules as 3-D boxes. The T-tree representation was inspired by the 
28 
Left child Middle child Right child 
t.j=ti+Ti tk=ti,yk>yi t,=ti, y,=yi 
(a) 
(b) 
Figure 2.1: (a) the structure of a T-tree, (b) A compacted placement of modules 
and the corresponding T-tree. [62] 
binary representation of a 2D floorplan called B* tree [44] in which a node (module) has at 
most a left child and a right child that represent the dimensional relation of modules in a 2-D 
plane. In the T-tree representation a middle child is added which represents the information 
along the third dimension as shown in Figure 2.1 (a). The T-tree represents the geometric 
29 
True 3-D Quasi 3-D 
11 
LI / ;~; 'Ji 7 / b 7 I' c.. o• 
OA 
02 ~ ~: ~ L! ,/ / f 7 
(a) (b) 
Figure 2.2: (a) true 3-D floorplanning [125] (b) quasi 3-D floorplanning [13]. 
relation between two modules as follows. If a node 11.1 is the left child of node n;, then 
module j must be placed adjacent to module i in the T+ direction where T is the third 
dimension [62]. If a node nk is the middle child of node n;, module k must be placed in the 
Y + direction of module i with the t-coordinate of module k equal to the t-coordinate of 
module i (i.e. tk = t;, and yk > y;). Finally, if node 111 is the right child of node 11; module I 
should be placed in the X + direction of module i with the t and y-coordinates equal to 
those of module i (i.e. t1 = ~' y1 = y). Although it reports linear time complexity for 
operations on T-tree, the packing scheme has 0(1l) complexity. An example of a T-tree and 
its corresponding module packing is shown in Figure 2.1 (b). 
The quasi 3-D representation constructs floorplans of different device layers with an array 
of 2-D-representations. Each device layer has its own 2-D representation. Unlike true 3-D 
representations, module placement information along the z-axis is not included, and 
modules do not have a height in the quasi 3-D representation. The examples of quasi 3-D 
30 
representations are two-layer BSG [2], four-layer SP [63], and four-layer TCG [13]. Using 
these representations, intra-layer operations within a layer and inter-layer operations across 
different layers are performed for the perturbation of solution space. Using quasi 3-D 
representations, many 3-D floorplanning algorithms have been reported recently in the 
literature. In Chapters 4 and 5, we will compare our floorplanning results with three existing 
state-of-the-art 3-D floorplanners. 
2.6 PREVIOUS WORK ON 3-D FLOORPLANNING 
Cong et al. [13] proposed a thermally driven 3-D floorplanning algorithm. It uses a compact 
resistive thermal model along with simulated annealing for optimization. Although the 
compact resistive thermal model is reasonably fast, it is still much more computationally 
expensive (9. 7x) compared to non-thermal-driven floorplanning. In order to mitigate the 
computational overhead of thermal driven floorplanning, the authors of [13] have 
presented approximate closed form solutions of heat flow ih vertical and horizontal 
directions. The closed form heat flow solution is very fast, but less accurate. Thus Cong et 
al. have also presented a hybrid approach in which the less accurate but fast closed form 
equations are used during simulated annealing, but accurate thermal resistive network 
model is used for 20 consecutive iterations whenever the temperature of the simulated 
annealing process drops down. Thus it provides a tradeoff between accuracy and 
computational overhead which is reduced to 3.2x compared to 9.7x when the accurate 
model was used. This work was the first to address the thermal optimization during 3-D 
floorplanning. 
31 
Hung et al. presented an interconnect and thermal aware floorplanning algorithm for 3-D 
microprocessors [64]. Hung et al. haye developed a thermal modeling tool called HS3D to 
identify the hot spot and temperature of modules during floorplanning. The authors use the 
B* tree representation (quasi 3-D) and simulated annealing engine for optimization. The 
algorithm considers power consumption in wires during optimization. Hung et al. used the 
detailed model of an Alpha-like microprocessor in Verilog and extracted the actual power 
consumption in function modules and intra-module interconnects using Design Compiler and 
First Encounter tools. Then they used a 2-D floorplanning tool to compute the inter-module 
wirelength. By summing the inter-module wirelength and intra-module wirelength, they 
obtain the total wirelength in a 2-D chip, and use the power consumption in interconnects 
from First Encounter as real power data. Next they normalize the inter-module wirelength 
of a 3-D chip (during floorplanning) with respect to the inter-module wirelength of the 2-D 
chip. The normalized value is multiplied by the power consumption in all wires of the 2-D 
chip to scale and estimate the power consumption in the 3-D chip. The area, aspect ratio, 
and power consumption obtained after placement & routing is used as input to the 
floorplanning algorithm. The algorithm is unique in a sense that it uses the power 
consumption in interconnects along with the power consumption in circuit module to 
estimate the peak temperature of the chip. 
Ll et al. proposed a hierarchical 3-D floorplanning algorithm for wirelength optimization 
[45]. The algorithm proposes a statistical method to partition and place small sets of 
modules in each device layer, and then performs 2-D floorplanning in each device layer 
without performing any inter-layer moves. Thus it decomposes the 3-D floorplanning 
32 
problem into multiple 2-D floorplanning problems. Therefore any good 2-D floorplanning 
algorithm can be used .in this approach. 
Zhou. et al. recently proposed a scalable temperature and leakage aware 3-D floorplanning 
algorithm [65]. It uses an adaptive force directed technique, and integrates a power-thermal 
analysis to close the leakage-thermal feedback loop. The authors define different types of 
directed forces: a) thermal force and b) filling force. The thermal force is used to minimize 
temperature by keeping two hot modules away from each other. The filling force is used to 
eliminate overlap between blocks, and to evenly distribute modules. The power-thermal 
analysis model is used to minimize the leakage power along with chip temperature, area, 
wirelength, and via count. The reported floorplan solutions are better and the floorplanner 
is fast compared to the thermal-driven 3-D floorplanning tool presented in [13]. 
These 3-D floorplanning algorithms provide different methodologies and focus on 
different sets of objectives during floorplan optimization. Each of them is unique and 
ground breaking. However, one important thing to notice is that they all perform 3-D 
floorplanning of 2-D modules only, and thus they limit the advantages of 3-D integration. 
2.7 A BASIC 3-D FLOORPLANNING TOOL 
At the beginning of this research work, a basic 3-D floorplanning tool was inherited from 
Benyi Wang, a previous researcher in the research group [101]. It is based on an 
evolutionary algorithm and Sequence Pair representation. A 3-D floorplan is represented by 
a set of sequence pairs which is called a Grouped Sequence Pair (GSP), proposed by Yu Xia 
for test scheduling of core based SoC[102]. It starts with a randomly generated initial set of 
33 
solutions (parents). Each solution goes through stochastic perturbation, and new solutions 
(offspring) are generated. Parents and offspring compete for survival and best fit set of 
solutions become the next generation parents. The algorithm iteratively searches for a 
better floorplanning solution (using a pre-defined set of moves) until termination criteria is 
reached. 
The tool included a module-splitting move which randomly used to select a module and 
aggressively used to split it into m parts, where m was the total number of device layers in a 
3-D chip. Although it had a split move incorporated, it did not have any mathematical 
model to quickly estimate the cost of splitting. Furthermore, the cost function of the 
floorplanner included area and inter-module wirelength only. As a result, the split move had 
virtually no effect on the solution quality because the optimization engine only used to see 
an increase in area due to splitting, but it could not see any positive effect. Thus it mostly 
used to reject floorplan solutions with split modules. In addition, its 3-D constraint graph 
based module packing algorithm had O(r!) time complexity whi~h made the tool very slow. 
This packing algorithm will be elaborated in Chapter 3. 
The basic algorithm (which was inherited) was at the very initial stage of development. 
However, it provided a framework for continuing research as a ·part of this dissertation. 
Figure 2.3 shows the flow chart of the initial 3-D flo0rplanning tool that was inherited at 
the beginning of this research. Please note that the shaded parts of the algorithm in Figure 
2.3 indicate that significant changes were made as a part of this research. A detailed 
explanation of these changes will be presented in Chapters 3, 4, and 5. 
34 
Input 
Populate initial set of 
Floorplans 
Perturb (unrestricted swap, invert, rotate, 
exchange, change group, and split) 
Pack modules 
( 3-D Graph based : Slow ) 
Compute Fitness (area and 
inter-module wirelength only) 
Best fit set of 
Floorplans 
No 
C Optimized Floorplan :> 
Figure 2.3: Flow-chart of a basic 3-D floorplanning tool inherited at the 
beginning of the research. The shaded boxes indicate the portions of the 
algorithm that have been modified as a part of this research work. 
35 
CHAPTER 3: VERTICAL CONSTRAINTS IN SEQUENCE PAIR 
REPRESENTATION 
In this chapter, we focus on the vertical constraints of multi-layer 3-D ICs that enable 
vertical alignment of modules in different device layers. Vertical alignments, such as 
overlapping or non-overlapping constraints, might be necessary in designs such as a 
microprocessor composed of CPU and L2 Cache sub-blocks that are placed in consecutive 
device layers. The vertical alignment is also required in a bus-driven floorplanning design 
[66] where a set of blocks can be connected by a rectangular strip of bus (horizontally or 
vertically). In another application, a group of modules might be required to be vertically 
aligned within a large digital system. Examples of different applications of vertical 
constraints are shown in Figure 3.1. Furthermore, in a 3-D SoC design in which every 
device layer has·digital and analog blocks as shown Figure 3.1 (a), analog blocks residing in 
'different device layers may have to be vertically aligned for close connections, and isolated 
from the rest of the ·circuits to avoid interaction with noisy blocks. An example would be an 
audio amplifier block, TV output, and LCD driver, stacke.d on top of each other, with the 
image processor integrated close to the LCD driver (on the ·same device layer). Similarly, a 
video/ camera interface is integrated on the same device layer as TV, and the audio 
amplifier is close to the DSP block. In another case, if analog blocks have been placed in a 
particular device layer, these blocks may need to be kept away from noisy digital blocks as 
shown in Figure 3.1(b). An example of such a scenario is an FM and an audio amplifier 
block fabricated close to each other, yet separated away from a DSP or/ and an image 
36 
(a) 
1Iodule Repulsion 
(b) 
Figure 3.1: Example of vertical constraints in 3-D SoC (a) module alignment, 
bus planning, and each layer containing analog/RF blocks are vertically 
aligned together, (b) analog/RF blocks have been assigned in on the top 
layer and they have been separated from noisy digital block. 
processor block. These requirements. must be known and preserved during the 
floorplanning phase. 
37 
In 2006, Law et al. [67] presented a vertical alignment in a 3-D floorplan using layered TCG 
to address the bus-driven 3-D floorplanning. However, the layered TCG based 3-D 
floorplanning is very slow and runtime grows rapidly with problem size (the runtime will be 
compared in Chapter 5). Furthermore, vertical constraints on other competitive 
representations such as sequence pair, and identification of feasible solutions to reduce the 
solution search space have not been explored. We will derive the vertical constraints on 
sequence pairs in this chapter. 
3.1 INTRODUCTION TO SEQUENCE PAIR 
In the sequence pair (SP) representation of a floorplan, two permutations (<r+ >; <r->) 
of a set of modules are sufficient to define a 2-D packing [53]. For example: 
(< ... a ... b>; < ... a ... b ... >) 
(< ... a ... b>; < ... b ... a ..... >) 
place a to the left of b 
place a above b 
Thus a sequence pair provides the relative positions of modules without their physical 
information. Given a sequence pair, one can construct an oblique lattice structure with the 
lines labeled in the same order as they appear in r+ and r-. For example, let us assume that 
( < r+ >; < r- >) = ( <a, b, c, d>; <c, a, d, b> ) . Each module is placed at a lattice point 
which is at the intersection of two lines with the same label as shown in Figure 3.2(a). 
Figure 3.2(b) shows the resultant placement of modules satisfying th_eir relative position as 
defined by the sequence pair. Please note that if the shapes of modules are different, then 
the floorplan for the same sequence pair will be different. Figure 3.2(c) shows an example 
of a different floorplan of the same sequence pair due to a change in the sizes of modules. 
38 
b I a a b 
I 
d I 
c I 
b 
c d 
- -
c 
(a) (b) (c) 
Figure 3.2: (a) oblique grid for r+ = {ab c d} and r-· = {cad b}, (b) resultant 
placement of blocks, and (c) different packing for the same sequence pair due 
to change in sizes of modules. 
sink 
a 
b 
c 
c 
(a) (b) 
Figure 3.3: Weighted graphs Gh (left) and G" (right) of the sequence pair shown 
in Figure 3.2 
39 
Two weighted, acyclic digraphs (i.e. directed graphs) are constructed in which the modules 
at the lattice points form a vertex set. Figure 3.3 shows a method for constructing the 
horizontal and the vertical constraint graphs. In the horizontal graph Gh an edge is placed 
from module i to module jiff "i is to the left of j ". Then source and sink vertices are 
added to each graph. A weight is associated with each edge of the horiz~mtal (or vertical) 
graph which indicates the width (or height) of the respective module. A longest path 
algorithm applied on the horizontal and .vertical graphs computes the width and height of 
the chip as well as the co-ordinates of each module. 
The Sequence Pair representation is extended to a pseudo 3-D floorplan representation .. A 
classification of pseudo 3-D floorplan representations is given in Chapter 2 (see Section 
2.5).Thus a 3-D floorplan is represented by a set of sequence pairs, and we call it a Grouped 
Sequence Pair (GSP). Each sequence pair of the GSP represents the module packing of one 
particular device layer. The sequence pairs representing different layers are independent of 
each other. Therefore inter-layer relations between modules are not included. Figure 3.4 
shows the example of a two-layer grouped sequence pair and its corresponding floorplan. 
The advantages of GSP representation are that a GSP preserves all good properties (which 
were previously discussed in Section 2.4) of sequence pairs defined by P-admissibility, and it 
is very easy to perturb the· solution search space on the GSP topological representation. 
Furthermore, it is possible to design a fast module packing algorithm that satisfies vertical 
constraints on the geometrical floorplan. A fast packing algorithm has been developed as a 
part of this research and it will be presented in Section 3.6. Similar to the grouped sequence 
pair, it is possible to define vertical constraints on other quasi 3-D floorplan representations 
40 
r 1+ r 1- / a/ b/ {<ab c d>, <cad b> } I / ci / Grouped c 
sequence pa ir Iii I f I (GSP) {<e f g h>, <g eh f> } I h r 2+ r 2-
(a) (b) 
Figure 3.4: (a) An example of a two-layer grouped sequence pair, and (b) its 
corresponding floorplan in two device layer. 
as well. However, the key point is whether the module packing and easy perturbation of the 
solution space can be achieved or not while satisfying the vertical constraints. For example, 
BSG, B*-tree, O*-tree, and layered TCG all have graph based module packing algorithms 
which have longer runtime than grouped sequence pair representation. Furthermore, 
satisfying vertical constraints further increases the runtime complexity of the module 
packing algorithm. Satisfying vertical constraints on true 3-D floorplan representations such 
as T-tree, 3-D slicing tree, and sequence triple will be easy because these representations 
treat modules as rectangular 3-D boxes. However, the solution search space has more 
redundancy in the z-axis data structure as described in Section 2.5. 
3.2 VERTICAL CONSTRAINTS ON SEQUENCE PAIRS 
Vertical constraints are specific relations between blocks that are assigned to different 
device layers. They are specified by defining geometrical relations between the xy-
coordinates of the bottom-left corner of blocks. The vertical constraints require geometrical 
information of module packing. As we discussed in the previous section, this geometrical 
41 
information is not available in the sequence pair representation. However, blocks are 
assigned to different device layers and their vertical constraints should be established 
among multiple sequence pairs. Since the sequence pair specifies the relative positions of 
modules in a device layer, it is possible to identify a set of feasible GSPs that may lead to 
the fulfillment of the vertical constraints during geometric floorplanning. Thus in this work 
we derive feasibility conditions that identify feasible GSPs, which lead to satisfaction of 
vertical constraints during the geometric floorplan. These feasibility conditions can be 
checked on a GSP representation and our 3-D packing algorithms (Section 3.5 and Section 
3.6) will always satisfy the vertical constraints in the presence of these feasibility conditions. 
We call them GSP feasibility conditions, and they can be used in two ways: a) to detect and 
prune an infeasible GSP; or b) to guide stochastic moves to generate feasible GSPs only. 
Thus, the GSP feasibility conditions can significantly reduce the search space and therefore 
speed up the search process. In this chapter, we discuss vertical relations by using a simple 
example of specific vertical constraints that only require that the xy-coordinates of the 
modules under vertical constraint are the same, i.e. the modules are vertically aligned. 
3.3 TWO LA YER FEASIBILI1Y CONDITION 
In this section, we present the feasibility conditions on a GSP that only contains two device 
layers. The two layer feasibility conditions are easy to explain. The two layer feasibility 
condition will be later extended to multiple device layers in this chapter. As it was discussed 
in Sections 3.1 and 3.2, each sequence pair in a GSP is independent of the other sequence 
pairs, and therefore the inter-layer placement information between two or more sequence 
pairs of a GSP is absent. Thus there is no straightforward method to detect the feasible set 
42 
)A, 1~~ / ____ / -~:-~e,:• I I .. Layed 
II 1):.,, L/s,/ 1):.,, B2 I I A2 A2 I 
(a) (b) 
Figure 3.5: (a) Module pairs {A1, A2} and {B1, B2} under vertical constraints in 
two device layers are shown. (b) If A1 is moved rightward as shown by the 
arrow to align with A2, B1 moves further away from B2• Both pairs cannot be 
aligned simultaneously, and thus are infeasible. 
of GSPs. In this work, we study the relative order of modules in different sequence pairs of 
a GSP, and identify a set of common patterns that is present in each feasible GSP. We will 
elaborate our work using the following example: 
Let us consider an example of four modules Al> Az, B1 and B2 in two device layers. The 
modules to be aligned vertically are labeled with the same letter and distinguished with the 
subscripts that indicate their device layers. Thus A1 and A2 need to be aligned vertically. 
Similarly B1 and B2 must be aligned vertically. Let us consider the example of an infeasible 
GSP that is composed of the following sequence pairs representing two device layers: 
Layer 2: { < ...... B2, ••.•••. A2 ••. >; < ... B2 •• .• ••• A2 ••••••••• >} 
43 
According to these sequence pairs, A1 has to be to the left of B1 in layer 1 while A2 has to be 
to the right of B2 in layer 2. Please notice that A1 and B1 are constrained in layer 1 along the 
+ X axis only because they have to satisfy and preserve their relative positions defined by 
the sequence pair of layer 1. They are, however, allowed to have different Y coordinates as 
shown in Figure 3.S(a). Similarly, A2 and B2 are restricted in the X-direction only. To satisfy 
the vertical constraint, the {A1, A2} and {B1, Bz} module pairs should align simultaneously. 
It can be observed from Figure 3.S(b) that if A1 moves rightward to align with Az, B1 moves 
further away to the right as well in order to preserve the sequence pair. Thus it results in B1 
moving further away from B2• Similar observation can be made if B2 is moved rightward to 
align with B1. It is clear th~t when aligning one pair, the modules in the other pair get even 
more separated. Therefore these two module pairs can never overlap simultaneously which 
implies that the vertical constraints imposed on these two module pairs can never be 
satisfied for the given GSP. In the next sub-section we will define feasibility conditions for 
these two pairs based on their relative orders in each of the sequences of the GSP. 
3.4 GRAPH REPRESENTATIONS OF FEASIBILITY CONDITIONS 
In order to better visualize and represent the groups of feasible and infeasible solutions we 
define a graph representation for the feasibility conditions. Let us assume that { r~; r~} 
and { r; ; r; } represent the sequence pairs of layer 1 and -layer 2 respectively (see ~igure 
3.6(a)). For each sequence of the original GSP, we first construct a constrained sequence. We 
scan a sequence from left to right, detect constrained modcles, and . record them in the 
constrained sequence in the order of their detection. Using this procedure, we obtain {'Vt ;'V~ } 
44 

(a) 
(c) 
(e) 
'¥ + 
1 
(b) 
(d) 
(f) 
'P-
2 
'¥ -
2 
Figure 3.7: Graph representation showing six out of eight cases which are 
feasible conditions for two module pairs in two device layers. 
lTJl+ Q • HJ -T T 1 
(a) 
Figure 3.8: Graph representation of infe.asible sequences pair constraints 
for two module pairs in two device layers. 
46 
examine the feasibility conditions using the xy-coordinates of the lower left corners of 
modules has been given in Appendix A. The infeasible cases are presented in Figure 3.8 in 
which all module pairs cannot be vertically aligned simultaneously. These cases can be 
identified on GSP and eliminated immediately_ without any unnecessary computation. The 
unnecessary computation involves transformation of topological representation (i.e. GSP) 
to geometric module packing in each device layer. Since the floorplanning algorithms using 
stochastic search methods are iterative processes, the cost of unnecessary computation for 
module packing of infeasible solutions can add up and reduce the speed of the algorithm. 
The exact cost of this computation for each infeasible solution depends on the problem 
size (i.e. total number of modules in a floorplan) that will be discussed in Section 3.5. Please 
note that there are 25% infeasible cases (two out of eight configurations). A generic 
theorem for a feasibility condition for two module pairs in two layers is stated as theorem 1 :· 
Theorem!: 
Feasibility Condition for two pairs of modules: Given a two-layer feasibility condition 
graph G(V, E), there exists a feasible solution to the vertical constraint problem for two 
module pairs represented by the graph if: 
(a) G contains a clique of size K (KE {3,4}), or 
(b) G contains two cliques of size 2, such that each clique contains on!J nodes of the same color. 
The formal proof of theorem 1 is given in Appendix A. During software implementation, 
theorem 1 can be implemented (satisfied) by checking if either { 'l't '"';} or { "'~ '"';} are 
connected by an edge, i.e. modules in those node pairs are in the same order. Since we 
compare the orders of modules (under vertical constraints), the feasibility conditions can be 
47 
expanded to more than two pairs of modules constrained in two device layers. For 
two layers. There are three possible ways of selecting two modules from the three modules 
of L1. Thus we can decompose L1 in three combinations of two modules as { { A1, B1}, { A1, 
C1}, {B1, C1}} such that modules in each pair maintain their relative order defined in L1• 
Similarly { { A2, B2}, { A2, C2}, {B2, C2}} will be the decomposed pairs of L2• From this 
decomposition, we will have three feasibility configurations of two module pairs as [ {A1, 
decomposed configurations will map to Figure 3.7(a), i.e. they will satisfy theorem 1. 
and (C1, C:J under vertical constraints can be aligned simultaneously. A theorem for a two 
layer feasibility condition is stated as theorem 2: · 
Theorem2: 
D 2, ..... } are two sets of modules located in two different device layers L1 and L2 
respectively. The packing on L1 and L2 are represented by Constrained Sequence Pairs SP 1 
and SP2 respectively. SP1 and SP2 are feasible if module pairs {(A1, A:J, (B1, B:J, (C1, C:J, 
(D1, D:J, ..... } can be vertically aligned simultaneously. The vertical alignment of all these 
module pairs is feasible if: 
Every combination of two module pairs decomposed from L 1 and L 2 (without changing their relative orders) 
construct a feasible configuration by sati.ifying theorem 1 i.e. 
{up u} and fvv v} form a feasible configuration 'if { uv v1 } E SP1; v { u2, v2 } E SP 2" 
48 
The formal propf of theorem 2 is given in Appendix A. The tw~ lqyer feasibility condition 
theorem can be easily extended to include modules vertically constrained in more than two 
layers. Consider an L layer 3-D floorplan with vertical constraints. If the modules are 
constrained in more than two layers, each combination of 2-device layers can be considered 
and its feasibility can be detected based on t:lfeorem 2. If every such ·combination satisfies 
theorem 2, then a solution to the vertical constraint will be feasible. The multi-layer 
feasibility condition is stated as theorem 3: 
Theorem3: 
Multi Layer Feasibility Condition: Given a set of multi-layer constrained sequence pairs, 
there exists a feasible solution to the vertical constraint problem if: 
Each combination of 2-device lqyers satisfies the 2-lt!]er feasibi/iry condition theorem (i.e. theorem 2). 
The formal proof of theorem 3 is given in Appendix A. 
3.5 3DCG: A MODULE PACKING ALGORITHM WITH VERTICAL 
CONSTRAINTS 
A module packing algori~ decodes the topological representation (i.e. data structure) of a· 
floorplan to a geometric floorplan. However, it does not search for an optimal solution. A 
floorplanning algorithm searches for an optimal solution, and uses the module packing 
algorithm as a decoder of a floorplan representation to a real floorplan. Thus, module 
packing is just a small part o.f the overall floorplanning algorithm as seen in Figure 2.3. A 
good module packing algorithm should be able to quickly translate a topological 
representation into a geometric floorplan satisfying the vertical constraints. In a quasi 3-D 
49 
representation, module packing of each layer is generally performed independently because 
each device layer has its own independent topological representation. Thus the packing of 
each device layer does not have information about the module packing of the remaining 
device layers in a 3-D floorplan. In case of vertical alignment of modules located in 
different device layers, their vertical alignment information has to be shared across multiple 
device layers for the packing algorithm to satisfy the vertical constraints. We present a 
graph-based algorith1:1 for module packing that satisfies vertical constraints among modules 
in different device layers in a 3-D floorplan. This algorithm was inherited from Benyi 
Wang, a former Ph.D. student of my advisor, Prof. Chrzanowska-Jeske. The algorithm was 
a part of the basic 3-D floorplanning algorithm [101] discussed in section 2. 7. 
In this method, we create two global 3-D constraint graphs (3-D-X and 3-D-Y') to evaluate 
the 3-D module packing. The global constraint graphs co·mbine the geometric floorplan of 
all device layers and help us define the vertical constraints of modules that are located in 
different device layers. We will only elaborate the method to create a 3-D-X constraint 
graph in this section because the same method is valid for 3-D-Y as well. Let us assume that 
there are only two device layers in the 3-D floorplan. Layer 1 has four modules {1,2,3,4} 
and layer 2 contains modules { 5,6, 7 ,8}. Across these two layers, module 2 has to be 
vertically aligned with module 7. We use the following steps to create the global 3-D-X 
constraint graph: 
1. Create the 2-D constraint graph Gh along the X-axis [53] for each device layer from 
the sequence pair. Thus we obtain two such constraint graphs corresponding to 
two device layers as shown in Figure 3.9(a) and Figure 3.9(b). Each directed edge of 
so 
the graph is assigned a weight (not shown in Figure 3.9) equal to the width of a 
module connected at the tail of the edge. Edges connected to the source and sink 
nodes have zero weights. 
2. We merge the constraint graphs of both layers by adding a global source node in the 
3-D-X graph and connecting it with the source nodes of both device layers (src1 
and src2) using two zero weight edges (see Figure 3.9(c)). Similarly we add a global 
sink node and connect the sinks of each layer with zero weight edges. 
3. If node 2 and node 7 have to be vertically aligned, then we insert two edges of zero 
weight between node2 and node 7 in cyclic (i.e. 2--+ 7 and 7--+ 2) fashion to represent 
the vertical alignment as shown in Figure 3.9(c). Please note that in the case of 
partial vertical alignment, where two modules need to be partly overlapped, these 
two cyclic edges can be assigned non-zero weights. 
Similarly, we create a 3-D-Y constraint graph. The Bellman-Ford's shortest path algorithm 
is used to find the critical path (by changing the signs of edge-weights) on 3-D-X and 3-D-Y 
in order to calculate the module packing along with the chip size. The Bellman-Ford 
algorithm also detects a negative cycle that indicates no solution. Our packing algorithm is 
similar to a 1-D compaction algorithm [68] and it has 0( n') runtime complexity [68]. 
. 51· 

3.6 LCSLS: AF AST MODULE PACKING ALGORITHM WITH VERTICAL 
CONSTRAINTS 
The module packing algo~ithm using 3-D constraint graphs (3DCG) discussed in the 
previous section is slow due to 0( n' ) runtime complexity. In addition,. it requires the 
construction of graphs from the grouped sequence pair (GSP) which takes further 
computational effort. In this section, we present an alternative packing algorithm using 
Longest Common Subsequence (LCS), and ~ateral Shifting is presented as a part of this 
research that does not require the creation of large constraint graphs like the 3DCG 
algorithm. We call it the LCSLS algorithm. 
The LCSLS is a fast algorithm for module packing with vertical constraints. We discuss the 
strategy for packing with alignment of constrained modules along the X-axis because the 
same strategy is valid for alignment along the Y-axis. As the name of our algorithm says, we 
compute packing of each device layer by LCS [58] and then laterally shift the modules 
under vertical constraint to align them vertically. However, the order of shifting a module is 
very important for minimum packing and minimum number of lateral shifting. We achieve 
that by performing a topological sort on a constrained adjacency list Adj_X. The 
construction steps of Adj_X are as follows: 
Let {A1' B1, C1, D 1, E 1}, and {A2, B2' C2, D2' E2} be the modules in Layer 1 and Layer 2 as 
shown in Figure 3.lO(a). An edge between A1 ~ B1 in Figure 3.lO(a) shows that A1 is to the 
left of B1 which is obtained from GSP. A small subgraph is created in each device layer by 
introducing edges between modules. Let A= {A1, A2}, B = {B1, B2}, •.••... ,E = {E1," E2} 
53 

a di~ected acyclic graph (DAG), we check for the presence of cycles while sorting, and a 
cycle indicates no solution. The main steps of the LCSLS algorithm are as follows: 
1. Topologically sort the Adj_X and obtain a list L containing the sorted sub-module 
groups. If cycles are detected during topological sorting, exit/ terminate because th~re 
will be no solution. (Please note that, similar to Bellman-Ford algorithm used to 
find packing in 3DCG algorithm, LCSLS also detects a cycle for infeasible packing). 
2. Calculate module packing along the X-axis in each device layer using the Longest 
Common Subsequence (LCS) [58] method in O(n lg lg n) runtime. This will give the 
x-:coordinates of each module without vertical alignment and x-dimensions of -all 
layers of the chip. 
3 .. Extract a group gfrom the topologically sorted list L. 
4. Shift the x-coordinate of a module mi (in layer z), miE g along the +X-axis to align 
it with the rightmost module of g. For example, let us assume that g = {A1, A2' A 3, 
A 4} and all elements of g are located in four different device layers. If A 4 has the 
largest x-coordinate then all other members of g should be aligned with A 4 only. 
5. Shift all modules which are to the right of mi in layer i by the minimum distance 
such that there is no overlap caused due to the shifting of mi. 
6. Repeat steps 4 to 6 for all mi E g. 
7. Go to step 3 and repeat the process for all g E L. 
55 
Now we perform a complexity analysis of the proposed LCSLS algorithm. Let s be the 
number of vertically constrained groups~ and n be the total number of modules such that s 
< n. The topological sort of step 1 will take O(s + e) ,...., O(l)) where e is the total number of 
edges. Step 2 takes O(n lg lg n) time. Step 3 will executes in 0(1) time. Steps 4 - 5 take O(n) 
and step 6 takes constant time because the size of g can at most be equal to the total 
number of device layers which is very small and fixed. Thus, steps 4- 6 take O(n) time and 
Step 7 repeats them s.times in a loop. Therefore, the runtime. from steps 3 - 7 will be O(s.n). 
Finally, the total runtime of LCSLS (step 1 to step 7) will be O(l + n lg lg n + s.n). 
However, in most practical cases, the total number of vertically ·constrained groups will be 
much smaller than the total number of modules, i.e. s < < n. In those cases, the runtime 
will be reduced to O(n lg lg n) only. In the worst case scenarios:::: n) the runtime will be O(n2) 
which is still faster than the 3DCG packing algorithm The LCSLC algorithm can also be 
used for partial vertical alignment by controlling the spatial distance during lateral shifting. 
Both 3DCG and LCSLS algorithms produce identical floorplans for a given GSP and a set 
of vertical constraints. The quality of an optimized floorplan solution is determined by the 
design of a floorplanning algorithm instead of a module packing algorithm. Module packing 
algorithms only determine how fast they can decode a topological floorplan representation 
to a geometric floorplan. Thus module packing algorithms only affect the runtime of a 
floorplanning algorithm. From the time complexity analysis of 3DCG and LCSLS 
algorithms, it appears th~t 3DCG ?as O(r!) complexity, whereas L~SLS has average case 
O(n lg lg n) and worst case 0(11) complexity. Thus LCSLS is asymptotically faster than the 
3DCG algorithm which will be experimentally verified in Chapter 4 (please see Section 4.9). 
56 
. ' 
CHAPTER 4: PLACEMENT-AWARE 3-D FLOORPLANNING 
4.1 MOTIVATION 
As discussed in Chapter 2 (Section 2.6), the existing 3-D floorplanning algorithm~ 
[13],[45],[64],[65] assume that an entire module is placed on one .device layer. These 
algorithms do not consider the possibility of distributing cells of a module in multiple 
device layers to reduce its wirelength. In other words, floorplanning solutions might benefit 
from 3-D pl~cements of cells within some modules. If a 3-D floorplan is completed 
without considering 3-D placement inside modules, it is too late to perform 3-D cell 
placement because a particular module has been assigned to a single device layer. This idea 
came from the unpublished work of Benyi Wang and Chrzanowska-Jeske [101] which 
provided a groundwork for this research. However, ¢e advantages of 3-D placement have 
been reported by many researchers. For example, a min-cut partitioning based 3-D 
placement [3], used on ISPD'98 placement- benchmark circuits, indicates a 28% - 51 % 
reduction in total wirelength [14] (for 2 to 5 layer 3-D placement) compared to the total 
wirelength of a 2-D implementation of the same benchmark. Furthermore, in 2008, Ma et 
al. [94] presented a case study of the design driver of a superscalar microprocessor. Ma et al. 
first implemented the design driver as a 3-D chip consisting of only 2-D micro-architectural 
blocks, performed 2-D placement & routing within each block, and recorded the chips 
performance, temperature and power. In the next step, Ma et al. implemented the micro-
architectural blocks of the design driver of the superscalar microprocessor as 3-D modules. 
For each block, they performed different 3-D block implementation by varying the number 
57 
of device layers in each implementation. Thus for a particular block, they performed one-
layer, two-layer, three-layer, and four-layer 3-D module implementations. Next they . 
performed thermally driven 3-D placement of logic gates inside all 3-D modules, and 
characterized these modules· in terms of temperature, power, and speed. They developed a 
3-D box packing tool, and treated each module as a 3-D box to perform floorplanning 
using simulated annealing based optimization to select 3-D modules (based on each 
module's characterized values in terms of speed, power and temperature) to optimize the 
speed, temperature and power of the new 3-D implementation of the design driver of the 
superscalar microprocessor. Please note that this case study used a block selection based 3-
D floorplanner and requires multiple 3-D implementations of the same block (by varying 
the number of device layers for each block), which might not be feasible for a real design 
containing large number of modules because it will increase the design cost and the design 
time. 
Finally Ma et al. compared a) the initial 3-D implementation in which all modules were 
~mplemented as 2-D modules, with b) the new 3-D implementation in which modules were 
3-D modules and each module had a different number of device layers. During the 
comparison, Ma et al. reported that multi-layer (3-D) micro-architectural blocks (i.e. case b) 
can improve performance by 14%, reduce power by 10% - 30%, and decrease temperature 
by 11 % compared to single-layer (2-D) blocks (i.e. case a) in 3-D microprocessors . .Thus 3-
D placement of cells inside modules can further reduce total wirelength, power and 
temperature resulting in improved performance of 3-D ICs. 
58 


B 
Layer 2 
Layer 1 
(a) 
, 
, 
, 
, 
, 
, 
, 
, 
, 
r-----
!, __________ _ 
(b) 
B 
Layer 3 
- - - - - - - --- - - -/ 
Layer 2 ,' 
, 
Layer 1 
, 
, 
, 
, 
, 
, 
Figure 4.2: Sub-module pair placed in (a) consecutive (b) alternate device 
layers. Sub-modules in the consecutive device layers minimize the TSV height. 
-----z ---B TSV 
Layer 2 
Layer 1 Layer 1 
(a) (b) 
Figure 4.3: Sub-module p air with (a) the same planar location (b) different 
planar locations. Vertical constraint with the same planar location of sub-
modules produces smaller intra-module wirelength. 
split into non-identical rectangles, a new model to estimate the wiring cost of splitting will 
be required. The evaluation of wiring cost and the need for a wirelength estimation model 
will be discussed in Section 4.5. In addition, new 3-D placement and 3-D routing tools will 
need to be developed. 
61 
4.3 PROBLEM FORMULATION 
Consider a set of n rectangular modules where each module Mi has a fixed area Ai, width ~ 
and height Hi connected by m nets. Let (ki, f'J be the Rent's parameter of Mi and L be the 
total number of fixed active layers. Let (x;, Yi> zJ denote the lower left corner of the module 
M;, where I ~ z; ~ L and ~ belongs to the set of natural numbers. A module can be split 
into S sub-modules. A placement-aware 3-D floorplan is an assignment of (x; , yi>, zJ for 
each Mi and each sub-module Mii of a split module Mi such that no two modules or sub-
modules on the same device layer overlap, and all sub-modules Mii of the same module Mi 
satisfy the placement-aware constraints. We seek a solution to the following problem: 
Placement-aware 3-D Floorplanning with Vertical Constraints Problem (3-D FVC): 
Given a technolo!!J node, a set of n rectangular modules, with areas, aspect ratios and known Rent's 
parameters, connected "i?J m nets in L device lqyers, and a set of placement-aware constraints, find a 3-D 
flootplan that satisfies all the placement-aware constraints and minimizes chip area, inter-module and intra-
module wirelength while controlling the number of inter-module and intra-module vias. 
Please note that Ren't's parameters (k,p) can be preprocessed from the optimized netlist 
(after logic .rynthesis) of modules as reported in [77]. A brief description of the Rent's 
parameter extraction is given in Appendix C. 
4.4 STOCHASTIC COMBINATORIAL OPTIMIZATION 
We use grouped sequence pair (GSP) [102] representation and an evolutionary algorithm 
(EA) [101] in our 3-D floorplanning algorithm. Unlike simulated annealing (SA), an 
evolutionary algorithm (EA) processes a population of potential solutions in parallel (a 
62 
detailed description of SA and EA was presented in Section 2.3). Each individual in the 
population is a unique solution. Parents are subjected to stochastic "reproduction" that 
produces offspring. All parents produce one or more offspring at each generation; parents 
and offspring are ranked_ according to fitness and best fit individuals become the next 
generation population. This subset of an evolutionary algorithm is known as evolutz"on strategy 
(ES)J in which the selection process is purely deterministic. in evolutz"on strategy mutation is 
used as the main operator [103]. Recombination (i.e. crossover) with small probability can 
be used along with mutation to escape out of local minima. However, it is not a "must 
have" condition in evolution strategy. A simple crossover can be achieved by mixing the 
sequences of sequence pairs from two parents. Such crossover will create a very l~rge 
perturbation in the solution search space. Furthermore, in the presence of vertical 
~onstraints, the crossover operator will most likely produce an infeasible solution. 
Therefore recombination is not included and we use mutatz"on operators only. At the 
beginning, we randomly construct a different Grouped Sequence Pair (GSP) for each individual 
solution. Although many CAD algorithms use simulated annealing, we chose the 
evolutionary algorithm because we wanted to explore parallel searching of the solution 
space. The conclusion will be presented in Section 4.9 while c~mparing the results of our 
EA based algorithm with the results of other SA based algorithms. 
4.5 COST ESTIMATION OF WIRELENGTH REDUCTION DUE TO MODULE 
SPLITTING INSIDE 3-D MODULES 
Floorplanning is usually performed when no module layouts are given. Therefore it is 
unknown what the total wirelength is within a 2-D module and how will it change when the 
63 
module is partitioned into multiple sub-modules. Statistical wirelength prediction methods 
become very useful in this scenario. Several wirelength_ distribution models are Widely 
available in the literature for system level interconnect prediction of square-shaped 2-D and 
3-D chips [1],[71],[72],[73]. Some of these models have been further extended to handle 
rectangular shapes [32]. These models show significant reduction in wirelength of 3-D 
chips compared to 2-D ICs. The models are based on Rent's Rule [74]: 
T=kNP (4.1) 
where T is the number of I/ 0 terminals of the chip/ module, N is the total number of 
gates, k is Rent's coefficient and p is Rent's exponent. Additionally, the 3-D wirelength 
distribution models assume that all device layers hav~ the same area and aspect ratio, and 
are placed exactly at the same x-y coordinates. To be able to use these 3-D wirelength 
models we split a module into sub-modules of the same size only. Please note that in case 
of non-identical sub-modules, a ne"Y mathematical/ statistical model for wirelength 
estimation will need to be developed. Furthermore, it is desired to have an analytical model 
for the mathematical model to quickly estimate intra-module wirelength during 
floorplanning. Due to a non-trivial mathematical model, there is no direct extension from 
the previous work. It will require further research in this direction. 
We assume that a) Rent's parameters (k) p) and the average fanout oflogic gates inside each 
module are given, b) a rectangular module. contains only 2-input NAND gate based 
circuitry, and c) gates are organized into a homogeneous array inside the module. Based on 
the area of a 2-input NAND gate cell, we estimate the total number of gates N 
64 
Let us assume that we start with a rectangular 2-D module containing N gates and T 
input/ output terminals which is split into two identical sub-modules. To find the 
wirelength reduction due to such partition, we estimate total wirelength inside the initial 2-
D module and the final 3-D module using our rectangular 2-D and 3-D wirelength 
distribution models [32J. The models in [32] have been generalized by incorporating the 
aspect ratios (aj of the modules as input to the models. Our models also converge with 
models from [1],[71],[72],[73] for ar=1 which is a special case of square shape. A brief 
derivation of the rectangular wirelength distribution models and our analytical solutions is 
given in Appendix B. We have also derived the closed form analytical solutions for total 
wirelength estimation inside 2-D/3-D mQdules and therefore we are able to estimate the 
wirelength reduction in constant time. Experimental results related to our rectangular 
wirelength distribution models have been shown in Appendix B. 
We have observed from our experiments that when 2-D modules are split, the reduction in 
wirelength inside the 3-D module varies between 28 to 50% for 2 to 5 device layers, which 
is in close agreement with the 3-D placement result reported in [3]. The authors of [3] use 
ISPD'98 placement benchmark circuits and perform placement of logic cells in 2-D /3-D 
designs and report percentage reduction in wirelength in 3-D chips with respect to 2-D 
designs. Das et al. [75] have reported similar reduction in wirelength (see Figure 4.4) of 3-D 
designs with respect to· 2-D designs which matches closely with the actual wirelength 
obtained after p~acement . and routing of ISPD'98 placement benchmark circuits. This 
verifies the accuracy of our estimation. In addition, we estimated the total wirelength of 18 
IFU control logic circuitry (2-D modules) of an IBM POWER4 dual core microprocessor 
65 
~0.9 
(I) 
.!:::! 
ca 
E 0 0.8 
c: 
........... 
..c: 
-~0.7 
~ 
~ 
"§ 
ctS 0.6 
-0 
-
0.5 
2 3 
~ predicted (placement-Rent) 
..SV- predicted (partition-Rent) 
-a- placed 
-e- routed 
4 
number of device layers 
5 
Figure 4.4: Predicted vs. placed and routed wi~elengths of ISPD'98 
benchmark circuits. Wirelength of 3-D placement and routing is normalized 
w.r.t. 2-D design to show the percentage reduction due to 3-D design [75]. 
TABLE 4.1: COMPARISON OF TOTAL MEASURED WIRELENGTH (226827.0 GATE 
PITCHES) OF 18 IFU CIRCUITS OF IBM POWER4 WITH THEIR ESTIMATED 
WIRELENGTH USING DIFFERENT WIRELENGTH DISTRIBUTION MODELS. 
Wirelength Estimation Estimated Wirelength Error(%) Methods 
ate itch 
Our 212684.0 -6.24 
Our 225139.6 -0.74 
141256.9 -38.0 
148317.0 -35.0 
206593.7 -9.0 
and compared them with the actual (measured) wirelength obtained from [76]. These IFU 
modules have different rectangular shapes spread over a wide spectrum of aspect ratios. A 
cumulative estimation of the wirelength is shown in Table 4.1. The Rent's parameter (k} p) 
66 
and (k*, p¥J were directly taken from the tables reported in [76],[77].The top two rows 
show our total wirelength estimations which are closest to the measured total wirelength 
when compared with other existing methods reported in [7 6]. This again validates our 
estimation. 
4.6 PERTURBATION OF SOLUTION SPACE 
In every generation, each floorplan is perturbed by randomly selecting one move from a set 
of pre-defined moves. The initial software inherited from [101] consisted swap, invert, rotate, 
exchange, change group, and module-.rplit moves. However, all the existing moves have been 
modified as a part of this dissertation such that the new algorithm searches within feasible 
search space. In addition, some new moves such as insert, submodule merge, and changefeasibiliry 
cof!ftguration have been introduced. 
1) Swap: positions of two randomly chosen modules are exchanged; we perform this 
operation either between non-constrained modules or between constrained sub-
modules. For example, let us assume that modules b and e have been chosen in the 
initial sequence <a, x21 b, c, d, e, y2> for swapping. The sub-modules x2 and y2 in the 
sequence are under vertical constraints (with their respective sub-modules in another 
device layer) but b and e are not under vertical constraints. Therefore swapping 
between b and e is allowed. After performing the swap operation, the final sequence 
will become <a, x21 e, c, d, b, y2>. When swapping occurs between constrained sub-
modules (such as swapping between x2, and y,j, the feasibility configuration· is kept 
unchanged by the method that will b~ explained in the Module-.rplit move. 
67 
2) Invert: the order of a sequence between two randomly chosen points is reversed. The 
invert operation is achieved by continuously swapping modules from these two points 
and moving inward. In addition, the orders of modules under vertical constraints are 
not swapped such that feasibility conditions are preserved. For example, let us assume 
that the sequence <a, x2' b, c, cl, e> needs to be completely reversed and x 2 is under 
vertical constraint. Then we first take "a" and "e" from two extreme ends and swap 
them. Next we chose "x/' and· ''d", but do not swap because x2 is under vertical 
constraint. After that, "b" and "c". are swapped. Thus the inverted sequence becomes 
<e, x2' c, b, d, a>:. 
3) Insert: randomly select a non-constrained module in a sequence and move it to another 
randomly selected location in that sequence. For example, let us assume that we are 
given a sequence <a, x, b, c, cl, e, y> in which a non constrained module b has been 
randomly chosen to be moved from its original location to a location just before 
module e in the sequence. After the insert operation, the new sequence becomes <a, x, 
c, cl, b, e, y>. Since this move operated on non constrained modules, it does not 
disturb the feasibility configuration. 
4) Rotate: swap a module's width a!}.d height. We only rotate the modules which are not 
square shaped. This move is operated on the geometric information of a module (i.e. 
height and width), and therefore it does not disturb the feasibility configuration. 
5) Exchange: positions of two randomly chosen non-constrained modules on two different 
device layers are exchanged. For example, let us assume that sequence pairs of a two 
68 
layer 3-D IC are SP1 and SP2 such that SP1 = {<a, b, c, d, e >,<a, b, c, d, e >},and 
SP2 = { <u, v, w, x, y, z >, <z, y, x, w, v, u >}. If ·module b of SP1 is randomly chosen 
to be exchanged with a randomly chosen module w of SP2, then we simply exchange 
these modules in both sequences of the SP1 and SP2. The resultant sequence pairs 
become as SP1 = {<a, w, c, d, e >, <a, w, c, d, e >},and SP2 = { <u, v, b, x, y, z >, 
<z, y, x, b, v, u >}. Please notice that modules w and b shown in the bold letters in 
SP1 and SP2 have been exchanged. Since exchange is performed on non-constrained 
modules, this operation does not disturb the feasibility configuration. 
6) Change group: randomly select a non-constrained module from a device layer and move it 
to a randomly selected different device layer. This move on non-constrained modules does 
not disturb the feasibility configuration. 
7) · Module-split: split a randomly chosen module w into sub-modules; place them in 
consecutive device layers and preserve the feasibility conditions. As we can see from 
Figure 4.4, the single largest wiring reduction can be achieved at two device layers 
(approximately 30% reduction with respect to 2-D design). When a module is split into 
more than two layers, the incremental reduction in wirelength drastically reduces and 
the reduction saturates to four device layers. Thus we split modules into two sub-
modules only. In addition, we restrict the feasibility configuration such that every 
combination of two sub-module pairs constructs the same type of feasibility 
configuration as it will be explained at the end of the move description. 
69 
8) Submodule-merge: merge all sub-modules of a previously split module and restore its 
original shape. The restored module is placed in a device layer that is randomly chosen 
as one of its sub-module's device layers. 
9) Change-feasibility configuration: change the present feasibility configuration to a 
different feasibility configuration (Figure 3.7). For example, if the present configuration 
is a clique of size 4 as shown in Figure 3.7(a), change it to a clique of size 3 as shown in 
Figure 3.7(c) by simply swapping the constrained modules in 'I'~ to disjoint this node 
from the original clique of size 4: In contrast, when we change the feasibility 
configuration from a clique of size 3 t6 a clique of size 4, then we reorder the 
constrained sequence 'I'~ by moving a constrained module from one position to 
another. 
During the module-split move, we restrict the feasibility configuration such that every 
combination of two sub-module pairs constructs the same type of feasibility. For example, 
let us consider sub-modules {A1, B1 C1} and {A2, B2, C2} that are vertically constrained in 
two layers. Sub-modules, defined by the same name but using different subscripts, are split 
sub-modules of the same module which need to be vertically aligned. If we select the 
feasibility configuration of a clique of size 4, then every combination of two sub-module 
pairs such as { (A1 B1), (A2 B~}, { (A1 C1), (A2 C~} and { (B1 C1), (B2 C~} will all have a size 
4 clique. We sacrifice some feasible solutions but.save runtime. Since we grow the feasibility 
configuration by splitting modules sequentially, we save the configuration information. 
Before splitting, we retrieve the saved feasibility configuration and find the position of a 
70. 
module win the sequence pair which is about to be split. We identify the next consecutive 
layer where a split part of the original module will be placed. For example, in a 4-layer chip, 
if" w" is in layer 2, then its second split part can be either sent to layer 1 or layer 3, which is 
determined randomly. Let us assume that layer 1 is selected and the initial sequence pair in 
layer 2 is shown in Figure 4.S(a) . {A2, B2, C2' D 2} in layer 2 are initially under vertical 
constraints with {A,, B,, C1, D ,} of layer 1 with the feasibility configuration of a clique of 
size 4. Module "w' of layer 2 has been chosen for splitting. Take the positive sequence of 
lqyer 2 as a reference which indicates that "w' resides bet\veen B2 and C2. In the negative 
sequence of l01er 2, use insert to move 1v2 such that it can reside between B2 and C2 as shown 
by the dotted arrow in Figure 4.S(a). In the sequence pair of lqyer 1, find the indices such 
1) Insert ! I 
< .. . Az ... Bz·· Cz .. D z ... w > Layer 2 < .. Az .. Bz .. w ... Cz .. D z.> 
2) Splii& Insert 
< .. Ai··Br .... Cr.Dr> 
"' "' ,XSplit & Insert 
< ... Ai··· B1.4 D 1 .. ~~~ Layer 1 
(a) 
< .. Az .. Bz··wz ·· .Cz .. D z.> < .. ·Az·· · Bz.wz. Cr Dz ... > 
< .. Ai··Br.w1 ···C1··Dr> < ... Ai··· B1w1 ..... C1 .. Di ... > 
(b) 
Layer 2 
Layer 1 
Figure 4.5: An example of Module split move. (a) Sub-modules {A1' B1, C1, 
D 1}, {~, B2, C2, D 2} are vertically constrained in layer 1 and layer 2 under the 
feasibility configuration of a clique of size 4 and their sequence pairs are 
shown. Module "w" initially resides in layer 2 which is about to be split. (b) 
Resultant sequence pairs of layer 1 and layer 2 after splitting. 
71 
that part of "w' can be placed between B1 and C1 after splitting. Split & insert the sub-
module w1 in both sequences of lqyer 1 to these indices as shown by the bold arrows of 
Figure 4.S(a). The final GSP is shown in Figure 4.S(b). Please note that the feasibility 
configuration remains unchanged. 
A non-uniform probability is assigned for the selection of each move. We dynamically 
change the probabilities of all moves in three different stages based on the quality of the 
solution as it proceeds towards convergence. At the beginning, we assign higher 
probabilities to those moves which create large perturbations (e.g. invert, exchange, change-
group, and change feasibility configuration). For insert, swap, and rotate, we increase their 
probabilities gradually in stage 2 and stage 3 while reducing the probabilities of larger 
perturbing moves. We also use multi-stage termination criteria in the designed floorplan. 
For example, if a solution is stuck into local minima at stage 1 and reaches the termination 
criteria, then we immediately move the probability table to stage 2 and reset the termination 
criteria. This insures that the floorplanner will use all the three stages of the probability 
table that are dynamically varied during runtime. 
In general, a set of modules in a floorplanning problem might be identified as eligible for 
splitting; otherwise candidates might be chosen based on predefined criteria. We first 
choose a set of modules which is eligible for splitting based on module sizes. We normalize 
the areas of all modules compared to the average area of modules. Next we disqualify the 
smaller modules from being an eligible candidate for module splitting because splitting 
smaller modules might not give significant benefits in terms of the system level total 
wirelength reduction. In the ne~t step we prepare small buckets containing a set of 
72 
modules. The total number of modules can be different in each bucket. For each individual 
solution of the evolutionary algorithm, we randomly assign one of these buckets that 
contain a set of module split candidates. The module split move only chooses a module 
from the assigned bucket corresponding to each individual floorplan in the population. 
Once a module gets split, it is removed from the bucket. If the sub-modules of a module 
are merged, the module is back injected into the module split bucket. In addition to this 
strategy, we tried keeping all candidates together in one bucket, and assigning the same 
bucket to each individual floorplan solution. However, our experiments did not show good 
results possibly due to aggressive splitting which results in a large packing/ footprint area 
and hence bad (drastically increased) inter-module wirelength .. 
4.7 COST FUNCTION 
In every generation of the evolutionary algorithm, we obtain a set of new floorplan 
solutions due to minimization of a weighted cost function. We incorporate inter-module 
wirelength, reduction in intra-module wirelength due to splitting, via count, and dead space 
in the cost. When vertical constraints are disabled, then the cost function optimizes dead 
space, inter-module wirelength, and inter-module via count only using the following cost 
function: 
Cost = a DS + /3 WL + Yi VCinter 
Upon activation of placement-aware module splitting, the cost function changes to: 
Cost= a DS + /3. (WL- ~~ntrJ +Yi VCinter + Y2 VCintra 
73 
(4.2) 
(4.3) 
where DS is the dead space, WL is inter-module wirelength (obtained from net information), 
~ W intra is reduction in int:ra-module wirelength due to placement of 3-D modules (obtained 
using the statistical method of section 4.5), and VCinter and VC:111ra are inter-module and intra-
module via count respectively. The constants a,fJ,yP and y2 are real-valued ~ning 
parameters that are designer specified for changing the quality. These tuning parameters 
give additional weight to any component of the fitness function. 
In addition to the proposed cost function of eqn. ( 4.2) and eqn. ( 4.3), several other cost 
functions such as a) adding total area instead of dead space, b) adding the reciprocal of 
~ W intra instead of subtractillg it from the inter-module wirelength, c) combining inter and 
intra module via counts, and d) introducing system level total wirelength instead of 
reduction in inter-module wirelength were tried and their solution qualities were examined. 
It was observed that none of these cost functions gave good solution quality. Therefore the 
best performing cost function as shown in eqn. ( 4.3) was selected. 
Although the values of tuning parameter seem trivial, there is no straightforward way to 
determine their exact values. One of fundamental problems for tuning parameters is that 
various components of the fitness function may have different importance, as well as they 
may have different units [104]. For example, the dead space (DS) has a unit in µm2 whereas 
wirelength has a unit in µm. Sirrajarly, via count has no unit, and if the height of vias are 
considered then they have a unit in µm. Thus determination of tuning parameters involves 
learning experience based on the experimental results on different types of floorplanning 
problems. For the floorplanning benchmarks that are frequently used by academic 
researchers, we have observed the following ranges of tuning parameters: a) r:x is used 
74 
between 0.5 to 2.0, b) ~ is used between 0 to 20, c) y1 is used b~tween 100 to 5000, and d) y2 
is used between 500 to 15000. 
The values of these tuning parameters can noticeably change the quality of floorplanning 
solutions. For example, if (X is chosen very high, then the floorplanner will put more 
emphasis on area. Similarly, if~ is kept high and (X is kept low, the floorplanner will put 
more emphasis on wirelength. Furthermore, if ~ is kept zero then the floorplanner will not 
optimize wirelength at all. If y1 is kept in its lower range then the floorplanner will insert 
more ~nter-module vias in order to minimize wirelength. In contrast, if y1 is kept in its 
upper range, inter-module via height will be minimized, but inter-module wirelength might 
go up. If y2 is kept in its lower range, the floorplanner may try to split larger numbers of 
modules. Although it seems desirable, the increase in vertical constraints (imposed by 
placement aware module splitting) also increases area (see section 4.9), which will in. turn 
worsen the inter-module wirelength. 
4.8 DESIGN FLOW OF THE PLACEMENT-AWARE 3-D FLOORPLANNING 
ALGORITHM 
Figure 4.6 shows the flow chart of our placement-aware 3-D floorplanning algorithm that 
was extended from the initial idea of module splitting (Please note that the initial software 
inherited from Benyi is not a placement-aware floorplanner as it onjy optimizes area and inter-module 
wirelength, and it does not have any engine/ model to capture the effect ef placement inside 3-D modules). 
We start with a finite set of randomly generated floorplans without any split modules. Then 
we improve the quality of the solution by making changes in the solution space using a pre-
75 
defined set of moves (Section 4.6) and their probabilities are dynamically varied. During 
perturbation, we explore the possibility of 3-D placement ~side a module (module-split move 
in Secti.on 4.6) by statistically estimating the reduction in wirelength (Secti.on 4.4). If a module is 
split into multiple parts, its sub-modules are put under vertical constraints (Section 3.2) at runtime. 
Furthermore, we restrict moves (see Section 4.6) such that they satisfy the feasibility condition 
theorems of Section 3.4. We use the LCSLS algorithm (Section 3.6) to evaluate the module 
packing. 
As described in Section 4.6, we change the probabilities of selecting various moves in three 
stages during the floorplanning process. To avoid premature termination, if the termination 
criterion reaches before the end of the first stage, we switch the probabilities of moves to 
the second stage and reset the termination criteria. Similarly if a premature termination 
criterion is reached in the second stage we immediately move to the third stage. This step 
ensures that the floorplanner utilize.s all three stages and uses all perturbing moves of 
Section 4.6. We call it a multi-stage termination criterion. 
In Figure 4.6, the leftmost rectangles show the significant modifications made, or the new 
methods (algorithms) introduced during the various steps of the floorplanning algorithm. 
76 
Rent's parameter, & 
technology parameter 
Cost estimation of 3-D 
placement inside a 
module 
Maintain feasibility 
conditions : Reduced 
solution space 
D ynamic probability..._ _ _ _, 
of moves 
LCSLS algorithm 
Intra-module wire, 
via count, dead space 
Multi-stage 
termination criteria 
Input 
Populate initia l set of 
Floorpla ns 
Perturb 
Pack 
modules 
Compute 
Fitness 
Best fit se t of 
Floorplans 
Yes 
Optimized Floorplan 
Figure 4.6: Flowchart of the proposed p lacement-aware 3-D floorplanning 
using vertical constraints (3-D FVC). T he grey shaded p ortions have been 
modified from the initial algorith m. T he left most rectangular boxes in 
orange sh ade are the n ew methods . 
77 
4.9 EXPERlMENTAL RESULTS 
The proposed placement-aware 3-D floorplanning using vertical constraints (3-D PVC) 
was implemented in C++/STL. All experiments were performed on a Sun V490 server 
(4xDual Core Sun SPARC IV+ CPUs} each running at 1.35GHz speed and total 32GB RANf). 
The algorithm is designed to run on a single core only. Our 3-D FVC related experimental 
data are an average of 20 runs of each benchmark. 
We used the two largest MCNC benchmarks (ami33 and ami49) and three largest GSRC 
benchmarks (n100} n200 and n300) for our experiments. The number in each benchmark's 
name denotes the total number of modules in the floorplan. 
4.9.1 Experimental Setup 
For a module's intrinsic wirelength estimation, we assigned different Rer:it's parameters (k}p) 
to different modules in the floorplan. Different values of Rent's parameters were taken 
from [77], assuming that the modules represent different typ'es of circuits, such as logic, 
SRAM, microp~ocessor, control circuitry etc. The Rent's parameters were randomly chosen 
from [77] for different modules of floorplanning benchmarks. These parameters for each 
module of benchmark circuits have been given in Appendix C. The Rent's parameters for 
different types of _circuits differ due to difference in the routing topology, complexity of 
wiring, and different number of I/ 0 terminals. 
The choice of technology node is limited by the number of nets connecting different 
modules, the number of gates inside modules and a need to stay within an acceptable range 
of Rent's parameters that describe the relation between gate count and I/O pins of a 
78 
module. We used a 0.25µm technology node for ami33 and ami49 because they are old 
benchmarks. In addition we have chosen a 0.25µm technology node for n100, a 0.13µm 
technology node for n200 and a 65 nm technology node for n300. This is because the 
aggregate area of all modules remains the same in each of these GSRC benchmarks.but the 
complexity of inter-module wiring and the number of modules increase gradually. In other 
words, the wiring complexity and the number of modules increase while the area remains. 
·constant, which can be interpreted as. technology scaling. The units of MCNC and GSRC 
benchmarks are 1 µm and 10 µm respectively. The same units were assumed in CBA [13]. 
We assume that a 2-input NAND gate cell has an area equivalent to 15001'.2 (to estimate the 
number of gates inside a module) and the average fanout of the circuit is 3.0. Please note that, for 
a real design, Rent's parameters (k,p) can be preprocessed from the optimized netlist (qfter 
logic synthesis) of a module [77]. A brief description of the Rent's parameter extraction is 
given in Appendix C. Furthermore, average fanout and the total number of gates can be 
obtained from logic synthesis. 
4.9.2 Comparison of 3-D Packing Algorithms with Vertical Constraints 
The packing algorithm translates the topological floorplan representation into a geometric 
floorplan, i.e. it assigns the (x, y) coordinates of the lower left corners of modules and 
determines the .chip dimensions. In real life this assignment of x-y coordinates can be 
identified as floorplanning. However, in the floorplanning CAD tool, the module packing 
algorithm only acts as a decoder of the data-structure 0.e. floorplan representation) that 
translates the floorplan representation into the real geometric floorplan. However the 
floorplanning tool requires many more operations to search for an optimal floorplan than 
79 
TABLE 4.2: R UNTIME COMP/~RlSON OF 3-D PACKING WITH V ERTICAL 
CONSTRAINTS 
3D-FVC 3D-FVC # of vertically 
runtime using runtime aligned sub-
Circuit 3D CG usmg modules 
LCSLS 
(s) (s) (count) 
ami33 54 23 28 
ami49 98 52 18 
n100 304 139 40 
n200 851 263 50 
n300 2001 445 48 
Total 3308 922 184 
2200 
2000 
1800 - 3DCG _,._LCSLS 
1600 
- 1400 (/) 
'-" 
<I,) 1200 8 
..... 
.... 1000 c 
:l 
ci:::: 800 
600 
400 
200 
0 
ami33 ami49 nlOO n200 n300 
Figure 4.7: Runtime of 3-D FVC (with increasing problem size) using 
various packing algorithms with vertical constraints. LCSLS is faster than 
3DCG. 
80 
simply decode the data structure to a geometric floorplan, as it was shown in Section 4.8. 
Thus, the packing algorithm is a part of the overall floorplanning CAD tool. 
We compare the effect of 3-D packing with vertical constraints algorithms (Sections 3.5 and 
3.6) on the runtime of 3-D FVC. Tab.le 4.2 shows the runtimes of each benchmark 
obtained using 3DCG and LCSLS algorithms. The LCSLS algorithm is approximately 2x-
Sx faster than the 3DCG algorithm on these benchmarks. When the problem size 
increases, the difference in runtime is larger and this trend can be easily observed in Figure 
4.7. It also verifies the asymptotic complexity analysi~ of 3DCG and LCSLS algorithms 
presented in section 3.6. Based on this observation we have performed all the remaining 
experiments using the LCSLS packing algorithin. During our experiment we have observed 
that the 3DCG and LCSLS algorithms produce identical floorplans for a given Grouped 
Sequence Pair, and a set of vertical constraints. This observation was made by comparing 
the Latex output files generated by 3DCG and LCSLS algorithms during a series of 
floorplanning experiments. We have not shown floorplans obtained by the 3DCG and 
LCSLS algorithms because it will be impossible to visually distinguish two identical 
floorplans (because there is no difference). 
4.9.3 Comparison of Area and Wirelength Mi~imization without Vertical 
Constraints · 
In this sub-section, we compare our results with two state-of-the-art algorithms, CBA [13] 
and 3-D STAF [65]. The thermal optimization in CBA and 3-D STAF is disabled and their 
results are taken directly from [13] and [65]. For a fair comparison, module splitting and 
vertical constraints were disabled in 3-D FVC. We_ exclude via height in inter-module 
81 
wirelength (HPWL) calculation, similar to CBA and 3-D STAF [78]. We report footprint 
area, HPWL, via count and runtime. If a net spans from layer 1 to layer 4, then we count 
three vias. The same method was used in CBA and 3-D STAF [78]. 
Compared to CBA, 3-D FVC produces a 12.6% smaller footprint area, 24.8% smaller 
wirelength and 16.9% fewer via count, as shown in Table 4.3. These par~meters are also 
better than or comparable to those generated by 3-D STAF. 
Table 4.4 shows a runtime comparison among CBA, 3-D STAF and 3-D FVC. Please 
notice that the runtime of CBA taken from [13] was obtained using a 750 MHz CPU, the 
runtime of 3-D STAF [65] was reported using a 3 GHz CPU and our CPU speed is only 
1.35 GHz. Therefore it is only an approximate comparison. We leave it to readers to judge 
the performance. Using linear scaling of runtimes with respect to the 3 GHz CPU speed, it 
is clear that 3-D FVC is faster and it also scales well with increasing problem size compared 
to CBA and 3-D STAF as shown in Figure 4.8. 
82 
00
 
(.)
,) 
T
A
B
L
E
 4
.3
: 
CO
M
PA
RI
SO
N 
OF
 F
OO
TP
RI
NT
 A
RE
A,
 W
IR
EL
EN
GT
H 
AN
D 
VI
A 
CO
UN
T 
OP
TI
M
IZ
AT
IO
N 
W
IT
H
O
U
T 
VE
RT
IC
AL
 C
ON
ST
RA
IN
TS
 
C
B
A
t [
13
] 
3D
-S
TA
Ft
 [6
5] 
3D
-F
V
C
 w
ith
ou
t m
o
du
le
 
sp
lit
tin
g 
Ci
rc
ui
t 
Fo
ot
pr
in
t 
H
PW
L
 
V
ia
 
Fo
ot
pr
in
t 
H
PW
L
 
V
ia
 
Fo
ot
pr
in
t 
H
PW
L 
V
ia
 
(m
m2
) 
(m
m)
 
(co
un
t) 
(m
m2
) 
(m
m)
 
(co
un
t) 
(m
m2
) 
(m
m)
 
(co
un
t) 
am
i3
3 
0.
35
 
22
.5
 
93
 
0.
35
 
22
.0
 
12
2 
0.
36
 
20
.8
 
88
 
am
i4
9 
14
.9
0 
44
6.
8 
17
9 
13
.4
9 
43
7.
5 
22
7 
10
.9
 
44
9 
37
8 
n
10
0 
5.
29
 
10
05
 
95
5 
5.9
 
91
3 
82
8 
5.
20
 
67
8 
71
7 
n
20
0 
5.
77
 
21
03
 
20
93
 
5.9
 
16
86
 
17
29
 
5.
42
 
15
30
 
14
95
 
n
30
0 
8.
90
 
31
50
 
23
26
 
9.
7 
23
79
 
15
57
 
8.
89
 
23
80
 
20
14
 
A
gg
re
ga
te
 re
la
tiv
e 
to
 C
B
A
 
+
0.
37
%
 
-
12
.2
%
 
-
21
%
 
-
12
.6
%
 
-
24
.8
%
 
-
16
.9
%
 
tT
he
rm
al
 o
pt
im
iz
at
io
n 
w
as
 d
is
ab
le
d 
in
 C
BA
 a
n
d 
3-
D
 S
TA
F 
w
he
re
as
 m
o
du
le
 s
pl
itt
in
g 
a
n
d 
v
er
tic
al
 c
o
n
st
ra
in
ts
 w
e
re
 
di
sa
bl
ed
 in
 3
-D
 F
V
C
 f
or
 a
 f
ai
r 
c
o
m
pa
ris
on
 .. T
he
 r
es
u
lts
 o
f 
C
B
A
 a
n
d 
3-
D
 S
TA
F 
w
er
e 
ta
ke
n 
di
re
ct
ly
 f
ro
m
 t
he
ir 
o
rig
in
al
 so
u
rc
es
 [1
3] 
a
n
d 
[65
] r
es
pe
ct
iv
el
y.
 
TABLE 4.4: R UNTIME COMPARlSON WlTHOUT V ERTlCALALlGNMENT 
Actual runtime on different processor 
speed 
Circuit CBAat 3D-STAF 3D-FVC 
750MHz at3.0 GHz at 1.35 GHz 
(s) (s) (s) 
ami33 23 52 10.3 
ami49 86 57 36.1 
n100 313 68 60.5 
n200 1994 397 125.2 
n300 3480 392 209.7 
Total 5896 966 442 
1000 
900 ..,._CBA 
800 - 3D -STAF 
- ..,._3D-FVC 00 
- 700 I!) 
8 600 ..... 
.... 
c: 
~ 500 i:i:: 
"O 400 I!) 
--; 
300 u 
rfJ 
200 
100 
0 
ami33 ami49 nlOO n200 n300 
Figure 4.8: Runtime comparison of various 3-D floorplan algorithms with 
increasing problem size. Thermal optimization and vertical alignment were 
disabled. All runtimes have been scaled linearly to 3 GHz CPU speed. 
84 
4.9.4 Impact of Placement-Aware 3-D Floorplanning with Vertical Constraints 
on System Level Wirelength 
We present the results of 3-D PVC with module splitting on MCNC and GSRC 
benchmarks. Table 4.5 shows the footprint area, inter-module wirelength (HPWL), total 
wirelength, via count, runtime, and reduction in intra-module wirelength due to placement-
aware module splitting. When placement aware module splitting is enabled, 3-D FVC 
reduces the system level total wirelength (inter-module wire + intra-module wire) by 9.88% 
(559628 vs. 504356 mm). The reduction in intra-module wirelength is "'10X compared to 
the inter-module wirelength (5539 vs. 55705 mm). Furthermore, the runtime is comparabl~ 
to or better than other state-of-the-art algorithms. 
The inter-module wirelength goes up slightly upon activation of module splitting in 3-D 
FVC because the cost of module splitting is activated in the cost function (see eqn 4.3 in 
Section 4.7), i.e. L1~ntra changes from zero to a positive number. Therefore the EA engine 
sees the introduction of an additional parameter in the cost . function for combinatorial 
optimization. If there is a large reduction in intra-module wirelength at the expense of a 
minor increment in inter-module wirelength, the optimization engine accepts it as a better 
solution. 3-D FVC purposely introduces additional vias (intra-module vias) to further reduce 
wirelength. However, the increase in via count is still under the projected via-density [1 OJ by 
the year 2010 onwards (please see Figure .1.7 in chapter 1). Please note·that this projected 
via density is the basis o~ our assu!llption that vias will not be a strong limiting factor in 
future 3-D ICs as explained in Section 4.1. Upon activation of module splitting, we observe 
an increment in footprint area which may be due to the following reasons: a) area and 
85 
wirelength optimization generally follows a trend similar to a banana curve; b) to sati~fy the 
vertical constraints, the LCSLS packing algorithm laterally shifts sub-module~, which may 
result in area overhead. 
In this experiment, the values of tuning parameters were chosen for various benchmarks as 
follows a) for ami33, ex = 1, ~=17, y1 = 2000, y2 = 5000; b) for ami49, ex = 1, ~=12, y1 = 
5000, y2 = 15000; c) for n100, ex= 1, ~=10, y1 = 2000, y2 = 5000; d) for n200, ex= 1, ~=8, y1 
' . 
= 2000, y2 = 4000; and e) for n300, ex = 1, ~=15, y1 = 2000, y2 = 15000. As discussed in 
Section 4. 7, the floorplanning results may chang~ if different tuning parameters are used in 
the fitness function. A set of ranges for the tuning parameters have been also suggested for 
designers in Section 4.7 which is based on the sensitivity analysis presented in Appendix D. 
A comparison of various parameters obtained from different floorplanners is presented in 
Figure 4.9 which shows that our footprint area, inter-module wirelength, and inter-module 
vias remain smaller or comparable to CBA and 3-D STAF's results (without thermal 
optimization; Table 4.3). Please note that prior to module splitting, the intra-module 
wirelength is the same for any floorplanner which does not split modules. Thus we 
statistically compute the intra-module wirelength of all 2-D modules and use it to estimate 
the total wirelength of CBA, 3-D STAF and 3-D PVC by adding their respective inter-
module wirelengths. In addition, inter-module wirelength is a small fraction of the total 
wirelength. Therefore Figure 4.9(b) shows similar total wirelength for CBA and 3-D STAF. 
Howe~er, 3-D ·PVC (with MS) reduces total wirelength by ,....,9.8% compared to both 
floorplanners. A 3-D-floorplan of ami49 with split modules and satisfying all placement-
aware constraints is shown in Figure 4.10. 
86 
9000 3D-FVC 600000 30 -STAF 30 -FVC 
..c: 8000 CBA 30-STAF 30 -FVC (WITH- 1 
CBA (NT) (NO-MS) ~ (NT) (NO-~ I S) 580000 c: MS) 
" 
7000 
~ '-" 560000 
-~ 6000 ..c: ~ ...._ ~ 540000 30 -FVC 
" a sooo c: 
" 
(\\' ITH-
"3 s c:i 520000 MS) 
"C '-" 4000 
.!::l 0 ~ 500000 E 3000 
.!. -; 480000 
" 2000 0 E !-< 
- 1000 460000 
0 440000 
(a) (b) 
30 -FVC 
41 30 -STAF (\\'!TH- 8000 ~ 30 -STAF 30 -FVC 30 -FVC CBA (NT) ;-is) c CBA 36 g 7000 (NT) (NO-MS) (\~ 'ITH -
...._ 31 ~ 6000 MS) N 
" < ;; 5000 s 26 g " 21 :; 4000 
"C 
" 
0 3000 
" 
16 ~ < 11 " 2000 c 
...... 1000 6 
0 
(c) (d) 
Figure 4.9: Comparison of (a) inter-module wire, (b) total wirelength (c) 
footprint area, and ( d) inter-module via count, obtained using different 
floorplanning tools. Total wirelength is the sum of inter and intra module 
wirelength. Vertical constraints are applied only when module splitting (MS) is 
activated. 3-D FVC (with MS) reduces total wirelength by 9.8%. The bar charts 
represent the aggregate data of all the benchmarks. Please note that each chart 
has a different y-scale. 
87 
00
 
00
 
T
A
B
L
E
 4
.5
: 
E
FF
E
C
T 
O
F 
3-
D
 P
V
C
 W
IT
H
 M
O
D
U
LE
 S
PL
IT
TI
N
G
 O
N
 T
H
E
 S
Y
ST
EM
 L
E
V
EL
 T
O
TA
L 
W
IR
E
L
E
N
G
TH
 
O
N
 A
 4
-L
A
Y
ER
 3
-D
 F
LO
O
R
PL
A
N
. 
Fo
ot
pr
in
t 
In
te
r-
Sy
ste
m
 le
ve
l 
In
te
r-
In
tr
a-
R
un
tim
e 
R
ed
uc
tio
n 
m
o
du
le
 
to
ta
l 
m
o
du
le
 
m
o
du
le
 v
ia
 
a
t 1
.3
5 
in
 i
nt
ra
-
Ci
rc
ui
t 
w
ire
le
ng
th
 
w
ire
le
ng
th
 
v
ia
 
G
H
z 
m
o
du
le
 
(m
m2
) 
(co
un
t) 
C
PU
 
w
ire
le
ng
th
 
(m
m)
 
(m
m)
 
(co
un
t) 
(s) 
(m
m)
 
am
i3
3 
0.
44
 
26
.7
 
16
11
 
13
3 
56
1 
29
.0
 
-
48
5 
am
i4
9 
12
.4
 
50
4 
11
34
48
 
44
8 
98
67
 
56
.7
 
-
30
00
0 
n
10
0 
5.
95
 
73
8 
32
10
2 
97
2 
35
31
 
16
5.
4 
-
73
70
 
n
20
0 
5.
99
 
15
90
 
69
72
6 
17
76
 
46
89
 
30
1.
9 
-
60
50
 
n
30
0 
9.9
1 
25
00
 
28
74
69
 
22
69
 
41
86
 
50
6.
8 
-
11
80
0 
A
gg
re
ga
te
 
34
.6
9 
53
58
.7
 
50
43
56
 
55
89
 
22
83
4 
10
59
.8
 
-
55
70
5 
00
 
"°
 
45
 
·
-
32
A 
40
 
26
 
u.
 
~
 
-
-
32
B 
48
 
I 
36
 
46
 
·
5B
 
6 
8 
28
 
10
 3
8 
·
37
 
34
 
31
\ 
.
 
21
 
.
.
_
_
 
aa
 
-
-
3g
 2
7 
42
 
17
 
30
 
OB
 
-
-
~
 .
 
2
A
. 
4AI
 
47
A
 4
3A
 
-
~ 
47
B 
43
B 
2B
. 
1B
 l
29
B 
L
ay
er
 1
 
La
ye
r 2
 
La
ye
r 3
 
La
ye
r 4
 
Fi
gu
re
 4
.1
0:
 A
 4
-la
ye
r 3
-D
 fl
oo
rp
la
n 
o
f a
m
i4
9.
 P
le
as
e 
n
o
tic
e 
th
at
 p
ar
ts
 o
f m
o
du
le
 0
 (O
A, 
OB
), 1
, 2
, 3
, 4
, 
5,
 7
, 2
9,
 3
2,
 4
3,
 4
4 
a
n
d 
47
. h
av
e 
be
en
 p
la
ce
d 
in
 c
o
n
se
c
u
tiv
e 
de
vi
ce
 la
ye
rs
. T
he
 p
la
na
r l
oc
at
io
ns
 o
f s
u
b-
m
o
du
le
s 
a
re
 t
he
 s
a
m
e
. 
\0
 
0 
T
A
B
L
E
 4.
6:
 E
FF
E
C
T 
O
F 
FE
A
SI
BI
LI
TY
 C
O
N
D
IT
IO
N
S 
O
N
 T
H
E
 S
O
LU
TI
O
N
 Q
UA
LI
TY
 O
F 
3-
D
 P
V
C
 
3D
-F
V
C
 w
ith
ou
t f
ea
sib
ili
ty
 c
o
n
di
tio
ns
 
·
 3D
-F
V
C
 w
ith
 fe
as
ib
ili
ty
 c
o
n
di
tio
ns
 
Fo
ot
pr
in
t 
In
te
r-
R
ed
uc
t;i
on
in
 
N
o.
 o
f 
Fo
ot
pr
in
t 
In
te
r-
R
ed
uc
tio
n 
in
 
N
o.
 o
f 
Ci
rc
ui
t 
m
o
du
le
 
in
tra
-m
od
ul
e 
sp
lit
 
m
o
du
le
 
in
tra
-m
od
ul
e 
sp
lit
 
(m
m~
 
w
ire
le
ng
th
 
w
ire
le
ng
th
 
m
o
du
le
s 
(m
m~
 
w
ire
le
ng
th
 
w
ire
le
ng
th
 
m
o
du
le
s 
(m
m)
 
(m
m)
 
(co
un
t) 
(m
m)
 
(m
m)
 
(co
un
t) 
am
i3
3 
0.
40
 
24
.7
 
-
31
1 
8 
0.
44
 
26
.7
 
-
48
5 
15
 
am
i4
9 
12
.4
 
53
4 
-
29
60
0 
9 
12
.4
 
50
4 
-
30
00
0 
9 
n
10
0 
5.
68
 
70
9 
-
57
10
 
16
 
5.
95
 
73
8 
-
73
70
 
25
 
n
20
0 
5.
53
 
15
60
 
-
26
80
 
9 
5.9
9 
15
90
 
-
60
50
 
25
 
n
30
0 
9.
43
 
25
10
 
-
77
80
 
17
 
9.9
1 
25
00
 
-
11
80
0 
25
 
A
gg
re
ga
te
 re
la
tiv
e 
to
 3
D
-F
V
C
 w
ith
ou
t f
ea
sib
ili
ty
 c
o
n
di
tio
ns
 
+
3%
 
+
0.
3%
 
-
20
%
 
+
68
%
 
4.9.5 Effect of Feasibility Conditions on the Solution Quality of 3-D FVC 
To study the benefits of feasibility conditions, we performed experiments in two different 
cases a) we disabled the feasibility conditions and removed the restricted set of associated 
moves (of Section 4.6) from the 3-D FVC, and b) we used the 3-D FVC algorithm 
described in Section 4.8 which is our placement-aware 3-D floorplanning tool i.e. feasibility 
conditions were activated in this case. Both of these cases have the same set of tuning 
parameters and the same cost function (as defined in Section 4.7). Next we compared the 
results of experiments obtained using these two cases. Table 4.6 shows the comparison of 
footprint, inter-module wire, reduction in intra-module wirelength due to splitting and 
number of split modules. Please note that 3-D FVC with faasibiliry conditions (i.e. case b) splits 
68% more modules and reduces intra-module wirelength by 20% at the expense of a 3% 
increment in footprint compared to 3-D FVC without feasibility conditions (i.e. case a). 'fhe 
inter-module wirelength remains comparable. Runtime increases by 10% because packing 
takes longer due to an increased number of split modules/vertical constraints. Thus 
feasibility conditions split more modules and further reduce wirelength. As a result, it 
introduces an additional number of intra-module vias which is still under the projected via 
density (by IBM) as previously discussed in Sub-section 4.9.4. 
Since our approach introduces intra-module Vlas to harness the advantages of 3-D 
integration, our approach is limited by the via density and the penalty associated with by the 
TSV technologies. If the TSV technology does not allow dense TSV placetp.ent withjn 3-D 
ICs, then our approach will not work. Fortunately the continuous advancement of TSV 
technology favors our assumptions, and TSV size continues to shrink [10][69]. Furthermore 
91 
our algorithm is limited by the optimization of area, system-level wirelength, and via count. 
Our algorithm is based on a stochastic search method (evolutionary algorithm) in which 
different solutions are obtained for different experimental runs.· Additionally, using 
different tuning parameters for the cost function (described in Section 4. 7) changes the 
solution quality. T~s is a known issue in the combinatorial optimization based on 
stochastic search methods (such as SA and EA) [104]. 
92 
CHAPTER 5: 3-D FLOORPLANNING WITH MODULE ALIGNMENT 
5.1 INTRODUCTION AND MOTIVATION 
In this chapter, vertical alignments between modules assigned to different device layers in 
3-D ICs are considered at the floorplanning stage. Vertical alignments are important in 
memory intense design, mixed signal, and future comprehensive systems due to delay, 
· substrate noise, and power reduction, to name just a few. In memory intense design such as 
a microprocessor consisting of a CPU and L2 cache memory will be required to be 
vertically aligned for low latency as shown in Figure 5.1. The alignment of blocks in 
different device layers is desired for aligning modules on the same bus (bus driven 3-D 
flootplannin~ e.g. SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instruction 
Multiple Data). The SIMD performs the same type of operation on multiple data in which 
a common data pool is connected to multiple processing units. The MIMD architecture has 
-a number of processing units that function independently using the same data pool. MIMD 
machines with shared memory have processors which share a common, central memory. In 
a simple approach, all processors are attached to a bus which conriects them to memory. 
These constraints will be defined based on the design information which is available to 
d.esigners. Furthermore it might be desired to control the relative position of certain blocks 
to reduce temperature and noise inside the 3-D chip. One approach could be to use a 
thermal and noise model to for simultaneous optimization during floorplanning. However, 
. . 
in the presence of macro blocks, the power and temperature profile of macro blocks could 
be easily available to designers. Similarly the knowledge of noisy digital blocks and noise 
93 
Layer . ..\ssig1m1ent 
}.lodule Repulsior 
Figure 5.1: Example of different placement constraints in 3D floorplanning. 
sensitive analog blocks will be available to designers in the case of macro blocks. It might 
also be necessary to restrict certain group of modules within a particular set of device layers 
due to different types of substrates (such as bulk-Si or Silicon-on-Insulators), different types 
of technology nodes, or due to thermal constraints. For example, in a microprocessor 
consisting of a CPU logic and an L2 cache block, the CPU is generally hotter (due to ALUs 
and control logic) than L2 cache memory. In this case, the CPU block can be placed closer 
to the heat sink whereas L2 cache memory can be placed above the CPU and away from 
the heat sink. Finally, certain groups of modules might be required to be spatially separated 
due to the noise sensitivity of the modules as shown in Figure 5.1. 
The advantages of designer specified constraints are that designers get a control over the 
optimization process. Based on the design information (either at block level power, thermal 
or noise analysis, or information from third party designers for macro blocks), designers 
94 
might have want to impose certain constraints prior to the optimization process. 
Furthermore, in the case of design reuse, designers can use their prior experience and 
knowledge from the previous design. Finally the designer specified constraint also reduces 
the search space of the floorplan. In case of optimization through the cost function of the 
floorplanner, an accurate and fast noise, thermal, and power model is required, and these 
models will add additional dimensions in the solution space that ~ also increase runtime. 
Since the floorplanning algorithms are based on stochastic search methods, it might not be 
possible to reach the desired solution quickly. Furthermore, it does not allow any designers 
to control the optimization process in terms of the vertical module alignment, module 
repulsion, and layer assignment constraints shown in Figure 5.1. However, if designers do 
not have sufficient information about the circuit blocks, optimization through the cost 
function might be desired. In this problem we focus on designer-specified sets of 
placement constraints only. 
5.2 PROBLEM FORMULATION 
We present a 3-D floorplanning with module alignment (3-D FMA) algorithm that 
considers a) vertical alignment of modules located in different device layers (vertical 
constrain!), b) layer assignment for modules based on technology node or substrate type 
information (lqyer assignment constrain!), and c) places certain set of modules away from each 
other (module repulsion constrain~. Figure 5.1 shows an example of these constraints. In this 
algorithm, we allow designers to specify these constraints as input information. 
95 
Consider a set of n rectangular modules where each module Mi has a fixed· area Ai, width W; 
and height Hi connected by m nets. Let L be the total number of fixed active layers. Let (xi 
yiJ z;) denote the lower left corner of the module M~ where 1::; z; ::; L and Zi belongs to the 
set of natural numbers. A cluster is defined as a set of floorplan modules. A set of clusters is 
composed of several independent clusters. Let G = {g1, ,&, & ... ,&} be a set of clusters. Thus 
each cluster g; E G contains a set of floorplan modules. Modules inside each cluster g; E G 
need to be vertically aligned by having the same lower left corner (xi , yiJ) but different .z,. 
Similarly, let R = h, r2, r3, .. .. ,rk} be a different set of clusters, and a particular cluster 1j ER 
contains a group of modules. Modules inside each cluster 1j E R need to be placed apart 
from each other by having different X; and/ or Yi such that they do not overlap. 
Furthermore, let D; be a set of allowed device layers for a module Mi where it can be legally 
placed. A 3-D floorplan with module alignment is an assignment of (x;, Yi' .zJ for each M; 
and such that, all members of G satisfy vertical constraint, all members of R satisfy repulsion 
constraints, and each M; satisfies ltjyer assignment constraint. We seek a solution to the following 
problem: 
3-D Floorplanning with Module Alignment (3-D FMA): Given a set of n rectangular 
modules, with areas and aspect ratios, connected 1:J m nets in L device lqyers, a set of vertical alignment 
constraints, module repulsion constraints, and lqyer assignment constraints, find a 3-D flooplan that 
satisfies all these constraints and minimizes chip area, inter-module wirelength, and the number of inter-
module vias. 
96 
5.3 COMBINATORIAL OPTIMIZATION AND THE COST FUNCTION 
y:.'e use the evolutionary algorithm described in Chapter 4, as the mai~ optimization engine 
for 3-D FMA. We use the vertical constraints of the grouped sequence pair (GSP) derived in 
Chapter 3 to explore within the feasible solution search space. However, a penalty function 
is incorporated in the cost function to satisfy the repulsion constraints. It is due to the fact 
that in all cases of feasibility and infeasibility conditions (derived in Chapter 3), it is possible 
to separate modules from each other. Finally, we use special operators (restricted perturbation 
ef the solution search space) to satisfy the layer assignment constraints, and to maintain the 
feasibility of the solution. 
Let a module i has the lower left X-coordinate x1 , and width W;. Similarly another module j 
has the lower left X-coordinate ~ , and width wl Thus the pair wise penalty t:uij for these 
modules is calculated as: 
1 1 1 1 t:u .. = + +. + ..,.__--------~ Ix; -xjl +& l(x; +w;)-xjl+& Ix; -(xj + w)I+& l(x; + w;)-(xj + w)I+& 
(5.1) 
where s is an infinitesimally small positive real number. Please note that each term of 
eqn(5.1) represents the distance (along the X-axis) between two corners of the two 
modules. In addition, if we detect that any corner of a module vertically overlaps with 
another module under MR constraint (i.e. the x coordinates of the two modules are the 
same across different device layers), we heavily penalize it by artificially increasing the value 
97 
of flxij. Similarly ~yij is also calculated for the Y-coordinate of the two modules. 
However &ij which denotes the vertical distance across the two device layers containing 
module i, and module j is calculated as shown in eqn (5.2): 
1 
&ij = ILayer(i) - Layer(})! + 1 (5.2) 
where Layer(i) denotes the device layer ID of a module i. 
Thus the repulsion penalty function, Penalty(R.) is computed as eqn (5.3): 
size(R) rk-1 rk 
Penalty(R) == L L L ( !hiJ + ~y!i + &!i) (5.3) 
k=l i=l j=i+l 
where size(R.) is the total number of groups in the repulsion constraint set ~ and rk is the 
number of modules in a repulsion group k E R 
Finally, the cost function (Fitness) for the EA engine is designed as shown in eqn. (5.4): 
Cost== a.DS + f3.WL + y.VC + z.Penalty(R) (5.4) 
where DS is dead space, WL is inter-module wirelength, and VC is inter-module via 
count. In addition, Penalty(R) is the sum of the penalty for violations of module repulsion 
constraints. The constants a, fJ, r, and X are tuning parameters that can be changed by the 
user to fine tune the solution quality. In the floorplan optimization, we minimize the 
weighted sum of the cost function given by eqn. (5.4). Please note that eqn. (5.4) is similar 
98 
to eqn. ( 4.2) which is used for area, inter-module w~relength, and via count minimization. 
The only difference is that the penalty function has been added to eqn. (4.2) to obtain eqn. 
(5.4). For the floorplanning benchmarks that are frequently used by academic researchers, 
we have observed the following ranges of tuning parameters: a) r:x is used between 0.5 to 
2.0, b) ~ is used between 1 to 10, c) y is used between 1000 to 5000, and d) X is used 
between 1.0 to 2.0. The ranges of these tuning parameters are chosen based on the 
sensitivity analysis of the cost function presented in Appendix D. 
The values of tuning parameters can noticeably change the quality of floorplanning 
solutions. For example, if r:x is chosen very high, then the floorplanner will put more 
emphasis on minimizing area. Similarly, if ~ is kept high and r:x is kept low, the floorplanner 
will put more emphasis on wirelength minimization. If ~ is kept zero then the floorplanner 
will not optimize wirelength at all. If y is kept in its lower range then the floorplanner will 
insert more inter-module vias in order to minimize wirelength. In contrast, if y is kept in its 
upper range, inter-module via height will be minimized, and inter-module wirelength might 
go up. If X is kept zero, the floorplanner may not put more emphasis on separating 
modules under MR constraints. However, if Xis non-zero, then floorplanner puts a penalty 
in the cost function in order to gui~e the floorplanner to separate modules under MR 
constraints. 
The values of tuning parameters used (or experiments have been given in Section 5.5. 
99 
5.4 PERTURBATION OF THE SOLUTION SPACE 
We randomly generate an initial set of floorplan solutions at the beginning that do not 
satisfy any ~f the user specified constraints. Next, we adjust the locations of module.sunder 
vertical constraints, repulsion constraints and layer assignment constraints. Then we balance 
the number of modules in each device layer. 
To satisfy the vertical constraints of different sets of modules, we restrict the moves such 
that we only search within feasible solution search space by applying the feasibility 
condition introduced by the theorems of section 3.4. It was stated in theorem 1, ''from a 
software implementation point of view, theorem 1 can be sati.ified ry checking if either { 'l't, 'I';} or {'I'~, 
'I';} are connected ry an edge, i.e. modules in those node pairs are in the same order." Thus we satisfy 
this condition by simply keeping the same orders of constrained groups of modules under 
vertical constraints in each layer, either in their positive constrained sequences or in their 
negative constrained sequences. For example, let us assume that modules inside each of the 
\ 
following three groups, u = {utJ u2, uJJ u4 }, v = {v1, v2, v3, v4 }, and w = {wtJ w2, w3, w4 } are 
under vertical constraints in a 4-layer 3-D IC. In this case, we can create constrained 
sequence pairs such that either all positive constrained sequences {'I'~, 'I';, 'I';, 'I';} or all 
negative constrained sequences {'I'~, 'I';, 'I'~, 'I'~} of all device layers have the same order of 
modules belonging to different groups. Continuing our example, if all the positive (or 
negative) constrained sequences of different device layers are as { Uv Vv w1}, { u 2, v2, w2}, { u 3, 
vJJ w3 }, and { u4, v4, w4}, the solution will always be feasible. This order can be changed, and 
as long as the corresponding changes in all device layers are the same, it will always insure 
feasible solutions. For example, {u2, Wv v2}, {u1, wJJ v1}, {u3, w2, v4 } and {u4J w4J v3} either in 
100 
the positive or in the negative constrained sequences will also produce feasible solutions. 
Please note that each set represents the positive (or negative) constrained sequence of a 
particular device layer. Furthermore, it does not matter which modules go to which device 
layer. It only matters that their group order be the same. In this method we sacrifice a 
fraction of feasible solution space but avoid checking of feasible solutions. For example, let 
us assume that { 31' b1, c1} and {b2, a2, c2} are the two positive constrained sequences of two 
device layers. In these two positive constrained sequences, only the order of { ~' b1} and 
{b2, 3-i} are different which makes it a different example than the previous one in this 
paragraph. If the two negative constrained sequences of the two device layers have either a) 
{a1, b1} and {az, b2,}, or b) {b1, a1} and {b2, az,} orders of modules, the solution will be 
feasible. Thus the restriction might not allow searching all the feasible floorplan solutions 
and in some cases a few good solutions might never be considered. In the worse case, i.e. if 
there is onfy one good solution which is not reachable by our restricted moves then the 
floorplanner might not find it. However, due to enormous solution space such as 
nk-I (n !)2 /(k-1) ! for a k-layer 3-D chip containing n modules [45], the chances of 
reaching the worst case (i.e. the best solution is .not reachable by the flootplanner due to the restricted 
moves) situation are negligible. 
Next we use the following moves - insert, swap, invert, exchange, change group, and rotate that are 
the same moves as presented in chapter 4 (please see section 4.6). In addition, we have 
introduced the following set· of new restricted moves (spedal operators) to perturb the 
solution search space: 
101 
• V alignExchange: In this move, we only exchange modules within a vertically 
aligned group. For example, if the group g = {m1, m2, m3 m4 } has all its modules under 
vertical constraints and located in four different device layers, then we only perform 
inter-layer exchange of two randomly selected modules from this group. 
• ValignChangeGroup: This perturbation is similar to ValignExchange except that it 
moves a module m; of a group g from one device layer to another layer. Please note 
that this move can only be effective if the number of modules present in a vertically 
constrained group is smaller than the total number of device-layers in a 3-D chip. 
• ValignPerturbFeasibilityOrder: This move changes the order of modules within 
the constrained positive sequences while keeping their orders the same in all device layers. 
For example, let us say that we swap two modules from two different vertically 
constrained groups, such that their relative position changes in one device layer. 
Then we perform the swap operation in all the remaining device layers between 
modules from these two constrained groups, such that the resultant order of 
modules' groups will be the same across all device layers. Similarly, insert and invert 
operations are performed on the constrained positive sequences of all device layers. The 
same operations can also be performed on the constrained negative sequences of all device 
layers. 
The ValignPerturbFeasibili!JOrder move explores the different feasibility tonfigurations for 
module alignment (MA) constraints. At the same time, it also creates large perturbation in 
the search space because the order of modules (under MA constraints) in each device layer 
102 
needs to be rearranged to maintain the feasibility of a solution. The larger the probability of 
the ValignPertu,rbFeasibili!JOrder move, the more frequently the entire floorplan of all device 
layers will change with a large perturbation. . The large perturbation may be desirable at an 
early stage of the floorplan optimization. However, frequent use of this move may not lead 
to a good solution, or the sol~tion might not converge. Other two perturbations, 
ValignExchange and ValignChangeGroup also search in the feasible solution space. However 
the perturbations created by these two moves only perturb two device layers instead of all 
device layers. We sometime perturb the layout of all the device layers in just one move, and 
rest of the time we only perturb the layout of two device layers. Thus the combination of 
these three moves helps in creating drastic as well as moderate perturbation while 
maintaining th.e feasibility of vertical alignment of modules. 
The probabilities of moves are changed in three stages (similar to those discussed in 
Chapter 4). Thus, for each move we provide three probabilities correspqnding to the three 
stages of the move. The probabilities are a) for insert (0.14, 0.14, 0.22), b) for swap (0.14, 
0.20, 0.28), c) for invert (0.24, 0.18, 0.15) d) for ·exchange (0.13, 0.09, 0.05) e) for ChangeGroup 
(0.13, 0.09, 0.05), j) for ValignExchange (0.06, 0.1, 0.02), g) for ValignChangeGroup (0.04, 
0.03, 0.01), h) for ValignPerturbFeasibili!JOrder (0.07, 0.09, 0.02) , and i) for rotate (0.05, 0.08, 
. 0.2). There is no straightforward mathematical theory behind the decision about the 
probabilities of moves. The values of these probabilities were chosen based on the initial 
experiments performed by varying· the probabilities of various moves, and studying their 
effects on various floorplan benchmarks. A simple logic is to assign high probabilities at the 
early stage of the optimization process to those moves which create large perturbations 
103 
because initial solutions are random and they have not converged. At the later stage, high 
probabilities are assigned to those moves which create small perturbations because at the 
later stage floorplan solutions start to converge to an op~al solution. 
5.5 EXPERIMENTAL RESULTS 
We implemented 3-D FMA, the proposed 3-D floorplanning with module alignment, in 
C++/STL. All experiments were performed on a Sun V490 server (4xDual Core Sun 
SPARC IV+ CPUs, each running at 1.35GHz speed and total 32GB RAM). The algorithm is 
designed to run on a single core only. Our 3D-FMA related experimental data are an average 
of 20 runs of each benchmark. 
We used the two largest MCNC benchmarks (ami33 and ami49) and three largest GSRC 
benchmarks (n100, n200 and n300) for our experiments. The number in each benchmark's 
name denotes the total number of modules in the floorplan. Since the MCNC benchmarks 
are old and small in problem size, we present our experiments for MCNC and GSRC 
separately. We used the following tuning parameters for various floorplan benchmarks: a) 
for ami33, cx=1, ~=20, y=2000, x=2, b) for ami49, cx=1, ~=12, y=2000, x=2, c) for n100, 
cx=2, ~=10, y=2000, x=2, d/ for n200, cx=1, ~=10, y=2000, x=2, and e) for n300, cx=1, 
~=12, y=2000, x=2. These values were chosen based on the sensitivity analysis presented in 
Appendix D which suggests a certain range for each tuning parameter for the optimal 
solution. 
5.5.1 Effect of Module Alignment on MCNC Benchmarks 
104 
TABLE 5.1: EFFECT OF DIFFERENT PLACEMENT CONSTRAINTS ON A 4-LAYER 3-
D FLOORPLAN USING MCNC BENCHMARKS 
# of constraints Footprint HPWL Via Runtime 
circuits LA Relative 
ami33 
3 
2 0 14.1 465 53.7 ami49 
0 14.5 
512 
50.0 
1.17 
0 490 439 1.09 
3 
3 2 6 14.5 499 450 58.7 1.28 
The ami33 and ami49 circuits from MCNC benchmarks only contain a small number of 
modules. Thus we keep the number of constraints the same on these benchmarks. First we 
perform experiments without any constraints as a baseline. Then we use only vertical 
constraints for module alignment (MA). Next we apply vertical constraints for module 
alignment (MA) and module repulsion constraints (MR). Finally we apply MA, MR, and 
layer assignment constraints (LA) simultaneously. 
We report footprint area, inter-module wirelength (HPWL), via count (i.e. number of 
TSVs), and runtime on 4-layer 3-D floorplanning. Table 5.1 shows the comparison of 
these parameters. From the table, it is clear that 3-D FMA optimizes area, HPWL, and via 
count in the presence of various constraints. The runtime increases between 9% - 40% 
when the different types of constraints are combined for simultaneous optimization. When 
the MA constraints are applied, the footprint area increases on average by 30% and the 
HPWL increases by 15% - 30%. Via count remains approximately the same. Please 
105 
observe that the effect of MA constraints on the footprint area and HPWL dominates over 
the penalty caused by MR and LA constraints. 
5.5.2 Effect of Increasing the Number of Module Alignment Constraints on 3-D 
Floorplanning using GSRC Benchma*s 
In this subsection we use GSRC benchmarks and vary the number of module alignment 
(MA) constraints to observe its effect on the solution quality and runtime. Table 5.2 shows 
the varying numbers of MA constraints, and their effect on area, HPWL, via count, and 
runtime for n100, n200, and n300 benchmarks. We increase the number of MA constraints 
from 5 to 15. As a result, the footprint area increases on an average by 11.5% - 32. 7%, 
HPWL increases by 4.5% - 15.5% whereas via count approximately remains the same. The 
runtime penalty is between 1.35x to 1.65x for . module alignment constraints varying 
between 5 to 15. The area increases due to the module alignment that is achieved by 
laterally shifting modules during the LCSLS packing algorithm. Due to the alignment of 
modules, the wiring may need to take longer routing path and it results in increased HPWL. 
The effect of module alignment has no effect on via count. 
5.5.3 Composite Effect oflncreasing the Number of Various Constraints on 3-D 
Floorplanning using GSRC Benchmarks 
In this experiment, we simultaneously apply MA, MR, and LA constraints, vary the number 
of these constraints,_ and observe their effects on the quality of floorplan solutions. Table 
5.3 presents the comparison of footprint area, HPWL, via count, and runtime. The number 
of MA constraints is varied between 5 to 15, the same as previous subsection 5.5.2. We 
observe that the footprint area and HPWL increments are approximately the same as 
106 
·, 
TABLE 5.2: EFFECT OF INCREASING THE NUMBER OF MODULE ALIGNMENT 
CONSTRAINTS ON A 4-LAYER 3-D FLOORPLAN USING GSRC BENCHMARKS 
#of 
MA 
·Avg.# Footprint HPWL 
of 
Circuits constraints modules 
per MA 
n100 
n200 
n300 5 3.0 9.80 2480 10.70 10 2.8 2570 
15 2.8 11.50 2700 
Via Runtime 
Actual Relative 
2020 521.1 1.57 
2012 579.8 1.75 
2013 602.5 1.81 
TABLE 5.3: COMPOSITE EFFECT OF INCREASING THE NUMBER OF DIFFERENT 
CONSTRAINTS ON A 4-LAYER 3-D FLOORPL.{\.N USING GSRC BENCHMARKS 
Circuit 
n100 
n200 
n300 
107 
previous subsection's result (presented in Table 5.2). This indicates that the effects of 
module repulsion (.MR) and layer assignment (LA) constraints on the solution quality are 
negligible (on average within 1 % of the results in Table 5.2) compared to MA constraints. It 
is due to the fact that MR constraints are satisfied by a penalty function, and they do not 
require any changes in the physical floorplan. The layer assignment constraints are satisfied 
by the restricted moves (special operators) during the perturbation of the solution search 
space. Thus it also does not require additional changes on the geometric floorplan whereas 
MA constraints involve lateral shifting of modules on the geometric floorplan for the 
vertical alignment. 
5.5.4 Runtime Comparison of 3-D FMA with LTCG based 3-D Floorplan~ing 
Algorithm 
In this sub-section we compare the runtime of our proposed 3-D FMA algorithm with the 
layered TCG (LTCG) based 3-D floorplan algorithm (67] for vertical module alignment. 
However, the comparison of 3-D FMA with LTCG is approximate due to the following 
differences a) LTCG performs floorplanning of soft modules whereas 3-D FMA performs 
floorplanning of hard modules, b) The problem formulation of LTCG is such that it 
vertically aligns "m" modules out of a set of "k" modules where k > m. In contrast, the 
number of modules for vertical alignment in each group is fixed in 3-D FMA, i.e. m = k 
which is more strict constraint than LTCG's formulatic:m c) LTCG (67] optimizes only area 
for soft modules and maintains vertical module alignment whereas 3D-FMA minimizes 
area and inter-module wirelength and via count while maintaining the vertical module 
alignment. Therefore we only compare runtime with LTCG in Table 5.4. In addition we 
108 
also report the total number of . MA constraints for 3-D FMA, and LTCG based 
floorplanner. Please note that the runtime of LTCG was reported using a 3.2 GHz CPU 
[67] while our CPU's speed is only 1.35 GHz. Considering our slower CPU speed, 3-D 
FMA is faster than LTCG (please see Table 5.4). Figure 5.2 shows a runtime comparison of 
LTCG and 3-D FMA with the actual runtime, and linearly scaled runtime. 
5.5.5 An Example of a 4-Layer 3-D Floorplan with Various Constraints 
Figure 5.3 shows a 4-layer 3-D floorplan obtained using 3-D FMA. The module groups for 
MA constraints are {O, 3}, {1, 2, 5, 32}, and {10, 17, 20, 48}. Please observe in Figure 5.3 
that modules within each MA group have been placed in different device layer and they are 
vertically aligned. Similarly module groups under MR constraint are, { 4, 33} and { 11, 28}. 
Please note that modules within each group are placed away from each other, and they are 
not vertically aligned. Finally, the group of modules under the layer assignment constraints 
in Layer 1 is {3, 33, 40}, in Layer 2 is {O, 7, 32}, in Layer 3 is {12}, and in Layer 4 is {13}. 
Please observe that modules under the LA constraint have been placed in their specified 
device layers. Thus 3D-FMA produces feasible solution satisfying these constraints. 
The 3-D FMA algorithm can optimize area, inter-module wirelength, and via count while 
simultaneously satisfying designer specified set of constraints. 'fhese constraints are useful 
in bus driven 3-D design, and heterogeneous 3-D integration. The tradeoffs (in terms of 
footprint area and wirelength) associated with the MA, MR and LA constraints can be 
minimized by modifying the tuning parameters of t?e cost function as discussed in Section 
. . 
5.3. An approxirna;e runtime comparison with an LTCG b~sed 3-D floorplanner [67] 
shows that 3-D FMA is faster and scales well with increasing problem size. 
109 
TABLE 5.4: RUNTIME COMPARlSON OF 3-D FMA WITH LTCG BASED 3-D 
FLOORPLANNER 
Circuits LTCG @ 3.2 GHz CPU 3-D FMA @ 1.35 GHz CPU 
Number of MA Runtime [67] Number of MA Runtime 
constraints [67] (s) constraints (s) 
ami33 2 28 3 21.4 
ami49 3 63 3 50.0 
n100 5 548 5 102.5 
n200 N / A N / A 5 233.8 
n300 N / A N / A 5 521.1 
-+- LTCG @ 3.2 GHz ~ 3-D FMA @ 1.35 GHz 
- • - 3-D FMA Scaled to 3.2 GHz 
600 
500 
,..--._ 400 VJ 
'-"' 
<!) 
6 300 
·o 
c:: 
~ 200 
100 
0 ----- - --
arni33 ami49 n100 
Figure 5.2: Runtime Comparison of 3-D FMA with LTCG 
110 
~
 
~
 
~
 
40
 
I 
20
 
3g
 
I 
35
 
I 
33
 
3· 
2 
I 
10
 
I. 
21
 
l 
-
~
 
7 
14
 
0 
32
 
.
 
~
·
 
6 
22
 
.
 
-
43
 
31
 
46
 
:tJ
 
26
 
42
 
.
 
15
 
38
 
H
 
36
. 12
 
-
-
5 
37
 
44
 
28
 
le-
!!. 
45
·1
24
~2
7 
301~
 
.
 
11
. 
12
5 
47
 
I 
I 4 
I 1
 
13
 .
 
~
 
La
ye
r 1
 
La
ye
r 2
 
La
ye
r 3
 
La
ye
r 4
 
Fi
gu
re
 5
.3
: 
A
 4
-la
ye
r 
3-
D
 f
lo
or
pl
an
 o
f 
a
m
i4
9 
o
bt
ai
ne
d 
u
si
ng
 3
D
-F
M
A
. 
M
od
ul
e 
gr
ou
ps
 u
n
de
r 
M
A
 
c
o
n
st
ra
in
ts
 a
re
 {O
, 3
}, 
{1,
 2,
 5
, 3
2}
, a
n
d 
{1
0, 
17
, 2
0,
 4
8}
. M
R
 c
o
n
st
ra
in
t g
ro
up
s 
a
re
 {4
, 3
3}
 a
n
d 
{11
, 2
8}
. 
L
ay
er
 a
ss
ig
nm
en
t c
o
n
st
ra
in
ts
 in
 L
ay
er
 1
 =
 
{3
, 3
3,
 4
0}
, L
ay
er
 2
 =
 
{O
, 7
, 3
2}
, L
ay
er
 3
 =
 
{12
}, 
a
n
d 
L
ay
er
 4
 
=
 {i
3}
. 
.
 
CHAPTER 6: TSV-INDUCED 3-D IC YIELD 
6.1 TSV FABRICATION TECHNOLOGIES 
In this Section we will discuss the detailed fabrication technologies of through silicon vias 
that provide vertical interconnection across different device layers. As briefly discussed in 
Chapter 1, there are two types of TSV technologies depending on the order of their 
formation: a) via-first, and b) via-last. 
In the via-first process, the majority of the inter-layer vias can be formed during the wafer 
bonding process (i .e. right after the wafer alignment process) but before the top wafer 
thinning process. The drilling of TSV s is performed by one of the t\vo available processes 
a) laser drilling, and b) dry etching or Bosch etching. Laser drilling creates holes in the 
wafers without any micro cracks. Similarly, dry etching creates deep, steep-sided holes and 
trenches in wafers. The formation of TSVs consist of an electrical isolation layer such as 
lntemrabl pads, I/ Os, or powet"/ cround 
(a) 
(MeUil, 
[ LMdk, 
.. -..1 
Adhesive 
Bonding. 
Oxide 
Bonding. 
or Oxide 
Rl!C~I! 
IA<rGap) 
Si Subst rate 
(b) 
Device 
surface 
Figure 6.1: TSVs in 3-D ICs using (a) via-first, and (b) via-last methods. [107] 
112 
Si02 or other dielectrics, a liner or barrier layer (made of titanium, tantalum, TiN, or TaN), 
and via metal filling (made of copper, tungsten, or highly doped polysilicon) [107]. After the 
formation of the metal filled holes, the wafers are thinned and bonded together. The 
thinning of wafers is done by grinding and chemical mechanical polishing (CMP) [9]. In 
the via-last process, TSVs are formed after the wafers are aligned and bonded, and the top 
wafer is thinned. Similar to the via-first process, TSVs formed by the via-last method 
consist of an electrical isolation layer, a liner, and via metal fill [107]. The wafers are 
attached either by adhesive-to-adhesive bonding or oxide-to-oxide bonding as shown in 
Figure 6.1. 
In the via-first technology, TSV s only pass through the thinned Si-substrate and the rest of 
the connection is carried through local interconnects and bond vias (vias made at the 
interface of the bonding layer) as shown in Figure 6.1(a). In via-last technology, TSVs pass 
through the thinned Si-substrate as well as all metal layers (because wafers are already 
aligned and bonded) as shown in Figure 6.1 (b). As a result, the TSV height in the via-first 
technique is shorter than those formed using the via-last technique. To mechanically 
support the long TSVs formed using the via-last technique, the diameter of the TSVs are 
kept longer than the diameter ofTSVs formed u,sing the via-first technique [107]. 
We discussed the serious 3-D IC yield loss issue posed by TSV failure (see sub-section 
1.5.6). For a quick recap TSVs suffer from thermo-mechanical stress caused by a difference 
in the Coefficient of Thermal Expansion (CTE) of copper TSV s and the surrounding 
dielectric. As a result, for a given temperature, copper expands more than its surrounding 
dielectric, resulting in thermo-mechanical stress. In this chapter we will focus more on the 
113 
TSV induced yield issue. Prior to that, let us look into other alternative via technologies that 
are available for vertical interconnections in 3-D I Cs. 
6.2 ALTERNATIVE VIA TECHNOLOGIES: WIRELESS VIAS 
In addition to the through silicon vias, other alternative via technologies such as AC 
coupled interconnects (ACCI) have been proposed by [82],[83],[84]. For a full swing digital 
signal, its edges carry the digital information and its DC component carries no information 
[82]. An ACCI transmits digital information on the edges of a signal. These ACCI are 
circuit based solutions in which a wireless link across multiple device layers is established 
through transmitter (Tx) and receiver (Rx) circuits. Two types of ACCis have been 
proposed by researchers as follows: 
• Capacitive ACCI: In this topology, the transmitter and receiver circuits 
communicate with each other using a capacitive interface as shown in Figure 6.2(a). 
A voltage-mode driver transmits a signal which is converted into voltage pulses after 
passing through the coupling capacitor. The .voltage pulses are reshaped to a full 
swing digital signal by the receivers. 
• Inductive ACCI: In this topology, the transmitter and receiver circuits are interfaced 
by two spiral inductors which are formed across the two device layers as shown in 
Figure 6.2(b). These two spiral inductors construct a transformer to communicate 
With the Tx and Rx circuits. In an inductive ACCI, a current mode driver transmits a 
signal which is converted into current pulses after passing through the inductor pair. 
The current pulses are reshaped to a full swing digital signal by the receivers. 
114 
Since wireless vias are circuit based solutions, they do not suffer from thermo-mechanical 
· stress which makes them a promising alternative to TSV s. Capacitive coupling based vias 
can only be used in two layers (face-to-face) and they ar~ prone to low frequency noise. 
However, in~uctive vias are immune to low frequency noise and can be extended to more 
than two layers 'but they consume .more power [82] (14.5mW per via far 5Gbps data transfer at 
180nm technology node) compared to capacitive vias and TSV s. Both capacitive and inductive 
vias require additional area to implement Transmitter (Tx) and Receiver (Rx) circuits. A 
single wireless via needs additional Tx and Rx circuits in two device layers. The additional 
area overhead estimation will be presented in Chapter 7. The capacitive and inductive 
ACCI use the same type of Tx and Rx circuits and the coupling capacitors/inductors are 
formed in_ the metallization layer. Thus both capacitive and inductive ACCI occupy the 
same amount of Si area. Circuit schematics of Tx and Rx circuits for an inductive ACCI are 
shown in Figure 6.3. As a proof of concept, researchers have also fabricated a 3-D test chip 
using inductive ACCI and 0.35µm bulk-CMOS process as reported in [85]. They were able 
to achieve 2.5Gbps speed and performed for a 27 -1 pseudo random binary sequence with 
no errors for more than 2.513 bits, after which they stopped the measurement 
(a) (b) 
Figure 6.2: Concept of ACCI (a) Capacitive ACCI (b) Inductive ACCI. [82] 
115 
MP1 
RX.JN_POS 
MP2 
R)(_IN_NEG ----1----11 
MN1 l.tN2 
(a) 
(b) 
Figure 6.3: Schematic of (a) transmitter, and (b) receiver circuits of a 
wireless via using inductive coupling [82]. 
due to time constraints of the measurement. Furthermore, a new pulse based circuit 
technique using a 90 nm technology node to raise the aggregated data rate up to 1 Tb/ s, 
with high reliability (BER <10·1~ has been presented in [95]. It reduces the pulse width in 
the transmitter which in turn . reduces the power consumption, because transmitter 
consumes power only when the pulse current flows. These inductive vias are based on 
standard CMOS techniques that do not require new process development. Unlike TSVs, 
they do not need ESD protection and experimental results show that inductive ACCI 
116 
exhibit good misalignment tolerance ( + / - 3µm tolerance within 5% increase in power 
consumption) [95]. The two misaligned spiral inductors of inductive ACCI are still able to 
transmit signal using a weaker magnetic field than a perfectly aligned pair of inductors. 
However, the weak magnetic field is increased by increasing the current (hence the increase 
in power consumption). Thus good misalignment tolerance is achievable in inductive vias. 
6.3 CARBON NANOTUBE BASED INDUCTORS FOR WIRELESS VIAS 
'Researchers [86],[87] have recently report~d the following properties of Carbon Nanotube 
(CNT) based inductors: 
• The radius (r) of a CNT is several nm compared to the radius of a copper wire which 
is about several µm. Therefore the magnetic field (H) induced by current (I) in a 
CNT is about one thousand times larger than that induced by a copper wire. H ~_I_ 
2trr 
• The relation between magnetic field (H) in the inductor and the inductance (-L) of the 
inductor is f µ0H 2 dV :::::: Lf 2 • Thus large magnetic field results in the large inductance. 
2 2 
• CNTs can be bent with small curvature. Therefore an inductor made using CNTs 
has a smaller footprint than a copper inductor. 
Furthermore, the large magnetic field induced by a small current in a CNT was 
experimentally measured using a magnetic force microscope and reported in [86],[87]. Since , 
the on-chip CNT based inductor has a smaller footprint than a copper based inductor, 
denser packing of inductive vias is possible. In addition, recent research works have shown 
117 
promising results for use of CNTs as passive inductors in analog circuits such as LNA [88] 
and decoupling cap~citors [89]. Thus in our work, we assume that high density inductors 
with smaller footprint areas are feasible in 3-D ICs. 
6.3 YIELD AS A FUNCTION OF TSV FAILURE 
We consider the y1eld problem in a 3-D system-on-chip in the presence of defects in 
through silicon vias. To operate the chip, we assume that all TSVs need to be fully 
functional. The objectives for solving this problem are as follows: 
• Find new strategies such that all connections between device layers inside a 3-D chip 
can be established. 
• Develop a model to estimate yield for a given TSV failure rate. 
One solution to the 3-D yield problem (caused by thermo-mechanical stress) would be to 
replace all TSV s with inductive vias in a 3-D .chip. However, the power penalty associated 
with them might be: unacceptable. Therefore in this work we focus on minimizing power 
consumption in wireless vias while solving the yield problem in 3-D ICs. In the next 
chapter 3-D IC yield improvement will be discussed in detail. 
118 
CHAPTER 7: VIA REDUNDANCY FOR 3-D IC YIELD IMPROVEMENT 
Let us consider via redundancy within 3-D chips. It is a well known technique for yield 
enhancement. For a chip to be functional, all connections have to be satisfied. Thus we 
propose redundant via insertion such that every connection through TSV s can be 
established inside a chip. The objectives are to find new strategies for yield enhancement 
and to develop models to estimate yield for a given via failure rate. 
· A simple approach for yield enhancement would be to use redundant TSV s in parallel with 
the primary TSVs which is a common practice in traditional 2-D ICs. We studied the same 
technique for 3-D IC yield improvement by inserting a redundant TSV in parallel with a 
primary TSV and connecting them directly with a wire. Please note that in this 
configuration, there is no MUX inserted. We presented this study in [35],[36] and reported 
that for 1 OK primary TSV s and a 1 % defect rate, the obtained functional yield was 35% 
which was far below the acceptable range of yield. We further analyze the number of 
redundant TSV s per primary TSV required in order to achieve an acceptable yield. Let us 
assume that there are "I' redundant TSV s connected in parallel with one primary TSV. The 
functional yield probability can be analytically expressed as: 
(7.1) 
Where Pd is the probability of defect (0 < Pd < 1 ), ViaCount is the total number of TSV s in 
a chip, and Yfis the functional probability of yield (0 < Yf < 1). Solving eqn (7.1) to find the 
119 
TABLE 7.1: NUMBER OF TSVs REQUIRED To BE CONNECTED IN PARALLEL 
WITH EACH PRIMARY TSV IN 3-D CHIPS To OBTAIN 90% FUNCTIONAL YIELD 
Defect Via Count: 1 OK ViaCount: 20K ViaCount:90K ViaCount: 1M 
rate r= r= r= r= 
1 2 2 2 3 
2 2 3 3 4 
3 3 3 3 4 
4 3 3 4 4 
5 3 4 4 5 
6 4 4 4 5 
7 4 4 5 6 
8 4 4 5 6 
9 4 5 5 6 
10 4 5 5 6 
. number of redundant TSV s required to be connected in parallel with each primary TSV for 
an acceptable/ desired value of functional yield is obtained as follows: 
r = 
log(l-(Y1 )~ J 
log(Pd) 
- 1 (7.2) 
From Table 7.1, it is obvious that for a reasonable yield of 90%, 2 to 5 redundant vias per 
primary TSV are required for defect rate of 1 to 10%, and via count of 10K to 90K in a 3-
D chip. However, even these redundant TSV s will also suffer from thermo-mechanical 
stress and will be prone to failure. They will also consum~ more routing resources in the 
vertical direction, and may cause congestion because electrical connections through TSV s 
pass through all metal layers as well as thinned substrates in the 3-D stack. Thus adding so 
many redundant vias will increase the thermo-mechanical stress, consume additional Si-
120 
area, and will also create routing congestion in the vertical direction. Furthermore, putting 
two or more redundant vias very close to each other is not desired because the thermo-
mechanical stress shifts from TSV s to the thinned substrate which can permanently damage 
the device layer [27],[28]. Thus it may not provide an efficient solution. Therefore we 
consider redundancy solutions that require less than 100% redundant vias. 
To reduce the number of redundant vias, an alternative approach would be to make 
connections with redundant vias reconfigurable, i.e. a redundant via can be connected in 
place of a failed primary TSV in its neighborhood depending on the failure of a TSV. We 
can achieve the re-configurability using MUX-logic. To minimize the effect of thermo-
mechanical stress, an alternative approach would be to use redundant wireless vias in 
addition to the primary TSV s. The advantages are that these wireless vias will not fail due to 
stress, and they will save routing resources in the vertical direction. We only consider 
inductive vias [82] because they can be used for more than two layers. 
To minimize the impact of process variation (such as CMP variation), TSVs are preferred 
to be arranged uniformly. Thus we assume that: a) through silicon vias are uniformly 
distributed in rows and columns, b) the probabilities of defects in TSVs are uniform, c) 
wireless vias are 100% functional due to the absence of thermo-mechanical stresses, d) 3-D 
integration is achieved by die-to-die .(DTD) and die-to-wafer (DTW) methods using 
known-good-dies only. For simplicity, we consider the insertion of redundant vias in two 
layer 3-D ICs only. We will show in Chapter 8 that it can be easily extended for more than 
two device layers. 
121 
7.1 REDUNDANCYLATTICES 
In order to elaborate our proposed redundancy, we present different types of redundancy 
lattices that are used to construct various redundancy configurations in a device layer. Since 
the redundancy will be achieved using MDX-logic to re-route the failed TSVs, we propose 
lattices that use 2:1, 4:1, or 8:1 MUXes only because going beyond an 8:1 MUX might have 
an unacceptable area/ delay penalty. The redundant via ar~angement topologies have 
following different lattices: 
• Quad Lattice (QL) a redundant via is located at the center of a square and each 
corner has primary TSVs as shown in Figure 7.1 (a). Primary TSVs are those vias 
which are originally introduced during the 3-D IC design. The redundant via can be 
re-routed using a 4: 1 MUX to connect any one of the four TSV s. Please note that the 
redundant via can either be a wireless via or a physical TSV. 
• Octal Lattice (OL). A redundant via is inserted at the center, and eight primary 
TSVs are located around it (Figure 7.1(b)). An 8:1 MUX is used to connect any one 
oftheTSVs .. 
• Dual Lattice (DL): Two primary TSV s are covered by one reconfigurable 
redundant via as shown in Figure 7.1(c). The redundant via re-routing is done using a 
2:1 MUX. 
122 
(~) 
---------, ~121-!.al 1~!!Lc~ J 
,,,,. -0- , 
/ '\ 
( D ' \ I 
"-..__Q_ .,,,/ 
(c) 
I 
I 
' I 
\ 
\ 
',O 
' 
' 
--------- - 1 
:Qc!aJ1'i1!ti_c~~ 
-------
0 0 
D 
0 0 
---------
(b ) 
:o- - P~i~~~-?rsv~11 
- - -----------
0 ./ ,, 
... 
\ 
\ 
I 
' I 
I 
Figure 7.1: Proposed lattice structures for redundant via insertion in a device 
layer of a 3-D IC (a) Quad Lattice, (b) Octal Lattice, and (c) Dual Lattice. 
The redundant via can either be a wireless via or a physical TSV. 
7.2 REDUNDANCY EVALUATION FACTORS 
To characterize the redundant via configurations, we define the following evaluation 
factors: 
• Coverage factor (RT): It is the number of primary TSVs covered by a redundant 
via. It also indicates the sizes of MUXes used. For example, in Figure 7.2, the 
redundant via of the shaded lattice at the center covers four primary TSV s within the 
lattice. Hence RT = 4. 
• Redundancy factor (TR): It is a number of redundant vias covering a primary TSV. 
In Figure 7.2, the primary TSVs of the shaded lattice at the center are covered by two 
redundant TSVs. Thus TR= 2. 
123 
• Lattice overlap factor (LO): It is the number of lattices that partially overlap with a 
particular lattice. In Figure 7 .2, the shaded lattice at the center overlaps with four 
lattices (shown l?J ~ashed drcles) and therefore LO= 4. 
• TSV overlap factor (OV): A set of numbers that indicates the number of primary 
TSV s which are common between two overlapping lattices. In Figure 7 .2, the shaded 
lattice at the center and any neighboring lattices (shown by dashed circles) have exactly 
one common primary TSV. Thus OV =1. However, OV can be more than one 
number in a given layout. For example, if we analyze the grey shaded lattice at the 
center of Figure 7.4, OV is 3 for its overlapping lattices in adjacent rows (i.e. 
neighboring lattices above and below the grey shaded lattice), but OV is 4 for its 
overlapping lattices in the same row (i.e. lattices to the left and right of the grey 
shaded lattice). Thus OV = {3, 4}. 
The coverage factor (RT) indicates the sizes of MUXes used. If RT increases, then 
theoretically yield is expected to improve because it increases the reachability of a 
redundant TSV by primary TSV s. Similarly, increasing the redundancy factor (TR) should 
increase the yield because it enhances the probability of repairing a failed primary TSV. 
Furthermore, the lattice overlap factor LO should also increase the functional yield because 
it enhances the functional probability of all TSV s within a lattice. TSV overlap factor (OV) 
is used to distinguish the layout of different redundancy configurations in a device layer. Its 
effect on yield is already included in LO. 
124 
7.3 REDUNDANCY CONFIGURATION IN A DEVICE LAYER 
Using the lattice structures introduced in section 7 .1, we evaluate the different arrangements 
of primary and redundant vias in a device layer of 3-D chips for yield improvement. These 
configurations are divided into two categories that will be explained in sub-sections 7.3.1 
and 7.3.2 respectively. 
7.3.1 Wireless Via Redundancy Configuration 
• Quad Wireless Plus Configuration (QWP): In.this case, RT= 4, TR= 2, LO= 
4, and OV = 1. It is possible to repair all four failing TSV s within a lattice in this 
configuration. A quad wireless plus configuration is shown in Figure 7.2, ·where all 
edges are covered by regular wireless. vias while the remaining rows are covered by 
wireless vias in alternate columns. This configuration makes sure that each primary 
TSV, excluding the comer ones, can be re-routed by at least two different wireless 
vias. In this configuration, each lattice interacts with four neighboring lattices as 
shown by the shaded lattice in Figure. 7 .2. Excluding the lattices at the edges, only 
one TSV is shared with any neighboring lattices in this configuration. Due to the 
redundant vias in alternate columns, 'there are vacant sites (where wireless vias could be 
inserted) as shown in Figure 7 .2. 
• Octal Wireless* (OW*) Configuration: Here, RT = 8, TR = 2, LO = 4, and OV 
= 2. The interaction of lattices in this configuration is shown in Figure 7.3. This 
configuration can be useful in saving area in case of a smaller defect rate. 
125 
• Octal Wireless Plus Configuration (OWP): Here, RT = 8, TR = 4, LO = 10, and 
OV = { 3, 4}. The interaction of an Octal Lattice with its neighboring lattices is 
·shown in Figure 7.4. Furthermore, there are vacant sites between any two 
con~ecutive redundant vias in a row as shown in Figure 7.4. Please note that if all 
eight TSV s within a lattice are failing, it is still possible to repair all of them 
simultaneously. 
7 .3.2 Physical Via Redundancy Configuration 
• Dual TSV (DT) Configuration: Here, RT= 2, TR= 2, LO = 2, and OV = 1. 
Two primary TSV s are covered by one redundant TSV as shown by the shaded 
lattice in Figure 7.5. 
• Quad TSV Complete (QTC) Configuration: Here, RT = 4, TR = 4, LO = 8, and 
OV = 2. It is constructed using redundant TSV s and Quad Lattices. The layout in a 
device layer is similar to Quad Wireless Plus except that redundant TSVs are inserted in 
consecu#ve columns instead ef alternate columns in each row, i.e. thry are also inserted on the vacant 
sites between af!Y two redundant mas in a row. The examples of vacant sites are been shown 
in Figure 7 .2. 
• Octal TSV Plus (OTP) Configuration: In this case, RT = 8, TR = 4, LO = 10, 
and OV = {3, 4}.This is similar to the Octal Wireless Plus configuration (see Figure 
7.4), except that its redundant wireless vias are replaced by redundant TSVs. 
126 
• Octal TSV Complete (OTC) Configuration: H ere, RT = 8, TR = 8, LO = 20, 
and OV = {3, 4, 6}. It uses Octal Lattices with redundant TSVs. Its layout is similar 
t the O ctal Wireles Plus configuration ( Plea e ee Figure 7.4) except that redundant 
TSVs are inserted in consecutive columns instead of alternate columns in each row, i.e. thry are also 
inse1ted on the vacant sites betJveen at'!)' flvo redundant vias in a rou;. The examples of vacant 
sites are shown in Figure 7.4. For a lattice in a given row, there are seven overlapping 
lattices (including the redundant vias that can be placed on all vacant sites) in each adj acent row 
above and below it, and six overlapping lattices in the same row. Thus LO = 2 x 7 + 6 
= 20. 
Vacant Sites 
RT = 4 
a a TR = 2 
0 0 0 0 0 
L0 =4 
a a a a a 
0 0 ()\ - = 1 
a a 
0 0 
a 
0 0 
a I 
0 o/'-o 0 
./ 
' a a a a 
0 0 0 0 0 0 
D D D 
Figure 7.2: Quad Wireless Plus (QWP) configuration. If all the four TSVs 
within the shaded quad lattice fail, then it can be repaired by the 
neighboring wireless vias. 
127 
RT= 8, TR = 2, LO = -J., OY = 2 
... ~········· ······ ........ . 
..... ~········· ···· .... . .... ~········ 
Figure 7.3: Interaction of an Octal Lattice (shaded region at the center) with 
its neighborhood lattices in an Octal Wireless* Configuration. 
Vacant Sites 
RT = 8, TR = '-L LO = 10, OY = { 3A} 
Figure 7.4: Interaction of an Octal Lattice (grey shaded region at the center) 
with its neighboring lattices in an Octal Wireless Plus configuration. 
128 
RT= 2 
D D D TR= 2 
I LO = 2 
0 0 
m 
ov = l 
D D 
0 0 
' I I 
D D D I 
I I 
' 
I 
0 0 ' 0 / ... _, 
Figure 7.5: Interaction of a Dual (grey shaded) lattice with the neighboring 
lattices (encircled by dotted lines) in a Dual TSV redundancy configuration. 
7.4 YIELD ESTIMATION BY MONTE CARLO SIMULATION 
We implemented the proposed redundancy configurations for Monte Carlo simulations in 
C ++ / STL and performed simulations by varying the defect rate. We used 1 OOK chips and 
each chip contained 1 OK primary TSV s. A chip is treated as failed when a primary TSV fails 
and its connection cannot be repaired by a redundant via. Defects in TSVs (both p11r11ary and 
redundan~ are randomly inserted based on the given percentage of defect rate. Wireless vias 
are assumed to be defect free due to their immunity to thermo-mechanical stress. We used 
1 - 10% defects in TSV s for our experiments. 
A comparison of 3D IC yields obtained by different redundancy configurations that use 
redundant TSVs, is shown in Table 7.2. From the table, it can be observed that Octal TSV 
Complete configuration produces the best yield because it has the highest RT=8, TR=8, and 
L0=20 factors compared to all other redundant TSV configurations. Other redundancy 
129 
TABLE 7.2: MONTE CARLO YIELD RESULTS FOR REDUNDANT TSV 
CONFIGURATIONS 
TSV Redundancy Configuration 
Defect Octal TSV Octal TSV Quad TSV Dual TSV 
\ 
Rate Complete Plus Complete [35] 
(%) RT=8,TR=8, RT=8,TR=4, RT=4,TR=4, RT=2,TR=2, 
L0=20, L0=10, L0=8, L0=2, OV=1 
OV={3,4,6} OV={3,4} OV~2 
1 100 100 97.7 97 
2 100 96.5 97.1 84 
3 100 84 90 61.1 
4 100 67.4 72.1 36.2 
5 100 37.8 66.3 15.2 
6 100 30.4 45 -------
7 100 7.17 30.9 -------
8 97 6.43 10 -------
9 95.8 2.69 0 -------
10 94.3 2.48 0 -------
TABLE 7.3: MONTE CARLO YIELD RESULTS FOR WIRELESS VIA REDUNDANCY 
Wireless Via Redundancy Configurations 
Defect Octal Octal Wireless Plus Quad Wireless Quad Wireless 
Rate Wireless* [36] Plus 
(%) 
RT=8,TR=2, RT=8,TR=4, RT=4,TR=2, RT=4,TR=2, 
L0=4, OV=2 L0=10,0V= {3,4} L0=2, OV=2 L0;:::4, OV=1 
1 99.5 100 100 100 
2 96 100 99.8 100 
3 96.1 100 99.1 100 
4 67.5 100 97.5 99.9 
5 43.2 100 94.6 99.8 
6 20.7 99 89.6 99.4 
7 6.3 98.2 82.2 98.7 
8 1.1 97.3 . 72.7 97.6 
9 0 97.1 60.3 95.7 
10 0 94.7 46.5 92.6 
configurations in Table 7.2 produce an acceptable yield when the defect rate is less than 
2%. We suspect that beyond the 2% defect rate, the functional probability of lattices 
130 
decrease significantly. 
Table 7.3 shows the Monte-Carlo simulation results for the proposed wireless v1a 
redundancy configurations. It can be observed that Octal Wireless Plus and Quad Wireless Plus 
produce better yield due to higher RT, TR and LO factors compared to the rest of the 
redundant wireless via configurations. Thus it agrees with our theoretical explanation 
(please see Section 7.2) of the effects of these evaluation factors on the functional yield. 
For each configuration (of Tables 7.2 and 7.3), the IR-RT ratio indicates the number of 
redundant vias as a fraction of the total number of primary TSV s. When this ratio increases, 
functional yield increases. If the ratio approaches to 1, it indicates that the number of 
redundant vias is equal to the number of primary TSV s. When the ratio remains the same 
but LO increases, then yield increases because it increases the functional probability of a 
redundancy lattice, i.e. the functional.probability of all the primary TSVs within a lattice is 
increased. Similarly, if the ratio remains the same but TR and RT increase simultaneously 
then yield increases. The TSV overlap factor (OV) is used to distinguish the layout of 
red~ndancy configurations in a device layer. Its effect on the functional probability of a 
lattice is already included in LO. 
Based on the results from Tables 7.2 and 7.3, we will focus our study on the most our 
promising redundancy configurations (Octal TSV complete, Octal Wireless Plus, and Quad 
Wireless Plus) that produce better yield. We study the scalability of these configurations by 
increasing the TSV count (i.e. problem size) in a design. Table 7.4 shows the comparison of 
yield obtained by these three redundancy configurations for TSV counts of 20K, 90K and 
131 
1M. It can be observed that with increasing TSV count, yield decreases. However, this 
decrease is not very steep (exc~pt for Quad Wireless Plus for 1M TSVs). Thus these 
configurations can be used in complex designs with large numbers of !SV s of future 3D 
ICs. Please note that Quad Wireless Plus and Octal Wireless Plus only use 50% redundant 
vias compared to Octal TSV Complete which uses 100°/~ redundant TSV s. Quad Wireless 
Plus has added advantages due to the smaller 4: 1 MUX delay compared to Octal Wireless 
Plus, and Octal TSV complete that use 8:1 MUXes. In the next section we will compute the 
cost of redundancy in terms of area, delay and power tradeoffs. 
132 
.
_
.
.
.
,. 
lN
 
lN
 
T
A
B
L
E
 7
.4
: 
CO
M
PA
RI
SO
N 
OF
 3
-D
 I
C
 Y
IE
LD
 O
BT
AI
NE
D 
BY
 M
ON
TE
-C
AR
LO
 S
IM
UL
AT
IO
N 
W
IT
H 
IN
CR
EA
SI
NG
 V
IA
 
CO
UN
T 
A
N
D
 U
SI
N
G
 T
H
E 
M
OS
T 
PR
OM
IS
IN
G 
VI
A 
RE
DU
ND
AN
CY
 C
ON
FI
GU
RA
TI
ON
S 
O
ct
al
 T
SV
 C
om
pl
et
e 
Qu
ad
 W
ire
le
ss
 P
lu
s 
O
ct
al
 W
ire
le
ss
 P
lu
s 
D
ef
ec
t 
R
T
=8
,T
R
=8
, L
0=
20
,0
V
=
 {3
,4,
6}
 
R
T
=4
 T
R
=2
 L
0=
4 
O
V
=1
 
' 
' 
' 
.
 
R
T
=8
,T
R
=4
,L
0=
10
, O
V=
= 
{3
,4}
 
R
at
e 
T
SV
 c
o
u
n
t 
T
SV
 
T
SV
 
T
SV
 
T
SV
 
T
SV
 
T
SV
 
TS
V
 
T
SV
 
(%
) 
20
K
 
c
o
u
n
t 
c
o
u
n
t 
c
o
u
n
t 
c
o
u
n
t 
c
o
u
n
t 
c
o
u
n
t 
c
o
u
n
t 
c
o
u
n
t 
90
K
 
1M
 
20
K
 
90
K
 
1M
 
20
K
 
90
K
 
1M
 
1 
10
0 
10
0 
10
0 
10
0 
10
0 
10
0 
10
0 
10
0 
10
0 
2 
10
0 
10
0 
10
0 
10
0 
10
0 
99
.8
 
10
0 
10
0 
10
0 
3 
10
0 
10
0 
10
0 
10
0 
99
.8
 
98
.6
 
10
0 
99
.9
 
99
.9
 
4 
10
0 
10
0 
10
0 
99
.9
 
99
.4
 
94
 
99
.9
 
99
.8
 
99
.5
 
5 
10
0 
10
0 
99
.9
 
99
.6
 
99
.1 
82
.3
 
99
.8
 
99
.6
 
98
.9
 
6 
10
0 
99
.9
 
99
.7
 
98
.9
 
95
.4
 
60
.4
 
99
.4
 
99
.1 
97
.5
 
7 
99
.8
 
99
.8
 
99
.4
 
97
.5
 
90
 
32
.4
 
99
.1 
98
.4
 
95
.4
 
8 
99
.7
 
99
.4
 
98
.7
 
95
.3
 
81
 
10
.6
 
98
.4
 
97
.4
 
92
 
9 
99
.5
 
99
.2
 
97
.7
 
91
.5
 
68
.1 
1.
4 
97
.6
 
95
.4
 
87
 
10
 
99
.4
 
99
 
96
.2
 
85
.5
 .
 
50
.8
 
0 
96
.5
 
93
.2
 
80
.6
 
7.5 MODELING AREA, DELAY AND POWER OF REDUNDANT VIAS 
For the via configurations proposed in the earlier section, we estimate the penalties in area, 
delay, and power due to redundancy. We first compute these tradeoffs for each type of 
redundancy lattice. The actual tradeoff can be obtained by simply multiplying cost values 
per lattice by the number of lattices present in a design for a given redundancy 
configuration. 
7.5.1 Area Tradeoff 
We consider 2 input NAND gates (NAND2) as our basic logic elements in our analysis. 
The number of NAND gates defines a metric for estimating the area penalty. Each wireless 
via requires transmitter (Tx) and receiver (Rx) circuitry. We estimate the number of 
NAND gates using the transistor count in transmitter and receiver circuitry which was 
earlier shown in Figure 6.3. If a wireless via is chosen for covering a failed physical via, the 
transistor count in two multiplexers (one for Tx and another for Rx) also needs to be 
determined. Since the Tx and Rx are standard CMOS based circuits, the number of NAND 
gate estimates for them are fairly reasonable. Similarly, MUX circuits based on NAND 
gates are a good estimation. Please refer to Table 7.5 for the number of NAND gates/ area 
needed for each of the configurations. This number determines the area penalty suffered 
during re-routing for a failed physical via. The exact area can be calculated easily if the 
technology node is known. For a redundant TSV we have assumed 5x5 µm2 including the 
area of the contact pad while calculating the area. Please note that there will be two contact 
pads required for one TSV (one pad for each device layer). Furthermore, we add the area of 
the MUX used in a particular redundancy lattice. 
134 
TABLE 7.5: AREA PENALTY OF REDUNDANT LATTICES IN TERMS OF THE 
EQUIVALENT AREA OF A Two-INPUT NAND GATE 
Redundancy lattices 
Quad Octal Octal TSV Dual 
Wireless Wireless* TSV 
#ofNAND2 20 36 28 4 
Area at 180nm (µm2) 200 360 330 90 
TABLE 7.6: DELAY PENALTY AND IT'S SCALING WITH.TECHNOLOGY NODE 
Logic Delay in ps for different technology nodes 
element 180 nm 90nm 65nm 45nrn 32nrn 
NAND2 26 18.4 13 9.2 6.5 
2:1 MUX 123 87.0 61.51 43.51 30.8 
4:1 MUX 214 151.3 107 75.7 53.5 
8:1 MUX 337.5 238.3 168.5 119.2 84.3 
7.5.2 Delay Tradeoff 
We calculate the delay penalty by estimating the delay occurred due to redundant MUXes 
and adding it to the path delay. We use the TSMC 180nm standard cell library [91] for the 
delay estimation. The library provides delay values for each input pin transition. We 
averaged these values for each input pin and report them in Table 7.6. The values are scaled 
for lower technology nodes using the constant field scaling factor 1/-VS [92]. After re-
routing a failed physical via, only one path through these multiplexers would be active 
which accrues against path delay. The delay for an 8: 1 MUX is not reported in the library. 
We calculate it for the octal wireless configuration obtained using two 4:1 MUXes cascaded 
135 
~Dual - Quad -.-octal 
2.2 
N 2 
:::c 
0 1.8 
t:: 
>-> 1.6 
u 
t:: 
Cl) 1.4 ;:; 
O" 
Cl) 1.2 Lt 
1 
0.8 
180nm 90nm 65nm 45nm 32nm 
Figure 7 .6: Performance Reduction due to via re-routing through MUX logic. 
The chip's target frequency is 2.5 GHz (ideal case). 
with a 2:1 MUX. Also, please note that due to the presence of MUX delay there is clock 
stretching which reduces the overall performance. 
We compare the performance of a 3D chip achieved usmg different redundancy 
configurations with respect to a 2.5GHz ideal case target frequenry. From Figure 7.6, it can be 
seen that the performance penalty decreases as we move to lower technology nodes. At 
180nm, there is drastic reduction in performance compared to the target frequency, while 
for current technology nodes such as 45nm and 32nm, the performance reduction can be 
acceptable as tradeoff (i.e. since our approach increases the number of functional chips, it is 
usually better to have slower but functional chips than a large number of failed chips). At 
32nm, there is a 13% reduction for Dual lattice, 17% for Quad lattices (i.e. all the lattices that 
use 4:1 MUX) and 29% for Octal lattices (i.e. all the lattices that use 8:1 MUX). Octal lattices 
suffer the maximum delay penalty due to the presence of 8:1 MUX. Because of the large 
slack, it could be a better choice for slower chips. MUX re-routing involves extra wiring 
136 
which also causes additional delay which can be calculated by the formula ref, where, rand c 
are resistance and capacitance per unit length of a wire, and I is length of the wire. 
TABLE 7.7: POWER PENALTY AND ITS SCALING WITH TECHNOLOGY NODE 
Logic Delay in µ W for different technology nodes 
element 
NAND2 28 14 7 3 2 
2:1 MUX SS 28 14 7 3 
4:1 MUX 110 55. 28 14 7 
8:1 MUX 166 83 41 21 10 
7 .5.3 Power Tradeoff 
The TSMC 180nm standard cell library specifies values for power in µ W /MHz. We assume 
the operating frequency to be 2.5GHz from [82] where data rates up to 5Gbps can be 
obtained. Power data for lower technology nodes is obtained by using constant field scaling 
with a scaling factor of 1/S2 [92] Power values are shown in Table 7.7. It can be seen that 
power dissipation drastically reduces as we move to lower technology nodes. For simplicity, 
we assume that leakage power is negligible. Please note that the power dissipation in a 
wireless via is 14.5 mW at 2.5GHz [82] at the 180nm technology node whereas the largest 
8:1 MUX consumes only 166 µW. Thus the power consumption in MUX is negligible 
compared to a wireless via. Please note that even if leakage power is considered, the total 
power consumption in the MUX will still be negligible compared to a wireless via. 
137 
7.6 EFFECT OF REDUNDANCY ON PARAMETRIC YIELD 
The delay penalty due to additional Tx/ Rx circuits, MUX-logic and signal re-routing may 
degrade the performance of a 3D chip. In this section, we analyze the impact of 
redundancy on critical paths and overall chip performance in a bin of 3D chips. We define 
two types of chips as follows: 
• Fast Chip: A chip without any redundant vias on its critical paths. Here, 3-D chips 
will operate at its designed speed (assuming other variability issues are taken care of) . 
• Slow Chip: A chip with a redundant via on at least one of its critical paths. 
We assume that a critical path which spans across two device layers passes through exactly 
one primary TSV as shown in Figure 7.7. This assumption can be realized using a min-cut 
partitioning based 3-D placement tool [3]. The placement algorithm minimizes the cut-size 
of a net spanning across two device layers during placement which is equivalent to 
5x5µm 2 Contact Pad 
Layer2 
TSV 
A Layer 1 
Figure 7.7: A critical p ath spanning across two device layer using a TSV. We 
assume that a critical path crosses only once through a TSV. 
138 
minimizing the number of TSV s. Our goal is to estimate the number of fast/ slow chips. 
We first use a statistical 3-D wirelength distribution model to estimate the number of global 
wires in a chip that will be described in the next sub-section. 
7.6.1 Estimation of the Total Number of Global Wires 
Let us consider N logic gates which are arranged as a uniform 2-D array inside a 2-D chip. 
It becomes possible to estimate the number of wires for a given length using a wirelength 
distribution model [71] as follows: 
i(f N k ) = ak r[f -2ffef2 + 2Nf]£2P-4 ; 1 ~£~.JN 
. ' ' ,p 2 3 (7.1) 
= ~k r( 2ffe -R)' £2'-4 ; .JN~ R < 2.JN 
where f is the interconnect length in gate-pitch units and rx is a function of average fanout 
if.o.) as shown below: 
f.o. 
a=---
l+ f.o. (7.2) 
and r i.s given by: 
r = 2N(l-NP-1) 
( 
NP l+2p-22p-I 1 2.JN N J 
- p(p-l)(2p-l)(2p-3) 6p + 2p-~ - p-l 
(7.3) 
At p = 0.5, r becomes indetermi1:1ate of the form 0/0 and its value can be determined using 
L'Hopital Rule [71]. 
139 
Now, let us consider that the same 2-D chip is designed as an m-layer 3-D chip. We obtain 
the wirelength distribution of each layer and number of TSV s from [1] as: 
(7.4) 
(7.5) 
#TSV = m x k(l-mp-1)(N lmY. (7.6) 
Similar to [93] we define the lengths of local, semi-global and global wires as follows: 
f local= 1sf<0.7.JN Im (7.7) 
(emi == 0. 7 ~ £ < 1.2 .JN Im (7.8) 
f global = 1.2./N s f < 2.J NI m (7.9) 
Using eqn (7.5) and eqn (7.9), we can estimate the total number of global wires inside a 3-D 
chip by integrating (7.5) over the range off global. Please note that out of the total number 
of global wires, many wires will only span within a particular device layer and others will 
span across two device layers using TSV s. 
7.6.2 Estimation of the Total Number ofF.ast Chip~ 
To find the number of fast chips, we performed Monte Carlo simulation on a set of 1 OOK 
3-D chips.' Each 3-D chip contained 5 million gates in 2 device layers; We used Rent's 
140 
parameters k = 1.4 and p = 0.63 from [90]. In addition, we assumed that the circuit inside 
the chip was built using 2-input NAND gates for the uniform logic gate arrays inside each 
device layer [32]. The average fanout was chosen as 3.0 and the total number of primary 
TSVs was calculated using eqn (7.6). 
The defect rate in TSV s was varied from 1 - 10% and the number of critical paths passing 
through TSVs was varied between 0 - 50% of the total critical paths. Figure 7.8(a) shows 
the number of fast chips at a 4% defect rate for Quad Wireless Plus (QIVP), Octal Wireless 
Plus (OIVP), and Octal TSV Complete (OTC) configurations. It can be observed that the 
number of fast chips obtained by these three configurations is approximately the same. 
This is because the functional yields are approximately the same for these configurations 
for the given problem size. Figure 7.8(b) shows how the number of fast chips varies for 
different defect rates. Please note that the fast chip count drops sharply with increasing 
defect rate and increasing number of critical paths passing through TSV s. Its analytical 
reasoning is given in Chapter 8 (please see section 8.5). 
The area penalty ill terms of number of NAND2 gates is 1.35% for the Quad Wireless, and 
1.2% for the Octal Wireless configuration compared to the total gate count in a chip which 
is negligible. Since Octal TSV complete has 100% TSV redundancy, the area penalty is 
2.24e+06µm2 (including the 5x5µm2 area for a single TSV in one layer) which is equivalent 
to 4.5% of total gate count at the 180nm technology node. Thus area penalty is negligible. 
In the next chapter, we will present analytical models to quickly analyze the yield for a given 
defect rate when yield numbers are needed as design parameters. 
141 
I ---+-- QWP - OWP ---.-- OTC I 
110 -.-~~~~~~~~~~~~~~~---, 
100 
90 
~ 80 
.s:: 70 (.) 
iii 60 
"' LL 50 
-0 40 
~ 30 
20 
10 
0 +-~~~~~~~~.....,-~,....--.-~..---,---1 
r/I 
Q. 
.s:: 
u 
-r/I 
"' LL 
-0 
0 
z 
0 5 10 15 20 25 30 35 40 45 50 55 
% Critical Paths Passing Through TSVs 
(a) 
---+-- 1 % Defect ---- 4 % Defect I 
110 -.---~~~~~~~~~~~~~~~----. 
100 
90 +-'~-----.~~~~~~~~~~~~----t 
80 
70 
60 
50 
40 
30 
20 
10 
0 
0 5 10 15 20 25 30 35 40 45 50 55 
% Critical Paths Passing Through TSVs 
(b) 
Figure 7.8: Comparison between the number of fast chips obtained (a) for 4% 
defect rate using Quad Wireless Plus (QWP), Octal Wireless Plus (OWP), 
and Octal TSV Complete (OTC) configurations (b) for 1% and 4% defect 
rates by Quad Wireless Plus (QWP) configuration. 
142 
CHAPTER 8: REDUNDANT VIA DEPENDENT ANALYTICAL YIEID MODELS 
In Chapter 7 we presented functional yield enhancement methodology based on a via 
redundancy technique and estimated functional and parametric yield for 3-D ICs. Monte 
Carlo simulation is an iterative process which computes yield for discrete input values based 
on the problem size (i.e. TSV count) and defect rate in TSV s. Due to the iterative nature of 
Monte Carlo simulations, it is time consuming for a large problem size. Furthermore the 
yield obtained from an M~ simulation is just one discrete point in the yield solution space. 
In this chapter, we present analytical models for functional and parametric yields that 
eliminate the need for computationally expensive Monte Carlo simulations. We further 
provide an analytical model for the chip revenue. The analytical models quickly analyze the 
yield for a given defect rate when yield numbers are needed as design parameters. These 
yield numbers can be used in yield-aware physical design optimization processes such as 
floorplanning, placement and routing. Based on the Monte Carlo yield results from Chapter 
7, we have chosen to derive analytical models for the three redundancy configurations that 
provide high functional yield for a large number of TSV s and a wide spectrum of defect 
rates in 3D designs. These redundancy models are a) Quad Wireless Plus configuration (RT 
= 4, TR=2, L0=4, OV=1), b) Octal Wireless Plus configuration (RT =8, TR=4, L0=10, 
OV={3,4}), and c) Octal TSV Complete configuration (RT=8, TR=8, L0=20, 
OV={3,4,6}). The functional yield is defined by the number of working chips represented 
as a percentage of the total number of chips in a bin. Thus functional yield = 100 x 
Number of working chips/Total number of chips in a bin. 
143 
8.1 NOMENCLATURE 
We provide a list of variables that will be used for the derivation of analytical models: 
p critical 
n 
ViaCount 
ChipCount 
CrifTSV 
Glob a/Wire 
yQWP 
f 
yOWP 
f 
yore 
f 
Probability of a TSV to be defective ( 0 :S Pd :S 1.0) 
Probabilio/ of a TSV to be functional 
Probability of a TSV to be on a critical path 
Number ofTSVs in a redundancy lattice 
Total number of primary TSV s in a chip 
Total number of 3D chips in a bin. 
Number of global wires passing through TSVs 
Total number of global wires in a chip 
Functional yield of Quad Wireless Plus _configuration in percentage 
Functional yield of Octal Wireless Plus configuration in percentage 
Functional yield of Octal TSV Complete configuration in percentage 
· Price of a fast chip 
144 
Price of a slow chip 
m Number of redundant vias which cover one primary TSV 
Number of possible ways of "r" TSV s being defective from "n" TSV s 
n! 
r!(n-r)! 
8.2 ANALYTICAL MODEL FOR QUAD WIRELESS PLUS (QWP) 
CONFIGURATION 
To obtain the analytical model, we first focus on calculating the probability of a quad lattice 
(shown in Figure 7.1 in Chapter 7) to be functional. Please note 0-at this probability is also 
dependent on how a lattice is interacting with its neighboring lattices (in terms of lattice 
overlaps). This lattice interaction for QWP is shown in Figure 7.2, and redundancy 
evaluation factors are RT = 4, TR=2, L0=4, OV=1. To keep the model simple, we first 
assume that if the TS Vs wi,thin a lattice faz~ then the redundant wl,reless vias in the neighboring lattices 
are available for the repair. 
The probability of a TSV being defective is given by Pd which is dependent on the 3-D 
integration technology. Due to the unavailability of the statistical data from 3-D technology, 
we treat Pd as a variable for 0 < Pd ~ 0.1, i.e. the same range of defects that was used for the 
Monte Carlo simulation in Chapter 7. Thus the probability of a TSV to be functional (i.e: P w) 
equals to 1 - Pd· Since there are four primary TSV s in the redundancy lattice of the QWP 
configuration, n = 4. 
145 
Within a lattice, we first consider all the possible combinations of failed and working TSV s. 
Next we examine whether for each such combination of failed and working TSV s it is 
possible to repair all the failed TSV s by neighboring redundant vias. If a particular 
combination of failed and working vjas within the lattice is repairable, then the joint 
probabilities of these vias are multiplied by the total number of ways that particular 
combination can be obtained within a lattice, and the result is added to the functional 
probability of the lattice. For example, if one out of four primary TSV s within a lattice of the 
QWP configuration has failed, it means that the remaining th~ee TSV s within the lattice are 
working. Thus the joint probability of this combination of failed and working TSV s would be 
(Pa)1(Pw)3• Next the different number of ways for which one out of four primary TSVs in a 
lattice can fail is 4c1 • Thus the total probability of this particular combin,ation of failed and 
working primary TSV s will be 4 c
1 
x (Pa )1 x ( pw )3 • Similarly we consider all the possible 
combinations of failed and working TSV s with which the failed TSV s can be repaired. The 
sum of all these combinations gives the functional probability of a lattice. Please recall from 
Sub-section 7.3.1 that even if all the four primary TSVs of a lattice in QWP fail, it is still 
possible to repair all of them simultaneously. Therefore all combinations of failed and 
working TSV s are repairable, and they should be added in calculating the functional 
probability of the lattice. Thus _this probability is calculated by ! ncr (Pl(PJ1--r. 
Next we find the probability of all the TSys in a chip to be w~rking. Since the functional 
probability of a lattice covers the functional probability of n primary TSV s within the lattice, 
the functional probability of all the TSVs is calculated by the joint probability of 
ViaCount/ n lattices. Thus the expression for the functional yield of QWP (represented as a 
146 
percentage of the total number of chips in a bin) is given by eqn (8.1) in which the summation of a 
series term calculates the functional probability of a lattice, and the power term calculates 
the functional probability of all the TSV s in a 3-D chip. 
yQWP 
f 
where n = 4 for QTW, and 
ViaCount 
lOOx(~Pro."c, J-n-. (8.1) 
(8.2) 
Please note that in eqn. (8.2), the 2Pw- (P/ term has been introduced to minimize the error 
because a) a primary TSV can be part of two lattices and it can be adjusted by the logical OR 
probability that was introduced in eqn (8.2), and b) Our primary assumption was that if the 
TSVs within a lattice fai~ then the redundant wireless vz"as in the neighboring lattices are available for the 
repair. However, it might be possible that a redundant wireless via might have been used to 
repair TSV s of another redundancy lattice in a device layer. 
A comparison of analytical and Monte Carlo results for functional yield with 20K, 90K and 
1M TSVs in a chip is shown in Figure 8.1 (a). Please observe that the analytical model 
matches quite closely with the Monte-Carlo simulation results. We have used ten discrete 
defect rate points in Figure 8.1 (a) to show the yield trend for each of the problem sizes (i.c:. 
TSV counts). However, in real life the problem size as well as the defect rate may not be a 
147 
---.-- MonteCarlo (20K TSVs) ---fl--- Analytical (20K TSVs) 
---- MonteCarlo (90K TSVs) ---er-- Analytical (90K TSVs) 
-+- MonteCarlo (1 M TSVs) ----0--- Analytical (1 M TSVs) 
110 
100 
"'O 90 Qi 80 ~ 
Q) 70 
Cl 60 co 
-
50 c: 
Q) 40 () 
..... 30 Q) 
c. 20 
10 
0 
0 
----~,\ 
'o\\ 
'o,, 
', 
2 3 4 5 6 7 8 9 10 
Percentage Defect 
(a) 
_____._.__ MonteCarlo (20K TSVs) ------A-- Analytical (20K TSVs) 
--- MonteCarlo (90K TSVs) --a-- Analytical (90K TSVs) 
-----+- MonteCarlo (1M TSVs) --¢- Analytical (1M TSVs) 
105 
100 
"C Qi 95 
>= 90 
<>\ Cl> Cl 85 C1l 
- 80 c: ~\ Cl> 0 75 ... Cl> 
a.. 70 \ 
65 \ 
60 
0 2 3 4 5 6 7 8 9 10 
Percentage Defect 
(b) 
Figure 8.1: Comparison of Analytical model results with Monte Carlo 
simulation results for yield obtained using (a) Quad Wireless Plus 
configuration, and (b) Octal Wireless Plus Configuration for different 
numbers of TSV s in 3-D I Cs. 
discrete point. Thus the trend may shift depending on the problem ~ize and the defect rate. 
For each of the ten discrete points of defect rate, Monte Carlo simulations were performed 
independently (i.e. MC simulations were run ten times) which was very time consuming. In 
148 
contrast, for the analytical yield results, the analytical model of eqn (8.1) was used to obtain 
the yield for ten discrete defect rates which was a simple mathematical computation. Thus 
the analytical model provides fast yield estimation. 
8.3 ANALYTICAL MODEL FOR OCTAL WIRELESS PLUS (OWP) 
CONFIGURATION 
In this configuration, n = 8. The method of deriving the analytical model for OWP (RT =8, 
TR=4, L0=10, OV={3,4}) as shown in Figure 7.4 is similar to the derivation method for 
the QWP configuration described in Section 8.2. Similar to QWP, all the combinations of 
failed and working TSV s are repairable in OWP configuration and therefore they should be 
added in calculating the functional probability of the lattice. Due to this similarity, the rest 
of the expression for YJWP will be the same as the expression in the analytical model for 
the Quad Wireless Plus configuration (i.e. Y1QWP) given by eqn (8.1) and eqn (8.2) in section 8.2. 
Please note that this similarity might not be present in all other redundancy configurations. 
Thus any combination. of failed and working TSV s that cannot be repaired by redundant vias 
should not be included while calculating the functional probability of a redundancy lattice. 
A compa~son of analytical and Monte Carlo simulation results for the fun~tional yield with 
20K, 90K and 1M TSV s in 3D chips is shown in Figure 8.1 (b ). It can be observed that the 
analytical model matches quite closely with the Monte-Carlo results. However, for 1M 
TSV s, the error grows larger beyond an 8% defect rate. We· suspect that this is due to 
exponential growth of the truncation error. 
149 
8.4 ANALYTICAL MODEL FOR OCTAL TSV COMPLETE (OTC) 
CONFIGURATION 
As described in Sub-section 7.3.2, the primary and redundant TSVs are placed uniformly in 
the OTC configuration, which has evaluation factors RT=8, TR=8, L0=20, OV={3,4,6}. 
For the derivation of the analytical model for OTC, we first calculate the probability of a 
primary TSV to be functional. For simplicity, we assume that if the redundant TSV s for a 
particular primary TSV are working, then they are available for repairing that particular 
primary TSV. 
Xn OTC, a primary TSV is covered by eight redundant TSVs. Let us call this, "m" and 
therefore m = 8. For a primary TSV, its probability of working is P w The probability of its . 
repair (in case it is failing) is calculated by the join probability of the primary TSV to be 
failing (P) and the probability of at least one out of m redundant TSV s to be working. 
Please note that the term containing the summati.on of a series in eqn. (8.3) calculates the joint 
probability. Finally we calculate the probability of all primary TSVs to be working by 
incorporating the power term in eqn. (8.3). The final equation for calculating the functional 
yield of the Octal TSV Complete configuration is: 
yore 
f (8.3) 
We compared the analytical model's results with the Monte Carlo simulation results for 1 -
10% defect rates. The analytical results are within 1 - 3% error with Monte Carlo results for 
150 
20K to 1M TSV s. Thus the analytical model's results match closely with the Monte Carlo 
simulation results. 
8.5 ANALYTICAL MODEL FOR FAST /SLOW CHIPS 
In this section, we derive the analytical models for estimating the number of fast/ slow 
chips in a bin of 3-D ICs. The assumptions for defining the fast and slow chips are given in 
Section 7.6 (in Chapter 7). 
We first find the probability of a TSV to be on a critical path (i.e. PcriticaU which is given by 
. the ratio of the total number of TSVs that can be on critical paths (CrifTST/) to the total 
number of TSV s in a chip." Here, CrifTSV is equal to the percentage of the total number of 
global wires that span across different device layers (i.e. using TSVs), and CrifTSV is 
calculated using the statistical 3-D wirelength distribution method explained in Sub-section 
7.6.1. Thus the expression of Pcritical is given by eqn (8.4). 
p _ CritTSV I 
critical - /ViaCount (8.4) 
The probability of a TSV to be Functional and on Critical Path= 1 - (the probability of a 
TSV to be defective and on a Critical Path) = 1 - ( Pd P critical'). Please note that in this 
expression, "1" is the cumulatiye probability of a) the TSV to be functional and on a critical 
path, and b) the TSV to be defective and on a critical path. 
Therefore, probability of all TSV s to be Functional and on Critical Path = (1 - Pd PcriticaJ ViaCormt. 
151 
Fast Chips (FC) as a percentage of the total number of chips for a given redundancy 
configuration RC is calculated by eqn (8.5): 
( 
RC ) ( ) ViaCount FC RC = Yf 1-~ X ~ritical (8.5) 
h Y RC E {YQWP yOWP yOTC} w ere f f ' f ' f 
Due to ViaCount as the power term in eqn. (8.5), the number of fast chips may drop 
exponentially with increasing Pd and/ or PcriticaI· Furthermore, any increase in Pd will also 
decrease the functional yield Yfc which is also a factor in eqn (8.5). Thus Pd has dual impact 
on the total number of fast chips. 
Similarly, Slow Chips (SC) as a percentage of the total number of chips for a given 
redundancy configuration RC would be as shown in eqn (8.6). 
SC = (Y RC ).(l -(l - p X p .. )ViaCount) 
RC f d cntrcal "(8.6) 
Using our analytical model, we calculated the number of fast/ slow chips for the input setup 
presented in Sub-section 7.5.2 in chapter 7 (i.e. SM gate design). A comparison of Monte-
Carlo and our analytical models' results for fast/slow chips (for 1 % and 4% defect rates) 
obtained using the Quad Wireless Plus configuration is shown in Figure 8.2. It demonstrates 
that the analytical model matches closely with the Monte-Carlo results. We have observed 
similar comparisons for Octal Wireless Plus and Octal TSV complete configurations as well. 
Please notice that the FC and SC curves cross each other at a certain point in Figure 8.2. 
152 
1 00~:----.-----::!:====:=:!::====::::!::=====l 
'0, 0 FastCh ip (Analytical) 
en 
0.. 
£ 
u 
ID 
ra 
LL 
3 
0 
CJ) 
80 
60 
g, 40 
ro 
c 
·<ll.) 
~ 20 
<ll.) 
Q._ 
Q o D Slow Chip (Analytical) 
"Q, - - - - - - - FastChip (r,,onteCarlo) 
0.. a -- Slow Chip (MonteCarlo) 
u ,& 
Oo 
1 Pd = 1% '0, 0 
Sweet Spot 
Percentage critical paths passing through TSVs 
(a) 
1 OO<f,---.------.---..----=~ts:F~ERD 
en 
0.. 80 £ 
0 
(ii 
ro 
LL 60 3 
0 
CJ) 
<ll.) 40 OI 
ro 
"E 
© 
~ 20 
<ll.) 
Q._ 
' 
b Sweet 
\ Spot 
'?, 4' 
Q 
' 
20 
0 Fa'StCh ip (Analytical) 
D Slow Chip (A nalyt ical} 
- - - - - - - FastChip (l,,onteCarlo} 
-- Slow Chip (MonteCarlo } 
I Pd = % 
40 60 80 100 
Percentage crit ical paths passing thro ugh TSVs 
(b) 
Figure 8.2: Analytical and Monte Carlo simulation results for Fast/Slow chips 
for Quad Wireless Plus configuration for (a) 1% defect rate (b) 4% defect rate. 
This intersection in Figure 8.2 is a sweet spot below which the number of fast chips 
produced will be larger than the number of slow chips for a given redundancy 
configuration. We define a variable Pn"'' as shown in eqn (8.7). 
153 
p = _!_(l -(O.S)I/ViaCount) 
sweet p 
d 
(8.7) 
Next, we define a variable "G" which denotes the highest percentage of global wires that 
can pass through TSVs·and still get a larger number of fast chips. "G" is given by eqn (8.8): 
0 s G s ( p sweet x Via Count J x 100 
Global Wire 
(8.8) 
This model will allow a designer to quickly estimate the maximum number of global wires 
that can pass through TSV s in order to obtain a higher number of fast chips. It can also be 
incorporated in physical design tools such as floorplanning for 3-D ICs. The detailed 
description of the analytical models' use is given in Section 8.8. Please note that the sweet 
spot in Figure 8.2 shifts leftwards with increasing defect rate that will in turn produce fewer 
number of fast chips. 
8.6 ANALYTICAL MODEL FOR CHIP REVENUE 
In this sub-section, we present chip revenue estimations obtained from a bin of chips. We 
assume that the prices of fast and slow chips include the packaging, assembly and bonding 
costs in addition to design and fabrication costs of 3-D chips in our revenue model. The 
revenue model is a function of ChipCount, ViaCount, defect rate, CritTS~ and redundanfY 
configuration and it can be obtained by eqn (8.9). 
_ ( · ) ChipCount Revenue - PF.FCRc + Ps.SC Re x----
100 
154 
(8.9) 
where PF is the price of a fast chip and P5 is the price of a slow chip; PF> P5• The variables 
FCRc and SCRc are obtained from section 8.5 and they depend on the redundancy 
configuration. The total chip revenue model provides a tool for quick estimation of chip 
profitability. 
8.7 EXlENSION OF ANALYTICAL YIELD MODELS OF TWO-LAYER 3-D 
CHIPS TO MULTI-LAYER 3-D CHIPS 
The analytical yield models previously described iri Section 8.2 to Section 8.6 were derived 
for 2 layer 3-D ICs only. In this section, we extend our previous analytical models of 2-layer 
3-D ICs to analytical models of multi-layer 3-D ICs using the 3-D wirelength distribution 
model presented by Zhang et al [72]. 
We assume that TSV s are uniformly distributed in each device layer, the same as assumed 
in the two-layer 3-D chip's analytical model. Please note that in case of multi-layer 3-D ICs, 
the heights of TSVs will differ [72] depending upon how far apart certain devices/circuits 
that require vertical interconnections have been placed in the 3-D stack. For example, in the 
case of 4-layer 3D ICs, vias' height could be h, 2h,' and 3h, where h is the vertical distance 
between two adjacent device layers. Let x1 be the number of TSV s of height 1 h, x2 be the 
number of TSV s of height 2h and x3 is TSV count of height 3h. Here, we decompose the 
vias of different heights into multiple vias of 1h heights. For example, if a via's height is 3h, 
then we count it as three 1hvias. Thus the ViaCountis calculated as eqn. (8.10). 
ViaCount = x 1• 1 + x2 • 2 + x3• 3 (8.10) 
155 
After the conversion of via count using eqn. (8.10), we have found that analytical models 
for multi-layer 3-D ICs converge to our models that were derived for 2-layer 3-D ICs in 
this Chapter. The convergence happens due to the principle of superposition in which a 
multi-layer 3-D IC is decomposed into an set of identical two-layer 3-D I Cs composed of 
consecutive device layers of the original chip as shown in Figure 8.3. The number of global 
wires can be obtained using eqn.(7.5) and eqn. (7.9) for an m-layer 3-D IC containing N 
logic gates as explained in sub-section 7.6.1 (in Chapter 7). Furthermore, the number of 
TSVs of different heights x1, x2, x3, .... ,x11,_ 1 (i.e. TSV height distribution) can be obtained 
from Zhang's 3-D wirelength distribution model [72] as given by eqn. (8.11). 
(2 2 )
akN(l-Np-i_m p-2 +m-1N p-i) 
x= m-z ------------
z m(m-l) (8.11) 
where z = 1, 2, 3, .. . ,m-1; N is the total number of gates in the 3-D chip, k and pare Rent's 
parameters, and r:x is the same as defined in eqn. (7.2) in chapter 7. 
NewViaConnt 
Figure 8.3: Decomposition of a 3-layer chip into a pair of identical two-layer 
3-D I Cs for yield calculation using superposition. 
156 
We assume that critical path TSVs also have a height distribution that is proportional to the 
TSV distribution given by eqn (8.11). This assumption is fairly accurate because these 
critical path TSVs are a small subset of all TSVs in a 3-D chip. Similar to eqn (8.10), we 
decompose the critical path TSV s of different heights into critical path TSV s of 1 h height. 
Thus the original CntTSVis modified as the new CntTSVby using eqn (8.12). 
N ewCritTSV = CritTSV . x L z x2 ( 1 m~ J VzaCount z=I (8.12) 
where ViaCountis calculated by eqn (8.10), and xz is computed using eqn (8.11). Please note 
that for a 2-layer 3D chip (i.e. m = 2), eqn (8.12) converges to CritTSV, i.e. the original 
number of critical path TSV s. 
For a multi-layer 3-D IC, eqns. (8.1) to (8.9) are modified by computing ViaCount using 
eqn. (8.10), and replacing CntTSV by NewCntTSV that is calculated by eqn (8.12). Apart 
from these two changes, the rest of the expressions remain the same. 
8.8 APPLICATION OF YIELD IMPROVEMENT STRATEGIES DURING 
FLOORPLANNING 
As it was discussed in chapter 2, the floorplanning stage guides the placement stage during 
the physical design of VLSI chips. The proposed yield improvement techniques can be 
applied during floorplanning to estimate functional yield and chip revenue (Please see the 
flow chart in Figure 8.3). The analytical yield models presented in Sections 8.3 t~ 8. 7 show 
that the functional yield depends on TSV defect rate and ViaCount. It is also important to 
note that ViaCount is a power term in the functional yield expression (given by eqn (8.1) to 
157 
eqn (8.3)) and therefore functional yield may change rapidly with change in ViaCount. Thus 
the floorplanning algorithm can optimize ViaCount such that an acceptable range of 
functional yield -can be achieved for a given TSV defect rate by incorporating functional 
yield in the cost function of the floorplanner. Furthermore, the chip revenue model of eqn 
(8.9) can be incorporated in the cost function to maximize profitability. For example, let us 
say that a typical floorplanner optimizes area, inter-module wirelength, and ViaCount using 
the cost function given by eqn (8.13). 
Fitness = a DS + /3 WL + y1 VCinter (8.13) 
where DS is the dead space, WL is inter-module wirelength and VCinter is inter-module via 
count. The constants a,/J, and y1 are real valued tuning parameters. For a yield-aware 3-D 
floorplanning, the fitness function can be designed to incorporate functional yield and chip 
revenue as shown in eqn (8.14). 
Fitness= a DS + /3 WL + y1 VCinter + y2Yfc + y3 Revenue(RC) (8.14) 
where Yf c is the functional yield, and Revenue(Rq is the chip revenue for a redundancy 
configuration RC that was presented in Section 8.6. In addition, y1 and y2 are additional 
tuning parameters. The area penalty due to insertion of via redundancy can be added by 
increasing the sizes (width and height) of modules based on the area required by redundant 
vias and MUXes used for via rerouting. Since the floorpianning algorithms based on 
stochastic search methods such as ~imulated annealing (SA) and evolutionary algorithms 
(EA) are iterative procedures, the analytical yield models can be very useful in the fast 
158 
Rent's parameter, technology 
parameter, Defect rate, Chip Pricing 
New move to a) select a 
Redundancy Configuration from 
library b) Swap redundancy 
configuration from library 
Dynamic probability of moves 
wirelength, via count, dead space, 
Functional Yield, and Revenue 
Multi-stage termination criteria 
Input 
Populate initial set of 
Floorplans 
Perturb 
Pack 
modules 
Compute 
Fitness 
Best fit set of 
Floorplans 
Yes 
Optimized Floorplan 
Figure 8.4: The flow chart of a Yield-Aware 3-D Floorplanning. 
estimation of these yield metrics during runtime of the floorplanner. Furthermore, a library 
of these. different redundancy configurations can be prepared, and the floorplanning 
algorithm can be randomly allowed to select a redundancy configuration from the library 
159 
using a set of new moves (see Figure 8.4) that can either randomly select a redundancy 
configuration for a floorplan solution or can swap the present redundancy configuration 
with another redundancy configuration from the library. The methodology to derive 
analytical models for other redundancy configurations is similar to the method presented in 
Section 8.1. However, the mathematical term introduced to minimize the error in eqn. (8.2) 
might be different, or may be required to introduce different values of r in the same 
equation. It might be possible to derive a unified analytical model for the three redundancy 
configurations (QWP, OWP, and OTC) if a generic method to handle the error for all of 
them can be established. Please note that there is an asymmetry in the analytical models 
because wireless vias are considered defect free whereas all TSV s (primary and redundant) 
have a non-zero probability of being qefective. 
160 
CHAPTER 9: CONCLUSIONS AND FUTURE WORK 
The following conclusions have been derived from the research work presented in this 
dissertation: 
• Placement-aware 3-D floorplanning provides an opportunity for system level total 
wirelength reduction which can in turn reduce power consumed by interconnects. 
• Placement-aware module splitting enables 3-D placement of logic gates which has 
the potential to reduce chip temperature and improve performance. 
• The placement-aware 3-D floorplanning tool bridges the existing gap between 3-D 
floorplanning and 3-D placement. 
• Feasibility conditions derived on a sequence pair representation help in eliminating 
the infeasible floorplan solutions ·that cannot satisfy vertical sub-module alignment. 
• LCSLS provides a fast 3-D packing. algorithm satisfying a set of vertical constraints. 
It eliminates the need for the creation of the time consuming 3-D constraint graphs 
used in the 3DCG algorithm. 
• Statistical 2-D/3-D wirelength distribution models with analytical solutions help in 
the quick prediction of wirelength reduction due to placement-aware module 
splitting. 
161 
• The EA based stochastic method that uses a parallel search for an optimal solution 
has been shown to be faster and better than a SA based optimization for 3-D 
tloorplanning problems. 
• Vertical constraints on sequence pairs coupled with the LCSLS packing algorithm 
can be used for a fast 3-D floorplan with vertical module alignment for bus-driven 3-
D design. 
• Through Silicon Vias (TSV s) that connect circuits across different device layers suffer 
from thermo-mechanical stress and pose a serious yield loss problem. Wireless via 
redundancy along with TSV s introduced in t~e original design shows promising 
results for the yield enhancement of 3-D ICs. 
• The proposed Quad Wireless Plus redundancy configuration provides a good balance 
between high yield and low redund_ancy overhead in terms of area, and delay. 
• The Octal TSV Complete configuration provides high yield but it takes 100% 
redundant vias and uses 8:1 MUXes which has larger delay penalty than Quad 
Wireless Plus configuration (which uses 4:1 MUXes). 
162 
Research work is endless and there are always chances for improvement and the 
advancement of existing knowledge. Consequently, we propose the following future work: 
• Improvement in the cost function of 3-D Floorplanning: The cost function is 
the single most important factor that significantly determines . the quality and 
convergence of the search process in a combinatorial optimization. At present the 3-
D PVC and 3-D FMA algori~hms require different tuning parameters for different 
benchmarks. A range for each of the tuning parameters has been identified using the 
sensitivity analysis ?f the cost function. However, it would be worth investing in an 
improved cost function that does not require frequent changes in the tuning 
parameters. A meta-GA base~ approach may be useful in exploring improvement of 
the cost function. 
• Development of wirelength models with non-identical sub-modules: At 
present, 3-D PVC splits modules into identical sub-modules only. However, 
depending on design requirements, it might be desirable to split modules into non-
identical sub-modules. To estimate the wiring cost for such partitions, mathematical 
models that can handle non-identical rectangular sub-modules need to be developed. 
This type of design requirement can occur in a fixed outline 3-D floorplan design in 
which a module might be split into non-identical _blocks if non-identical spaces are 
available within fixed dies of two or more device layers. 
• Investigation of novel redundancy configurations: This dissertation presents a 
set of redundancy configurations for yield improvement. However, it would be worth 
163 
searching for any other efficient redundancy configurations that can improve yield 
while keeping the redundancy cost minimum (in terms of the number of redundant 
vias, delay, power, etc.). 
• Yield-aware 3-D floorplanning: A tentative flow chart of a yield-aware 3-D 
floorplanning algorithm was presented in Chapter 8 which can be designed to quickly 
evaluate yield and revenue metrics during floorplan optimization. Since floorplanning 
is an iterative process, the number of TSV s may change during several iterations by 
an inter-layer perturbation that moves modules from one device layer to another. 
Thus the yield metrics will also change even for the same design at a fixed defect rate. 
Therefore analytical yield models will be very helpful in fast estimation of yield 
metrics during floorplanning. 
• Thermally Driven Placement-Aware 3-D Floorplanning: Heat extraction and 
thermal management are among the major challenges faced in the design of 3-D ICs. 
Previous works on thermally driven 3-D floorplanning consider heat extraction using 
a heat sink only. However, recent advancement in heat extraction using the 
micro channel liquid cooling technique has been proposed in [15], [16], [1 7], [18], [19]. It 
would be worth designing a new thermally driven 3-D floorplanning that considers 
microchannel liquid cooling and optimizes the chip temperature along with area, 
wirelength, via count, etc., and comparing its effectiveness with the previous 3-D 
floorplanners. 
164 
• CNT-based spiral inductor design for wireless vias: The wireless vias in 3-D ICs 
use spiral inductors as transformers for interfacing between two device layers. As 
discussed in Chapter 7, the power consumption in wireless vias due to copper based 
inductors is higher compared to MUXes and TSV s used for redundancy. The 
inductor design should optimize the power dissipation and footprint area of the 
spiral inductor while considering the amount of inductance required and the 
tradeoffs in terms of parasitic resistance. Furthermore, the design should also 
consider imperfections imposed by CNT fabrication technology. For example, the 
CNTs needed for inductor design should be metallic. However, due to imperfections 
in the manufacturing process, semiconducting and metallic CNTs are usually mixed 
in a bundle. Therefore designers must also consider these fabrication imperfections 
during inductor design·. This problem is a good MS thesis topic with strong chances 
of getting research publications. 
• Noise analysis for wireless vias within 3-D ICs: The insertion of wireless vias 
will create additional electromagnetic field inside 3-D I Cs [82]. The spiral inductors 
that are used as an interface between two device layers are generally created in the 
upper metal lay~r and they may interact with neighboring wires and TSV s causing 
cross-talk noise. Recent studies on TSV-to-TSV coupling have bee~ presented in 
[110],[111 ],[112],[113]. Similarly, electromagnetic studies on wireless vias and 
misalignment tolerance between inductors have been presented in [95]. However, 
there are no studies on the interaction of wireless vias with TSV s or interconnects. 
Thus further study is required to determine how these inductors will interact with the 
165 
surrounding interconnects, TSVs and CMOS devices. One important thing to note is 
that even if there may be several redundant wireless vias inside 3-D chips, all of them 
.. need not be active at the same time. For example, let us assume that there are 1000 
redundant wireless vias in a 3-p IC but there are only 100 failed TSVs. In this case, 
only 100 wireless vias will be activated using via re-routing. Furthermore, depending 
on the circuit functionality, all of them might not be switching simultaneously. Thus 
a careful noise analysis will be required. 
166 
CHAPTER 10: SUMMARY OF MAJOR CONTRIBUTIONS 
The major contributions of this dissertation can be briefly summarized as follows: 
• Extended a 3-D floorplanning tool inherited at the beginning of the research (which 
optimized area and inter-module wirelength, included module splitting to allow 3-D 
placement within split modules, and was satisfying vertical constraints). The new 
extended tool is called a placement-aware 3-D floorplan with vertical constraints 
algorithm (3-D FVC). The extension was accomplished by a) extending existing 
stochastic wirelength distribution models from square to rectangular 2-D modules 
and 2-D modules that are divided into two parts and placed in two consecutive 
device layers in 3-D ICs, b) designing a fast packing algorithm, c) deriving a set of 
vertical constraints on a sequence pair representation for vertical alignment of sub-
modules, and d) making changes to many steps of the basic EA based floorplanning 
algorithm such as adding Rent's parameters and the technology node in the input, 
dynamic probabilities of moves, introducing additional cost components to the cost 
function, introducing two new moves (submodule-merge and change feasibility con.figuration) 
and multi-stage termination criteria. The extended algorithm (3-D FVC) bridges the 
pre-existing gap between 3-D floorplanning and 3-D placement tools. The vertical 
constraints derived on the sequence pair allow us to identify feasible solutions on the 
topological represe0:tation th~t can satisfy vertical constraints. 3-D FVC statistically 
captures the wirelength reduction due to 3-D placement inside modules and 
identifies certain sets of modules that can benefit from 3-D placement. As a result, it 
167 . 
reduces system level total wirelength by ,..__,9.8% compared to existing state-of-the-art 
3-D floorplanners that do not include placement-aware module splitting. This work 
appears in [33] and [34]. 
• Based on models available in the literature for square 2-D module wirelength 
distributions, a set of rectangular 2-D and 3-D wirelength distribution models (one 
rectangular 2-D model and three rectangular 3-D models) have been .derived. These 
models allow us to estimate the wiring reduction within the modules due to 3-D 
placement of logic gates inside 3-D modules. This work has been published in [32]. 
• A fast module packing algorithm (LCSLS) has been designed which quickly translates 
the topological floorplan representation to geometrical floorplan while satisfying the 
vertical constraints imposed on modules/sub-modules. This work appears in [34]. 
• Modified our 3-D FVC algorithm to include vertical module alignment. The 
modified tool is called 3-D floorplan with vertical module alignment (3-D FMA). It 
satisfies designer-specified sets of constraints for bus-driven 3-D design and 
heterogeneous 3-D integration. It optimizes footprint area, inter-module wirelength 
and via count while satisfying the given set of constraints. An approximate runtime 
comparison with LTCG (another 3-D floorplanner that also performs vertical 
module alignment) shows that 3-D FMA is faster than LTCG. 
• The 3-D IC yield problem due to failure of TSVs caused by thermo-mechanical 
stress has been formulated, and a set of redundant physical via/wireless via 
168 
redundancy configurations has been proposed. Based on the via redundancy 
technique, various redundancy configurations have been· presented. Monte-Carlo 
simulation results show that the functional yield of 3-D ICs can be enhanced using 
our proposed via redundancy solutions. The initial version of this work has been 
published in [35] and [36]. A matured version of this work has been submitted for 
publication in a journal and is currently under review [37]. 
• The cost of redundancy overhead in terms of area, delay and power has been 
presented. Furthermore, a stochastic method for parametric yield estimation has been 
presented. This work appears in [36] and [37]. 
• A set of analytical models have been presented to quickly estimate functional yield, 
parametric yield, and chip revenue. The comparison of the analytical models' results 
with Monte-Carlo simulation results shows close agreement between them. This 
work appears in [36] and [3 7]. 
169 
REFERENCES 
[1] K. Banetjee, S. Souri, P. Kapur, and K.C. Saraswat, "3-D ICS: A Novel Chip 
Design for Improving Deep-Submicrometer Interconnect Performance and 
System-on-Chip Integration," Proceedings of the IEEE, Vol. 89, No. 5, pp. 602-633, 
May 2001. 
[2] . Y. Deng, and W. P. Maly, "Interconnect Characteristics of 2.5-D System 
Integration Scheme", Proc. Int'/ Symp. Pf?ysical Design, pp. 171 -175, Apr 2001. 
[3] S. Das, A. Chandrakasan, and R. Reif, "Design Tools for 3-D Integrated Circuits," 
Proc. ASP-DAC, pp.53-56,Jan. 2003. 
[4] S. Das, A. Chandrakasan, and R. Reif. "Three-Dimensional Integrati?n: 
Performance, Design Methodology, and CAD Tools," Proc. ISVLSI, pp. 13-18, 
Feb. 2003. 
[5] A. Fan, A. Rahman, and R. Reif, "Copper wafer bonding," Electrochem. Solid State 
Lett.1 vol. 2, pp. 534-536, 1999. 
[6] A. Fan, R. Reif, K. Chen, and S. Das, "Fabrication Technologies for Three-
Dimensional Integrated Circuits," Proc. Int'/ Symp. Quality Electronic Design1 pp. 33-37, 
2002. 
[7] S. Reda, G. Smith, and L. Smith, "Maximizing the Functional Yield of Wafer-to-
Wafer 3-D Integration," IEEE Trans. VLSI, vol. 17(9), pp. 1357 - 1362, 2009. 
[8] L. Smith, G. Smith, S. Hosail, and S. Arkalgud, "3-D: It All Comes Down to Cost," 
3-D Architectures for Semiconductor Integration and Packaging, 2007. 
[9] S. Llm, "TSV-Aware 3-D Physical Design Tool Needs for Faster Mainstream 
Acceptance of 3-D ICs," DAC 2010 knowledge center article, pp. 1-11, 2010. 
170 
[10] K. Bernstein, P. Andry, ]. Cann, P. Emma, D. Greenberg, W. Haensch, M. 
Ignatowski, S. Koester, J. Magerlein, R. Puri, and Albert Young, "Interconnects in 
the Third Dimension: Design Challenges for 3-D ICs," Proc. Design Automation 
Conference, pp. 562-567, 2007. 
[11] International Technology Roadmap for Semiconductors (ITRS), www.itrs.net 
[12] B. Goplen, and S. Sapatnekar, "Thermal Via Placement in 3-D ICs," Proc. Intl Symp. 
on Pl!Jsical Design, pp. 167-174, 2005. 
[13] ]. Cong,]. wei; Y. Zhang, "A Thermal-Driven Floorplanning Algorithm for 3-D 
ICs," Proc. Int. Conj on Computer Aided Design) pp. 306-313, Nov 2004. 
[14] B. Gopfon, and S. Sapatnekar, "Efficient Thermal Placement of Standard Cells in 3-
D ICs using a Force Directed Approach," Proc. Int. Conj on Computer Aided Design) 
pp. 86-89, 2002. 
[15] D. Tuckerman and R. W. Pease, "High-Performance _Heat Sinking for VLSI," 
IEEE Electron Device Letters, vol. 2(5), pp. 126-129, 1981 
[16] H. Mizunuma, M. Behnia, and W. Nakayama, "Forced Convective Boiling of a 
Fluorocarbon Liquid in Reduced Size Channels - An Experimental Study," Journal of 
Enhanced Heat Transfer, vol. 9(2), pp. 69-76, 2003. 
[1 7] W. Qu and I. Mudawar, "Thermal Design Methodology for High-Heat-Flux Single-
Phase and Two-Phase Micro-Channel Heat Sinks," IEEETransactions on Components 
and Packaging Technologies) vol. 26(3), pp.598-609, 2003. 
[18] X. Wei and Y. Joshi, "Optimization Study of Stacked Micro-Channel Heat Sinks 
for Micro-Electronics Cooling," IEEE Transactions on Components and Packaging 
Technologies) vol.26(1), pp. 55-61, 2003. 
171 
[19] N. Lei, A. Ortega, and R. V aidyanathan, "Modeling and Optimization of Multilayer 
Minichannel Heat Sinks in Single-Phase Flow," In IEEE InterPACK Conference, 
InterPACK2007-33329, pp. 29-43, 2007. 
[20] T. Brunschwiler, B. Michel, H. Rothuizen, U. I<Joter, B. Wunderle, H.Oppermann, 
and H. Reich!, ''Forced Convective Interlayer Cooling In Vertically Integrated 
Packages," Intersociery Conference on Thermal and Thermomechanical Phenomena in Electronic 
Systems, pp. 1114-1125, 2008. 
[21] D. Selar, C. King, B. Dang, T. Spencer, H. Hacker, P. Joseph, M.Bakir and ]. 
Meindl, "A 3-D-IC Technology with Integrated Microchannel Cooling," 
International Interconnect Technology Conference, vol.1 ( 4), pp.13-15, 2008. 
[22] H. Mizunuma, C Yang, and Y. Lu, "Thermal Modeling for 3-D-ICs with Integrated 
Microchannel Cooling," Proc . . Int. Conj on Computer Aided Design, pp. 256-263, 2009. 
[23] R. Filippi, ]. McGrath, T. Shaw, C. Murray, H. Rathore, P. McLaughlin, V. 
McGahay, L. Nicholson, P. Wang,]. Lloyd, M. Lane, R. Rosenberg, X. Liu, Y. 
Wang, W. Landers, T. Spooner, J. Demarest, B. Engel, J. Gill, G. Goth, E. Barth, 
G. Biery, C. Davis, R. Wachnik, R. Goldblan, T. Ivers, A. Swinton, C. Barile, ~d]. 
Aitken, "Thermal Cycle Reliability of Stacked Via Structures with Copper 
Metallization and an Organic Low-k Dielectric," Int'/ Reliabiliry Physics Symposium, pp. 
61-67, 2004. 
[24] R. S. Patti, "Three-Dimensional Integrated Circuits and the Future of System-on-
Chip Designs," Proceedings of the IEEE, vol. 94(6), pp. 1214-1224, 2006. 
[25] C. Ferri, S. Reda,_ and I. Bahar, "Strategies for Improving the Parametric Yield and 
Profits of 3-D ICs," Proc. Intl. Conference on Computer-Aided Design, pp. 220-226, 2007. 
[26] G. Smith, L. Smith, S. Hosali, and S. Arkalgud, ''Yield Considerations in the Choice 
.of 3-D Technology," Intl. Symp. on Semiconductor Manufacturing, pp. 535-537, 2007. 
172 
[27] D. Bentz, J. Zhang, M. Bloomfjeld, Jian-Qiang Lu, R J. Gutmann, and T. S. Cale, 
"Modeling Thermal Stresses of Copper interconnects in 3-D IC Structures," Proc. 
COMSOL Multiphysics User's conference, pp. 321-326, 2005. 
[28] J. Zhang, M. Bloomfield, J-Q Lu, R. J. Gutmann, and T. S. Cale, "Modeling 
Thermal Stresses in 3-D IC Inter-wafer Interconnects," IEEE Transactions on 
Semiconductor Manufacturing, vol. 19(4), pp. 437- 447, 2006. 
[29] B. Swinen, W. Ruythooren, P. Moor, L. Bogaerts, L. Carbonell, K. Munck, B. 
Eyckens, S. Stoukatch,. D. Tezcan, Z. Tokei, J. Aelst, and E. Beyne, "3-D 
Integration by cu-cu Thermo-compression Bonding of Extremely Thinned Bulk-Si 
Die Containing 10 µm pitch Through-Si vias," International Electron Devices Meeting, 
pp. 1 - 4, 2006. 
[30] C. Bower, D. Malta, D. Temple, J. Robinson, P. Coffman, M. Skokan, and T. 
Welch, "High Density Vertical Interconnects for 3-D Integration of Silicon 
Integrated Circuits," IEEE Electrom~ Components and Technology Conference, pp. 399 -
403, 2006. 
[31] A. Kahng, "Classical Floorplanning Harmful," Proc. ISPD, pp. 207-213, 2006. 
[32] Rajeev Nain, Rajarshi Ray, and Malgorzata Chrzanowska-Jeske, "Rectangular 3-D 
Wirelength Distribution Models," Proc. IEEE International Conference on Electronics, 
Circuits, and Systems (ICECS), pp. 109 -112, 2008. -
[33] Rajeev Nain, and Malgorzata Chrzanowska-Jeske, "Placement-aware 3-D 
Floorplanning," Proc. IEEE International Symposium on Circuits and Systems (ISCAS), 
pp.1727-1730, 2009. 
[34] Rajeev. Nain, and Malgorzata Chrzanowska-Jeske, "Fast Placement-Aware 3-D 
Floorplanning using Vertical Constraints on Sequence Pairs," In press, IEEE 
Transactions on Very Lm;ge Scale Integration (TVLSI) Systems, vol. pp. no. 99, 1-14, doi: 
10.1109 /TVLSI.2010.205524 7 ,URL: 
173 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5545492&isnumber= 
4359553 
[35] Rajeev Nain, and Malgorzata Chrzanowska-Jeske, "3-D Yield in the Presence of 
Defects in Through Signal Vias," IEEE International workshop on Design far 
Manufacturabilzry & Yield (DFM&Y), pp. 84-87, 2009. 
[36] Rajeev Nain, Shantesh Pinge, and Malgorzata Chrzanowska-Jeske, ''Yield 
Improvement of 3-D ICs in the presence of Defects in Through Signal Vias," 
IEEE Int'/ Symposium on Quality Electronic Design (ISQED ), pp. 598-605, 2010. 
[37] Rajeev Nain, Rehman Ashraf, and Malgorzata Chrzanowska-Jeske, "3-D IC Yield 
Enhancement in the Presence of Through-Silicon Via Failure," Under Review, 
IEEE Transactions on Very La.rge Scale Integrafton (TVLSI) Systems. 
[38] P. Miranda, A. Moll, "Thermo-Mechanical Characterization of Copper Through-
Wafer Interconnects", IEEE Electronic Components and Technology Conference, pp. 844-
848, 2006. 
[39] G. Blakiewicz, Malgorzata Chrzanowska-Jeske, M. C. Jeske and ]. Zhang, 
"Substrate Noise Modeling in Early Floorplanning of MS-SOCs", ASPDAC, pp. 
819 - 823, 2005. 
[40] M. Healy, M. Vittes, M. Ekpanyapong, C. Ballapuram, S. Lim, H. Lee, G. Loh, 
"Micro architectural floorplanning under performance and thermal tradeoff ', Design 
& Test in Europe (DATE), pp. 1288-1293, 2006. 
[41] H. Xiang, X. Tang and Martin D. F. Wong, "Bus-Driven Floorplanning", Int. 
Conference on Computer Aided Design, pp. 66, 2003. 
[42] F. Balasa, "Modeling Non-Slicing Floorplans with Binary Trees", Int. Conference on 
Computer Aided Design, pp.13-16, 2000. 
174 
[43] X. Tang and D. F. Wong, "Floorplanning with Alignment and Performance 
Constraints", Proc. Design Automation Conference) pp. 848-853, 2002. 
[44] G.Wu, S.Wu, Y. W. Chang, and Y. C. Chang, "B*-trees: A New Representation for 
Non-slicing Floorplans," Proc. Design Automation Conference) pp. 458--463,Jun. 2000. 
[45] Z. Li, X. Hong, Q. Zhou, Y. Cai, J. Bian, H. Yang, V. Pitchumani, and C. K. 
Cheng, "Hierarchical 3-D Floorplanning Algorithm for Wirelength Optimization," 
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 12, pp. 2637-2646, Dec. 2006. 
[46] E. Young, C. Chu and M. Ho "Placement Constraints in Floorplan Design", IEEE 
Transactions on VLSI ~stems) vol. 12, issue 7,July 2004, pp. 735-745, 2004. 
[47] H. Murata, E. Kuh, "Sequence Pair Based Module Placement for h~rd/soft/pre­
placed Modules'', ISPD 1998) pp. 167-172, 1998. 
[48] Adya, S. N. and Markov, I. L., "Fixed-Outline Floorplanning: Enabling Hierarchical 
design", Proceedings of the International Conference on Computer Design, pp.1120 - 1135, 
2003. 
[49] Y. Tam, E. F. Y. Young and C. Chu, "Analog Placement with Symmetry and other 
Placement Constraints", Intl. Conference on Computer !1-ided Design) pp. 349-354, 2006. 
[50] L. Xiao and E. F. Y. Young, "Analog Placement with Common Centroid and 1-D 
Symmetry Constraints'', Proceedings of ASP DA~ pp. 353-360, 2009. 
[51] B. Wang , Malgorzata Chrzanowska-Jeske, and G. Greenwood, "ELF-SP -
Evolutionary Algorithm for Non-slicing Floorplans with Soft Modules", Proc. Int. 
Symposium on Circuits and Systems) vol.2, pp.681-684, 2002. 
[52] Malgorzata Chrzanowska-Jeske, B. Wang and G. Greenwood, "Floorplanning with 
Performance-based Clustering'', Proc. Int. Symposium on Circuits and Systems) vol. 4, pp. 
724-727' 2003. 
175 
[53] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, ''VLSI Module Placement 
Based on Rectangle Packing by the Sequence Pair," IEEE_ Trans. Comput.-Aided Des. 
Integr. Circuits Syst., vol. 15, no. 12, pp. 1518-1524, Dec. 1996. 
[54] S. Nakatake, K. Fujiyoshi, H. Murata, and Y. Kajitanid, "Module Packing Based on 
the BSG-structure and IC Layout Applications," IEEE Trans. Comput.-Aided Des. 
Integr. Circuits Syst., vol. 1 7, no. 6, pp.519-530, June 1998 .. 
[55] P. N. Guo, C. Cheng, and T. Yoshimura, "An 0-tree Representation of Non-
Slicing Floorplan and its Applications," in Proc. DAC, 1999, pp.268-273, 1999. 
[56] X. Hong, G. Huang, Y. Cai, J. Gu, S. Dong, C. Cheng, and J. Gu, "Corner Block 
Llst: An Effective and Efficient Topological Representation of Non-slicing 
Floorplan," Proc. Int. Conj Comput.-Aided Des., Nov. 2000, pp. 8-12. 
[57] J. M. Lln and Y. W. Chang, "TCG: A Transitive Closure Graph Based 
Representation for Non-slicing Floorplans," Proc. Design Automation Conference, pp. 
764--769,June 2001. 
[58] X. Tang and D.Wong, "FAST-SP:A Fast Algorithm for Block Placement Based on 
Sequence Pair," Proc. ASP-DAC, pp. 521-526, 2001. 
[59] K. Bazargan,. R. Kastner, and M. Sarrafzadeh, "3-D Floorplanning: Simulated 
Annealing and Greedy Placement Methods for Reconfigurable Computing 
Systems," Proc. Int. Workshop Rapid Syst. Protoryping, pp. 38-43, June 2000. 
[60] L. Cheng, L. Deng, and D. F. Wong, "Floorplanning for 3-D VLSI design," Proc. 
ASP-DAC, pp. 405-411, 2005. 
[61] H. Yamazaki, K. Sakanushi, S. Nakatake, and Y. Kajitani, "The 3-D-packing by 
Meta Data Structure and Packing Heuri~tics," IEICE Trans. Fundamentals) vol. E38-
A, pp. 639-645, 2000. 
176 
[62] P. Yuh, C. Yang, and Y. Chang, "Temporal Floorplanning using the T-tree 
Formulation," Proc. Int. Conference on Computer Aided Design) pp. 300-305, 2004. 
[63] . P. Shiu, R. Ravichandran, S. Easwar, and S. Lim, "Multi-layer Floorplanning for . 
Reliable System-on-Package," Proc. Int. Symp. Circuits Syst.) vol. 5, pp. V-69-V-72, 
May2004. 
[64] W. L. Hung, G. M. Llnk, Y. Xie, N. Vijaykrishnan, and M.]. Irwin, "Interconnect 
and thermal-aware floorplanning for 3-D microprocessors," Proc. Int. Symposium on 
Quality Electronic Design) pp. 98-104, Mar. 2006. 
[65} P. Zhou, Y. Ma, Z. Ll, R. Dick, L. Shang,'H. Zhou, X. Hong, and Q. Zhou, "3-D 
STAF: Scalable temperature and leakage aware floorplanning for three-dimensional 
integrates circuits," Proc. Int. Conference on Computer Azded Design) pp. 590-597, 2007. 
[66] T. Ma, and E. Y. Young, "TCG-based Multi-Bend Bus Driven Floorplanning," 
Proc. ASP-DAC: pp. 192-197, 2008. 
[67] ]. Law, E. Young, and R. Ching, "Block Alignment in 3-D Floorplan Using Layered 
TCG," Proc. GISVLSI, pp. 376-380, 2006. 
[68] K. Fujiyoshi, and H. Murata "Arbitrary Convex and Concave Rectilinear Block 
Packing Using Sequence-Pair," IEEE Trans. CAD) vol.19, no.2, pp.224-233, 2000. 
[69] X Dong, and Y Xie, "System-Level Cost An~ysis and Design Exploration for 
Three-Dimensional Integrated Circuits (3-D. ICs)," Proc. ASP-DAC: pp. 234-241, 
2009. 
[70] ]. Cong, G Luo,]. Wei and Y. Zhang, "Thermal-Aware 3-D IC Placement .Via 
Transformation," Proc. ASP-DAC: pp. 780-785, 2007. 
[71] ]. Davi~, V. De, and J. Meindl, "A stochastic Wirelength Distribution for Gigascale 
Integration (GSI)-Part I: Derivation and validation," IEEE Trans. Electron Devices)· 
vol. 45, no. 3, pp. 580-589, Mar. 1998. 
177 
[72] R. Zhang, K. Roy, C. Koh, and D. Janes, "Stochastic Wire-length and Delay 
distributions of 3-dimensional circuits," Proc. Int Conference on Computer Aided Design, 
pp. 208-214, 2000. 
[73] A. Rahman, A. Fan, and R. Reif, ''Wire-length Distribution of Three-Dimensional 
Integrated Circuits," Proc. IEEE Int Conference on Interconnect Technology) pp. 671-678, 
1999. 
[7 4] B. S. Landman, and R. L. Russo, "On a pin versus block relationship for partitions 
oflogic blocks," IEEE Trans. Comput) vol. 20, no. 12,pp. 1469-1479, Dec. 1971. 
[75] S. Das, A. Chandrakasan, and R. Rief, "Calibration of Rent's rule models for three-
dimensional integrated circuits," IEEE Trans. Very Lar;ge Scale Integr. (VLSI) Syst) 
vol. 12, no. 4, pp. 359-366, Apr.2004. 
[76] M. Lanzerotti, G. Fiorenza, and A. Rand. "Predicting Interconnect Requirements in 
Ultra-large-scale Integrated Control Logic Circuitry." Proc. SUP, pp. 43-50, 2005. 
[77] M. Lanzerotti, G. Fiorenza, and A. Rand, "Microminiature packaging and 
integrated circuitry: The work of E. F. Rent, with an application to .on-chip 
interconnection requirements," IBM J. Res. Dev., vol. 49, no.4/5, pp. 777-803, 
Jul./Sep. 2005. 
[78] Pingqiang Zhou (The author of 3-D STAF [65]), Personal communication) May 6, 2008. 
[79] J. W. Joyner, "Opportunities and Limitations of Three-dimensional Integration for 
Interconnect Design," Ph.D. Thesis, Georgia Institute of Technology, July 2003. 
[80] A. Rahman, "System Level Performance Evaluation of Three-Dimensional 
Integrated Circuits," Ph.D. Thesis, MIT, Feb 2001. 
[81] W. E. Donath, "Placement and Average Interconnections Lengths of Computer 
Logic," IEEE Trans. of circuits !)St. Vol. CAS-~6, pp. 272-277, Apr. 1979. 
178 
[82] J. Xu, L. Luo, S. Mick, J. Wilson, and P. Franzon, "AC Coupled Interconnect for 
Dense 3-D ICs," IEEE Trans. on Nuclear Science, vol. 51, no. 5, pp. 2156-2160, Oct. 
2004. 
[83] S. Mick, J. Wilson, and P. Franzon, "4 Gbps High-Density AC Coupled 
Interconnection," Proc. Custom Integrated Circuits Conferepce, pp: 133-140, 2002. 
[84] S. Kuhn, M. Kleiner, R. Thewes, and W. Weber, ''Vertical Signal Transmission in 
Three-dimensional Integrated Circuits by Capacitive Coupling," 199 5. IEEE 
International Symposium on Circuits and Systems, pp.37-40, 1995. 
[85] W. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A. Sule, M. Streer, and P. 
Franzen, "Demystifying 3-D ICs: The Pros and Cons of Going Vertical," IEEE 
Design & Test of Computers, vol. 22, no. 6, pp. 498-510, Nov.-Dec.2005. 
[86] K. Tsubaki, H. Shioya, J. Ono, Y. Nak~jima, T. Hanajiri, and H. Yamaguchi, "Large 
magnetic field incluced by carbon nanotube current -proposal of carbon nanotube 
inductors," Device Research Conference Digest, vol. 1, pp. 119-120,June 2005. 
[87] K. Tsubaki, Y. Nakajima, T. Hanajiri, and H. Yamaguchi, "Proposal of Carbon 
Nanotube Inductors," Institute of Physics Publishing]ournal of Physics: Conference Series 38, 
pp. 49-52, 2006. 
[88] A. Nieuwoudt, and Y. Massoud, "Carbon Nanotube Bundle-Based Low Loss 
Integrated Inductors," IEEE Int'/ Conj on Nanotechnology, pp. 714-718, 2007. 
[89] M. Budnik, A. Raychowdhury, A. Bansal, and K. Roy, "A High Density, Carbon 
Nanotube Capacitor for Decoupling Applications," Proc. Design automation Conj 
2006, pp. 935-938. 
[90] H. B. Bakoglu, "Circuits, Interconnections, and Packaging for VLSI," Addison-
Weslry, 1990. 
[91] www.ece.Virginia.edu/ ~mrs8n/ cadence/SynthesisTutorial/ tsmc18.pdf 
179 
[92] N. Weste and D. Harris, "CMOS VLSI Design 
Perspective," Addison-Weslry, 2005. 
A Circuits and Systems 
[93] · J. A. Davis, V. K. De, J. A. Meindl, "A Stochastic Wire-Length Distribution for 
Gigascale Integration (GSI)-Part II: Applications to Clock Frequency, Power 
Dissipation, and Chip Size Estimation," IEEE Trans. on Electron Devices, Vol. 45, 
No. 3, pp. 500-597, 1998. 
[94] Y. Ma, Y. Liu, E. Kursun, G. Reinman, and J. Cong, "Investigating the Effects of 
Fine-grain Three-Dimensional Integration on Microarchitecture Design," ACM]. 
Emetg,. Technol. Comput. Syst., vol. 4, no. 4, 2008, Article 17. 
[95] T. Kuroda, ''Wireless Proximity Communications for 3D System Integration, 
IEEE Int'/ Workshop on Radio Frequenry Integration Technology, pp. 21-25, Dec. 2007. 
[96] P. Jacob, 0. Erdogan, A. Zia, P. Belemjian, R. Kraft, and J. McDonald, "Predicting 
the performance of a 3D Processor-Memory Chip Stack," IEEE Design & Test of 
Computers, vol. 22, Issue 6, pp. 540-547, Nov. - Dec. 2005. 
I97] C. Liu, I. Ganusov, M. Burtscher, and S: Tiwari, "Bridging the Processor-Memory 
Performance Gap with 3D IC Technology," IEEE Design & Test of Computers, vol. 
22, Issue 6, pp. 556 - 564, Nov. - Dec. 2005. 
[98] G. Neudeck, S. Pae,]. Denton, and. T. Su, "Multiple Layers of Silicon-on-Insulator 
for Nanostructure Devices," Journal of Vacuum Science and Technology B: Microelectronics 
and Nanometer Structures, vol. 17, no. 3, pp. 994-998, 1999. 
[99] K. Saraswat, K. Banerjee, A. Joshi, P. Kalavade, P. Kapur, and S. Souri, "3-D ICs: 
Motivation, performance analysis, and technology,'' in Proc. 26th European Solid-
State Circuits Conference. (ESSCIRC), pp. 406-414, 2000. 
[100] X Tang, and M. Wong, "On Handling Arbitrary Rectilinear Shape Constraint," 
Proc. ASP-DAC, pp. 38 - 41, 2004. 
180 
[101] Benyi Wang, and Malgorzata Chrzanowska-Jeske, "A Basic 3-D Floorplan tool," 
C++ Software, Unpublished. 
[102] Y. Xia, and Malgorzata Chrzanowska-J eske, "Considering Layout For Test 
Scheduling of Core-Based SoCs," in Proc. ICECS, pp. 1-4,2005. 
[103] Kenneth De Jong, "Evolutionary Computation," MIT Press, 2002. 
[104] Garrison Greenwood, "ECE559 : Genetic Algorithms," Graduate Coursework at 
Portland State University, Spring 2010. 
[105] M. Lanzerotti, G. Fiorenza, and R. Rand, "Assessment of On-Chip Wire-Length 
Distribution Models," IEEE Transactions on VLSI Systems, vol. 12, no. 10, pp. 
1108-1112, Oct. 2004. 
[106] G. Karypis, and V. Kumar, "hMetis: A Hypergraph Partitioning Package. [Online] 
Available: http:/ /www.users.cs.umn.edu/ ""'karypis/ metis/hmetis/index.html 
[107] ]. Lu, "3-D Hyperintegration and Packaging Technologies for Micro-Nano 
Systems," Proceedings of the IEEE, vol. 97, no. 1, pp. 18- 30,January 2009. 
[108] [Online] http://en.wikipedia.org/wiki/Sensitivity_analysis 
[109] ]. Helton, J. Johnson, C. S~aberry, and C. Storlie, "Survey of Sampling-Based 
Methods for Uncertainty and Sensitivity Analysis," Sandia National Laboratories 
Report, June 2006. 
[110] T. Song, C. Liu, D. H. Kim,]. Cho,]. Kim,]. S. Park, S. Ahn,]. Kim, and S. K. 
Lim, "Analysis ofTSV-to-TSV Coupling with High-Imp~dance Termination in 3D 
I Cs", to appear in 12th IEEE International Symposium on Quality Electronic Design 
(ISQED), March 2011. 
181 
[111] J. Cho, J. Shim, E. Song, J. Pak, J. Lee, H. Lee, K. Park, and J. Kim, "Active circuit 
to through silicon via (TSV) noise coupling," In IEEE Electrical Peiformance of 
Electronic Packaging and Systems, pp. 97 -100, 2009. 
[112] K. Yoon, G. Kim, W. Lee, T. Song,]. Lee, H. Lee, K. Park, and]. Kim," Modeling 
and analysis of coupling between TSVs, metal, and RDL interconnects in TSV-
based 3D IC with silicon interposer," In Proc. IEEE Electronics Packaging Technology 
Conj, pp. 702 - 706, 2009. 
[113] B. Curran, I. N. dip, S. Guttovski, and H. Reich!, "The impacts of dimensions and 
return current path geometry on coupling in single-ended Through Silicon Vias," 
IEEE Electronic Components and Technology Coeference, pp. 1092 - 1097, 2009. 
[114] M. Kang and W. Dai, "General Floorp.lanning with L-shaped, T-shaped and Soft 
. . 
Blocks based on Bounded Slicing Grid Structure," in Proc. ASP-DAC, pp. 265-270, 
1997. 
[115] S. Nakatake, M. Furuy~, and Y. Kajitani, "Module Placement on BSG structure 
with Pre-placed Modules and Rectilinear Modules," Proc. ASP-DAC, pp. 571-576, 
1998. 
[116] J. Xu, P. N. Guo, and C. K. Cheng, "Rectilinear Block Placement using Sequence-
Pair," Proceedings of International Symposium on Physical Design, pp. 173-178, 1998. 
[117] Y. Pang, C.-K. Cheng, K. Lampaert, and W. Xie, "Rectilinear Block Placement 
using 0-tree representation," Proceedings of International Symposium· on Physical Design, 
pp. 156-161, 2001. 
[118] Y. Ma, X. Hong, S. Dong, Y. Cai, C.-K. Cheng, and J. Gu, "Floorplanning with 
abutment constraints and L-shaped/T-shaped blocks based on corner block list," in 
Proc. Design Automation Conference, pp. 770-775, 2001. 
182 
[119] G. M. Wu, Y.C. Chang, and Y.W. Chang, "Rectilinear Block Placement using b*-
trees," ACM Transactions on Design Automation of Electronic Systems, vol. 8(2), pp.188-
202, 2003. 
[120] J.-M. Lin, H.-L. Lin, and Y.-W. Chang, "Arbitrarily Shaped Rectilinear Module 
Placement using the Transitive Closure Graph Representation," IEEE Transactions 
on VLSI Systems, vol. 10(6), pp. 886-901, 2002. 
[121] W. Davis, E. Oh, A. Sule, and P. Franzon, "Application Explora~on for 3-D 
Integrated Circuits: TCAM, FIFO, and FFT Case Studies," IEEE Transactions on 
Very Large Scale Integration Systems, vol. 17, no. 4, pp. 496-506, Apr. 2009. 
[122] 3-D IC & TSV Interconnects, Yole T?evelopment, 2010 reports. 
http:/ /www.i-micronews.com/upload/Rapports/3D%20flyer.pdf 
[online]: 
[123] A. Joseph, J. Gillis, M. Doherty, P. Lindgren, R. Kelly, R. Malladi, P. Wang, M. 
Erturk, H. Ding, E. Gebreselasie, M. McPartlin, and J. Dunn, "Through-silicon vias 
enable next-generation SiGe power amplifiers for wireless communications," IBM 
journal of Research and Development, vol. 52. No. 6, pp. 635-648, Nov. 2008. 
[124] [online]: cseweb.ucsd.edu/ classes/wi10/ cse241a/ slides/t_line.ppt 
[125] S. Salewski, E. Barke, "An upper bound for 3D slicing floorplans," Proc. ASP-
DAC, pp. 567-572, 2002. 
183 
APPENDIX A: MATHEMATICAL PROOFS OF THE FEASIBILITY CONDITION 
THEOREMS 
We present formal proofs of the feasibility condition theorems stated in section 3.4. 
Theorem1: 
Feasibility Condition for two pairs of modules: Given a two-layer feasibility condition 
graph G(V, E), there exists a feasible solution to the vertical constraint problem for two 
mo~ule pairs represented by the graph if: 
(a) G contains a clique of size K (KE {3)4})) or 
(b) G contains two cliques of size 2) such that each clique contains on/y nodes of the same color. 
Proof: Let us consider the case of a clique with size .4 as shown in Figure A.1 in which 
module A1 has to be aligned with A2, and B1 has to be aligned with B2• 
Let Xm denotes the x coordinate of the lower left corner of (sub) module min the ith 
device layer. 
Sequence Pair of Layer 1 : <A1' B1; A1, B1> 
=> A1 is to the left ofB1 => XA1 < XB1 
Sequence Pair of Layer 2 : <A2, B2 ; A2, B2> 
=> A2 is to the left ofB2 => XA2 < XB2 
Let us assume, XA1 - XA2 = o , where o is· an arbitrary constant such that o 2:: 0 
=> XA1 = XA2 + 0 
1.84 
(A.1) 
(A.2) 
(A.3) 
Figure A.1: Two module pairs {Au A2} and {B1, B2} with vertical constraints in 
2-device layers. 
From Eqn (A.1) and (A.3): 
(A.4) 
From Eqn (A.2): 
XA2 < XB2 => XA2 + 0 < XB2 + 0 (A.5) 
Combining Eqn (A.4) and (A.5): 
=> A2 can be shifted rightward by o to align it with A1 first. After that {B1, B2} can ~e 
aligned together without disturbing the alignment of {A1, A2} because B1 and B2's x-
coordinates are to the right of both A1 and A2• 
Therefore a. clique of size 4 farms a feasible solution far vertical alignment. Similarly we can show that a 
clique of size 3 forms a feasible solution. 
Condition (b) of theorem 1 is the same as a clique of size 4 except that lateral shifting will 
take place along the y-axis. Hence theorem 1 is proved. 
185 
Theorem2: 
Two Layer Feasibility Condition: Let Li = {A1, B1, C1, D 1, .•••.• } and L2 = {A2, B2, C2, 
D2, ..... } are two sets of modules located in two different device layers L1 and Lz 
respectively. The packing on Li and L2 are represented by Constrained Sequence Pairs SP 1 
and SP2 respectively. SP1 and SP2 are feasible if module pairs {(A1, A), (B1, Bz), (C1, Cz), 
(D1, Dz), ..... } can be vertically aligned simultaneously. The vertical alignment of all these 
module pairs is feasible if: 
Every combination ef two module pairs decomposed from Lt and L 2 (without changing their relative orders) 
construct a feasible corifiguration f?y sati.ifying theorem 1 i.e. 
{u17 u} and {v17 v} form a feasible configuration v {uv vt} E SP1; 'r;f { u2) v2 }. E SP2' 
Proof (by contradiction): Let us assume that SP1 and SP2 are feasible constrained sequence 
pairs, but every combination of two module pairs does not make a feasible configuration. 
If every combination of two module pairs does not make feasible configuration, 
=>There exists at least one combination of two module pairs which makes an infeasible 
configuration i.e. 
(r1, r~ and (s1, sz) do not form a feasible configuration where {r1, s1} E SP1; rt -::J:. s1 and {r2) 
s2 } E SP2; r2 -::J:. s2 
=> SP1 and SP2 become infeasible because (r!J rJ) and (stJ sJ) cannot be vertically aligned 
simultaneously. (Contradiction to our initial assumption). 
Hence it is proved 1:ry contradiction that for SP 1 and SP 2 to be feasible, every combination of 
two module pairs should construct a feasible configuration, thereby satisfying theorem 1. 
186 
Theorem3: 
M~ti Layer Feasibility Condition: Given a set of multi-layer constrained sequence pairs, 
there exists a feasible solution to the vertical constraint problem if: 
Each combination of 2-device lqyers satisfies the 2-lqyer feasibili"!J condition theorem (i.e. theorem 2). 
Proof (using principle of superposition): Let us assume that there are L device layers (L > 
2) in which modules are vertically constrained. The L layer feasibility condition can be 
decomposed in L~2 combinations of two-layer feasibility condition (theorem 2). If any one 
of the Lc2 combinations violates theorem 2, then sub-modules constrained between those 
two layers can never be aligned. Thus the multi-layer feasibility condition will fail. 
=> For the multi-layer feasibility condition, all Lc2 combinations must satisfy the two layer 
feasibility condition (theorem 2). 
187 
APPENDIX B: STOCHASTIC RECTANGULAR 3-D WIRELENGTH 
DISTRIBUTION MODELS 
B.1 INTRODUCTION 
The wirelength distribution model proVides a stochastic relation between the total number of 
interconnects and the length of those interconnects. The length is measured in "gate pitch" 
unit, defined by the average separation between adjacent gates in a chip. A wirelength 
distribution model is used for interconnect prediction, mostly for early interconnect 
planning and design for high performance chips. In addition, it provides mathematical 
models to establish several performance matrices such as total wirelength, average length. of 
wire, and number of global and semi-global wires. Zhang [72], Banerjee [1], Rahman [73] 
and Joyner [79] have previously derived statistical wirelength distribution models applicable 
for square shaped 3-D chips only. 
Since the module shapes are generally rectangular in the floorplan, it is necessary to develop 
a rectangular 3-D wirelength distribution model for fast wirelength estimation. Zhang, 
Banerjee and Rahman's models have one thing in common - all of them were derived from 
Davis' square shaped 2-D model [71], which is computationally fast. Joyner derived a new 
gate pair [79] expression that could be used to derive a rectangular model. However, no 
further work was done to complete the rectangular 3-D model and performance matrices 
such as total wirelength, average wirelength, etc. Joyner's square shaped 3-D model is also 
complex and computationally expensive due to lack of a closed form analytical solution. 
188 
Therefore we have chosen Zhang, Banerjee and Rahman's models for further study and 
extension, with an objective ofjastwirelength prediction inside rectangular modules. 
B.2 GENERAL BACKGROUND 
The derivation of previous Zhang, Banerjee and Rahman's 3-D. wirelength distribution 
models for square shaped chips are based on Davis' 2-D wirelength model. Davis' 2-D 
wirelength distribution model was derived using a terminal-gate relationship known as 
Rent's rule [74] as follows: 
(B.1) 
where, Tis the number of I/ 0 terminals of chip, N is the total number of gates, k is Rent's 
coefficient .and p is Rent's exponent. The Rent's exponent p denotes the complexity of 
.wiring inside a chip and its value is a real number between 0 and 1. Zero denotes absolutely 
no wiring and one denotes maximum wiring such as a clique in a graph in which every node 
is connected to the remaining nodes using edges . 
. Davis assumed all the gates to be organized in a uniform array of square logic gates inside 
the square 2-D chip. He calculated the total number of gate pairs separated by a given 
manhattan distance £ inside the array. Based on the conservation ofl/O terminals [13], he 
also calculated the expected number of interconnects between these gate pairs. In short 
fo~, his distribution function can be written as eqn (B.2): 
(B.2) 
189 
where, r is a normalization constant, M/) is the total number of gate pairs [71] in gate 
pitch unit and /exp ( f) is the expected number of int.erconnects between these gate .pairs as 
shown in eqn (B.3) and (B.4). 
M(f) =f_-2f2 ffe +2Nf, 1 ~ f<ffe 
3 
=j(2ffe-t)3' .JN ~f< 2..JN 
]exp(£)= ak [(NA +NBY -(NB +NeY -(NA +NB +Ne Y] 
Ne 
:::::: . akp(l- p )f,2p-4 
(B.3) 
(B.4) 
where, NA = 1, NB:::::: £(£-1), Ne:::::: 2£ and a is a function of average fanout if.o.) and equals to 
(jo./if.o.+1)); k andp are Rent's parameters. 
3-D distribution models derived from Davis' model consist of a horizontal distribution 
model and a vertical distribution model. The horizontal component gives the distribution 
within each layer while the vertical component gives via distribution across different layers. 
Banerjee et al. [1] used the direct application of Davis'. 2-D model and calculated the 
horizontal distribution. Banerjee transformed the Rent's coefficient 'k' of a 2-D chip to 
effective external Rent's coefficient, keff for each layer of a 3-D chip. The transformation is based 
on the conservation of I/ 0 terminals [71]. Banerjee also calculated the total number of 
vertical vias. However, his method does not provide via distribution across different layers. 
The Rent's exponent p remains unchanged under the transformation from a 2-D to 3-D 
190 
model assuming that the routing algorithm remains the same [1]. Banetjee's distribution 
model can be suµimarized mathematically as follows: 
k =kmp-l 
eff 
hnt(B) (m) = mk(l - mp-l )(NI m Y 
f (£) = mf2D(f) + hnt(B)(m) 
3D(B) 
(B.5) 
where m.is the total number of layers, (k,p) are Rent's parameters of the entire chip and ke.ff 
is the Rent's coefficient of each layer. Please note that h-v for each layer will use ke.ffas input 
inst~ad of k. Also ~nr is constant and only gives the total number of inter-layer vias instead 
of their distribution. 
Zhang et al. split the /exp ( £) into vertical and horizontal components, and derived the 
horizontal and vertical 3-D wirelength distribution models without transforming the 
original Rent's parameter of a 2-D chip [72]. Zhang's model is summarized as shown in eqn 
(B.6): 
hD(Zhang) ( f) = h( f) + v(z) 
h(£)= e( ~ -u2~+2<}2p-4 I'.>£<~ 
= ~(2~-.eJ £2P-4 ~$£$2~ (B.6) 
where, e is normalization constant 
kN(l N p-I p-2 -I Np-I) v(z) = a - -m +m · (2m-2z) 
m(m-1) 
where z= 1,2, ... ,m-1 
191 . 
where mis the total number of device layers in a 3-D IC. 
The third model was developed by Rahman et al. [73],[80]. Rahman initially modified eqn 
(B.4) by incorporating the effect of vertical wire distribution in NB and Ne as follows: 
(B.7) 
w~ere tz is via height (in gatepitch) , u is the unit step function and Nz = m = total number 
.of device layers. Using eqn (B.7) and eqn (B.4), the expected number of interconnects (13-D, 
exp) is obtained. Rahman's distribution model is finally derived as follows: 
fm ( f' tz) = I''M3D ( f' t~ )/3D,exp ( f' tz) 
M3D(f ,tJ = NZM(f) + /JM(f) 
where fJ is a constant which depends on N2 . 
B.3 RECTANGULAR 2-D WIRELENGTH DISTRIBUTION MODEL 
(B.8) 
As the previously described 3-D models [1],[72],[73] used Davis' square 2-D model, we first 
modified Davis' model to a rectangular 2-D model and incorporated the aspect ratio 
a,.(O <a,.~ 1) of chip/module in our derivation. We assume that all the logic gates inside the 
192 
rectangular module are uniformly arranged in a rectangular array as shown in ·Figure B.1. 
The summary of our derivation is as follows eqn (B.9): 
(B.9) 
where, A is a normalization constant, ~(f) is the number of gate pairs inside the 
rectangular array for a given length £ and /exp(.e) is expected number of interconnection as 
defined in eqn (B.4). In Figure B.1, The gate count in a row and column defines length and 
width of the array respectively. We define aspect ratio ar such that: 
_width . < 
ar - ' 0 < ar - 1. 0 
length (B.10) 
width = ~ arN ; length = ~NI ar 
where, length and width are integer values in gate pitch unit. The gate pitch is calculated using 
the area of a rectangular module A 111 and the number of gates N inside the module as 
follows: 
gate pitch = ~ 4n / N (B.11) 
To calculate ~(f), we traverse in the rectangular array of logic gates row-wise from top left 
as shown in Figure B.1. We choose a reference gate (shown as "A" in Figure B.1) and 
count the total number of gates which are at a manhattan distance £ away from it. Then we 
remove the reference gate from further counting and set a new reference right next to it. 
We keep counting and adding the number of such gates until all the gates are traversed in 
193 
.....-----• ) 
·-·-·-·· 
; ·- · - ·~ : · - · - · ~ 
··-·-·- ··-·-·-
: ·-·-·· .- ·- ·- ·· GJ Q EJ a ! ; ' ' ' ' ' 
D 0 a D Q D a 0 D 
D D 0 a D a 0 D D 
D D D 0 Fl 0 D D D 
~----------- Length 
Figure B.1: Gate pair calculation in a partial manhattan circle. 
the array. The final sum gives ~(£) . In Figure B.1, the dotted gates indicate that they have 
already been traversed and removed to avoid double counting. The reference gate is called 
type "A'', the gates located at a distance e are called type "C" and all other gates residing 
between type A and type C are known as type B. These three types of gate form a partial 
manhattan circle [71] together. The traversal procedure of calculating Mif'J is mathematically 
defined as: 
~~ 
MR(f) = I I <DR (i,j, R) (B.12) 
i=I j= I 
194 
where <I> R (i, j, f) computes the tdtal number of gates that are a distance . f away from the 
reference gate "A" located in the ith row and jth column of the rectangular array. The 
function <l> R is defined as follows: 
(f + l)u0 (f + 1) - (f- {E + j)u0 (R,- {E + j) v~ v~ 
- 2(£-~arN +i)u0 (f-~arN +i) - u0 [- (f-~arN +i)] 
+ [f-(F, + }:-).JN+ j + i-l]u0 [ f-(F, + }:-).JN+ j + i-1] 
'\/ ar . . '\/ ar 
+ ( f)u0 ( f) - ( f - j)u0 ( f - }) + ( f - ~ arN - J + i)u0 ( f - ~ arN - J + i) 
(B.13) 
where u0 is the unit step function. 
The procedure of calculating~(£) using computer simulation has O(length x width) runtime 
inside the rectangular array of~ logic gates, which is computationally expensive for large 
number of gates. Therefore we have derived a closed form analytical solution of ~(£) as 
shown in eqn (B.14), which can be computed in constant time: 
Region I: 1 ~ .f. < ~arN 
Regi~n II: ~a/{ ~·,e < ~N/ar 
195 
Region III : ~NI ar ~ £ < (~NI ar + ~ arN - 2) 
(B.14) 
Assuming .JN/ ar + .J arN > > 1 and N > > 1, eqn (B.14) is finally approximated as shown 
in eqn (B.15): 
Region I: 1 ::; f < ~ arN 
Region II: ~arN ~ £ < ~N/ar 
Region III: ~N/ar ~ £ < ~N/ar +~arN 
(B.15) 
A comparison between the closed form analytical solutiqn obtained using eqn (B.15) and 
computer simulated gate-pair counts for different aspect ratios is shown in Figure B.2. It 
indicates that our approximate gate pair count matches closely with the computer simulated 
data. 
196 
~ 106 
0 
u 
· ~ 
0.. 
" s ;; 10 
() 
10' 
--- Simulated (ar = 0 .64) 
• 
+ 
Analytical (ar = 0.64) 
Simulated (ar = 0 .36) 
Analytical (ar = 0.36) 
1 02 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
1 500 1000 1500 2000 
L (gate ptch) 
Figure B.2: Comparison of closed form analytical solution and computer 
simulation of MR(L}, for different aspect ratios. 
The total number of interconnects in a system for the known values of k,p,a and N is 
given by [81]: 
(B.16) 
Now, the normalization coefficient, A is calculated by eqn (B.17) 
197 
.Ja;ii +~N/ar f !R,2D ( f)d 1!, = /total (B.17) 
. .f=l 
Thus, 
A=[ NP.'¥ 1 (FY+Ma;) N l (B.18) 
6p(p-l)(2p-l)(2p-;3) - 6p + 2p-l - p-l 
2N(l-Np-I) 
where \jf, a function of Rent's exponent p and aspect ratio ar is given by eqn (B.19): 
'P= 
(2p-3)(p-1)(2p-I)a;- 6p(2p-3)(p-I)(a; +a;-1) 
+ 3p(2p-3)(2p-l)(ar -ar2-p +2a/-1) 
+ 2p(p-1)(2p-1)(3a?-p - 3a;-1 - a;) 
+3(ar +ar-l +2)P-3ar-p - 3p(2p-I)ar2-p -6pa!-p (B.19) 
At p = 0.5, A becomes indeterminate of the form 0/0, but by using L'Hopital rule at p = 
0.5, A converges to eqn (B.20). 
4N-4JN (B.20) 
A= ~~~~--~~----~--~----~----~~----~~~-( J;,N+~)[-tn(NJ + {ra> Ja}1]- 2JN(a,+ lJ + 4N-~ 
This completes the derivation of a rectangular 2-D wirelength distribution. A performance 
metric, length demand function D(.f.) that gives the total length of interconnects between a 
length of one gate pitch to f gate pitches [21], is calculated as shown in eqn (B.21): 
198 
f. 
D(e) = J l.JR, w(l)dl 
The complete evaluation of eqn (B.21) gives eqn (B.22) as shown below: 
Region I: 1::::; f < ~arN 
D(f) = ak A(l g2p+1 -1 - (~a,.N +~N/ar )f,2P -1 + 2Nf,2p-1 _lJ 
2 3 (2p+l) ' 2p 2p-1 
Region II: ~arN ~ f. < ~N/ar 
. 1 (~a,.N)2p+1 -1 ( c:-:;:rN ~/ ) C.Ja;N)2p -1 (~arN)2p-1 -1 
---- - -va,.1v +vN1a,. + 2N---'-----
D(f) = ak A 3 (2p+l) . 2p 2p_.:1 
2 [.e.2p-1 -(~a N)2p-1] (a J[f.2p-2 -(~a N)2p-2] 
- a,.N ,. + N~a N __.!.+1 ,. 
2p-1 r 3 2p-2 
Region III: ~N/a,. ~ f. < ~N/a,. +~a,.N 
IX/)= ak 
. 2 
199 
(B.21) 
. (B.22) 
B.4 EXTENSION TO RECTANGULAR 3-D WIRELENGTH DISTRIBUTION 
MODELS 
Since we extend the previous 3-D models of square shape to rectangular shape, our models 
are referred with the prefix "extended" during this discussion. Using our rectangular 2-D 
distribution model, we have extended Banerjee's 3-D square model to a 3-D rectangular 
model by modifying eqn (B.5) using eqn (B.9) as follows: 
JR,3D(B) ( £) = mfR,2D ( f) + hnt(B) ( m) (B.23) 
The modification in Zhang's model will only be in horizontal wirelength distribution while 
the vertical distribution will remain unchang~d even for rectangular shape. Thus we 
modified the horizontal distribution of eqn (B.6) as follows: 
/!,3 
8R[3-(.JarN/m +.JN/mar )£2 +2£ N/m]f2P-4 ; I~/!,< .JarN /m 
EJR[- a,Nf + N t' + l)~a,N ]f'P-4; .Ja,N/m 5' £<.JN/ma, 
m m 3 m 
~R [.Ja,N /m +.JN/ma, -£]3 £2P-4; .JN/ma, 5' £<.JN/ma, +.Ja,N /m 
(B.24) 
Thus the e~tended Zhang's model appears as: 
(B.25) 
200 
where v(z) is same as defined in eqn (B.6). The normalization constant, E>R is given by 
(B.26) 
(B.26) 
At p = 0.5, E>R becomes indeterminate of the form 0/0. Using L'Hopital rule at p = 0.5, 
@R becomes as follows: 
2ak.mP ( N /m-.JN Im) 
@R= ~~~~~~~~~~~~~~~~~~~~~~~~~~ 
(~arN/m +~N/mar)[-In N + In[fo: + J:-]2 -1] - 2.JN /m[fo: + i:-J + 4 N -~ 
m var var m 3 
(B.27) 
Finally, Rahman's model is extended to a 3-D rectangular model by modifying eqn (B.8) 
with the use of eqn (B.9) and the modified model is represented by eqn (B.28). 
fR,3D(£,tz) = r'MR,3D(£,tz)J3D,exp(£,tz) 
MR,3D(/!,,tJ = NzMR(£)+ /JMR(/!,) 
B.5 EXPERIMENTAL RESULTS 
(B.28) 
We implemented our newly developed 3-D rectangular models in C++ /STL to generate 
data for experimental results. These results are also valid for the original 3-D models of 
square chips because they are special cases of our models (ar = 1). Our models are referred 
with the prefix "extended" during the discussion. 
201 
B.5.1 Effect of Aspect Ratio on Wirelength Distribution and Total Wirelength 
We started by plotting a graph to observe the impact of aspect ratio on our newly 
developed 2-D wirelength distribution model which was later extended to the 3-D domain. 
Figure B.3 indicates that a reduction in aspect ratio will increase the global wire count but 
decrease the semi-global wire count. \'Ve can see these dual trends as the two curves cross 
each other in Figure B.3. When we plot the total wirelength against various aspect ratios, we 
observe that the total wirelength decreases with decreasing aspect ratio for ar < 0.6 as 
shown in Figure B.4. However, it saturates for 0.6 < ar ~ 1. This trend is seen in 3-D circuits 
.... 
0 
"~ N = 1000000 
~ k=3.88,p=0.66 
~ - .F anout= 3.0 
· · AspectRatio 1 .0 
AspectRatio O .36 
10°'------~------J'----..L...>.J 
200 400 800 1200 
Length (gate pitch) 
Figure B.3: Effect of aspect ratio on 2-D rectangular wirelength distribution. 
N = 1000000 
k = 3.88, p = 0.66 
Fanout= 3.0 
3 .41:'T-'------'----'----..__---Ll 
0.2 0.4 0.6 0.8 
Aspect Ra.tio 
Figure B.4: Effect of varying aspect ratio on total wirelength (in gate pitch). 
202 
as well. Therefore a smaller aspect ratio Qess than 0.6) is preferred for the total wirelength 
reduction. 
B.5.2 Comparison Among Rectangular 3-D Wirelength Distribution Models 
The comparison between extended · Banerjee's and Zhang's models shows that their 
horizontal components converge in spite of different derivation approaches. Figure B.5 
indicates this convergence. Extended Banetjee's model however does not provide the 
distribution of vertical vias among different layers. Extended Zhang's model gives the 
distribution of vias, but it does not tell which via is associated with which net/wire . 
. Extended Rahman's model takes care of this issue. It considers via as a part of a particular 
net and calculates the length of the net. A separate comparison between extended Zhang's 
model and extended Rahman's model is presented in Figure B.6. It can be seen that the 
extended Rahman's model predicts lesser global and semi-glob~ wire compared to the 
extended Zhang's model. However the number of local wires is larger than Zhang's model 
due to the assumption of direct vertical integration [73],[80] in Rahman's derivation. In the 
integration, any gate is allowed to be connected in different layers. Therefore many global 
and semi-global wires are replaced by local wires and vias. Rahman's approach is 
computationally expensive due to the lack of a closed form analytical solution because of 
changes imposed by eqn (B.7) and eqn (B.8). Howeyer, it gives a closer prediction of the 
wirelength distribution compared to Banerjee and Zhang's models. The summary of our 
c.omparison is presented in Table B.1. 
203 
101 
--- Extended 3D (Banerjee) 
N = 1000000 
k = 3 .88, p = 0.66 
Fanout= 3.0 
AspectRatio = 0 .49 
# of layers = 4 
o Extended 3D (Zhang) 
10aL-_.__._~.__._.........._ _ _,__,_............__._............__...._....._.._._~ 
100 10 10
2 
Length (gate pitch) 
Figure B.5: Extended Zhang's and Banerjee's horizontal 3-D models. 
' 10 ~ · · · · · · · · · Extended 3D (Zhang) lOJ -~ ---Extended 30 (Rahman) 
!!= 10~ 
.... N = 1000000 0 10
3 
.... 
k = 3 .88, p = 0 .66 ... ~ 102 Fanout= 3 .0 
:i 101 AspectRatio = 0.49 
10° 
#of layers= 4 
10° 101 10
2 103 
Length (gate pitch) 
Figure B.6: Comparison of extended Zhang's and Rahman's 3-D models. 
TABLE B.1: COMPARJSON OF VARlOUS RECTANGULAR 3-D WlRELENGTH 
DISTRJBUTION MODELS 
Rectangular Analytical Via Relative 3-D models Relative 
derived solution distribution computation 
available? available? time accuracy from: 
Banerjee's Yes No Fastest Low 
Zhang's Yes Yes Faster Medium 
Rah man's No Yes Slower .High 
204 
APPENDIX C: RENT'S PARAMETER EXTRACTION 
C.1 EXTRACTION OF RENT'S PARAMETERS FROM THE NETLIST OF A 
CIRCUIT BLOCK 
Let us assume that we are given the netlist of a circuit block for which Rent's parameters 
are needed to be extracted. The first step is to perform a k-way partition [1 OS] of the netlist 
with multilevel recursive bisection using a circuit partitioner such as hMetis [106]. For these 
partitions, k is given by the list { 2, 4, 8, 16, 32, ..... } with the highest value in the sequence 
chosen such that no partition contains zero gates. As a result of the multilevel recursive 
circuit partitioning, a list of I/ 0 terminals (I) and number of gates (N) for each partition is 
obtained. Next a log-log plot. of T vs. N for each partition is plotted and a least square 
linear fit ~s obtained from the plot. The linear fit can be expressed as: 
Log(T) = p Log(N) + Log(k) (C.1) 
=> T=kNP (C.2) 
Thus, from the least square linear fit plot, the values of k and p are obtained. 
C.2 RENT PARAMETER FOR MODULES OF FLOORPLAN BENCHMARK:S 
For the placement-aware 3D floorplanning experiments, we randomly assigned Rents 
parameter to modules assuming that the module represented different types of circuits such 
as microprocessor, SRAM, ASIC like control logic circuit etc. The Rent's parameters for 
205 
different types of circuits were directly taken from [77]. Tables C.1 to C.5 show the Rent's 
parameter assigned to different modules in ·each floorplanning benchmark. 
TABLE C.1: RENT PARAMETERS FOR EACH MODULE OF AMI33 BENCHMARK 
ami33 ami33 ami33 
Module k t> Module k p Module k t> 
o~ ·~ 6 0.12 J1't' ' 23.3 0.3 ·.'.2,2,,fi'"·~' 0.82 0.45 ' . 
\ fl' 1.4 0.63 1!2' ~ff 7.3 0.46 "23, 2.09 0.36 . ;,, y,!,j' M i; i!i 
F ~ ~ 2 ~ 1.4 0.63 . =i;3· ®~ 22 0.66 .. ~4.,, 7Ji 1.4 0.63 ~ .. ~ i 
' 
.•• R ' 
, r3, ,.,, 0.82 0.45 ~ ::rf ····, 1.4 0.63 29, 14' 1.4 0.63 '•· 
.. ~ ' ' p 
. 
,4 2.09 0.36 ,, ;r~,,, 0.82 0.45 ' ? ~id.'' ' 1.4 0.63 % !!<! ·--"' *,. "' ,· 
6 
. 
3.8 0.75 ' 1:6: ' 2.14 0.8 ,, i'l·,.' 2.2 0.66 
' 
'k .• ' 
'<)£6"',.,. 'f 2.14 0.8 Y?.: .·~. 2.09 0.36 "' ·~28' . 7.3 0.46 
•, ' •. ' 
0.8 0.69 ts··~. 3.8 0.75 '?9~~ :. ~ 4.4 0.61 ~ 1 ,;;,'. ~~ 
8' 2.2 0.66 ,, 1&.'' 6 0.12 ' B~f. '. 0.8 0.69 
" . R'fiy; "'-
',, 
"' !Z 9 $ •µf; 4.4 0.61 ' '2!{' 0.82 0.45 31 0.8 0.69 
" ' 
. 
.,; %li 
~ 
'to 20.5 0.3 fl'. ·~2f' ;c·•<; 2.09 . " 2 ' 0.36 3•w' ' 7.3 0.46 
.. 
' 
'¥< 
TABLE C.2: RENT PARAMETERS FOR EACH MODULE OF AMI49 BENCHMARK 
ami49 ami49 ami49 
Module k p Module k t> Module k p 
Q 3.8 0.75 1: J7~~. ~ 1.4 0.63 µ S4) . 0.82 0.45 
' 
I• 1 1.4 0.63 . ·r~f ·: 2.09 0.36 >@ 35~pi 2.09 0.36 
... 
' 2! i! 2.2 0.66 I• •:,r9 #!' ~ 3.8 0.75 . ya(): •. 2.09 0.36 ' 
. ·. . 
"' 
'"J •. 2.14 0.8 iW .:w: ··_,;, 1.4 0.63 ".· ~31 ~ .. 3.8 0.75 
,;,, ~ 
"' 
/!!.. 
'2i . 4 « ' 2;09 0.36 . 1.4 0.63 ,{' ®•3g' 1.4 0.63 ., 
" "' 
,,: 
,. 'll ;S, . 3.8 0.75 l: *'·~· ' 0.82 0.45 • ':?.9@ '* ·~ 1.4 0.63 i 
r"" , 2.14 0.8 "'. @'2i3"'''•• 3.8 0.75 ·4o·ft · 1.4 0.63 11& 
'"' 
#' ,, ,k %\<,', - "· 
,, 
• '. ''11 
0.8 0.69 ,~4 '11!&$ 4.4 0.61 :~u~· . ~ I• 1. ' p " 0.82 0.45 . ' t-. "'"'- 4 ~ (;, r; 
8 ,,.· d 2.2 0.66 .. ts t 0.8 0.69 •., *:z: ·~~ 2.2 0.66 I• ~ M l 
,9' 4.4 0.61 
ad flh ~ . 3.8 0.75 fa •• )t-.' . ~, .2.14 0.8 
)~Q·w~. , 20.5 0.3 '• • 2'1 2.09 0.36 44: . . 2.14 0.8 l!P'# #',' .: rr .,. 23.3 0.3 "'~2s . ~ 1.4 0.63 45'" 4.4 0.61 . . .,.. /, :t;efRff ~ . 
'"' 
. 
·i2:. 7.3 0.46 ·2'~ '0 • 3.8 0.75 6 ¥$ 7.3 0.46 
= 
, ~4 ·,%., ; b , . 
. ~~3: . 2.2 0.66 . ~3U 2.14 ~ 0.8 l'..i. "11ll ii* 0.82 0.45 
~ ~r14"' , 0.8 0.69 ¥A • ct' "~ 2.2 ~% • , ~ 
"' 
~) .*3 •. !!<"' 0.66 ~- 1' 1.4 0.63 .. 
r;iiil" i-<.i 
1.9 0.5 w=3~; 
.. 
2.09 0.36 
!:i#rn 'if!.*'., 
. l:pm 
.. 
"""" ·1~.r,--,,,-, 
~ ~i6, . 6 0.12 3~· . . 1.4 0.63 7£ «. . ~·':!: • , ··"' R .. 
206 
TABLE C.3: RENT PARA.METERS FOR EACH MODULE OF Nl 00 BENCHMARK 
nlOO nlOO nlOO 
Module k p Module k p Module k p 
, 
6 0.12 0 M :j~fp.' 0.82 0.45 4~ 3.8 0.75 I@ 
' 
, 
. ·r 1.4 0.63 
, 35, ' 1.4 0.63 ,,·,.69· 1.4 0.63 
'"' 'fi, r 'n 
',2 ~. 1.4 0.63 I' 36: 0.8 0.69 ' '·7cfk' 3.8 0.75 I• ""-' j:;w;;;:,,;: "' ' 
''3'# "' 
. ' 
0.82 0.45 3-:r'w · 
1i % 
4.4 0.61 n 7{ 1.4 0.63 
l>M 4 
,, 
0.82 0.45 , $8,~·, 6 0.12 1•. 7 ,, 20.5 0.3 ; ~ ' ' 
: fJ ~1' 5;;; , 2.09 0.36 r .~39~ ·' 0.8 0.69 #' 0.8 0.69 
'I'! ~. "'.'. 
"' 
'fa .• (~ 
' 
A· w. ' 0.8 0.69 ' 4{} 1.4 0.63 / 7:4 6 0.12 ¥,ii< pf '# '· . •' 
I• '"' .. 7."'"' 0.8 0.69 "'"41 ' 0.82 0.45 ,. ' 75·, ' 2.09 0.36 
~ 0.82 0.45 '·4~ ~~ 0.82 0.45 .. · ..•. 76' , 0.82 0.45 ' ~ 
'.9:' ' 2.14 0.8 ' 4:3~: , : 1.4 0.63 77 23.3 0.3 i'O .,., 2.2 
' 
0.66 M ~14, ' 1.4 0.63 '' '7f>!' t 2.09 0.36 
'' , ltl , ' 4.4 0.61 45 ' ~ 0.82 0.45 , ~9' 6 0.12 :!> 
' 
' :J.~ 7.3 0.46 AP· 2.2 0.66 ~ ' .·so ' 0.8 0.69 , 
' ' ' ·~ ··~1~·~ ' 2.2 0.66 .:;:· J:t 2.2 0.66 '8·1 'r 23.3 0.3 
,· 
'' ' ' 
" 1;~.: 0.8 0.69 ' w 4'8 w,' 1.9 0.5 ~ · ·s2~ 0.82 0.45 
" 
:1~'.· 6 0.12 , ~: ·.4:~ '·' 4.4' 0.61 ... ····$6~ 1.4 0.63 ' 
Jo 1.4 0.63 ~. &, 50 0.8 0.69 "'" '.$,4' CT'' 3.8 0.75 , 
'' 
k 
, ,' 
·U 
-,;. 
' 3.8 0.75 ..···~5~' 2.2 0.66 &5·' ' 3.8 0.75 ' 
... 
,; is . ., .. 3.8 0.75 ··.~~ ,, 2.2 0.66 ,;86,., , 1.4 0.63 
' ·'• '·" 
'· 1~· ' 1.4 0.63 •, ''5:3: 3.8 0.75 ,,,, JfT 1.9 0.5 ii'# 
" ' 
.~o ~ 1.4 0.63 '. }~4.~' 
, ' 
1.4 0.63 ./:ss: .. ·· 1.9 0.5 
',2!1 ' 
% 3.8 0.75 ' 'ss: 1.4 0.63 ~,9" 2.2 Q.66 
' .~i ~: 4.4 0.61 ;, ,56, 2.2 0.66 '·.~ ·.9(f' ' 
' " ' 
0.8 0.69 
~">; 23 ~· 0.8 
' 
0.69 
" 57' ' 3.8 0.75 ·~ ,, -:~~~t ' 0.8 0.69 '·,~4 ' 2.09 0.36 SB~ 2.2 0.66 ;:~92' ' 3.8 0.75 ft. ' /'* Ki ¢.>JI 
' 
' 
,f25~ 1.4 0.63 «# 59'' ,,. 0.8 0.69 ''.92 ' ·, 0.8 .0.69 
' ' 
' 6 3.8 0.75 ' .. 6J~.· 3.8 0.75 ,9{ '~' 2.2 0.66 .,.,,2 ' 
2:7, "' 2.2 0.66 tr:'&!' ' 0.8 0.69 ''·' 95· '~,,, 3.8 0.75 
' 
·?Jf'. ,f ':v ' 
"' #! 1<'$ '£ 
,~;·' ,2s:r:~ 
' 1.4 0.63 "'~62~ ~. 0.8 0.69 ':.w'96i d 2.2 0.66 
' ' ' ' z~· .. ' 0.82 0.45 
" ... ~~ .. ', 2.2 0.66 ~ f:• ~97 '.' 7.3 0.46 ,;;;% 
:#,rJ~. ,~(f ' 2.09 0.36 i ' ~({t;f IP 1.9 0.5 , .. ,'Q'S' . 2.09 0.36 
' 
. ~ 
'~l ' 3.8 0.75 : ., "6£' "' 1.9 0.5 "99 0.82 0.45 ' 
;,¥,' 11 1!!9; 
'· 
,,., 
j2: 3.8 0.75 60: "' 3.8 0.75 .MP ' 
' 
~ 
' 
--~«;;"t;--
~ "':5~Wi, '"'' 1.4 0.63 t' ,·6?.' 
,. 
1.4 0.63 
<::!&;3l'W 
' ~ 
----'!:.! 
' 
¢. % 
207 
TABLE C.4: RENT PARAMETERS FOR EACH MODULE OF N200 BENCHMARK 
n200 n200 n200 
Module k jJ Module k p Module k jJ 
(f. . 6 0.12 ~. ~'Z . ~ 4.4 0.61 74'.· . 6 0.12 . ~ 
• 1.@ 1.4 0.63 ~ ~n;·3'8· 6 0.12 ,. «75 .. 2.09 0.36 
#< .. 
1~ "".2,,·. .• 4.4 0.61 39 . , 4.4 0.61 i~~=. 0.82 0.45 ; . 
~ 3: .. ~· 0.82 0.45 . 2:fo . 1.4 0.63 . 71 .. ;~ 23.3 0.3 
:ii ~ '"' 
"'4"' i 
* WW ii; '<' ~ 0.82 0.45 • • # ~1~ . 0.82 0.45 ;, 18~ . 2.09 0.36 
1·, 5 i 2.09 0.36 . 41 11 • 0.82 0.45 ; 79· .. , 6 0.12 
"'"'6* ·.,· 0.8 0.69 
. ~·i". 1.4 0.63 :&OI 0.8 0.69 . . .3 ... 
7 . 0.8 0.69 'l-4~ ~~ 1.4 0.63 •< •;tf1. #: 23.3 0.3 
,, ;;8· . 0.82 0.45 ,. 4s · ·. 0.82 0.45 u o/ 82 0.82 0.45 ;if 
·. ~ b ·~· .%, • . ~ 
,, ·9 2.14 0.8 46 .. 2.2 0.66 . 83 1.4 0.63 . . .~ 
~;10= 2.2 0.66 4)' @, 2.2 0.66 .. ~4· "** 3.8 0.75 
. . • Ji 
... 4.4 0.61 . ~~ 1.9 0.5 .85· 3.8 0.75 
. 
. 
·:~· j.z. ·w •• 7.3 0.46 49· 4.4 0.61 8~: 1.4 0.63 . d'. ~ 
"' J3 2.2 0.66 $~ ef 0.8 0.69 . 'if( 1.9 0.5 . ® ,,. ~ . . 
;t~4.~ 0.8 0.69 :&r." . 2.2 0.66 ·s& 1.9 0.5 
. ~ipJ! ~ 3.8 0.75 -~~"2· ef • 
. . 
2.2 0.66 .., 89 ft 
... 
2.2 0.66 
.16 
. 
1.4 0.63 _., 53t· 3.8 0.75 
. 
:· 9(} .. · 0.8 0.69 
. • !:> ~ • • . 
.. 
t1( ."· 3.8 0.75 ,·. 5~! Ii' 1.4 0.63 • . .. 91··· ·.· 0.8 0.69 
. 
" 1~ 3.8 0.75 1'11'. • 5*~ 1.4 0.63 . }9.2 . 3.8 0.75 . .. . 
. i~~ 1.4 0.63 .~(5. ". 2.2 0.66 93. 'ii 0.8 0.69 
0 . 
~if 1.4 0.63 ~~~~ . 3.8 0.75 . ~.94 . 2.2 0.66 . .. . 
;2( 3.8 0.75 ' .. s.s. ' ,. 2.2 0.66 ,·:··gs; 3.8 0.75 
21. 4.4 0.61 :.. ~59~ 0.8 0.69 " '96 ' 2.2 0.66 '!' '"' ,,-f~ 
' 
·z5·, ,, 0.8 0.69 . :&~: . 3.8 0.75 I> 97.~' 7.3 0.46 
' 
iA 
·' 
~ 
2;t: ' 2.09 0.36 '~~ ·, : 0.8 0.69 .' 98',. ¥ 2.09 0.36 I~ 
'·ill· 2,5 ' 1.4 0.63 &&'/. ~. 0.8 0.69 9*'9 0.82 0.45 ' '° .. !!: 1,, 
.~ 
., 
3.8 0.75 
. 63·* ~ 2.2 0.66 :roq~ 6 0.12 
5~·~ ' I' 29.~· · ... 2.2 0.66 ' 1.9 0.5 ~·. 1.ij1:,,;' ' 1.4 0.63 
. ' 
' ~3:: 1.4 0.63 <'·!? G& J' 1.9 0.5 ~ .1a2~· 1.4 0.63 . In ,.a ">1: w, 
" 
•'-*•19 ' 0.82 0.45 I~ . ~6 ,, 3.8 0.75 r., '103''' ' 0.82 0.45 
"' "' "'~ ~ ... ' 
' ' ' 2.09 0.36 ~,. ·~6r· ·;;,. · 1.4 0.63 :1 <1d m 0.82 0.45 ~01 '· ' ' . ' ~ ,;l\ . ~,. 
,JQ) 
' 
3.8 0.75 i~ ·:/y63 w ~ 
"' ' w(ff<< • 
3.8 0.75 ~ iQ5 ~ 2.09 0.36 32(." 3.8 0.75 k' ~9. ~~:;· 1.4 0.63 "'1bG~. : 0.8 0.69 
>;fr if>"' !~ rw 
' 
.,,:33, .... 1.4 0.63 70 t•, 
'' '"'' 
3.8 0.75 . 16i~ 0.8 0.69 
'41"" 34 ' 0.82 0.45 ·~~h.*',~'· 1.4 0.63 1'08· '~ 0.82 0.45 ~ 
" 
I~ 
' 
.::i. 'J~~ ~ 1.4 0.63 1~· '''72( 1~' 20.5 0.3 '10~ 2.14 0.8 
"izy& ' 0.8 0.69 . ?9, 0.8 0.69 110 ' 2.2 0.66 . 
' 
' ' .,... 
' 
., 
208 
n200 contd. n200 contd. n200 contd. 
Module k p Module k p Module k p 
4.4 0.82 0.45 " ' 1]3,, 0.8 0.69 
7.3 1.4 6 0.12 
2.2 1.4 2.09 0.36 
0.8 0.82 0.82 0.45 
6 2.2 23.3 0.3 
1.4 2.2 2.09 0.36 
3.8 1.9 0.5 '' 179 ; 6 0.12 
3.8 4.4 0.61 ,J8Q 0.8 0.69 
1.4 0.8 23.3 0.3 
1.4 2.2 0.82 0.45 
3.8 2.2 o.66 , .rrs3 · ' 
w 4f' II; 
1.4 0.63 
4.4 0.61 )!53 ; 3.8 3.8 0.75 
0.8 1.4 0.63 ,"'" 185 '' 3.8 0.75 
2.09 1.4 1.4 0.63 
1.4 2.2 o.66 1ar - 1.9 0.5 
3.8 3.8 1.9 0.5 
#!, 121 < 2.2 2.2 o.66 a~9 : 2.2 0.66 
1.4 0.8 0.69 ~@ < 19QJ ' 0.8 0.69 
0.82 3.8 0.75 ' i 9:1 "· 0.8 0.69 
2.09 0.8 3.8 0.75 
3.8 0.8 0.8 0.69 
3.8 2.2 2.2 0.66 
1.4 L9 3.8 0.75 
0.82 1.9 2.2 0.66 
1.4 3.8 7.3 0.46 
0.8 1.4 2.09 0.36 
4.4 3.8 0.75 : , 19~H" 0.82 0.45 
6 0.12 '1~~ u 1.4 
0.8 3.8 0.75 ,:@@ -----~ ',,, 
1.4 1.4 
0.82 20.5 
209 
TABLE C.5: RENT PARAMETERS FOR EACH MODULE OF N300 BENCHMARK 
n300 n300 n300 
Module k p Module k p Module k 
6 6 0.12 7,ti,.; 0.82 0.45 
1.4 0.8 0.69 :: ' 1), ~ 23.3 0.3 
1.4 1.4 0.63 78 ' 2.09 0.36 
0.82 0.45 ,'' 41 "": 0.82 6 0.12 
0.82 0.82 0.8 0.69 
2.09 0.36 43 1.4 0.63 r ~81., 23.3 0.3 
0.8 1.4 0.82 0.45 
0.8 o.69 : ~. ~S· : 0.82 0 45 ' 83 ' 
. ' 
1.4 0.63 
0.82 2.2 3.8 0.75 
2.14 2.2 3.8 0.75 
. fo 2.2 0.66 ~ 411 1.9 o.5 : ·st 1.4 0.63 
4.4 4.4 0.61 ~ :87 ~ 1.9 0.5 
7.3 0.8 0.69 ~8' 1.9 0.5 
2.2 2.2 0.66 ~ ' 82 . 2.2 0.66 
0.8 2.2 0.8 0.69 
6 3.8 0.75 Qf.' ' 0.8 0.69 
1.4 1.4 3.8 0.75 
3.8 1.4 0.8 0.69 
3.8 2.2 0.66 94' ' 2.2 0.66 
1.4 0.63 I~: Si '~ 3.8 3.8 0.75 
1.4 2.2 2.2 0.66 
3.8 0.8 7.3 0.46 
4.4 3.8 0.75 98: 2.09 0.36 
0.8 0.69 h,f, : ~ 0.8 0.69 ,·99' IV 0.82 0.45 
2.09 0.36 ' ~62 '· ... ; 0.8 0.69 I• • 1 <JO~ 2.2 0.66 
1.4 0.63 I• Q~' , , 2.2 o.66 1~ soi~ ·%% 1.4 0.63 
3.8 1.9 0.5 ' )~f 0.8 0.69 
2.2 1.9 2.2 0.66 
1.4 3.8 0.75 .,. 1-0j, '2 2.2 0.66 
0.82 1.4 0.63 ~ }OS · 6 0.12 
2.09 0.36 6:8 ' ' 3.8 1.4 0.63 
3.8 0.75 .~ .... ~;? .: ' 1.4 0.63 }O~~ ': 1.4 0.63 
3.8 0.75 : ;;0;: 3.8 3.8 0.75 
1.4 0.63 ,,, • ~71 ; 1.4 2.2 0.66 
0.82 20.5 2.2 0.66 
1.4 
,, 
0.63 7~ 0.8 o.69 ~tn 11 
4 ' ' ' 
0.8 0.69 
0.8 0.69 )lzf 6 4.4 0.61 
4.4 0.61 ~ 7i ~ 2.09 0 36 ' 1t3 : 
• ,.Y# 1.9 0.5 
21'0 
n300 contd. n300 contd. n300 contd. 
Module k p Module k p Module k p 
2.2 
2.2 o.66 : ··-r 5rf .. ~ o.8 0.69 ' '1:91 " 2.2 0.66 
0.82 o.69 ;.>f9:a · 1.9 o.s 
1.4 0.45 ' '19~~ 4.4 0.61 
6 0.12 ' f~~,. ": 2.14 o.8 zpo .· · o.8 o.69 
1.4 o.66 ·· · zot · 2.2 o.66 
1.4 o.63 "' ~ f 6L~ ~ 4.4 
0.82 0.46 •@ ~Q~, 3.8 0.75 
0.82 0.66 . w ~04 1.4 0.63 
2.09 0.36 ' :1~4 ., : 0.8 o.69 · ·= 405: 1.4 o.63 
0.8 0.12 20'6, ,' 2.2 0.66 
0.8 
2.14 0.75 208· . 2.2 0.66 
2.2 0.75 '2\)9 l 0.8 0.69 
4.4 0.63 ?~©·. ' 3.8 0.75 
7.3 0.63 l,r 21,1 ; 0.8 0.69 
2.2 0.66 1~ ·~~i~ 1 3.8 0.75 . 2:11 2 0.8 0.69 
0.8 
6 0.12 I@, J~p; ' 0.8 0.69 ff w,414 1.9 0.5 
1.4 0.36 . 21~· 1.9 0.5 
wdi "' 
3.8 0.63 .21Q 3.8 0.75 
3.8 0. 7 5 , ) 6 ~ ~ 3.8 0.75 e '217¥' ' 1.4 0.63 
1.4 0.63 * 1"~:. ~: 2.2 0.66 "218 ' 3.8 0.75 
1.4 0.63 ~ ·,~JJ~ 1.4 0.63 Zf.9' , 1.4 0.63 
3.8 0.75 )i9~ 0.82 o.45 ~ 2io: , 3.8 0.15 
4.4 
0.8 0.75 ~2i > 20.5 0.3 
2.09 
1.4 0.63 ' ~2~4 ·, 6 0.12 
3.8 0.45 2is· · 2.09 o.36 
2.2 0.66 ~ ... 1~~1$ . 1.4 0.63 '2j6' ,. 0.82 0.45 
1.4 0.69 . '212:7¥' ' 23.3 0.3 
0.82 0.61 . · 27s·~. 2.09 o.36 
2.09 
3.8 o.75 #!' tM ~ o.8 
"' toi w ~ ~ o.69 ~ao .,, o.8 o.69 
3.8 
6 . 0.45 ·2~2 : 0.82 0.45 
1.4 0.63 . \ ~2 ~· 0.82 
1.4 0.63 J:~~ ': 1.4 
0.82 0.63 235 3.8 0.75 
0.82 0.45 . ' .1:~~ 0.82 . 0.45 . 12~6 1.4 0.63 
211 
n300 contd. n300 contd. n300 contd. 
Module k p Module k p Module k p 
, v,2~7. 1.9 0.5 - ·258 ,zr 3.8 0.75 2,79 ' 7.3 0.46 ,, 
' ' •»" 3' ' 
' 23~ %£~ 1.9 0.5 ~·~s~'~' , 2.2 0.66 2~(l·' 2.2 0.66 ,, 
2J2 
,, 
2.2 0.66 
,,, ZQ,~;, 2.2 0.66 ' . 8 '. ,. 0.8 0.69 4 i, a~ . 
· ·~Z40*':: 0.8 0.69 ,,, ~~t .. ~, 0.8 0.69 Wo/ ~~~II' ~ 6 0.12 I• 
' 24f,' 0.8 0.69 ,,,,~.,~6~ II' 4.4 0.61 ,, 28i,, ', 1.4 0.63 '' 
w 
1,,, 
242: 3.8 0.75 ~ · .. ,~'2Qj'Y ' 1.9 . 0.5 18~' Jffi<iJI , .. 3.8 0.75 
1
' ~ ~ 2.\3 ' 'j 0.8 0.69 ' 11'~2©4:' 
I> ' ~ 2.2 0.66 '285 '' 3.8 0.75 
,:ff' ·244 ~ 2.2 0.66 • 2651 2.2 0.66 2S<?' 
' 
1.4 0.63 
;,;; ,;:¥ "' 1%" " ,,~, 
Pt'''' 2'4~ 3.8 0.75 '~6tr · 0.82 0.45 .~87 %' 1.4 0.63 
' 
ll> 
' 
,,, 
,~,~~6·: "' 2.2 0.66 I"• .26'1"' ' 1.4 0.63 2,,g3: '%4 3.8 0.75 ,,,,, ~ " ' ,,, 
'' '2~7: ' 7.3 0.46 :2()~f"'' 6 0.12 :2~9 ' 4.4 0.61 ~ "" I~ ' ' 
4 ? '~48: ,' 2.09 0.36 ~ %* $26§: w l1<1 v. '4 ~. "' » 1.4 0.63 · i9o 0.8 0.69 
~ 
'i42: 0.82 0.45 h , '2VO', 1.4 0.63 291 , .. 2.09 0.36 
' 
I~ ' ', I 
' ~SJ:),'! 2.2 0.66 ,} 2-ex · 
r. "' ?fi '!<l><e 
0.82 0.45 1111 ,~92 'I 1.4 0.63 
' 251 ~4" 1.4 0.63 • - ~72 ,, 0.82 0.45 '2~ ' 3.8 0.75 II' ' ;3 ,, 
' "' ,, " 0.8 0.69 1·' l13 k 2.09 0.36 «~ 11'$'29:tl- 2.2 0.66 ~~·2~~ ' 
' '''~,~~3, '; 2.2 0.66 '' ~'Z~': 0.8 0.69 I~' 495: ""' 
' ' . 
1.4 0.63 
'• ': 254·~ A 2.2 0.66 ' '2:.i~~· ·~ 0.8 0.69 " 2~6' 0.82 0.45 14 ' # , )'I 
'' 
**:2$5 41 6 0.12 I~ 1:7(Jw~ 2.14 0.8 '2,§;f··, ' 2.09 0.36 ,• ',,,,' • , ? ,, " ,;p * g 
' " {'? t 
2'§'6- ~" 1.4 0.63 1:. :~~1~w;, M 2.2 0.66 ,,· 29.s· 3.8 0.75 
2511 '~ 1.4 0.63 ~· 2:ffi; , 4.4 0.61 ,, · 29ir 3.8 0.75 <:j ~ ,.;f*,.a "" wz"" ,, 1% t- . 
212 
APPENDIX D: SENSITIVITY ANALYSIS OF THE COST FUNCTION OF 
FLOORPLANNING ALGORITHMS 
D.1 SENSITIVY ANALYSIS OF THE COST FUNCTION OF 3-D FVC 
ALGORITHM WITHOUT MODULE SPLITIING 
In this section, we perform the sensitivity analysis of the cost function when module 
splitting is disabled In this case, the cost is a function of dead space, inter-module 
wirelength, and inter-module via count as shown by eqn. ( 4.2). The tuning parameters ex, ~' 
and y1 are independent of each other and we would like to find how sensitive the cost 
function is with respect to these parameters. 
We use the sampling based method for the sensitivity analysis of the cost function. 
Assuming a Gaussian distribution, we randomly generate 300 samples for each of the three 
tuning parameters. The a andµ values for each tuning parameter are denoted by using the 
1 
parameter's name as a sub-script for a andµ, and their values are chosen as follows a) arx = 
2.0, µrx = 0.32, b) a~ = 10.0, µ~ = 3.2, and c) ay1 = 3000, µY1 = 990. Please note that for the 
sampling based sensitivity analysis approach [108],[109], the ranges of the parameters are 
assumed to be known and in our experiment, we have chosen their ranges based on our 
exp·erience with floorplan experiments performed on various benchmarks. Using ·these 
sampled values of tuning parameters, we run 3D-FVC without module splitting on a 
floorplan benchmark (n100) that contains 100 modules. Next we plotted scatter plots of 
cost vs. each individual tuning parameter, and thus obtained three scatter plots 
corresponding to ex, ~' and y1• Finally we used linear regression to obtain a best fit linear 
213 
3.0E+07 
2.5E+07 
2.0E+07 
_ 1.5E+07 
VJ 
0 
o 1.0E+07 
5.0E+06 
O.OE+OO 
0 
3.0E+07 
2.5E+07 
2.0E+07 
ti 1.5E+07 0 
(.) 
1.0E+07 
5.0E+06 
O.OE+OO 
4 
0 
y = 2E+06x + 1 E+07 
2 3 4 
Alpha 
(a) 
4 
y = 531 .4x + 1E+07 
2000 4000 
Gamma_1 
(c) 
5 
6000 
3.0E+07 
2.5E+07 
2.0E+07 
ti 1.5E+07 
0 
o 1.0E+07 
5.0E+06 
O.OE+OO 
0 
y = 735489x + 9E+06 
5 10 15 20 
Beta 
(b) 
Figure D.1: Scatter plots for sensitivity analysis of the tuning parameters of the 
cost function for 3-FVC without module splitting. 
relation for cost vs. an individual tuning parameter. The scatter plots are shown in Figure 
D .1. Please note that the linear relation represented by a solid line, and the equation 
representing the line are on each of the three scatter plots. 
The equations of the straight lines are given as follows: 
214 
(D.1) 
Cost= 7.35xl05 fJ + 9xl06 (D.2) 
Cost= 531.4 y1 + 107 (D.3) 
Performing the partial differentiation of cost with respect to each individual tuning 
parameter gives: 
8(Cost) = 2 x 106 
aa 
8(Cost) = ?.35 x 105 
ap 
8( Cost) = 531 .4 
ar1 
(D.4) 
(D.5) 
(D.6) 
From the slope obtained after the partial differentiation, it can be observed that cost is 
most sensitive to ex, then ~- Finally, cost is the least sensitive to y1• 
Since, the 3-D floorplanner minimizes the cost, it would be desirable to keep the values of 
ex and ~low. From Figure D.1 it is obvious that cost can be kept minimal if we keep ex 
within the range of 0.5 to 2.0. Similarly, ~ should be kept in the range of 1 to 10 for 
minimal cost. Since th~ cost is least sensitive to y1, its value can be chosen within 1000 to 
5000. 
215 
D.2 SENSITIVY ANALYSIS. OF THE COST FUNCTION OF 3-D PVC 
ALGORITHM WITH MODULE SPLITTING 
Once the module splitting in 3-D PVC is enabled, the cost function changes to eqn (4.3) 
and the fourth tuning parameter y2 is introduced. The distribution of each tuning parameter 
using Gaussian distribution is described as a) a"- = 2.0, µ"- = 0.32, b) a~ = 10.0, µ~ = 3.2, c) 
cry1 = 3000, µY1 = 990, and d) crY2 = 7000, µY2 =2200. Similar to the previous sub-section, we 
sample 300 points and plot scatter plots for cost vs. each individual tuning parameter. The 
scatter plots are shown in Figure D.2 and the corresponding linear fit is given as follows: 
Cost= 7xl06 a - 4xl07 (D.7) 
Cost= -5xl06 p + 3x107 (D.8) 
Cost= 2xl03 y1 - 3xl0
7 (D.9) 
Cost= l.4xl03 y2 - 3xl07 (D.10) 
Performing the partial differentiation of cost with respect to each individual tuning 
paramet.er gives: 
8(Cost) . 
= 7xl06 
aa 
B(Cost) 
ap = -5xl0
6 
216 
(D.11) 
(D.11) 
2.0E+07 
O.OE+OO 
-2 .0E+07 
-4 .0E+07 
U)-6.0E+07 
8 -8.0E+07 
-1.0E+OS 
-1.2E+08 
-1.4E+08 
2.0E+07 
O.OE+OO 
-2 .0E+07 
-4.0E+07 
U> -6 .0E+07 
0 
t> -8.0E+07 
-1.0E+OS 
-1 .2E+08 
0 
y = 7E+06x - 4E+07 
(a) 
2 
Alpha 
3 
~ 
.. ,•I• 
"' 
4 
"' "' y = ~036.Sx - 3E+07 
"' "' 
-1.4E+08 -'----------------' 
0 2000 4000 6000 
Gamma_1 
(c) 
4 .0E+07 
2.0E+07 
O.OE+OO 
-2.0E+07 
,...-4.0E+07 
~-6 .0E+07 
0
-8.0E+07 
-1.0E+OS 
-1.2E+08 y = -5E+06x + 3E+07 
-1.4E+08 
0 
2.0E+07 
O.OE+OO 
-2.0E+07 Ji.. 
-4.0E+07 
u; -6.0E+07 
8 -8.0E+07 
-1.0E+OS 
-1 .2E+08 
5 
"' 
(b) 
10 
Beta 
"' "' 
15 
y = 1425.Sx - 3E+07 
"' "' 
20 
-1 .4E+08 -'-------------' 
0 5000 10000 15000 
Gamma_2 
(d) 
Figure D.2: Scatter plots for sensitivity analysis of the tuning parameters of the 
cost function for 3-FVC with module splitting. 
8(Cost) 
ar1 
8(Cost) 
ar2 
2x103 (D.12) 
1.4 x 103 (D.13) 
From the partial differentiation, we can observe that cost is most sensitive to a and ~ -
Furthermore, the cost increases with increasing value of a whereas the cost decreases with 
217 
- With Module Splitting - - Without Module Splitting 
6.0E+07 
5.0E+07 
4.0E+07 
3.0E+07 
2.0E+07 
---
---------
.... 
.... 
"' 
l.OE+07 
0 
u O.OE+OO 
-l.OE+07 
-2.0E+07 
-3.0E+07 
-4.0E+07 
-5.0E+07 
1 1001 2001 3001 4001 5001 
Number of generations 
Figure D.3: Fitness Comparison of 3-D FVC with, and without module 
splitting. 
increasing value of ~ - From the scatter plot it can be observed that the value of IX is most 
suitable in the range of 0.5 to 1.5 for minimum cost. Similarly, the values of ~ can be 
chosen between 10 to 20. Furthermore, the cost is not very sensitive to y1 and y2 compared 
to IX and ~ which can be observed from the near horizontal lines of the best fit linear 
regression in the scatter plots. Thus y1 can be chosen in the range of 1000 to 5000, and y2 
can be chosen in the range of 1000 to 12000. 
Figure D.3 shows the cost vs. number of generation plot for 3-D FVC JJJith module splitting 
and 3-D FVC lJJithout module splitting. It can be observed that in both cases, the fitness 
saturates very fast within 1000 generations. 
218 
D.3 SENSITIVY ANALYSIS OF THE COST FUNCTION OF 3-D FMA 
Similar to the previous section, we will perform the sensitivity analysis of the cost function 
of 3-D FMA. Please note that when the placement constraints are disabled, the cost 
function is the same as section D.1. Therefore in this section we will perform the sensitivity 
analysis when the placement constraints are activated 'in 3-D FMA and the cost is defined 
by_eqn (5.4). 
Similar to the previous two sections, we randomly generate 300 samples for each of the 
four tuning parameters. The a and µ values for each tuning parameter are denoted by using 
the parameter's name as a sub-script for a andµ, and their values are chosen as follows a) arx 
= 2.0, µrx = 0.32, b) a~ = 10.Q, µ~ = 3.2, c) aY = 3000, ~ = 990, and d) c) ax = 3.0, µx = 0.65. 
The scatter plots are shown in Figure D.4. From the linear regression of the scatter plot, we 
obtain the following relations: 
Cost= 5xl06 a + 2xl07 (D.14) 
Cost= 8.lxl05 fJ + 2x107 (D.15) 
Cost= 4.7x102· r + 3xl07 · (D.16) 
Cost= 4xl06 x + 2x107 (D.17) 
Performing the differentiation of eqn. (D.14) to (D.17) gives. the slope or the sensitivity of 
the cost function: 
219 
8(Cost) 
aa 
B(Cost) 
ap 
8(Cost) 
ar 
8(Cost) 
ax 
= 5xl06 (D.18) 
= 8.lxl05 (D.19) 
= 4.74xl02 (D.20) 
= 4xl06 (D.21) 
From the above slopes, it can be observed that cost is most sensitive to a. and X and least 
sensitive to y. From the scatter plots of Figure D.4, a. can be chosen between 0.5 - 2.0, ~ 
between 1 to 10, and X can be chosen between 1.0 to 2.0 to achieve the optimal cost. Since 
the cost is least sensitive toy (as it can be seen from the horizontal line in Figure D.4(c)), it 
can be chosen between 1000- 5000. 
220 
5.0E+07 5.0E+07 
• • 4.0E+07 4.0E+07 
.... 
3 .0E+07 
.... 
3 .0E+07 
"' "' 0 0 u 2.0E+07 u 2.0E+07 
1.0E+07 y = SE+06x + 2E+07 1.0E+07 y = 810838x + 2E+07 
O.OE+OO O.OE+OO 
0 1 2 3 4 5 0 5 10 15 20 
Alpha Beta 
(a) (b) 
5.0E+07 5.0E+07 
4.0E+07 4 .0E+07 
.... 
3.0E+07 
.... 3 .0E+07 
"' • "' 0 0 u 2.0E+07 u 2.0E+07 :6 
• • 
1.0E+07 y = 474.42x + 3E+07 1.0E+07 y = 4E+06x + 2E+07 
O.OE+OO O.OE+OO 
0 2000 4000 6000 0.0 1.0 2.0 3.0 4.0 5.0 
Gamma Ki 
(c) (d) 
Figure D.4: Scatter plots for the sensitivity analysis of tuning parameters of the 
cost function for 3-D FMA algorithm. 
221 
