Optimal signal, power, clock and thermal interconnect networks for high-performance 2d and 3d integrated circuits by Sekar, Deepak Chandra
  
 
Optimal 
Signal, Power, Clock and Thermal Interconnect Networks 
for High-Performance 2D and 3D Integrated Circuits 
 
A Doctoral Dissertation  
submitted to the Academic Faculty 
 
by 
 
Deepak Chandra Sekar 
 
in Partial Fulfillment of the Requirements for the Degree of 
Doctor of Philosophy in Electrical and Computer Engineering 
 
 
 
 
 
 
School of Electrical and Computer Engineering 
Georgia Institute of Technology 
December 2008
 Optimal Signal, Power, Clock and Thermal Interconnect Networks 
for High-Performance 2D and 3D Integrated Circuits 
 
 
 
 
 
 
 
 
 
 
Approved by:   
   
Dr. James Meindl, Advisor 
School of 
Electrical and Computer Engineering 
Georgia Institute of Technology 
 Dr. Jeffrey Davis, Co-Advisor 
School of 
Electrical and Computer Engineering 
Georgia Institute of Technology 
   
Dr. Thomas Gaylord 
School of 
Electrical and Computer Engineering 
Georgia Institute of Technology 
 Dr. Saibal Mukhopadhyay 
School of 
Electrical and Computer Engineering 
Georgia Institute of Technology 
   
Dr. W. Russell Callen 
School of 
Electrical and Computer Engineering 
Georgia Institute of Technology 
 Dr. Paul Kohl 
School of 
Chemical and Biomedical Engineering 
Georgia Institute of Technology 
   
  Date Approved: 07/31/2008 
ACKNOWLEDGEMENTS 
 
I would like to sincerely thank Prof. James Meindl for the guidance he has provided 
me during my graduate career. His ability to identify problems that create an impact, 
people skills, broad knowledge base and enthusiasm for learning new things are a few 
things I have greatly admired. It has been a tremendous experience to learn about 
integrated circuits from him. 
I am also grateful to Prof. Jeff Davis for teaching me the fundamentals of interconnect 
modeling during the first few years of my PhD program. The work in this dissertation 
benefited significantly from interactions with him. I wish to express my gratitude to Prof. 
Thomas Gaylord, Prof. Paul Kohl, Prof. Russ Callen, Prof. Saibal Mukhopadhyay and 
Prof. Linda Milor for being on my dissertation committee and for improving the quality of 
this research with their constructive comments. I would like to thank several Georgia Tech 
researchers for productive discussions and for training and inputs in the cleanroom. These 
include Bing Dang, Ragu Venkatesan, Keith Bowman, Azad Naeemi, Muhannad Bakir, 
Calvin King, Reza Sarvari, Gang Huang, Kaveh Shakeri, Todd Spencer, Hiren Thacker, 
Paul Joseph, Farhana Zaman and Ajay Joshi. I am also indebted to Jennifer Root for the 
help she has given me.  
Most importantly, I would like to thank my wife and parents for the support they 
provided me over the duration of this work.  
 iii
TABLE OF CONTENTS 
 
Acknowledgements ................................................................................................... iii 
List of Tables ............................................................................................................. ix 
List of Figures ........................................................................................................... xi 
List of Figures ........................................................................................................... xi 
Summary ................................................................................................................. xvi 
 
Chapter 1 
Introduction................................................................................................................ 1 
1.1.   Origin and History of the Problem ......................................................................... 1 
1.1.1.   The First Interconnect Problem.........................................................................................1 
1.1.2.   The Second Interconnect Problem.....................................................................................3 
1.1.3.   The Third Interconnect Problem .......................................................................................5 
1.2.   Dissertation Outline.................................................................................................. 8 
 
Chapter 2 
Thermal Interconnect Networks .............................................................................. 9 
2.1. A Model to Relate Thermal and Electrical Performance of an IC....................... 10 
2.1.1. The Nose-Sakurai Model...................................................................................................10 
 iv
2.1.2. Summary of Proposed Model............................................................................................11 
2.1.3. Application of Model to a Pentium Microprocessor .........................................................12 
2.2. Heat Removal for 3D Integrated Circuits .............................................................. 14 
2.2.1. Concept .............................................................................................................................16 
2.2.2. Fabrication Process ...........................................................................................................16 
2.2.3. Theoretical Analysis..........................................................................................................20 
2.2.4. Characterization ................................................................................................................25 
2.2.5. Benefits .............................................................................................................................26 
2.3. Summary ................................................................................................................... 28 
 
Chapter 3 
Signal Interconnect Networks ................................................................................ 30 
3.1. Signal Interconnect Challenges in Scaled CMOS Technologies........................... 33 
3.2. A Technological Solution: Carbon Nanotube Interconnects ................................ 35 
3.2.1. Circuit Models...................................................................................................................36 
3.2.2. Manufacturing Challenges ................................................................................................39 
3.2.3. Power Benefits ..................................................................................................................40 
3.2.4. Performance Benefits ........................................................................................................45 
3.2.5. Conclusion.........................................................................................................................47 
3.3. A Circuit Solution: Improved Repeater Insertion Techniques ............................ 48 
3.4. An Architectural Solution: Parallel Processing Architectures ............................. 57 
3.4.1. Intra-Core Interconnect Networks.....................................................................................58 
3.4.2. Inter-Core Interconnect Networks.....................................................................................62 
3.4.3. Intra-Core and Inter-Core Communication Networks.......................................................65 
 v
3.5. Summary ................................................................................................................... 66 
 
Chapter 4 
Power Interconnect Networks ................................................................................ 68 
4.1. Electromigration Resistant Power Delivery Systems ............................................ 69 
4.1.1. Proposed Technique ..........................................................................................................70 
4.1.2. Measurements....................................................................................................................73 
4.1.3. Conclusions .......................................................................................................................76 
4.2. MIM Power-Ground Plane Decoupling Capacitors .............................................. 76 
4.2.1. Proposed Technique ..........................................................................................................77 
4.2.2. Benefits for Power Delivery..............................................................................................78 
4.2.3. Benefits for Clock Distribution .........................................................................................80 
4.2.4. Benefits for Signal Interconnects ......................................................................................81 
4.2.5. Conclusions .......................................................................................................................83 
4.3. Power Delivery for 3D Integrated Circuits ............................................................ 83 
4.4. Summary ................................................................................................................... 84 
 
Chapter 5 
Co-Design of Signal, Power, Clock and Thermal Interconnect Networks ......... 85 
5.1. Models........................................................................................................................ 86 
5.1.1. New Stochastic Signal Wire Length Distribution Model ..................................................86 
5.1.2. Logic Gate Model..............................................................................................................94 
5.1.3. Global Interconnect Model................................................................................................95 
5.1.4. Local Interconnect Model ...............................................................................................100 
 vi
5.1.5. Intermediate and Semi-Global Interconnect Model ........................................................102 
5.2. Implementation of CAD Tool ................................................................................ 104 
5.2.1. Algorithm ........................................................................................................................104 
5.2.2. Graphical User Interface .................................................................................................106 
5.3. Verification and Case Studies................................................................................ 109 
5.3.1. Verification of CAD Tool ...............................................................................................109 
5.3.2. Case Study of a 22nm Air-Cooled 2D Integrated Circuit................................................111 
5.3.3. Case Study of a 22nm Microchannel-cooled 3D Integrated Circuit................................114 
5.3.4. Applications ....................................................................................................................117 
5.4. Summary ................................................................................................................. 117 
 
Chapter 6 
Conclusions and Future Work ............................................................................. 118 
6.1. Contributions of this Research .............................................................................. 118 
6.2. Avenues for Future Research ................................................................................ 119 
6.2.1. Wafer-to-Wafer Bonding Approach for Microchannel  Cooled 3D-ICs .........................119 
6.2.2. 3D Stacked SRAM Cache Memory ................................................................................120 
6.2.3. Stacking of 3D Phase Change Memory Arrays with Microprocessors ...........................121 
6.2.4. Physical Limits of Copper Interconnects ........................................................................122 
 
Appendix A 
Solution of Equations to Co-Design Thermal and Electrical Functionality..... 123 
 
 vii
Appendix B 
A Model to Minimize Energy-Delay Product of a Repeated Wire.................... 126 
 
References............................................................................................................... 133 
List of Publications ................................................................................................ 142 
Vita .......................................................................................................................... 144 
 
 viii
LIST OF TABLES 
 
Table 1.1: Comparison of transistor delay with that of a benchmark 1mm interconnect. .. 4 
Table 1.2: Microprocessor scaling trends [1.9]. ................................................................. 5 
Table 2.1: Description of symbols in model. .................................................................... 12 
Table 2.2: Comparison of model predictions with actual data from the 130nm Pentium 4 
microprocessor.................................................................................................................. 13 
Table 2.3: Silicon thickness optimization for 100 um wide microchannels (from Fig. 2.10).
........................................................................................................................................... 24 
Table 2.4: Micropump technology demonstrated in the literature (from [2.18]).............. 25 
Table 2.5:  Benefits of microchannel-cooled 3D integrated circuits for a processor with 40 
million logic gates constructed in a 65nm technology...................................................... 28 
Table 3.1: Interconnect pitches of 65nm logic technologies. ........................................... 30 
Table 3.2: Wire pitch prediction using MINDS................................................................ 42 
Table 3.3: Power reduction with CNT interconnects for a 22nm technology. ................. 43 
Table 3.4: Wire pitch prediction using MINDS................................................................ 44 
Table 3.5: Power reduction with CNT interconnects for a 22nm technology. ................. 45 
Table 3.6: Wire pitch prediction using MINDS................................................................ 47 
Table 3.7: Comparison of model with SPICE simulations for a 100nm technology. 
Transistor models [3.19] have Ro=14.1kohm, Co=0.7fF, Ileak=52.5nA. Interconnect 
dielectric constant is 3. Repeater size is defined at the beginning of Section 3.3. ........... 50 
 ix
Table 3.8: Parameters of a 22nm technology [3.5]. .......................................................... 51 
Table 3.9: Power consumption for different repeater insertion models for a 22nm 10mm 
wire. .................................................................................................................................. 56 
Table 3.10: Benefits of energy-delay product based repeater insertion for a 22nm logic 
block.................................................................................................................................. 57 
Table 3.11: Optimized supply and threshold voltage. ...................................................... 59 
Table 3.12: Components of power for each logic core..................................................... 59 
Table 3.13: Pitches of metal levels for the considered logic cores................................... 61 
Table 3.14: Impact of size effects in parallel processing architectures. ........................... 62 
Table 3.15: Inter-core communication overheads. ........................................................... 64 
Table 3.16: Summary of results. ....................................................................................... 65 
Table 4.1: Electrical characteristics of clock wire. ........................................................... 81 
Table 5.1: Validation of model with actual data for average wire length......................... 92 
Table 5.2: Comparison of results from IntSim with actual data. .................................... 110 
Table 5.3: Design space exploration with IntSim........................................................... 112 
Table 5.4: Wiring predictions from IntSim..................................................................... 113 
Table 5.5: Design space exploration with IntSim........................................................... 115 
Table 5.6: Wiring predictions from IntSim..................................................................... 116 
Table A.1: Description of symbols used in model.......................................................... 123 
 x
LIST OF FIGURES 
 
Figure 1.1: (a) Kilby’s first IC: an oscillator. (b) Noyce’s first IC: a flip-flop................... 2 
Figure 1.2: Guidelines for scaling (a) transistors, and (b) interconnects. ........................... 3 
Figure 1.3: A Venn diagram showing four types of interconnect networks in today’s ICs.
............................................................................................................................................. 6 
Figure 1.4: Commercially available 3D-DRAMs (Courtesy: Samsung). ........................... 7 
Figure 2.1: Power density of commercial microprocessors [2.1]. ...................................... 9 
Figure 2.2: (a) Original concept of a microchannel heat sink from Tuckerman and Pease 
[2.13]. (b) An assembly technology for microchannel heat sinks proposed by Dang, Bakir, 
Joseph, Kohl and Meindl [2.14]........................................................................................ 14 
Figure 2.3: Manual attachment of tubes to the backside of each chip on a printed wiring 
board. ................................................................................................................................ 15 
Figure 2.4: A microchannel-cooled 3D integrated circuit. ............................................... 16 
Figure 2.5: Chip-level fabrication process for microchannel-cooled 3D integrated circuits. 
The extra lithography steps for fluidic network fabrication are indicated........................ 17 
Figure 2.6: Cross-sectional microscope image of a sample after chip-level fabrication. 
Microchannels are about 200um tall and 150um wide while copper vias are about 50um in 
diameter. Silicon thickness is 400um. Electrical through-silicon via density is 2500/cm2.
........................................................................................................................................... 18 
Figure 2.7: Assembly process for microchannel-cooled 3D integrated circuits............... 19 
 xi
Figure 2.8: (a) SEM image of a chip with solder bumps and polymer pipes. (b) 
Cross-sectional SEM image of an assembled two chip 3D stack with fluidic networks.. 20 
Figure 2.9: Equations to describe operation of a microchannel-cooled 3D-IC with square 
dice. Quantities whose units are not indicated in the above table are dimensionless....... 21 
Figure 2.10: Thermal resistance vs. silicon thickness....................................................... 23 
Figure 2.11: Measurement of thermal resistance. Fluid inlet and outlet temperatures also 
indicated (Courtesy: Bing Dang [2.14])............................................................................ 26 
Figure 2.12: Organization of a sample 2D server [2.19] in a 3D configuration. Besides 
higher logic density, 3D stacking also enables higher memory density. .......................... 27 
Figure 3.1: Equations in MINDS. ..................................................................................... 31 
Figure 3.2: Projections from MINDS for a 22nm logic core............................................ 33 
Figure 3.3: Types of carbon nanotube interconnects. ....................................................... 35 
Figure 3.4: Comparison of wire resistivity of a multi-walled CNT interconnect and a 
copper interconnect at the 22nm technology node. .......................................................... 37 
Figure 3.5: Comparison of interconnect architecture of (a) carbon nanotube interconnects, 
and (b) copper. t is the wire thickness, w is the wire width and s is the wire spacing...... 38 
Figure 3.6: Logic core area optimization using MINDS. ................................................. 41 
Figure 3.7: Performance optimization with multi-walled CNT interconnects. ................ 46 
Figure 3.8: Power consumption of logic cores. ................................................................ 47 
Figure 3.9: An energy-delay product minimization model for repeater insertion. ........... 49 
Figure 3.10: Three types of transistors in GSI chips. ....................................................... 50 
 xii
Figure 3.11: Energy-delay product for a repeated wire as a function of threshold voltage.
........................................................................................................................................... 51 
Figure 3.12: Delay of a 22nm wide 10mm long repeated wire when repeater insertion is 
carried out to minimize the energy-delay product. ........................................................... 53 
Figure 3.13: Fan-out of 4 inverter delay for a 22nm technology...................................... 54 
Figure 3.14: Percentage increase in gate sizes required to have the same FO4 delay at the 
specified threshold voltage as the delay at threshold voltage = 0.18V............................. 55 
Figure 3.15: Single core, dual core and quad core chips used for case study. A 22nm device 
technology is considered................................................................................................... 58 
Figure 3.16: Number of interconnect levels required for intra-core wiring. .................... 61 
Figure 3.17: Inter-core communication network of dual core and quad core chips. ........ 63 
Figure 4.1: (a) Current delivery path for a microprocessor. (b) On-chip power distribution 
network. Sleep transistor symbols are drawn as shown for convenience. ........................ 69 
Figure 4.2: Temperature increases in the packaging of an Intel microprocessor [4.4]..... 70 
Figure 4.3: Unidirectional current flow in a standard power delivery system.................. 71 
Figure 4.4: Schematic of the newly proposed power delivery system when the 
microprocessor is (a) powered up the first time, and (b) powered up the second time. 
Current direction is indicated with arrows........................................................................ 72 
Figure 4.5: (a) Schematic of test structure. (b) Chip fabricated for testing. ..................... 74 
Figure 4.6: (a) Comparison of measurements with Eq. (4.1). (b) SEM image of solder 
bump (i) before failure (ii) after failure. ........................................................................... 75 
 xiii
Figure 4.7: (a) Proposed MIM  power-ground plane capacitor structure (henceforth 
referred to as MIM plane decaps).  (b) Test structure....................................................... 77 
Figure 4.8: (a) I-V curve for capacitor with 400nm dielectric. (b) TEM of a 6nm Ta2O5 
layer above a 17nm Ta electrode. ..................................................................................... 78 
Figure 4.9: (a) Power delivery network used for analysis. (b) Reduction in first droop 
magnitude.......................................................................................................................... 79 
Figure 4.10: (a) Clock wire with grid based power distribution. Shields flank clock wire. 
(b) Clock wire with MIM plane decaps. ........................................................................... 80 
Figure 4.11: (a) Conventional grid based configuration with coplanar shields in blue. (b) 
Signal wires with MIM plane decaps................................................................................ 82 
Figure 5.1: An illustration of the gate socket concept. ..................................................... 87 
Figure 5.2: Block definitions to find average number of wires between a gate socket pair.
........................................................................................................................................... 88 
Figure 5.3: Validation of average wire lengths with new model with actual data from 22 
ISCAS’89 circuit blocks. Average error of: Donath distribution = 75%, Davis distribution 
= 38%, New distribution = 8%-24%................................................................................. 91 
Figure 5.4: (a) Comparison of new distribution with Davis distribution in the log scale. (b) 
Comparison of new distribution with Davis distribution in the linear scale for short lengths.
........................................................................................................................................... 93 
Figure 5.5: Cross-section of global wiring layers for an integrated circuit. ..................... 95 
Figure 5.6:  Top view of a global power grid between four pads..................................... 96 
 xiv
Figure 5.7: A tapered H tree that is typically used for clock distribution purposes. The wire 
resistance per unit length is denoted as r and the wire capacitance per unit length is denoted 
as c.  Due to symmetry, x1, x2, x3 and x4 are equipotential points. .................................... 98 
Figure 5.8: The equipotential points from Fig. 5.7 are merged in this figure. Distributed RC 
wire models show the equivalence between the top and bottom configurations in this 
figure. Due to symmetry, y1 and y2 are equipotential points............................................. 99 
Figure 5.9: Equipotential points from Fig. 5.8 are merged in this figure. ........................ 99 
Figure 5.10: Power predictions from IntSim. ................................................................. 113 
Figure 5.11: Schematic of Microchannel-cooled 3D Integrated Circuit under consideration.
......................................................................................................................................... 114 
Figure 5.12: Power predictions from IntSim. ................................................................. 116 
Figure 6.1: Wafer-to-wafer bonding approach for fabrication of microchannel-cooled 3D 
Integrated Circuits........................................................................................................... 120 
Figure 6.2: 3D stacking of SRAM with logic circuits to form a microprocessor........... 121 
Figure 6.3: Stacking of 3D phase change memory arrays with microprocessors........... 122 
Figure B.1: Diagrammatic representation of a repeated wire. ........................................ 126 
Figure B.2: Analytical solutions for γ (gamma) and δ (delta) for different values of Φgate 
(transistor dynamic power fraction)................................................................................ 130 
Figure B.3: Comparison of empirical solutions with analytical solutions...................... 131 
 
 xv
SUMMARY 
 
A high-performance 2D or 3D integrated circuit typically has (i) ratio of delay of a 
1mm wire to delay of a nMOS transistor > 500, (ii) target impedence of power delivery 
network < 1mΩ, (iii) clock frequency > 2GHz, and (iv) thermal resistance requirement of 
heat removal path < 0.6oC/W. This data illustrates the difficulty of obtaining high-quality 
signal, power, clock and thermal interconnect networks for gigascale 2D and 3D integrated 
circuits. Specific material, process, circuit, packaging, and architecture solutions to 
enhance these four types of interconnect networks are proposed and quantitatively 
evaluated. A microchannel-cooled 3D integrated circuit technology is developed to deal 
with thermal interconnect problems inherent to stacked dice. The benefits of carbon 
nanotube technology, improved repeater insertion techniques and parallel processing 
architectures for signal interconnect networks are evaluated. A circuit technique to 
periodically reverse current direction in power interconnect networks is proposed. It 
provides several orders of magnitude improvement in electromigration lifetimes. Methods 
to control power supply noise and reduce its impact on clock interconnect networks are 
investigated. Finally, a CAD tool to co-design signal, power, clock and thermal 
interconnect networks in high-performance 2D and 3D integrated circuits is developed.    
 xvi
CHAPTER 1 
INTRODUCTION 
 
The gigascale integration (GSI) era began in 2006 with Intel Corporation releasing an 
Itanium processor with 1.72 billion transistors [1.1]. The Itanium, like the IBM Power, Sun 
SPARC, AMD Opteron and Intel Xeon, is a high-performance microprocessor family with 
yearly system sales greater than $3 billion and profit margins greater than 60% [1.2]. 
Servers and workstations utilizing these high-performance microprocessors form the 
backbone of the internet and play an important role in business and scientific computing 
applications. This research focuses on interconnect issues in such high-performance 
microprocessors. 
 
1.1.   Origin and History of the Problem    
Since the invention of the transistor in 1947, interconnects have posed the following 
three major challenges to solid-state electronics: 
1.1.1.   The First Interconnect Problem 
In the 1950s, semiconductor firms manufactured hundreds of discrete transistors on a 
wafer. Technicians attired in identical lab coats sat side-by-side, hunched over 
microscopes that magnified the wafer, so that they could slice apart individual transistors 
and attach leads and wires to them using tweezers. These transistors were then tested, 
packaged and shipped to customers. The customers would connect transistors to each other 
to configure a circuit. This approach was difficult due to several reasons: 
• As the number of transistors on a circuit grew, the number of interconnects increased 
 1
exponentially. Soldering thousands of connections was difficult, costly and unreliable.  
• Connecting discrete transistors together to form a circuit increased size of the built 
circuit. 
Jack Kilby of Texas Instruments and Robert Noyce of Fairchild Semiconductor 
independently proposed a solution to this interconnect problem: the integrated circuit (IC).  
Different components of a circuit were now fabricated on the same silicon or germanium 
substrate. While Kilby had components on the same chip connected together using 
soldered gold wires [1.3], Noyce connected components together with deposited metal 
[1.3]. Metal was deposited over oxide layers obtained in accordance with Jean Hoerni’s 
planar process [1.4]. Fig. 1.1 shows the integrated circuits built by Kilby and Noyce. 
To summarize, as more and more transistors began to be used in circuits, the number 
of interconnects grew exponentially. Soldering thousands of interconnects by hand was 
costly, difficult and unreliable. This can be considered to be the first interconnect problem. 
The integrated circuit was invented to solve this issue. 
 
 
                           
(a)                                                                  (b) 
 
 
Figure 1.1: (a) Kilby’s first IC: an oscillator. (b) Noyce’s first IC: a flip-flop. 
 
 2
1.1.2.   The Second Interconnect Problem 
After the invention of the integrated circuit, more and more transistors were integrated 
on the same chip. Robert Dennard from IBM gave guidelines for scaling down transistor 
and interconnect sizes in ICs to gain performance [1.5]. Fig. 1.2 shows tables reproduced 
from Dennard’s paper. As can be observed from Fig. 1.2, transistor performance improved 
with scaling, while interconnect delay remained the same even when wire lengths were 
scaled. Furthermore, wires that stretched from one end of an IC to another did not scale in 
length due to architectural reasons and had delays that increased with scaling. Table 1.1 
shows a comparison of the delay of a minimum size nMOS transistor with that of a 1mm 
long interconnect.  
 
 
 
(a)                                                                            (b) 
 
 
 
Figure 1.2: Guidelines for scaling (a) transistors, and (b) interconnects.     
 
Several solutions were pursued to tackle this interconnect problem. These include: 
• Addition of more layers of metal – While a 0.5µm technology had four metal layers, a 
0.18µm technology had six metal layers. Typically, two metal layers were added to a 
 3
microprocessor every three years. 
• Repeaters – In the mid 1980s, Halil Bakoglu and James Meindl suggested the use of 
repeaters for interconnects in ICs [1.6]. Wires with repeaters were found to have an 
order of magnitude improvement in delay.  
• Copper interconnects – IBM pioneered the use of copper interconnects in ICs in the 
1990s. Copper was found to be about 50% less resistive than aluminum, and also gave 
important electromigration advantages [1.7]. 
• Optimal interconnect network design – In the 1990s, Jeffrey Davis and James Meindl 
derived compact expressions to predict the distribution of wire lengths in an IC [1.8]. 
They also developed a methodology to estimate optimal wire pitches for different 
metal layers in a multilevel interconnect network using the stochastic wiring 
distribution. The resulting optimal wiring networks were found to provide significant 
performance, die size and power benefits for IC designers. 
 
Table 1.1: Comparison of transistor delay with that of a benchmark 1mm interconnect. 
Technology nMOS CV/I delay RC delay of 1mm wire 
0.6um 17ps 8ps 
0.35um 9ps 25ps 
0.25um 7ps 50ps 
0.18um 5ps 90ps 
 
 
To summarize, transistor delay improved on scaling integrated circuits, but signal 
 4
interconnect delay did not. This issue consumed considerable process and design effort in 
the 1990s and can be considered the second interconnect problem.   
1.1.3.   The Third Interconnect Problem 
Beyond the 0.18um technology generation (circa 2000), new interconnect issues 
emerged for high-performance microprocessors. Table 1.2 shows that power consumption 
increased with time and reached 82W at the 0.13um technology node. Heat removal 
became a serious issue due to power density approaching limits of air-cooling. Thus, 
thermal interconnects, i.e. interconnects that removed heat from an integrated circuit, 
became important.   
 
Table 1.2: Microprocessor scaling trends [1.9]. 
Technology Frequency Power Voltage Current Power density 
0.6um 100MHz 10W 3.3V 3A 10W/cm2
0.35um 200MHz 16W 3.3V 5A 15W/cm2
0.25um 400MHz 17W 2V 9A 20W/cm2
0.18um 1000MHz 26W 1.7V 15A 25W/cm2
0.13um 3000MHz 82W 1.5V 55A 55W/cm2
 
 
Table 1.2 indicates that higher power dissipation combined with lower supply voltages 
caused current requirements of microprocessors to approach 50-100A. It became a 
challenge to efficiently deliver such high currents from an off-chip DC-DC converter to 
transistors on the integrated circuit through parasitics of the packaging and on-chip 
 5
interconnects. Therefore, power interconnects, i.e. interconnects responsible for delivering 
power to transistors on the integrated circuit, gained significance. 
It can also be observed from Table 1.2 that clock frequency increased exponentially 
and reached several GHz in a 0.13um technology.  For a synchronous IC, a clock signal is 
typically generated at a central phase-locked loop and has to travel through long 
interconnect paths to get to hundreds of thousands of flip-flops at the same instant. 
Distributing such a high frequency synchronous clock became difficult due to across-chip 
supply voltage fluctuations and manufacturing variations in ever-smaller devices and 
interconnects. Thus, clock interconnects, i.e. interconnects that delivered a synchronous 
clock signal to all the flip-flops on the integrated circuit, became important too. 
 
 
Figure 1.3: A Venn diagram showing four types of interconnect networks in today’s ICs. 
 
The above issues with power, clock and thermal interconnects were accompanied by 
further degradation of signal interconnect networks as described in Table 1.1. Beyond the 
0.18um technology generation, one could thus think of high-performance microprocessors 
having four important types of interconnect networks as shown in Fig. 1.3. This is the 
 6
world we live in today. When a new technology was developed in the 1990s, the focus was 
on it being optimized for signal interconnect networks. Today, however, any new 
technology needs to be co-optimized for signal, power, clock and thermal interconnect 
networks. This can be considered the third interconnect problem.  
While 2D integrated circuits have been the topic of discussion thus far, 3D stacked 
integrated circuits are being used today to gain higher packing density and to improve 
signal interconnect performance. Fig. 1.4 shows a set of DRAM chips stacked on top of 
each other in a 3D fashion. Interestingly, the idea of 3D stacking is almost 50 years old. 
The theoretical foundations for this concept were laid by James Early in 1960 [1.10].   
Although 3D stacking has been used for low-power applications, high-performance 
3D stacked microprocessors have been infeasible largely due to roadblocks associated with 
thermal and power interconnects.  To illustrate this with an example, if two 1V 100W/cm2 
chips are stacked on top of each other, the effective power density to be cooled is 
200W/cm2 compared to 100W/cm2 when these chips are placed side-by-side. One would 
also need to deliver 200A/cm2 to the 3D-IC compared to 100A/ cm2 when these chips are 
placed side-by-side. These are serious challenges. 
 
 
      
 
Figure 1.4: Commercially available 3D-DRAMs (Courtesy: Samsung). 
 
 7
To summarize, present-day high-performance 2D and 3D integrated circuits have 
serious issues with signal, power, clock and thermal interconnects. Each of these 
interconnect networks needs to be carefully designed to obtain high quality microchips. 
This can be thought of as the third interconnect problem. The objective of this dissertation 
is to discover techniques to ameliorate the third interconnect problem. 
 
1.2.   Dissertation Outline    
This manuscript begins by investigating techniques to enhance thermal, signal, power 
and clock interconnect networks in Chapters 2, 3 and 4. Following this, a CAD tool called 
IntSim that simulates a GSI chip by co-designing these four types of interconnect networks 
is presented in Chapter 5. The dissertation concludes with a summary of the major 
contributions of this research in Chapter 6. Appendices A and B are used to streamline the 
material in this dissertation and improve its readability. 
 8
CHAPTER 2 
THERMAL INTERCONNECT NETWORKS 
 
CMOS scaling over the past twenty years has been accompanied with a tremendous 
increase in power densities. This is depicted in Fig. 2.1. In fact, power densities of today’s 
2D microprocessors are so high that they require designers to limit performance to use 
standard air cooling technology [2.2]. Three dimensional stacking of high-performance 
microprocessors has also been difficult due to power density limitations [2.3]. Thus, 
thermal interconnect networks are crucial to high-performance 2D and 3D integrated 
circuits. 
 
 
 
 
 
 
 
 
 
 
Figure 2.1: Power density of commercial microprocessors [2.1]. 
 9
This chapter begins with the derivation of a model that relates thermal and electrical 
performance of an integrated circuit. Following this, the theory and fabrication process for 
a 3D microprocessor with integrated microchannel cooling are developed. The work in this 
chapter has been described by the author in [2.4]. 
 
2.1. A Model to Relate Thermal and Electrical Performance of an IC 
   This section explores the relationship between electrical performance of an 
integrated circuit and thermal resistance of its heat removal solution. Some background is 
first provided in Section 2.1.1, following which a summary of the model is provided in 
Section 2.1.2. An example showing how this model can be applied is described in Section 
2.1.3.  
2.1.1. The Nose-Sakurai Model 
In 2000, Nose and Sakurai considered an inverter driving a fan-out of 4 (FO4) and 
derived expressions for optimal supply and threshold voltages [2.5]. The researchers 
essentially minimized power consumption of the FO4 inverter for a certain target clock 
frequency, logic depth, temperature and device technology. Henceforth, these optimal 
supply and threshold voltage values are denoted as Vdd(T) and Vt(T) respectively, where T 
is the absolute temperature. This work from Nose and Sakurai yielded the following 
interesting relationships between leakage power (Pleak), dynamic power (Pdyn) and total 
power (P) of any CMOS circuit.  
2
1
leak s
dyn
P N
P
α
α= −  
…(2.1)
 10
  2
2 1
leak s
s
P N
P N
α
α α= + −  
…(2.2) 
 Here, α is an exponent in the power-law MOSFET model [2.6] and Ns is the 
sub-threshold slope in mV. Substituting α= 1.3 and Ns = 0.04mV (which corresponds to 
around 100mV/decade), we find that the ratio of leakage power to total power of a 
microprocessor is about 25%. Commercial microprocessors such as [2.7] have such 
leakage power to total power ratios.  
2.1.2. Summary of Proposed Model 
The proposed model is summarized below using three simple relationships denoted as 
Eq. (2.3), Eq. (2.4) and Eq. (2.5). Meanings of the symbols below are provided in Table 
2.1. 
amb
th
T TR
P
−=  …(2.3)
1
2
1                       1
2
dyn leak leak leak
s
leak
s
P P P P P
N
P
N
α
α
α
α
⎛ ⎞−= + = +⎜ ⎟⎝ ⎠
⎛ ⎞−= +⎜ ⎟⎝ ⎠
 
 
…(2.4)
0
t
s
V ( T )
N
leak gates dd leak
av
WP N V (T )I e
L
−⎞= ⎟⎠  
…(2.5)
Eq. (2.3) represents the definition of thermal resistance while Eq. (2.4) follows from 
Eq. (2.1). Short-circuit power of a microprocessor is typically less than 10% of the total 
power and is neglected [2.6]. Eq. (2.5) computes total leakage power of logic cores of a 
microprocessor. It takes leakage power of a single minimum size gate and multiplies this 
quantity by the number of gates and typical gate size. 
 11
Table 2.1: Description of symbols in model. 
Symbol Description 
thR  Thermal resistance of logic cores in oC/W 
ambT  Ambient temperature in K 
P Total power of logic cores in W 
gatesN  Number of logic gates 
av
W
L
⎞⎟⎠  Width to length ratio of transistors in a typical logic gate 
0leakI  Leakage current co-efficient of a typical minimum size logic gate in A 
 
 
Eq. (2.3) when rearranged is 
amb
th
T TP
R
−=  …(2.6)
Combining Eq. (2.4) and Eq. (2.5), we get 
0
1 1
2
t
s
V ( T )
N
gates dd leak
av s
WP N V (T )I e
L N
α
α
− ⎛ ⎞−⎞= +⎜ ⎟⎟⎠ ⎝ ⎠
 
…(2.7)
Eq. (2.6) and Eq. (2.7) represent two equations with two unknowns, total logic power 
P and temperature T. These can be solved self-consistently to obtain closed-form solutions 
for T and P. An interested reader can refer to Appendix A for details of this solution. An 
example for application of this model is provided in the following section. 
2.1.3. Application of Model to a Pentium Microprocessor 
The proposed model is used to evaluate logic power consumption, temperature, 
 12
optimal supply voltage and optimal threshold voltage for the 130nm Intel Pentium 4 
microprocessor. This processor had 12.5 million logic gates, ran at a clock frequency of 
3GHz (16 FO4 delays) and had a die size of 76mm2 for its logic core [2.8]. Based on data in 
[2.9], each minimum size nFET in this processor had a capacitive load of around 4.2fF and 
a leakage current of 100nA/µm at a threshold voltage of 0.19V (at room temperature). 
Thermal resistance is taken to be 0.6oC-cm2/W, 3 sigma threshold voltage variation is 
assumed to be 0.1V [2.10] and range of operation temperatures for this chip is specified as 
300K to 400K [2.5].   
 
Table 2.2: Comparison of model predictions with actual data from the 130nm Pentium 4 
microprocessor. 
 Model Actual value 
Total logic power 94W ~80W 
Temperature 100oC 80oC-100oC 
Supply voltage 1.2V 1.4V 
Threshold voltage at 300K 0.21V 0.19V 
 
 
These values yield a total logic power of 94W, temperature of 100oC, optimal supply 
voltage of 1.2V and optimal threshold voltage of 0.21V, as shown in Table 2.2. The error in 
values of these quantities with respect to actual data from the Pentium 4 processor 
[2.8][2.9][2.10][2.11] could be due to multiple reasons: (i) Drain induced barrier lowering 
is neglected in the model.   (ii) An increased supply voltage value could have been chosen 
by the Pentium 4 designers to analog circuits functioned reliably, or to avoid disruptive 
 13
changes in design of power delivery systems. (iii) Power reduction techniques such as 
multiple threshold voltages and downsizing of gates in non-critical paths were utilized in 
the processor. Despite these potential sources of error, the model seems to be a reasonably 
accurate relationship between electrical and thermal performance of an integrated circuit. It 
can be used to find the impact of different types of heat removal solutions on an integrated 
circuit. It can also provide approximate estimates for supply voltage, threshold voltage and 
logic power of an integrated circuit prior to design. 
 
2.2. Heat Removal for 3D Integrated Circuits 
 As mentioned earlier in this chapter, heat removal is one of the most critical 
challenges with high-performance 3D integrated circuits. Today’s air cooled heat sinks 
have thermal resistance limits around 0.6oC-cm2/W. This makes their use as heat removal 
solutions for a 3D-IC consisting of two 100W/cm2 chips difficult [2.3]. Microchannel 
cooling technology is a potential solution to this problem. 
 
    
 (a)                                                      (b) 
 
Figure 2.2: (a) Original concept of a microchannel heat sink from Tuckerman and Pease 
[2.13]. (b) An assembly technology for microchannel heat sinks proposed by Dang, Bakir, 
Joseph, Kohl and Meindl [2.14]. 
Microchannel cooling technology was invented in the early 1980s at Stanford 
University by David Tuckerman and Fabian Pease [2.13]. These inventors proposed 
etching channels on the back side of a silicon chip and passing a coolant such as water 
 14
through these channels. This improved heat removal rates from integrated circuits 
tremendously.  Fig. 2.2(a) is a picture from Tuckerman and Pease’s original paper. Over 
the past 25 years, several enhancements to this original concept have been proposed and 
evaluated [2.13][2.14]. One such concept from Dang, Bakir, Joseph, Kohl and Meindl (Fig. 
2.2(b)) involves delivering fluid to a microchannel heat sink using fluidic I/Os and 
microchannels on a printed wiring board [2.14]. The authors contend that this would allow 
cheap, wafer-level fabrication and assembly of microchannel heat sinks to multiple chips 
on a single printed wiring board. It would also preclude manual attachment of tubes to the 
backside of each chip in a multi-chip server (Fig. 2.3).  
 
 
 
 
   
 
 
Figure 2.3: Manual attachment of tubes to the backside of each chip on a printed wiring 
board. 
 
A point to note is that all the published experimental work on microchannel cooling 
thus far has dealt with 2D integrated circuits. A microchannel-cooled high-performance 3D 
integrated circuit technology is described in the rest of this chapter.   
 15
2.2.1. Concept 
 
 
 
 
 
 
 
 
 
 
Figure 2.4: A microchannel-cooled 3D integrated circuit. 
 
Potential methods to implement a microchannel-cooled 3D integrated circuit 
technology are illustrated in Fig. 2.4. Cooling fluid can be delivered to the 3D stack either 
using tubes on the back side of the 3D stack or using fluidic channels on the substrate. This 
fluid is delivered to microchannel heat sinks on the back side of each chip in the 3D stack 
using fluidic through-silicon vias (TSVs) and fluidic pipes. Electrical through-silicon vias 
are present to deliver power to different chips in the 3D stack and to communicate between 
different chips. A polymer such as Avatrel is used to cover the microchannels [2.14].   
2.2.2. Fabrication Process 
The fabrication process for such microchannel-cooled 3D-ICs involves two major 
parts: 
 16
(i) Chip-Level Fabrication Technology 
 
 
 
 
 
 
 
 
 
 
 
Figure 2.5: Chip-level fabrication process for microchannel-cooled 3D integrated circuits. 
The extra lithography steps for fluidic network fabrication are indicated. 
 
The chip-level fabrication process is depicted in Fig.2.5. It begins with fabricating 
electrical TSVs on an integrated circuit that has undergone front end of line (FEOL) and 
back end of line (BEOL) processing. Several processes are available for fabricating 
electrical TSVs such as [2.15] and [2.16]. In this work, [2.15] is used due to its simplicity 
and compatibility with equipment available with Georgia Tech’s cleanroom. Following 
this, the Bosch process is utilized for etching fluidic through-silicon vias and 
microchannels as shown in step (3) of Fig. 2.5. A sacrificial polymer material such as Unity 
 17
is then spin-coated on the microchannels and polished in step (4). An interested reader can 
refer to [2.14] for more details of steps (3) and (4). A polymer (Avatrel) is spun-on, 
patterned and cured to form a cover for the microchannels and fluidic TSVs in step (5). 
Unity is then decomposed by heating to 260oC. Fig. 2.6 shows a cross-sectional 
microscope microscope image of a sample after this chip-level fabrication process is 
complete. The entire process outlined above proceeds at less than 260oC, allowing it to be 
CMOS compatible. Furthermore, the fluidic network processing in Fig. 2.5 occurs at the 
wafer-level and requires just three micron scale lithography steps. This makes the process 
economically feasible.  
 
 
 
 
 
Figure 2.6: Cross-sectional microscope image of a sample after chip-level fabrication. 
Microchannels are about 200um tall and 150um wide while copper vias are about 50um in 
diameter. Silicon thickness is 400um. Electrical through-silicon via density is 2500/cm2. 
 
(ii) Assembly Technology 
This part of the project (Section 2.2.2(ii)) was completed by Calvin King. The 
assembly technology for microchannel-cooled 3D integrated circuits is outlined in Fig. 2.7. 
After solder bumping, fluidic pipes are fabricated with a polymer such as Avatrel [2.14] for 
the top chip in a two chip 3D stack. The bottom chip in the two chip 3D stack is first 
 18
assembled onto the substrate with a flip-chip bonder. Following this, the top chip in the 3D 
stack is assembled onto the bottom chip as shown in step (3) of Fig. 2.7. Underfill is 
dispensed to seal fluidic pipes and control co-efficient of thermal expansion mismatches 
between the chip and the substrate. Fig. 2.8(a) shows SEM images of a chip with solder 
bumps and polymer pipes. Fig. 2.8(b), on the other hand, is a cross-sectional SEM image of 
two silicon chips with solder bumps, fluidic pipes, fluidic through-silicon vias, Avatrel 
covers and electrical pads. These chips are assembled on top of each other and to a silicon 
substrate. Fig. 2.6 and Fig. 2.8 thus demonstrate the chip-level fabrication technology and 
assembly technology required for a microchannel-cooled 3D integrated circuit.   
 
              
   
 
 
 
 
 
 
 
 
 Figure 2.7: Assembly process for microchannel-cooled 3D integrated circuits.  
 19
     
 
(a)                                                        (b) 
 
 
 
Figure 2.8: (a) SEM image of a chip with solder bumps and polymer pipes. (b) 
Cross-sectional SEM image of an assembled two chip 3D stack with fluidic networks. 
 
2.2.3. Theoretical Analysis 
The thermal resistance of a microchannel heat sink can be written as a sum of three 
components: Rcond, which is due to conduction of heat from the circuits to the microchannel 
heat sink, Rconv, which is due to convective heat transfer between the heat sink and the 
coolant fluid and Rheat, which is due to heating of the fluid as it passes through the heat 
exchanger. Rcond is typically less than 0.01oC/W due to the excellent thermal conductivity 
of silicon and the short distance between heat generating circuits and microchannels [2.14]. 
Some commonly used equations for Rconv and Rheat are given in Fig. 2.9 [2.14]. Equations 
for pressure drop of the cooling fluid in microchannels (∆Pchannels) and fluidic 
through-silicon vias or pipes (∆Pvia/pipe) are also provided in Fig. 2.9. These equations have 
been verified with experimental data multiple times since the original work from 
Tuckerman and Pease [2.13][2.14].  
 
 20
  
 
 
 
Model Description of symbols 
2
2
c c
conv
f c c c c c c
W HR
k Nu n L ( H W )( H W )∞
= + +  
where 
2 3 48 2 1 1 9 3 8 5 8 5 4 2Nu . ( . . . . )5α α α α α∞ = − + − + −  
2
1
2
c
c
c
heat
p c c c
Wf Re L
H
R
C n H W P
µ
ρ ∆
⎛ ⎞+⎜ ⎟⎝ ⎠=  
where 
2 3 424 1 1 4 1 9 1 7 0 3 5f Re ( . . . . )α α α α α= − + − + −  
2
3
1
2
.
c
c
c
channels
c c c
Wf Re L V
H
P
n H W
µ
∆
⎛ ⎞+⎜ ⎟⎝ ⎠=  
4
512
.
via / pipe
via / pipe
via / pipe via / pipe
H V
P
n D
µ∆ π=  
kf = Thermal conductivity of water in 
W/m-K 
nc = Number of channels 
Nu∞ = Nusselt number 
α = Wc/Hc 
f = Friction factor 
Re = Reynolds number 
µ = Viscosity of water in Pa.s 
ρ = Density of water in kg/m3
Cp = Specific heat of water in J/K 
∆P = Pressure drop in microchannels 
in Pa 
Hvia/pipe = Height of via/pipe in m 
Dvia/pipe = Diameter of via/pipe in m 
nvia/pipe = Number of vias/pipes 
.
V = Overall flow rate of coolant in 
m3/s 
 
Figure 2.9: Equations to describe operation of a microchannel-cooled 3D-IC with square 
dice. Quantities whose units are not indicated in the above table are dimensionless. 
 21
For a microchannel-cooled 3D integrated circuit with an external fluidic tube, the ratio 
of pressure drop in fluidic vias and pipes (∆Pvia-pipe-total) to the pressure drop in 
microchannels (∆Pchannels) can be written as follows: 
2
4
3
1
1024
c
c via / pipe via / pipe
cchannels
via pipe total c via pipe total c c
Wf Re L n D
HP
P n H H
π∆
∆ − − − −
⎛ ⎞+⎜ ⎟⎝ ⎠=
W
 
…(2.8)
       
Here, Hvia-pipe-total represents the total length of vias/pipes in the path of the cooling 
fluid. Note that separate fluidic pathways exist to the top and bottom chip of the two-chip 
3D stack. For typical values for these parameters, i.e. Lc=10mm, Wc=100um, Hc=200um, 
Dvia/pipe=250um, Hvia-pipe-total=0.9mm (for 400um silicon chips), nc=2nvia/pipe = 50, we get 
12channels
via pipe total
P
P
∆
∆ − − ∼        and        
Silicon area for fluidic through vias 2 5
Chip area
. %=  …(2.9)
The above numbers reveal that fluidic vias and fluidic pipes consume minimal surface 
area for a two-chip microchannel-cooled 3D integrated circuit, and at the same time have 
negligible pressure drop through them.  This is largely because the total length of fluidic 
vias and pipes is only 0.9mm while the length of microchannels is as high as 10mm. 
Further reduction in pressure drop of fluidic vias and pipes is possible by increasing silicon 
area allocated to these structures. The result in Eq. (2.9) is interesting, since this indicates 
the fluidic network that provides liquid coolant to the microchannel heat sink does not 
impose any significant overhead for a two-chip stack. Essentially, integrated circuits in a 
multi-chip server would have the same thermal resistance irrespective of whether they are 
placed side-by-side or form part of two-chip 3D stacks! Note that experiments by Dang, et 
 22
al. in [2.14] revealed that pressure drop through fluidic pipes and through-silicon fluidic 
vias were negligible for a microchannel-cooled 2D integrated circuit as well.  
Another interesting trade-off exists and can be studied with the models in Fig. 2.9. It is 
well-known that high aspect ratio microchannels are required for heat removal [2.13]. This, 
in turn, requires a high silicon thickness. However, a high silicon thickness increases the 
diameter of electrical through-silicon vias. Table 2.3 helps explain this phenomenon. 
Aspect ratios of through-silicon vias possible today are in the range of 6:1 to 11:1 
[2.15][2.16]. Even for a 16:1 aspect ratio electrical through-silicon via technology, Table 
2.3 reveals that the diameter of through-silicon vias cannot be reduced below 6um for a 
thermal resistance limit of 0.6oC/W.   
 
 
 
 
 
 
 
 
 
 
  
Figure 2.10: Thermal resistance vs. silicon thickness. 
 23
On-chip global interconnect widths of today’s microprocessors are around 0.5um 
[2.17]. To move a significant number of global interconnects into the third dimension, it 
seems reasonable to expect that through-silicon via diameter needs to be around the same 
range as global interconnect width. However, due to the silicon thickness requirement 
imposed by microchannels, the electrical through-silicon via diameter in Table 2.3 is an 
order of magnitude higher than the global interconnect width.  This could limit the benefits 
of microchannel-cooled 3D integrated circuits as far as global interconnects are concerned. 
Note that this analysis considers aggressive pump technology (Table 2.4), and therefore, 
for standard pump technology, it might be difficult to even reach the numbers given in 
Table 2.3. Table 2.4 gives a summary of various varieties of micropumps that have been 
demonstrated in the literature [2.18]. It reveals the aggressiveness of the pump technology 
assumed for Table 2.3 based on pressure drop and flow rate values.   
 
Table 2.3: Silicon thickness optimization for 100 um wide microchannels (from Fig. 2.10).  
Via diameter (um) 
Silicon 
thickness 
Thermal 
resistance     
(oC/W) 
Pressure 
drop         
(psi) 
Flow rate 
for each 
chip in 3D 
stack 
(ml/min) 
Aspect 
ratio 6:1 
Aspect 
ratio 11:1 
Aspect 
ratio 16:1 
75um 2.05 30 7.6 13 7 4.7 
85um 0.95 30 19 14 8 5.3 
100um 0.55 30 49 17 9 6.3 
200um 0.15 30 506 33 18 12 
300um 0.09 30 1075 50 27 19 
400um 0.07 30 1650 67 36 25 
 
 24
Table 2.4: Micropump technology demonstrated in the literature (from [2.18]). 
Micropump description Volume of pump 
Pressure 
drop        
(psi) 
Flow rate 
(ml/min) 
Rotary micropump with a magnetic 
micromotor 
6mm3 (without 
motor) 2 0.350 (water) 
Vibrating diaphragm micropump with 
piezoelectric actuation 42mm
3 2.4 1.5 (water) 
Valveless nozzle-diffuser micropump with 
piezoelectric actuation 122mm
3 0.14 1.5 (ethanol) 
Electroosmotic micropump 1413mm3 23 7 
Injection EHD micropump 7mm3 0.35 14 (ethanol) 
Mini centrifugal magnetic drive pump        
(from Cole Parmer Product Manual) 900cm
3 30 1650 
 
 
2.2.4. Characterization 
The thermal resistance for each chip in a two chip 3D integrated circuit can be 
obtained by measuring properties of the single microchannel heat sink shown in Fig. 2.11. 
This is because fluidic TSVs and fluidic pipes delivering fluid to microchannels have a 
negligible pressure drop overhead. Platinum thin film resistors were fabricated on a chip 
with a microchannel heat sink to facilitate this measurement. Current was passed through 
these platinum resistors to produce heat, while temperature changes were measured by 
monitoring changes in platinum resistance. The chip was packaged and copper pads on the 
silicon substrate were used to deliver current to the platinum resistors and to monitor their 
resistances. Deionized water was circulated through the microchannels at a flow rate of 
65ml/min using a pump [2.14]. Fig. 2.11 shows details of these measurements. Pump 
power was measured as 0.3W while pressure drop was measured as 3.4psi [2.14]. The 
 25
junction-to-ambient thermal resistance was measured as 0.24oC/W based on the data in 
Fig. 2.11. Each chip in the two-chip 3D integrated circuit would thus have a 
junction-to-ambient thermal resistance of 0.24oC/W. 
 
 
 
 
 
 
 
 
 
Figure 2.11: Measurement of thermal resistance. Fluid inle
indicated (Courtesy: Bing Dang [2
 
2.2.5. Benefits  
Microchannel-cooled 3D integrated circuits prov
high-performance servers. Fig. 2.12 shows the organization
composed exclusively of 2D integrated circuits. It also de
when 3D die stacking is enabled due to microchannel coo
between three representative chips, denoted as chip 1, chip 2
be observed that microchannel-cooled 3D integrated circ
magnitude reduction in chip-to-chip interconnect lengt
 26t and outlet temperatures also 
.14]). 
ide tremendous benefits to 
 of chips in a server when it is 
picts the organization of chips 
ling techniques. The distances 
 and chip 3 are indicated. It can 
uits enable upto an order of 
hs. This leads to significant 
improvements in chip-to-chip interconnect latency, bandwidth and power dissipation. 
 
 
 
 
 
 
 
 
 
Figure 2.12: Organization of a sample 2D server [2.19] in a 3D configuration. Besides 
higher logic density, 3D stacking also enables higher memory density. 
 
Another benefit of microchannel-cooled 3D integrated circuits is the reduced thermal 
resistance when compared to today’s air cooled integrated circuits. This can be studied 
using the electrothermal model summarized in Section 2.1. When a 3GHz logic core in a 
2D air cooled server has 0.6oC/W thermal resistance, the model reveals that its chip 
temperature is 88oC and power is 102W.   This data is obtained for the 65nm technology 
described in [2.17]. Table 2.5 shows results obtained from the equations in Section 2.1 
when this logic core forms part of a 0.24oC/W microchannel-cooled 3D-IC. Three cases are 
considered where the microchannel-cooled logic core has (i) the same frequency, (ii) the 
same power, and (iii) the same temperature as the air cooled 3GHz logic core.  For the 
 27
same frequency case, an 18% reduction in power results along with a 41oC reduction in 
temperature. For the same power case, a 10% increase in frequency is obtained along with 
a 36oC reduction in temperature. This reduction in chip temperatures is beneficial for 
server reliability. If chip temperature is fixed at 88oC, a 3GHz air cooled logic core can be 
clocked at 4.5GHz with a microchannel heat sink. This 50% increase in frequency over the 
3GHz logic core is substantial. However, the equations in Section 2.1 reveal that the 
4.5GHz logic core would have a current of 203A compared to 105A for the 3GHz logic 
core, indicating improved power delivery schemes would be needed. These improved 
power delivery schemes are described in Chapter 4.  
 
Table 2.5:  Benefits of microchannel-cooled 3D integrated circuits for a processor with 40 
million logic gates constructed in a 65nm technology. 
 Frequency Power Temperature 
Optimal 
supply 
voltage 
Optimal 
threshold 
voltage 
Air cooled processor 
with                
thermal resistance 
0.6oC/W 
3GHz 102W 88oC 0.97V 0.29V 
3GHz 83W 47oC 0.87V 0.29V 
3.3GHz 102W 52oC 0.92V 0.29V 
Microchannel-cooled  
processor with        
thermal resistance 
0.24oC/W 4.5GHz 254W 88oC 1.25V 0.27V 
 
 
2.3. Summary 
A microchannel-cooled 3D integrated circuit technology is developed in this chapter 
to address heat removal challenges inherent to stacked high-performance microprocessors. 
 28
The presented microfluidic cooling networks are CMOS compatible, involve four 
minimally demanding lithography steps and are fabricated at the wafer-level. Electrical 
through-silicon via density for fabricated samples is 2500/cm2. Measurements reveal each 
chip in a two-chip microchannel-cooled 3D stack would have a junction-to-ambient 
thermal resistance of 0.24oC/W.  A tremendous reduction in chip-to-chip interconnect 
lengths is obtained due to this technology. However, on-chip global interconnect length 
reduction with microchannel-cooled high-performance 3D integrated circuits appears 
difficult, due to silicon thickness limitations imposed by the heat removal solution. A 
newly derived electrothermal model that co-designs electrical and thermal functionality of 
an IC reveals that a 3GHz logic core of an air-cooled microprocessor could run at 4.5GHz 
when it forms part of a microchannel-cooled 3D integrated circuit due to improved thermal 
resistance values. Challenges associated with microchannel-cooled 3D integrated circuits 
are development of pumps that work reliably over the lifetime of a server and fluid leakage.  
 29
CHAPTER 3 
SIGNAL INTERCONNECT NETWORKS 
 
Signal interconnect networks consume the bulk of the wiring area of today’s 
microprocessors. Table 3.1 shows wire pitches of high-performance and low-power 65nm 
technologies [3.1] from Intel Corporation. Pitches of interconnect levels are normally 
selected by semiconductor manufacturers using a stochastic wire length distribution. This 
looks at previous generations of a certain microprocessor and predicts wire lengths of a 
microprocessor that needs to be designed with the current logic technology [3.2][3.3]. 
Once the wire length distribution is known, algorithms are used to find pitches of different 
interconnect levels based on certain performance criteria and cost limitations [3.2][3.3]. 
 
 Table 3.1: Interconnect pitches of 65nm logic technologies. 
 High-performance 65nm technology Low-power 65nm technology 
M1 210nm 210nm 
M2 210nm 210nm 
M3 220nm 220nm 
M4 280nm 280nm 
M5 330nm 275nm 
M6 480nm 280nm 
M7 720nm 420nm 
M8 1080nm 1080nm 
 
 
 30
The methodology summarized in the previous paragraph forms the basis of a CAD 
tool called MINDS that was developed by Venkatesan, Davis and Meindl in 2003 to study 
signal interconnect networks [3.2]. MINDS is used in this chapter to evaluate different 
opportunities and ideas for enhancing signal interconnect networks. 
                                                          
2
max
min
l
m
w m
gates l
Ae A P li( l )dl
N
χ= ∫   
                                    
                                                               
2
2
2 2
rc
. cl
P f
ρ βτ = =  …(3.1)
2 75 2 o o
rc
. l cR C
P f
ρ βτ = =  …(3.2)
         
  
Figure 3.1: Equations in MINDS. 
 
The algorithm in MINDS can be understood better with the equations in Fig. 3.1. In 
Eq. (3.1) and Eq. (3.2), Am is the core area, χ is a factor to convert point-to-point wire length 
to net length, ew is a wire efficiency factor, P is the wire pitch, lmin and lmax are the minimum 
and maximum wire lengths on the pair of metal levels respectively, Ngates is the number of 
logic gates, c is the wire capacitance per unit length, β is a factor which is 0.25 for local 
wires and 0.9 for other wires, Ro and Co are the resistance and capacitance of a minimum 
 31
size repeater respectively and ρ is the wire resistivity. Also, i(l)dl is the number of 
point-to-point interconnects whose length lies between l and l+dl. Estimates for this 
quantity are obtained from a stochastic wire length distribution model derived by Davis 
and Meindl in [3.4]. The right hand side of Eq. (3.1) thus indicates the area needed for 
routing wires of length lmin to lmax in a pair of wiring levels, while the left hand side of Eq. 
(3.1) gives the area available for routing wires in that pair of wiring levels. It is assumed 
that w=s= P/2 and t=h=P where w, t, s, h and P are as indicated in Fig. 3.1. The RC delays 
of interconnects, with and without repeater insertion are given in Eq. (3.1) and Eq. (3.2). 
Repeater insertion is carried out using a sub-optimal Bakoglu methodology [3.2].         
Starting from the first pair of wiring levels, Eq. (3.1) and Eq. (3.2) are solved 
simultaneously. This helps determine the minimum wire pitch that allows the delay of the 
longest wire in a pair of wiring levels to be a certain fraction of the clock frequency. If the 
pitch obtained is smaller than the minimum pitch projected by the International 
Technology Roadmap for Semiconductors (ITRS) [3.5] for a certain logic technology, the 
ITRS minimum pitch is used. Using Eq. (3.1), the length of the longest wire is then 
calculated. The maximum length for wire pair 1 becomes the minimum length for wire pair 
2. These steps are repeated for all wire pairs until the longest interconnect in the wiring 
distribution is routed. Rent’s constants [3.4] are assumed to be k=4 and p=0.6 for all the 
analysis in this chapter. Logic gates are modeled as two-input NAND gates and are sized 
based on average wire length estimates. Simulations using MINDS have been shown to 
match data from commercial microprocessors in [3.2]. Leakage power models from [3.6] 
are used. The impact of size effects such as surface scattering and grain boundary 
scattering on copper resistivity are modeled as shown in [3.7]. The values for both 
 32
specularity parameter and reflectivity parameter for these types of scattering are chosen to 
be 0.5 [3.7]. 
This chapter begins with using MINDS to study signal interconnect challenges in 
scaled CMOS technologies. Solutions are then suggested to these challenges and their 
impact is quantified. The work in this chapter has been described by the author in [3.8], 
[3.9] and [3.10].     
 
3.1. Signal Interconnect Challenges in Scaled CMOS Technologies    
This section quantifies the impact of signal interconnect networks on a 4GHz 0.5V 
30M gate logic core constructed in a 22nm technology [3.5]. MINDS is used to predict 
pitches of interconnect levels, power and minimum die size that allow the wiring to be 
packed in ten metal levels. 
   
 
 
       
 
 
 
Metal levels Pitch   (nm) 
M1, M2 44 
M3, M4 52 
M5, M6 65 
M7, M8 97 
M9, M10 186 
Figure 3.2: Projections from MINDS for a 22nm logic core. 
 
Results from MINDS are summarized in Fig. 3.2. It can be observed that interconnects 
 33
and repeaters consume 61% of the total power of this logic core. Moreover, repeaters 
consume 26% of the total power and repeater leakage power is about 50% of the total 
leakage power. This data underscores how interconnect technology heavily influences 
power dissipation of microprocessors in aggressively scaled CMOS technologies. 
Interestingly, after the author published the above results on repeater power dissipation 
[3.8], IBM presented data from their chips confirming these results [3.11].  
It is well known that designers of today’s high-performance microprocessors are given 
a certain target power dissipation number and are asked to maximize performance within 
that power budget. This is due to many reasons: (i) Electricity bills of today’s servers over 
their lifetimes cost more than the price of the servers themselves [3.12]. To get a low cost 
of ownership, it is important to keep power dissipation under control. (ii) Thermal 
management and power delivery for chips with high power dissipation is difficult. 
Essentially, if the dominant interconnect and repeater power in Fig. 3.2 is somehow 
reduced, one could get substantially higher performance for the considered logic core. 
Thus, it is clear that interconnects impact performance of a logic core significantly. 
The cost of ownership of a high-performance microprocessor today depends on its die 
size, number of interconnect levels and power costs. The 10 interconnect levels needed for 
this 22nm high-performance microprocessor require 20 lithography steps, compared to 39 
lithography steps for the entire microprocessor [3.5]. Ref. [3.2] gives an analysis that 
shows how the die size of a microprocessor is determined both by the area needed for 
routing its wires and the silicon area needed for its transistors.  
Signal interconnect networks therefore play an important role in determining 
performance, power and cost of a GSI chip.  
 34
3.2. A Technological Solution: Carbon Nanotube Interconnects    
Carbon nanotubes are cylinders of graphene, a two dimensional form of graphite. 
They can be classified as single-walled and multi-walled based on the number of graphene 
shells forming the nanotube, as depicted in Fig. 3.3. Carbon nanotube interconnects are 
considered to be a promising alternative to copper interconnects in the long-term [3.13] 
due to their lower resistivity.   
 
 
 
 
 
 
Figure 3.3: Types of carbon nanotube interconnects. 
 
Although several publications have compared the performance of carbon nanotube 
(CNT) interconnects with copper wires [3.13][3.14], the impact of carbon nanotube 
interconnects on an entire GSI chip is not known. For example, it is not known what the 
power or performance benefits to a GSI chip could be if carbon nanotube interconnects are 
used. If the power or performance benefits are substantial, it could motivate further 
research on CNTs, while if the power or performance benefits are not significant, it could 
help focus research expenditure towards more promising approaches. The objective of this 
work is, therefore, to evaluate the power and performance benefits of using carbon 
 35
nanotube interconnects in GSI chips.  
3.2.1. Circuit Models  
Carbon nanotube interconnects are examples of quantum wires, where transport 
properties are affected by quantum effects. Due to the confinement of conduction electrons 
in the transverse direction of the wire, their transverse energy is quantized to a series of 
discrete values. This causes carbon nanotube interconnects to have some rather unique 
properties. Multi-walled CNTs form the focus of this study, since previous work has shown 
that multi-walled CNTs offer higher performance benefits [3.15] than single-walled CNTs 
for long interconnects. 
(i) Resistance Models 
 An important characteristic of carbon nanotube interconnects is quantum resistance 
[3.13]. A single shell of graphene, irrespective of length, has two electronic sub-bands that 
cross the Fermi level and a quantum resistance of h/4e2 = 6.5kΩ, where h is Planck’s 
constant and e is the charge on an electron [3.13]. Due to multiple graphene shells 
contributing to conduction in multi-walled CNTs, the effective quantum resistance is lower 
than the 6.5kΩ value mentioned above. 
The effective resistivity, ρ, of multi-walled CNTs is modeled using Eq. 3.3 [3.15].  
2
2 2 2
2 1
11
2
o
max
min min o
max o max max max o
min
o
R .
LL D
D DaT aTL Lb l LD l D D D l D
l
log
δρ = ⎛ ⎞+⎜ ⎟⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟− + − − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ +⎜ ⎟⎝ ⎠
 
…(3.3)
Here, δ is the spacing between adjacent shells in a multi-walled CNT interconnect. 
This quantity is the van der Waals distance between adjacent graphene layers in graphite, 
 36
and is taken to be 0.34nm. L is the length of the interconnect, Ro=12.9kΩ is the quantum 
resistance of a single conduction channel of a graphene shell, T is the absolute temperature, 
Dmax and Dmin are the outer and inner diameters of the multi-walled CNT interconnect 
respectively, a is 2.04x10-4nm-1K-1, b is 0.425 and lo = 1000/(T/100-2). This model makes 
three key assumptions: (a) Good contacts to all shells in the multi-walled CNT wire (b) 
Interaction between different shells of a multi-walled CNT does not impact performance 
(c) One-third of all shells in the multi-walled CNT are metallic.  
 
 
 
 
 
 
 
 
 
Figure 3.4: Comparison of wire resistivity of a multi-walled CNT interconnect and a 
copper interconnect at the 22nm technology node. 
 
Fig. 3.4 shows a resistivity comparison of 22nm wide multi-walled CNT interconnects 
with 22nm wide and 44nm tall copper wires.  This is done for all wire lengths in the logic 
core analyzed in Section 3.1. It can be seen that carbon nanotube interconnects reduce wire 
 37
resistivity for a majority of wire lengths in the logic core. For example, carbon nanotube 
interconnects reduce resistivity of a 1mm wire by as much as 66%. To put this in 
perspective, the transition from aluminum to copper interconnects reduced resistivity by 
45-50% [3.16]. Another observation from Fig. 3.4 is that for short wire lengths, carbon 
nanotube interconnects do not provide an advantage over copper interconnects. This is due 
to the quantum resistance [3.13] of carbon nanotube interconnects.     
(ii) Capacitance Models 
 
 
 
(a)                                                                     (b) 
 
 
Figure 3.5: Comparison of interconnect architecture of (a) carbon nanotube interconnects, 
and (b) copper. t is the wire thickness, w is the wire width and s is the wire spacing. 
 
Besides the lower resistivity, another advantage of using 22nm wide multi-walled 
CNT interconnects over 22nm wide and 44nm tall copper wires is the lower capacitance. 
This is due to the lower effective aspect ratio and cylindrical shape of multi-walled carbon 
nanotube interconnects. Using tables and compact models in [3.2][3.17], the structure in 
Fig. 3.5(a) is found to have 45% reduced electrostatic capacitance compared to the 
structure in Fig. 3.5(b). Since wire capacitance depends on the ratio of different wire 
 38
cross-sectional dimensions and not the cross-sectional dimensions themselves [3.2][3.17], 
this 45% capacitance advantage is available to all metal levels on a GSI chip that use 
multi-walled CNT interconnects. 
To add electric charge to a quantum wire, one must add electrons to available states 
above the Fermi level (Pauli Exclusion Principle). A quantum capacitance can thus be 
defined in series with the electrostatic capacitance for each conduction channel. This 
capacitance has a value in the order of 200aF/µm, and in virtually all cases is much larger 
than the electrostatic capacitance. In most practical cases, it can therefore be ignored 
[3.18]. 
3.2.2. Manufacturing Challenges  
Although carbon nanotube interconnects offer a significant reduction in wire 
resistivity, several important challenges need to be overcome before they are viable. These 
challenges include large scale directed growth of horizontally oriented carbon nanotubes, 
obtaining good contacts to all shells of a horizontal multi-walled CNT and manufacturing 
at CMOS compatible process temperatures (<400oC) [3.13].  Several research projects in 
industry and academia are currently directed towards surmounting these barriers [3.13]. 
The analysis in this manuscript evaluates the benefits provided by carbon nanotube 
interconnects to a GSI chip if the above challenges can be overcome.  
Wires in local metal levels of GSI chips typically have lengths less than 100µm. The 
logic core considered in Section 3.1, for example, is predicted by MINDS to have wires of 
length ranging from 0.5µm to 28µm in M1 and M2. The quantum resistance of 
multi-walled CNTs reduces their applicability to such short wires (Fig. 3.4). Many of these 
short wires in M1 and M2 also have large fan-outs. Distributing large fan-outs is difficult 
 39
due to the quantum resistance. Thus, wires in M1 and M2 are assumed to be constructed 
exclusively with copper for this analysis.  
While horizontally oriented carbon nanotubes help transport signals horizontally 
across a chip, vertical interconnects (vias) are required in each wire level to access higher 
levels of metal. Since the current conductivity of CNTs in non-axial directions is 
negligible, horizontally oriented CNTs cannot serve as vias. Copper is therefore required in 
each metal level for forming vias to higher levels of metal. Power interconnects form a 
significant portion of the total wiring area of a microprocessor today too. These 
interconnects are distributed using grid structures with segment lengths less than 1µm, and 
would be affected by the quantum resistance. Copper would therefore be required in all 
metal levels for power wiring purposes as well. This constraint of having both copper and 
carbon nanotubes in each metal level is an important barrier to carbon nanotube 
interconnect technology. Cost forms one of the major requirements for any interconnect 
technology, and having both copper and CNTs on the same metal level would cause a 
significant increase in lithography and process cost.     
3.2.3. Power Benefits  
A 4GHz 0.5V 30M gate logic core constructed in a 22nm technology is considered for 
this analysis. Two cases are considered: (a) copper is the only metal used for interconnect 
levels, and (b) both multi-walled CNTs and copper are used for interconnect levels. The 
power benefits of case (a) are first studied when the number of metal levels for case (a) and 
case (b) are the same. 
The minimum logic core area that allows all the wiring to be packed into 10 metal 
levels is obtained with MINDS.  Fig. 3.6 shows that while case (a) has a die area 
 40
requirement of 7.7mm2, the die area requirement for case (b) is only 4.7mm2. If logic cores 
form 40% of the total die area of the chip, this 39% reduction in die area for the logic core 
translates to a 15% reduction in total die area. The die area reduction results in more dice 
being processed per wafer and ameliorates the process cost overhead of having both CNTs 
and copper in various metal levels. 
 
 
 
 
 
 
 
 
 
Figure 3.6: Logic core area optimization using MINDS. 
 
The reason for this die area reduction with CNT interconnects is apparent from Table 
3.2 which shows wire pitches predicted with MINDS for case (a) and case (b). Essentially, 
due to the lower resistivity and capacitance of CNT interconnects, a chip using CNTs can 
have its wires sized smaller for the same clock frequency. Since the area required for 
routing these smaller size wires is reduced, die area is lower for the chip with CNT 
interconnects.  
 41
Table 3.2: Wire pitch prediction using MINDS. 
 
Metal levels Pitch   (nm) 
M1, M2 44 
M3, M4 44 
M5, M6 45 
M7, M8 76 
M9, M10 79 
 
 
     
Metal levels Pitch   (nm) 
M1, M2 44 
M3, M4 52 
M5, M6 65 
M7, M8 97 
M9, M10 186 
 
  
  Case (a): Chip with copper only.                   Case (b): Chip with CNTs and copper. 
  
The power predicted by MINDS for both cases is summarized in Table 3.3. 
Interconnect power is reduced by 47% for the logic core with CNT interconnects because 
of a decrease in total wire capacitance. This is due to two reasons: (a) The 39% reduction in 
logic core area results in shorter wires. (b) The capacitance per unit length of CNT 
interconnects is lower than copper.  There is also a 48% reduction in logic gate power when 
CNT interconnects are used. This is primarily due to lower values of wire capacitance, 
which result in smaller sizes for logic gates required to drive these wires.  For the same 
reason, latch sizes are also reduced. Furthermore, the die area savings cause shorter local 
clock wires. This reduction in clock wire and latch capacitances allows smaller local clock 
buffers to be used. The decrease in capacitance associated with latches, local clock wires 
and clock buffers leads to clock power savings. The clock power for the chip with CNT 
interconnects is 0.4W compared to 0.8W for the chip constructed exclusively with copper 
interconnects. This represents a 50% reduction in clock power. Repeater power is also 
significantly reduced from 2.1W to 0.4W largely due to improved performance offered by 
 42
multi-walled CNT interconnects. For the entire logic core, a 56% reduction in power is 
obtained with multi-walled carbon nanotube interconnects.  
 
Table 3.3: Power reduction with CNT interconnects for a 22nm technology. 
 
 Logic core with copper only 
Logic core with CNTs 
and copper 
Percentage reduction 
in power  
Logic gates 2.3W 1.2W 48% 
Repeaters 2.1W 0.4W 81% 
Interconnects 3W 1.6W 47% 
Clock 0.8W 0.4W 50% 
Total 8.2W 3.6W 56% 
 
Thus, a 4GHz 22nm logic core with 30M gates and 10 metal levels could have a 56% 
reduction in power and a 39% reduction in area if multi-walled carbon nanotubes are used 
along with copper for its interconnect technology. Of course, there is a process cost 
overhead to utilizing both carbon nanotubes and copper in various metal levels. 
The power benefits of carbon nanotube interconnects are now studied. This is done by 
assuming the 4GHz 0.5V 30M gate 22nm logic core with carbon nanotube interconnects 
has the same die area as the logic core constructed exclusively with copper interconnects. 
The wire pitches and number of metal levels obtained with MINDS are shown in Table 3.4. 
While the logic core with CNT interconnects requires 8.8 metal levels, the logic core 
constructed exclusively with copper interconnects requires 10 metal levels. Carbon 
nanotube interconnects thus allow a 12% reduction in the total number of metal levels. The 
 43
cause for this reduction in number of metal levels is apparent from Table 3.4. The lower 
resistivity and capacitance of carbon nanotube interconnects allows smaller wire pitches, 
and this, in turn, leads to reduced wire area and fewer metal levels.  
 
Table 3.4: Wire pitch prediction using MINDS. 
 
Metal levels Pitch   (nm) 
M1, M2 44 
M3, M4 44 
M5, M6 74 
M7, M8 91 
M9, M10 121 
Metal levels 8.8 
Die area 7.7mm2
Metal levels Pitch   (nm) 
M1, M2 44 
M3, M4 52 
M5, M6 65 
M7, M8 97 
M9, M10 186 
Metal levels 10 
Die area 7.7mm2
 
 
 
 
 
   
Case (a): Chip with copper only.                  Case (b): Chip with CNTs and copper. 
  
Table 3.5 shows a comparison of power consumption for the chip with carbon 
nanotube interconnects when it has the same die area as the chip constructed exclusively 
with copper interconnects. As described previously, wire capacitance reduces with carbon 
nanotube interconnects due to their lower effective aspect ratio and cylindrical shape. This 
decrease in wire capacitance causes a reduction in gate, latch and clock buffer sizes. 
Repeater power is also lowered due to the improved resistance and capacitance 
characteristics of carbon nanotube interconnects.  
Thus, a 4GHz 22nm logic core with 30M gates and a die area of 7.7mm2 could have a 
 44
43% reduction in power and 12% reduction in number of metal levels if multi-walled 
carbon nanotubes are used along with copper for its interconnect technology.  
 
Table 3.5: Power reduction with CNT interconnects for a 22nm technology. 
 
 Logic core with copper only 
Logic core with CNTs 
and copper 
Percentage reduction 
in power  
Logic gates 2.3W 1.6W 30% 
Repeaters 2.1W 0.6W 71% 
Interconnects 3W 2W 33% 
Clock 0.8W 0.6W 25% 
Total 8.2W 4.7W 43% 
 
3.2.4. Performance Benefits 
This section of the manuscript describes the performance benefits of using 
multi-walled carbon nanotube interconnects in the logic core studied in Section 3.2.3. Two 
cases are considered: (a) The logic core has copper interconnects only and runs at 4GHz 
with a power budget of 8.2W (b) The logic core has both multi-walled CNT interconnects 
and copper and has a power budget of 8.2W (same as case (a)). The maximum clock 
frequency for the logic core in case (b) is evaluated.   
Multiple simulations are run with MINDS for this purpose. For each clock frequency, 
the minimum die size for which the interconnects can be packed into 10 metal levels is 
computed. Power consumption is evaluated for this value of the die size. Fig. 3.7 indicates 
that a logic core with carbon nanotube interconnects can be clocked at frequencies as high 
 45
as 6.1GHz and still have a power dissipation of less than 8.2W. A logic core constructed 
exclusively with copper interconnects can be clocked only at 4GHz for 8.2W power 
dissipation, as shown in Table 3.5. There is a 6.5% die area reduction for the case involving 
multi-walled CNT interconnects.  
 
 
 
 
 
 
          
 
Figure 3.7: Performance optimization with multi-walled CNT interconnects. 
 
Table 3.6 reveals the wire pitches predicted by MINDS for the chip with just copper 
and for the chip with both multi-walled CNT interconnects and copper. Although the chip 
with multi-walled CNT interconnects and the chip with copper interconnects have 
approximately the same power consumption, the distribution of power is different (Fig. 
3.8). The clock power, interconnect power and logic gate power for the chip with 
multi-walled CNT interconnects are higher due to its higher clock frequency. Repeater 
power is lower due to the reduced capacitance and resistance associated with carbon 
nanotube interconnects.  
 46
 Table 3.6: Wire pitch prediction using MINDS. 
 
Metal levels Pitch   (nm) 
M1, M2 44 
M3, M4 44 
M5, M6 75 
M7, M8 119 
M9, M10 119 
Metal levels Pitch   (nm) 
M1, M2 44 
M3, M4 52 
M5, M6 65 
M7, M8 97 
M9, M10 186 
 
 
 
 
  Case (a): Chip with copper only.                Case (b): Chip with CNTs and copper.  
 
 
 
 
 
 
   Case (a): Copper interconnects only.                       Case (b): CNT and copper interconnects. 
Figure 3.8: Power consumption of logic cores. 
 
Thus, carbon nanotube interconnects allow a 43% increase in clock frequency for a 
22nm logic core with a die area reduction of 6.5%. There is a process cost penalty of 
having both copper and CNT interconnects on eight metal levels. 
3.2.5. Conclusion  
 47
For a 4GHz 8.2W 22nm logic core with 30M gates and 10 metal levels, carbon 
nanotube interconnects are found to provide (a) 56% reduced power and 39% reduced die 
size, or (b) 43% reduced power and 12% fewer metal levels, or (c) 43% higher frequency 
and 6.5% reduced die size. While these are substantial benefits, carbon nanotube 
interconnects also come with challenges. There is a process cost overhead since eight metal 
levels require both carbon nanotube interconnects and copper. Fabrication of carbon 
nanotube interconnects is immature today and needs significant progress before being 
considered a contender to replace copper. To the best of the author’s knowledge, this work 
represents the first system level comparison between carbon nanotube and copper 
interconnects.      
 
3.3. A Circuit Solution: Improved Repeater Insertion Techniques    
Improved repeater insertion techniques could help mitigate the repeater power issues 
discussed in Section 3.1. The equations in Fig. 3.9 summarize a repeater insertion model 
derived by the author to minimize energy-delay product (EDP). In this model, k is the 
number of repeaters, h is the size of repeaters i.e. the ratio of width of a repeater transistor 
to the width of a minimum size transistor, Rint is the wire resistance, Cint is the wire 
capacitance, Vdd is the supply voltage, b is the percentage of time the circuit is sleep gates, 
f is the frequency, a is the activity factor and Ro, Co and Ileak are the resistance, capacitance 
and leakage of a minimum size repeater respectively. The derivation of this model is 
described in Appendix B. The repeater number and repeater size that minimize 
energy-delay product of wires in a 100nm technology are obtained from this model and 
results are compared with SPICE simulations for the same in Table 3.7. It can be observed 
from Table 3.7 that the error of this model is <10% for all the cases that are considered.   
 48
   
Set , simplify and approximate t
o int int int
o o
2 2
o dd dd leak int dd
2
R C R CDelay k 0.7 hC 0.4 0.7hC
h k k k
1 1Power a CV f bV I hk a C V f
2 2
Energy delay product (EDP) Delay .Power
d(EDP) d( EDP)0, 0
dh dk
⎡ ⎤⎛ ⎞ ⎛ ⎞= + + +⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦
⎛ ⎞= + +⎜ ⎟⎝ ⎠
− =
= =
( ) ( )
o get
     
where   and 
Optimal delay =  
Optimal energy-d
int int o int
o o int o
2
o dd2 2
gate gate gate
2
o dd dd leak
int int o o
R C R Ck , h
R C R C
1a CV f
20.73 0.07log , 0.88 0.07log 1a CV f bV I
2
0.7 0.4R C R C 0.7 0.7
γ δ
γ Φ δ Φ Φ
γ δδ γ
= =
= + = + =
+
⎛ ⎞+ + +⎜ ⎟⎝ ⎠
( )
elay product 
=  
2
2 2
int int o o dd dd leak gate
1 0.7 0.4R C R a CV f bV I 0.7 0.7
2
γ δ γδ Φδ γ
⎡ ⎤⎛ ⎞⎛ ⎞+ + + + +⎢ ⎥⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎢ ⎥⎣ ⎦
 
Figure 3.9: An energy-delay product minimization model for repeater insertion. 
 
Fig. 3.10 illustrates that transistor area of future microchips would consist mainly of 
logic transistors, memory transistors (SRAM cells) and communication transistors 
(repeaters). Logic and memory transistors perform inherently different functions; they 
used to have the same device parameters, but this is not the case anymore. Similarly, the 
author proposes that logic and communication transistors also need to have their own 
uniquely optimized device parameters in the future. For example, communication 
transistors could have different values of threshold voltage (Vt) from logic transistors. The 
rationale behind this proposal can be obtained from the newly derived repeater insertion 
 49
model as described in the following pages of this manuscript. A 22nm microprocessor with 
parameters outlined in Table 3.8 is considered for analysis.  
 
Table 3.7: Comparison of model with SPICE simulations for a 100nm technology. 
Transistor models [3.19] have Ro=14.1kohm, Co=0.7fF, Ileak=52.5nA. Interconnect 
dielectric constant is 3. Repeater size is defined at the beginning of Section 3.3.   
Wire width Number of repeaters  - from model 
Repeater size 
– from model 
Number of repeaters  
– from SPICE 
Repeater size    
– from SPICE 
100nm 2.1/mm 40 2.2/mm 40 
200nm 1.05/mm 80.4 1.15/mm 80 
400nm 0.52/mm 161 0.55/mm 150 
600nm 0.35/mm 241 0.35/mm 240 
800nm 0.26/mm 322 0.28/mm 300 
 
 
 
 
 
 
 
 
 
 
 Figure 3.10: Three types of transistors in GSI chips. 
 50
Table 3.8: Parameters of a 22nm technology [3.5]. 
 
Quantity Value 
Technology node 22nm 
Frequency 4GHz 
Number of gates 40M 
Die area 12mm2
Supply voltage 0.5V 
Threshold voltage 0.18V 
Metal levels 10 
 
 
 
 
 
 
      
2 2 2
int int
1 0.7 0.4Optimal [ ( )( 0.7 0.7 ) ( )]
2
2 2where (0.73 0.07 ln )  and =(0.88 0.07 ln )            
o o dd dd leak gateEDP R C R a C V f bV I
gate gate
γ δ γδ φδ γ
γ φ δ φ
= + + + +
= + +
+
 
 
 
 
 
 
 
 
 
Figure 3.11: Energy-delay product for a repeated wire as a function of threshold voltage. 
 51
The repeated wire EDP vs. Vt plot of Fig. 3.11 indicates that an optimal Vt  =0.25V 
exists for all repeated wires on a chip that minimizes their EDP. This is because the 
wire-independent term inside square brackets in the optimal EDP expression must be 
minimized to minimize EDP. Since all repeaters on a chip have the same optimal Vt value, 
a single lithography and implant step is sufficient to fix the threshold voltage for all 
repeater transistors. 
Fig. 3.12 shows that delay of a repeated wire with the new model is fairly insensitive to 
increase in Vt near the optimal point. The delay at Vt = 0.25V is only 5% higher than the 
delay at Vt =0.18V. The reason for this is again evident from the model. The expression for 
delay of the repeated wire using the model is 
int int
2 2
0.7 0.4 Optimal delay = ( 0.7 0.7 )
where (0.73 0.07 ln )  and =(0.88 0.07 ln )
o o
gate gate
R C R C γ δδ γ
γ φ δ
+ + +
= + + φ
 …(3.4)
  1.3 proportional to ( )
dd
o
dd t
VR
V V−  (by the power law MOSFET model [3.21]) 
When Vt is increased, two terms are affected in Eq. (3.4). The term inside the square 
root sign increases due to the increase of repeater output resistance Ro. This has the effect 
of increasing the delay. However, gateφ , the ratio of dynamic power to total power of a 
repeater, increases because leakage power goes down when Vt is increased. The effect of 
the larger gateφ  is to have larger values of γ and δ, i.e. when Vt is increased, the model 
inserts more repeaters and bigger repeaters. This decreases the (0.7/ δ+0.7 γ +0.4/ γ +0.7 δ) 
term in the above equation and has the effect of decreasing the delay. An example is 
provided in Fig. 3.12 to illustrate this. When the Vt value is increased from 0.18V to 0.25V, 
 52
the square root term in Eq. (3.4) increases by a factor of 1.17 while the  (0.7/ δ+0.7 γ +0.4/ 
γ +0.7 δ) term decreases by a factor of 0.89, with the net result that the delay increases by 
only 5%.  This delay overhead can be counteracted in a power-efficient manner by 
increasing wire pitch by just 3.6%.  
 
 
 
 
 
 
 
 
 
Figure 3.12: Delay of a 22nm wide 10mm long repeated wire when repeater insertion is 
carried out to minimize the energy-delay product. 
 
The delay of an inverter driving a fan-out of 4 (FO4 delay) is considered to be 
representative of the delay of a typical logic circuit such as an ALU or a multiplexer which 
generally has short wires [3.22]. Fig. 3.13 shows that the delay of a fan-out of 4 inverter 
having average length wires is sensitive to its threshold voltage. It can be seen that if the 
threshold voltage is increased from 0.18V to 0.25V, the delay of the fan-out of 4 inverter 
increases by 38%, compared to just a 5% increase in delay of a repeated wire for the same 
 53
increase in threshold voltage. The reason for this can be understood better with the 
equation for delay of a fan-out of 4 inverter. 
FO4 delay = int0.7 ( )o o
R C C W
W
+   
Here, W is the width of an average size inverter and Cint is the capacitance of an 
average length wire. When Vt is increased, the output resistance Ro of the inverter increases 
and impacts the FO4 delay. As can be observed in Eq. (3.4), a repeated wire has its delay 
depending on the square root of Ro. A fan-out of 4 inverter, on the other hand, has its delay 
depending on Ro, and is more sensitive to increases in Ro. Fig. 3.14 reveals that a 91% 
increase in gate sizes is required to have the same delay at Vt = 0.25V as the delay at Vt = 
0.18V. This large gate size overhead prevents increase in Vt values of logic transistors. 
 
 
 
 
 
 
 
 
Figure 3.13: Fan-out of 4 inverter delay for a 22nm technology.  
 54
To summarize the results obtained in the last few pages, optimally repeated wires can 
have higher threshold voltages than specified. This is because their delay is not sensitive to 
threshold voltage. Any small delay increases can be counteracted power-efficiently by 
using slightly larger wires. On the other hand, the delay of logic circuits increases 
substantially with increase in threshold voltage. This is because there does not exist any 
technique to compensate the delay penalty when threshold voltage values are increased. 
Thus, an optimized GSI chip would have higher threshold voltages for its repeater 
transistors compared to its logic transistors.  Using the same arguments, an optimized GSI 
chip could also have longer channel lengths and/or thicker gate oxides for its repeater 
transistors compared to its logic transistors. To the best of the author’s knowledge, this is 
the first time this idea has been proposed.  
 
 
 
 
 
 
 
Figure 3.14: Percentage increase in gate sizes required to have the same FO4 delay at the 
specified threshold voltage as the delay at threshold voltage = 0.18V. 
 55
Table 3.9 shows power benefits of the EDP repeater insertion methodology when it is 
compared to the commonly used sub-optimal repeater insertion model for a 22nm 
technology with supply voltage=0.5V and threshold voltage=0.18V. The EDP model with 
unique threshold voltage values for logic and repeater transistors would reduce power of a 
repeated wire by 53% for 13% higher wire size.  The total repeater area for this case is less 
than the total repeater area for the sub-optimal repeater insertion model. 
 
Table 3.9: Power consumption for different repeater insertion models for a 22nm 10mm 
wire. 
Repeater insertion 
model 
Repeater 
number 
Repeater 
size 
Wire pitch 
(nm) 
Delay (ns) Power 
(µW) 
Bakoglu [3.23] 766 27 44 2.07 384 
Sub-optimal model 
[3.2]
346 30 47 2.07 223 
EDP model 293 19 51 2.07 147 
EDP model with unique 
Vt values for logic and 
repeaters 
307 28 53 2.07 105 
 
 
The benefits of these repeater insertion techniques are analyzed with MINDS at the 
system level for the 22nm 40M gate logic block considered in Section 3.1.  Table 3.10 
indicates power savings obtained when the EDP model is used along with unique threshold 
voltage values for logic and repeater transistors. A 26% reduction in total logic block 
power is obtained for a 9% increase in the number of wire levels. Repeater power, in 
particular, is lowered by 78%, while repeater area is reduced by 33%.  
 56
Table 3.10: Benefits of energy-delay product based repeater insertion for a 22nm logic 
block. 
 Sub-optimal model New EDP model with unique 
threshold voltage values for 
logic and repeaters 
Die area 12mm2 12mm2
Wire pitches of pairs of 
metal levels (nm) 
44, 45, 60, 93, 250 44, 57, 60,109, 109, 285 
Power Total = 17.4W                
Logic gates = 5.7W            
Repeaters = 5.8W             
Wires = 4.5W               
Clock = 1.5W 
Total = 12.9W               
Logic gates = 5.7W           
Repeaters = 1.25W            
Wires = 4.5W                
Clock = 1.5W 
Number of metal levels 9.3 10.1 
Repeater area 3.6mm2 2.4mm2
Number of repeaters 9.6 million 5.8 million 
 
 
 
3.4. An Architectural Solution: Parallel Processing Architectures    
In the early 1990s, Chandrakasan, Sheng and Brodersen proposed the use of parallel 
processing architectures as a potential technique to reduce power [3.20]. The impact of that 
early work has been substantial, with several multi-core processors available today. This 
section studies the implications of such parallel processing architectures on signal 
interconnect networks. Single core, dual core and quad core chips depicted in Fig. 3.15 are 
compared for this purpose. 
Fig. 3.15 shows that while the single core chip is clocked at 10GHz, the dual core chip 
is assumed to have two 5GHz cores, and the quad core chip is assumed to have four 
2.5GHz cores. If software is completely parallel, which is the case for many 
 57
high-performance applications, these three architectures would have the same throughput. 
Note that Chandrakasan, Sheng and Brodersen compared similar systems for their work 
[3.20]. The single core chip is taken to have 60% of its area, i.e. 45mm2, consumed by 
cache memory [3.2]. Hence, the dual core and quad core chips are taken to have 45mm2 of 
cache memory as well.  
 
 
 
 
 
 
 
 
 
Figure 3.15: Single core, dual core and quad core chips used for case study. A 22nm device 
technology is considered. 
 
3.4.1. Intra-Core Interconnect Networks 
The interconnect networks within a core are first analyzed. Multiple simulations are 
run in MINDS with different supply and threshold voltage values such that the total power 
of logic cores is minimized. The chosen supply and threshold voltages and power 
consumption of 10GHz, 5GHz and 2.5GHz logic cores are summarized in Table 3.11. As 
 58
can be observed, supply voltages can be lowered and threshold voltages can be raised with 
multiple lower frequency cores. This is due to the reduced performance requirements 
associated with lower frequencies [3.20]. This results in a reduction in both dynamic and 
leakage power. Different components of the power of each core are listed in Table 3.12.  
 
 Table 3.11: Optimized supply and threshold voltage. 
 Single core 10GHz 
chip 
Dual core 5GHz chip   Quad core 2.5GHz 
chip 
Supply voltage 0.75V 0.58V 0.48V 
Threshold voltage 0.18V 0.22V 0.24V 
Total power of logic 
cores 
167W 78W 49.2W 
 
 
Table 3.12: Components of power for each logic core. 
 10GHz 30mm2        
80M gate core 
5GHz 30mm2         
80M gate core 
2.5GHz 30mm2        
80M gate core 
Dynamic power of 
logic gates 
27W 4W 0.8W 
Leakage power of 
logic gates 
24W 3.6W 1.1W 
Dynamic power of 
repeaters 
7W 4.5W 0.8W 
Leakage power of 
repeaters 
11W 2.8W 2W 
Interconnect power 61W 18W 6.2W 
Clock power 36W 5.6W 1.2W 
 
 59
The interconnect requirements for these three logic cores are obtained with MINDS 
and are shown in Table 3.13 and Fig. 3.16. Table 3.13 indicates that the dual core chip has 
smaller wire pitches than the single core chip and the quad core chip has smaller wire 
pitches than the dual core chip. This is because MINDS equates RC delay of the longest 
wire in each pair of metal levels to a certain fraction of the clock frequency for evaluating 
wire pitch. Eq. (3.1) and Eq. (3.2) (reproduced on the following page) describe this 
condition. It should be noted that Eq. (3.2) is modified to reflect the fact that an 
energy-delay product based repeater insertion methodology is used for this analysis.      
                                            
                              
2
2
2 2
rc
. cl
P f
ρ βτ = =       for unrepeated wires, and 
           
2 0 7 0 40 7 0 7o orc
l cR C . . . .
P f
ρ βτ γδ γ
⎛ ⎞= + + +⎜⎝ ⎠δ =⎟  for repeated wires 
 
These equations indicate clearly that when clock frequencies are lowered for a logic 
core, the wire pitch reduces for both repeated and unrepeated wires. This helps explain the 
trend shown in Table 3.13.  
When wire pitches are reduced, the area needed for routing wires also reduces. For a 
fixed die area, the number of interconnect levels is decided by the area needed for routing 
wires. Thus, multiple core lower frequency chips with smaller wire pitches would require a 
fewer number of metal levels. Fig. 3.16 reveals that the dual core chip needs 25% fewer 
metal levels for intra-core communication than the single core chip. The quad core chip 
needs 38% fewer metal levels for intra-core communication than the single core chip.  
 60
Table 3.13: Pitches of metal levels for the considered logic cores. 
 10GHz 30mm2         
80M gate core 
5GHz 30mm2          
80M gate core 
2.5GHz 30mm2        
80M gate core 
M1, M2 pitch 44nm 44nm 44nm 
M3, M4 pitch 81nm 54nm 47nm 
M5, M6 pitch 83nm 73nm 64nm 
M7, M8 pitch 134nm 103nm 97nm 
M9, M10 pitch 141nm 171nm 262nm 
M11, M12 pitch 199nm 415nm  
M13, M14 pitch 292nm   
M15, M16 pitch 650nm   
  
 
 
 
 
 
 
 
 
 Figure 3.16: Number of interconnect levels required for intra-core wiring. 
 
MINDS provides another interesting result. Table 3.14 reveals that while a benchmark 
1mm wire for the 10GHz single core chip is routed in a 134nm pitch metal level, the same 
 61
wire is routed in a 103nm pitch metal level for the 5GHz dual core chip and in a 47nm pitch 
metal level for the 2.5GHz quad core chip. It is well known that copper resistivity increases 
exponentially with smaller wire dimensions [3.7]. A resistivity calculation using compact 
models in [3.7] show a 91% increase in resistivity of a 1mm wire as one moves from the 
10GHz single core chip to the 2.5GHz quad core chip. It reveals that the impact of copper 
resistivity increases due to size effects would be more pronounced in lower frequency 
parallel processing architectures.  
 
Table 3.14: Impact of size effects in parallel processing architectures.  
 10GHz 30mm2        
80M gate core 
5GHz 30mm2         
80M gate core 
2.5GHz 30mm2        
80M gate core 
Pitch of metal level 
used for routing 1mm 
wire 
134nm 103nm 47nm 
Resistivity of metal 
level used for routing 
1mm wire 
3.3uohm-cm 3.8uohm-cm 6.3uohm-cm 
 
 
3.4.2. Inter-Core Interconnect Networks 
Currently available dual core and eight core chips use a crossbar switch for inter-core 
communication [3.24][3.25]. For this reason, a crossbar is considered in this work as well. 
Publications on the above mentioned dual core chip [3.24] suggest that 144 byte lines are 
needed for its inter-core wiring. The eight core chip [3.25] uses a 134 GB/s crossbar for 
inter-core wiring. In this thesis, it is assumed that each core has a dedicated connection 
from its center to its crossbar. This connection has an aggressive bandwidth of 512GB/s in 
 62
each direction, and is shown in Fig. 3.17. The topmost metal layer pitch for intra-core 
wiring of each chip in the case study is used for routing inter-core busses as well. 
 
 
 
 
 
 
 
 
Figure 3.17: Inter-core communication network of dual core and quad core chips. 
 
The number of wiring levels needed for inter-core wiring of the 5GHz dual core chip is 
obtained as follows:  If a is the length of each side of the logic core, the longest wire in the 
stochastic wiring distribution of each core has length = 2a. Its bandwidth is 5Gbps, since its 
delay is equal to the clock period and clock frequency is 5GHz. The bandwidth of each 
wire of the bus from the center of a core to the crossbar (shown in Fig. 3.17) is thus  
10Gbps, since the bus length is a. To get 512 GBps in each direction, we need 
512Gbps*8bits/10Gbps*2directions=819 wires. Since each wire has a pitch of 415nm and 
length of ‘a’ for the dual core case, the area needed for each bus to the crossbar assuming 
40% wiring efficiency = 819 wires* 415nm*a/0.4 = 4.7mm2. As the wiring area available 
for each metal level is 30mm2, the bus to the crossbar takes up 16% of each metal level. 
 63
Similar calculations for the four core chip suggest the bus from each of its cores to the 
crossbar takes up 39% of a metal level. Thus, as the number of cores increases, the wiring 
overhead of inter-core wiring increases. This overhead, however, is small compared to the 
intra-core wiring, in spite of the aggressive bandwidth chosen for the inter-core wiring 
busses.  
 
 Table 3.15: Inter-core communication overheads. 
 Dual core 5GHz chip Quad core 2.5GHz chip 
Metal levels for inter-core 
communication 
16% of one metal level 39% of one metal level 
Power overhead of inter-core 
communication 
6W 8W 
 
 
Evaluation of the power overhead of the inter-core communication network is 
difficult. It depends on the design of the crossbar switch, the extent of parallelism in the 
software (which decides the activity factor of the buses) and cache coherence protocols. 
The power overhead of inter-core communication for the dual core chip is roughly 
assumed to be 8% of the total power of the cores based on previous work [3.20]. Similar 
estimates obtained for the power overhead of the quad core chip’s inter-core 
communication network are shown in Table 3.15. Although these power estimates are 
approximate and are not rigorously obtained, the error in total power is minimal due to the 
small values of these numbers (when compared to the total power of logic cores obtained in 
Table 3.11). 
 64
3.4.3. Intra-Core and Inter-Core Communication Networks 
Combining the results from Section 3.4.1 and Section 3.4.2, one can obtain the results 
shown in Table 3.16.  
 
Table 3.16: Summary of results. 
 Single core 
10GHz chip 
Dual core 
5GHz chip 
Quad core 
2.5GHz 
Number of metal levels considering both 
intra-core and inter-core communication
15.1 11.5 9.7 
Total power considering logic cores and 
inter-core network 
167W 82W 57W 
Power density 550W/cm2 137W/cm2 48W/cm2
Die area 75mm2 105mm2 165mm2
 
 
It can be observed that the dual core chip requires 24% fewer metal levels than the 
single core chip and the quad core chip requires 35% fewer metal levels than the single 
core chip. Table 3.16 also reveals that the power and power density drop dramatically due 
to reduced frequency parallel processing architectures, as predicted by Chandrakasan, 
Sheng and Brodersen [3.20]. In particular, the single core 10GHz chip has an unacceptable 
power density of 550W/cm2. To put this in perspective, the maximum power density that 
can be cooled with today’s air cooled heat sinks is about 100W/cm2 [3.26]. The dual core 
chip reduces the power density by a factor of almost four while the quad core chip provides 
an order of magnitude reduction in power density. In today’s data centers where power and 
cooling costs are comparable to server hardware costs, this tremendous reduction in power 
 65
and power density provides a significant reduction of cost of ownership of a server. The 
contribution of this work is the finding that the number of metal levels reduces significantly 
with parallel processing architectures. This, in the author’s opinion, is another attractive 
benefit of parallel processing, which along with the power and power density decrease, 
would make parallel processing architectures more attractive. The main drawback is the 
die area increase depicted in Table 3.16. However, the significant reductions in power, 
power density and number of metal levels seem promising enough to overlook this die area 
penalty (at least for the transition from single core to dual core). This is primarily because 
power and cooling costs are comparable to hardware costs in today’s data centers. Another 
interesting result from this study is the increased impact of copper size effects in parallel 
processing architectures.  
 
3.5. Summary    
Signal interconnect networks are projected to play an important role in future GSI 
chips. For a high-performance logic core constructed in a 22nm technology, signal 
interconnects are predicted to consume 35% of the total power while repeaters are 
predicted to consume 26% of the total power. Interconnects also contribute to more than 
half the total number of lithography steps to a CMOS process, thus impacting cost of a GSI 
chip significantly. Several techniques are studied to tackle these issues.  
For a 4GHz 8.2W 22nm logic core with 30M gates and 10 metal levels, carbon 
nanotube interconnects are found to provide (a) 56% reduced power and 39% reduced die 
size, or (b) 43% reduced power and 12% fewer metal levels, or (c) 43% higher frequency 
and 6.5% lower die size. The drawback of using these interconnects is the increased 
 66
process cost of utilizing both carbon nanotube and copper interconnects in various metal 
levels. Of course, this analysis assumes carbon nanotube technology will be mature enough 
to be manufacturable. 
A compact model to minimize energy-delay product of a repeated wire is derived. This 
model has less than 10% error when compared to SPICE simulations. It is also found that 
utilizing uniquely optimized device technologies for logic and repeater transistors is 
beneficial to future GSI chips. To the best of the author’s knowledge, this represents the 
first time this idea has been proposed. The above energy-delay product based repeater 
insertion technique, when used along with uniquely optimized threshold voltage values for 
logic and repeater transistors, is found to provide a 26% reduction in power for a 9% 
increase in number of metal levels for a 22nm 4GHz logic core.  
Architectures where multiple logic cores run at lower frequencies are found to give a 
24%-35% reduction in number of metal levels compared to architectures which have a 
single high-frequency core. It is also found that these lower frequency parallel processing 
architectures are impacted more by size-dependent resistivity increases in copper. 
   
 
 67
CHAPTER 4 
POWER INTERCONNECT NETWORKS 
 
There are three main concerns for on-chip power interconnect network design in GSI 
chips [4.1]: (a) IR drop – This is the resistive drop in the power distribution network. (b) 
Simultaneous switching noise – This represents the potential drop across the inductance of 
the packaging and solder bumps. Decoupling capacitors are typically added to the on-chip 
power distribution network to compensate this drop. (c) Electromigration (EM) – This is a 
reliability issue caused by long-term current flow through copper wires and solder bumps. 
Three-dimensional integrated circuits have higher current densities than two-dimensional 
ones, and therefore face even more significant challenges with the three issues mentioned 
above.  
Power supply noise in integrated circuits is typically restricted to 10% of the supply 
voltage [4.1] due to noise margin considerations. With the reduced supply voltages and 
increased chip currents that accompany CMOS scaling, this goal has become tougher to 
reach due to IR drop and simultaneous switching noise concerns. Electromigration is also 
exacerbated with scaling due to the increased chip currents and feature size reduction for 
on-chip interconnects, vias and solder bumps. This, in turn, leads to higher current 
densities through these components. This chapter is organized as follows. Section 4.1 
provides the summary of a technique proposed by the author to reduce EM in solder bumps 
and on-chip global power distribution networks. Section 4.2 describes a technique to 
reduce IR drop and Ldi/dt noise issues in future GSI chips. Section 4.3 discusses power 
distribution in 3D GSI chips. The chapter concludes with a summary in Section 4.4. The 
 68
work in this chapter has been described by the author in [4.2] and [4.3]. 
 
4.1. Electromigration Resistant Power Delivery Systems    
Current is normally fed from an off-chip DC-DC converter to transistors on the 
processor through a printed wiring board, a package, solder bumps and chip-level global 
interconnects (Fig. 4.1). Electromigration concerns could make the transfer of such high 
currents through many of these interconnections difficult. In fact, the ITRS predicts that 
current density per solder bump would reach 4x104 A/cm2 at the 32nm node, and that EM 
would necessitate material changes. In the long term, the ITRS says a new solder 
technology may need to be invented that can tolerate higher current densities. Temperature 
increases in the package and higher levels of on-chip interconnect stacks exacerbate the 
above EM concerns [4.4][4.5]. For example, [4.4] shows experimental data which indicate 
that chip-level solder bump temperature can be 30oC higher than that of the transistors. Fig. 
4.2 is an IR thermal image showing temperature increases in the packaging of an Intel  
microprocessor [4.4]. 
 
 
(a)                                                                 (b) 
 
 
 
Figure 4.1: (a) Current delivery path for a microprocessor. (b) On-chip power distribution 
network. Sleep transistor symbols are drawn as shown for convenience.  
 69
It has been known for the last 10-15 years that EM lifetimes for AC (TTFAC) are 
several orders of magnitude higher than DC lifetimes (TTFDC ) [4.6][4.7]. Dependence of 
EM on AC frequency f is normally understood with the following model [4.6][4.7]. 
 
…(4.1)
2
when  1 / 2
when  1 / 2
for high AC frequencies
       
           2 ( )
           1000
 
 
DC
DC
AC DC
DC
DC
f TTF
f TTF
TTF TTF
f TTF
TTF
≤
>
=
=
≈
 
Eq. (4.1) has been confirmed with experimental data for several interconnect and via 
materials such as Cu, Al, Al/Cu and W [4.4][4.5]. Power wires and bumps carrying DC are 
thus the bottleneck for EM lifetimes of a system because signal wires and bumps normally 
carry AC [4.6][4.7]. This work proposes a way to deliver high currents through the bumps 
and chip-level global wiring of a power delivery system with reduced EM. 
 
 
 
 
 
Figure 4.2: Temperature increases in the packaging of an Intel microprocessor [4.4]. 
 
4.1.1. Proposed Technique 
The power delivery system for a microprocessor can be represented as shown in Fig. 
4.3. On-chip sleep transistors [4.8] T1 and T2 are used to cut off the power supply from 
inactive circuit blocks and reduce leakage power. These sleep transistors are normally 
 70
placed between on-chip global power distribution networks and on-chip local power 
distribution networks as shown in Fig. 4.1(b) [4.8].  It can be seen from Fig. 4.3 that current 
flow through the power delivery system is unidirectional, causing EM concerns. 
 
 
 
 
 
 
 
 
 
 
Figure 4.3: Unidirectional current flow in a standard power delivery system. 
 
The author suggests the alternate power delivery system which is depicted in Fig. 4.4. 
Two on-chip sleep transistors, T3 and T4 are added. Four FETs (T5, T6, T7 and T8) are 
also added at the output of the DC-DC converter. This modified power delivery system 
works as follows: During the first power-up of the processor, T1, T2, T5 and T6 are ON 
while T3, T4, T7 and T8 are OFF. Fig. 4.4(a) indicates the direction of current flow in this 
case.  
 71
  
 
 
(a) 
 
 
 
 
           
 
 
 
 
(b) 
 
 
 
               
Figure 4.4: Schematic of the newly proposed power delivery system when the 
microprocessor is (a) powered up the first time, and (b) powered up the second time. 
Current direction is indicated with arrows. 
 72
When the user shuts down his computer and boots it up again, T1, T2, T5 and T6 are 
turned OFF while T3, T4, T7 and T8 are turned ON as shown in Fig. 4.4(b). This results in 
current through the board, package, solder bumps and on-chip global wires reversing, as 
shown in Fig. 4.4(b). Fig. 4.4 shows that current received by the on-die transistors is 
always in the same direction.  Switching between Fig. 4.4(a) and 4.4(b) continues every 
time the processor is shut down and restarted. Thus, bidirectional current flows through the 
power delivery system and causes EM time-to-failure to improve for solder bumps, 
board/package wiring and on-chip global wires. It should be kept in mind that MOSFETs 
T5, T6, T7 and T8 need high W/L ratios to reduce I2R losses. Sleep transistors T3 and T4 
have the same size as T1 and T2, and therefore take 1%-6% of the die area of the chip 
[4.9][4.10].  
4.1.2. Measurements 
Assuming (i) the microprocessor is turned on/off once a day, and (ii) DC EM lifetime 
=10 years, Eq. (4.1) predicts that EM lifetimes for the new power delivery system shown in 
Fig. 4.4 could be as much as 1000 times higher than that of the standard power delivery 
system shown in Fig. 4.3. However,  practical constraints such as varying processor on/off 
times, current variations with workload and non-uniform current density across the die 
would reduce benefits of this scheme. It is difficult to estimate how much these factors 
would affect efficiency of the proposed system. However, it should be noted that with a DC 
EM lifetime of 10 years and a processor switched on/off once a day, the total on/off cycles 
is 3650. The law of large numbers [4.11] suggests that a certain amount of averaging of the 
above variations would occur. Since EM with a non-symmetric AC stress depends on 
average DC value [4.7], this could reduce impact of the above non-idealities.   
 73
  
 
        
 (a)                                                                       (b) 
 
 
 
Figure 4.5: (a) Schematic of test structure. (b) Chip fabricated for testing. 
 
A simple experiment was conducted with bumps made of eutectic solder to show 
operation of the proposed EM resistant systems. A structure shown in Fig. 4.5(a) was 
fabricated and 4A of current was passed at 115oC. A 30% resistance increase was chosen as 
the failure criterion. Fig. 4.5(b) is a picture of the chip having the structure shown in Fig. 
4.5(a).  
Testing revealed that the DC lifetime was 65 minutes. The polarity of DC was changed 
periodically as shown in Fig. 4.4. EM lifetimes were obtained and are plotted in Fig. 4.6(a). 
The results indicate that the EM lifetimes with bi-directional current follow Eq. (4.1) quite 
well. SEMs of normal and failed solder bumps are shown in Fig. 4.6(b). It can thus be 
concluded that the proposed power delivery system shown in Fig. 4.4 significantly 
improves EM lifetimes of solder bumps and on-chip global interconnects. The 
improvement in I/O interconnect EM obtained with this technique could enable use of 
 74
novel packaging technologies such as conductive adhesives that suffer from EM issues 
[4.12].  
 
 
(a)                                                                                           (b) 
 
 
 
 
Figure 4.6: (a) Comparison of measurements with Eq. (4.1). (b) SEM image of solder 
bump (i) before failure (ii) after failure. 
 
The two added sleep transistors, T3 and T4, are connected between power and ground 
wirings and consume leakage power. The power consumed by these transistors at the 32nm 
node can be computed based on sleep transistor sizing models given in [4.13] and ITRS 
device technology parameters. Assuming chip power is 100W and sleep transistors are 
high threshold voltage (Vt) devices with 0.15V higher Vt [4.13] than ITRS low Vt 
transistors, the total leakage power consumed by T3 and T4 would be 0.3W. One concern 
with switching power and ground networks as shown in this chapter is that a printed wiring 
board normally has other chips sharing a common power and/or ground connection with 
the processor. These additional chips would need to have their power and ground networks 
switched with sleep transistors to implement this scheme. The large EM benefits possible 
 75
with this technique could encourage its implementation in spite of the above weakness, 
especially for applications that are EM critical, such as automotive electronics [4.14], 3D 
stacked high-performance microprocessors and military electronics.     
4.1.3. Conclusions 
This work, for the first time, proposes interchanging power and ground networks of a 
microprocessor chip every time it is rebooted. This helps improve solder bump EM, a 
serious challenge for future technology generations, by as much as three orders of 
magnitude. This can be achieved without the need for material changes. Joule Heating 
induced EM concerns with solder bumps, on-chip global wires and on-chip global vias can 
also be reduced. This technique could be particularly useful in the fast-growing automotive 
electronics market. Reliability, and not performance, is the driver for integrated circuits 
built for these applications [4.14][4.15]. Chips for automotive applications typically 
operate around 150oC [4.15][4.16], and solder bump EM is an important concern since 
these temperatures are close to the melting point for solder [4.15][4.16].  
High-performance 3D integrated circuits consume significantly high currents as well, and 
would benefit from this technique.  
 
4.2. MIM Power-Ground Plane Decoupling Capacitors    
Currently available high-performance microprocessors typically use grid structures 
for power distribution and MOS capacitors for decoupling power supply noise. 
Unfavorable scaling trends with chip currents, supply voltages, MOS capacitor leakage 
and wire resistance, however, are causing severe challenges to this power distribution 
architecture. The trend towards Silicon-on-Insulator wafers that significantly worsen 
power supply noise is another issue [4.17].  
 76
4.2.1. Proposed Technique 
An improved power distribution architecture that uses on-chip power and ground 
planes separated by a high k dielectric is depicted in Fig. 4.7. The experimental data below 
gives a proof-of-concept demonstration that large area power and ground planes separated 
by a high k dielectric can be manufactured with good yield. A low temperature process was 
used to fabricate and test 3000 1mm x 1mm MIM (metal-insulator-metal) capacitors with a 
Ta2O5 dielectric (dielectric constant=25) and 1um thick copper layers. Measurements 
showed that for a 400nm Ta2O5 dielectric, the capacitance was 47nF/cm2 and the 
breakdown voltage was higher than 100V. Not even a single defect was detected among the 
3000 capacitors tested, indicating excellent quality of the manufacturing process. For a 200 
nm thick dielectric, the measured capacitance was 77nF/cm2 and breakdown occured at 
5MV/cm. Only one defect was detected among the 3000 capacitors fabricated. I-V curves 
of the measurements are shown in Fig. 4.8(a). 1cm x 1cm and 4cm x 4cm capacitors were 
also fabricated with good yield.  
 
 
                                                                        
                                                                       
(a)                                                                    (b) 
 
 
 Figure 4.7: (a) Proposed MIM  power-ground plane capacitor structure (henceforth 
referred to as MIM plane decaps).  (b) Test structure.  
 77
To test if the dielectric was scalable to thinner dimensions, 2.5nm and 6nm dielectrics 
were deposited in an oxide-metal-silicon structure and probed with a Hg probe. Fig. 4.8(b) 
shows a TEM of the structure. At 1-2V, measured leakage currents were small (of the order 
of 10nA/cm2), indicating scalability of the PVD process to thin dielectric layers. This 
behavior was reproduced at all points on the PVD film indicating its uniformity. This 
experiment indicates that several orders of magnitude improvement in capacitance density 
are possible beyond the 77nF/cm2 value measured in the previous page. The requirement 
for this is that sufficiently planar capacitor electrodes are available. The experimental data 
in this section represent the results of the author’s collaboration with Symmorphix Inc, a 
startup company based in Silicon Valley.  
               
 
                        
(a)                                                                                  (b) 
 
 
 
Figure 4.8: (a) I-V curve for capacitor with 400nm dielectric. (b) TEM of a 6nm Ta2O5 
layer above a 17nm Ta electrode. 
 
4.2.2. Benefits for Power Delivery  
A power plane is essentially a very fine power grid where the grid segments are right 
next to each other. Using the IR drop equation in [4.1] and comparing the on-chip IR drop 
 78
of a power plane to that of an equal thickness commercial microprocessor global power 
grid [4.16], it is found that the IR drop for the power plane case is about 25% that of the 
grid.  
 
 (a) 
 
 
                     
                                                 
 Symbol Baseline MIM plane 
decap 
Power grid resistance Rdie 0.34Ω 0.08Ω 
On-die decoupling cap. Cdie 300nF Cap.Density
*4cm2
Package inductance Lpackage 3.3pH 3.3pH 
Current step Istep 40A 40A 
 
 
 
 
 
 
 
(b)  
 
 
 
Figure 4.9: (a) Power delivery network used for analysis. (b) Reduction in first droop 
magnitude.  
 79
A lumped element power delivery system model (Fig. 4.9) is simulated in SPICE to 
get an approximate estimate of the simultaneous switching noise caused by package 
inductance (called the first droop). Fig. 4.9 shows that a 8% reduction in first droop 
magnitude can be obtained by using a MIM plane decap arrangement with 77nF/cm2 
instead of a grid-based power delivery system with 300nF transistor decoupling 
capacitance [4.18]. A 61% reduction in first droop magnitude can be obtained by using a 
MIM plane decap arrangement with 450nF/cm2.      
4.2.3. Benefits for Clock Distribution  
Fig. 4.10(a) shows a clock wire of a commercial chip [4.18] that has shields to provide 
nearby return paths. When ground planes are present as shown in Fig. 4.10(b), they (i) 
provide nearby return paths for the clock transmission line, allowing coplanar shields to be 
placed further away from the clock wire to reduce capacitance, and (ii) reduce return path 
resistance. Both of these improve latency.  
 
 
(a)                                                                      (b)                     
 
 
 
 
Figure 4.10: (a) Clock wire with grid based power distribution. Shields flank clock wire. 
(b) Clock wire with MIM plane decaps. 
 80
A section of a 2GHz clock tree with 2.5mm of the above wires is considered which has 
an inverter driving a fan-out of two. Drivers are modeled in SPICE using 180nm device 
models. RAPHAEL, a CAD tool for parasitic extraction [4.19], is used for finding the 
interconnect RLC parameters. Table 4.1 gives a summary of the resistance, capacitance 
and inductance per unit length of the configurations in Fig. 4.10(a) and Fig. 4.10(b). SPICE 
simulations show that the latency of that particular section of the clock tree is about 5% 
better for the clock wire with MIM plane decaps. 
 
Table 4.1: Electrical characteristics of clock wire. 
 Resistance 
per mm 
Capacitance 
per mm 
Inductance 
per mm 
Fig. 4.10(a) 12.1Ω 0.37pF 0.35nH 
Fig. 4.11(b) 11Ω 0.33pF 0.58nH 
 
 
4.2.4. Benefits for Signal Interconnects 
Two global bus configurations (Fig. 4.11) are compared in SPICE with Partial 
Electrical Equivalent Circuit (PEEC) models [4.1] for the RLC wires. Fig. 4.11(b) uses 
power planes, so coplanar shields are not needed, unlike the power grid case in Fig. 
4.11(a). This enables a designer to have increased spacing between signal wires for the 
same cross-sectional area of the bus. Thus, wire capacitance for Fig. 4.11(b) is reduced 
compared to Fig. 4.11(a). Bus length is 2.8mm and each wire is assumed to have a 80fF 
load capacitance. Signal wires in Fig. 4.11(a) have 100Ω resistive drivers. SPICE 
simulations predict worst case crosstalk noise for Fig. 4.11(a) to be 0.38*Vdd. To have the 
 81
 82
same peak crosstalk noise, the driver resistance for the wires in Fig. 4.11(b) is found to be 
170Ω. SPICE simulations show that the delay of the global busses in Fig. 4.11(a) and Fig. 
4.11(b) are 112ps and 87ps respectively. These simulations also indicate energy per bit for 
a 180 nm technology is 22% lower for a wire in Fig. 4.11(b).   
 
 
 
(a)                                                                          
 
 
 
 
 
 
 
(b) 
  
 
 
 
Figure 4.11: (a) Conventional grid based configuration with coplanar shields in blue. (b) 
Signal wires with MIM plane decaps. 
4.2.5. Conclusions 
This work suggests using power and ground planes separated by high k dielectrics in 
interconnect levels of high-performance microprocessors. Such structures could reduce IR 
drop by a factor of 4 and decrease simultaneous switching noise caused by package 
inductance by 8%-60% in power interconnect networks of high-performance 
microprocessors. This MIM plane decap structure also provides important benefits for 
signal and clock interconnects. Prototypes of this structure have been fabricated. Testing 
reveals it is possible to fabricate such MIM plane decaps with good yield. Symmorphix 
Inc., a startup company based in Sunnyvale, was responsible for the experimental work on 
this project while the author was responsible for the theoretical analysis. Interestingly, 
when this work was in progress, Freescale Semiconductor conducted research on using 
high-k capacitors in interconnect levels with power grids [4.20]. Such structures are now 
part of Freescale Semiconductor’s high-performance microprocessor products.  
 
4.3. Power Delivery for 3D Integrated Circuits 
As mentioned previously, power delivery is a serious concern for 3D integrated 
circuits. If a 3D integrated circuit consists of four 100A 1V chips (for example), it would 
consume a total current of 400A at 1V. Delivering power to such a stack conventionally 
would cause many challenges with simultaneous switching noise, electromigration and IR 
drop. One solution to this problem involves having a MIM plane decap structure for each 
chip in the 3D stack as discussed in Section 4.2. The 3D integrated circuit could also have 
its power and ground networks interchanged periodically to control electromigration as 
described in Section 4.1.  
A number of approaches to ameliorate power delivery issues of 2D integrated circuits 
 83
have been recently proposed in the literature. These include clock-data compensation 
[4.21], use of active filtering circuits [4.22], placement of the DC-DC converter closer to 
the IC [4.23][4.24] and embedded decoupling capacitance in the package/board [4.25]. 
These techniques are applicable to 3D integrated circuits as well. 
 
4.4. Summary 
This chapter describes two techniques to ameliorate power interconnect concerns in 
future high-performance 2D and 3D integrated circuits. The first technique involves 
periodic reversal of current direction in power delivery systems to enhance 
electromigration lifetimes of solder bumps and on-chip global power interconnect 
networks by upto three orders of magnitude. The second technique involves fabrication of 
power and ground planes separated by a high k dielectric in global interconnect levels. 
Resistive drop in the global power interconnect network is reduced by a factor of 4 due to 
the low resistance of power planes. Simultaneous switching noise due to package 
inductance is reduced by 8% to 60% depending on the decoupling capacitance integrated in 
this structure.  
 84
CHAPTER 5 
CO-DESIGN OF SIGNAL, POWER, CLOCK AND THERMAL 
INTERCONNECT NETWORKS 
 
The interconnect stack and die size of present-day integrated circuits are typically 
optimized using a CAD tool such as MINDS as described in Chapter 3.  While this 
approach is reasonably accurate for older technologies, sub-90nm chips are significantly 
interconnect limited and bring up several issues. These include: 
• Power distribution networks took up more than 25% of all wiring tracks in a 180nm 
microprocessor [5.1], and are expected to consume a bigger percentage of total 
wiring tracks with scaling [5.2]. Power distribution networks thus need to be 
modeled rigorously and have to be co-optimized along with signal/clock wiring and 
via blockage. 
• Currently available stochastic wire length distributions show significant error when 
compared to actual data. For example, the commonly used Davis distribution [5.3] 
shows as much as 38% error with respect to measurement data for circuit blocks 
analyzed later in this chapter. More accurate wire length estimates are needed. 
• Via blockage can take up as much as 10-30% of the total wiring area for some metal 
levels [5.4]. Assignment of wires in multiple interconnect levels should be done 
with via blockage considerations in mind. 
• Global interconnect pitch needs to be selected based on signal, power and clock 
wiring considerations.    
 85
• Repeater leakage power is substantial as discussed in Chapter 3, and needs to be 
considered when repeater insertion is performed. 
• Wire resistivity increases due to size effects [5.5] need to be modeled. 
This chapter presents IntSim, a GUI based CAD tool that helps answer the above 
concerns and thereby enables better optimization of sub-90nm interconnect networks. 
After presenting a new stochastic wire length distribution model, this chapter describes 
logic gate sizing in IntSim. Following this, global, local and intermediate/semi-global 
interconnect optimization in IntSim are described. The algorithm used to combine together 
all these models is then presented. Results from IntSim are compared with data from a 
commercial microprocessor and several case studies are presented to show how IntSim can 
be used. The work done in this chapter was published by the author in [5.6]. 
   
5.1. Models  
This section describes various models utilized in the CAD tool developed by the 
author. 
5.1.1. New Stochastic Signal Wire Length Distribution Model    
Several publications have discussed stochastic wiring distributions [5.7]. The Davis 
distribution [5.3], which is considered one of the most accurate [5.8], assumes gates are 
uniformly distributed all over the chip and then finds a distribution of wire lengths using 
Rent’s rule. This assumption leads to a sea of uniformly distributed gates with uniform 
white space between them. In general, however, this assumption of uniformly distributed 
gates is not valid. A look at the layout of an integrated circuit shows, in most cases, that 
gates are placed close to each other with little blank space between them. The derivation of 
 86
a new wire length distribution that considers random arrangement of gates in a circuit 
block is discussed in this section. Comparison with actual data later in this document shows 
that error is reduced substantially compared to the Davis distribution. 
For the purpose of this derivation, the author defines a new quantity called a gate 
socket. Any chip is considered to have many gate sockets, some of which are occupied by 
gates, as shown in Fig. 5.1. The number of gate sockets Nsockets is related to the number of 
gates Ngates by the relation: 
.gates sockets gatesN N p=  …(5.1)
where pgates is the percentage of die area that is occupied by logic gates. For example, a chip 
with 10 million gates and 50% of the die area occupied by logic gates [5.9] would have 20 
million gate sockets, with gates randomly distributed in 10 million of them. If Nsockets 
calculated with Eq. (5.1) is not an integer, it is rounded off to the nearest integer as an 
approximation. 
 
 
                                          
                            Figure 5.1: An illustration of the gate socket concept. 
 87
The expected number of interconnects of a certain length l is given as the product of 
M(l), the number of gate socket pairs separated by a distance l, and Iexp(l), the average 
number of interconnects between a gate socket pair separated by l [5.3]. 
exp( ) ( ). ( )i l M l I l=  …(5.2)
The number of gate socket pairs separated by a distance l is similar to Davis’ 
derivation [5.3] of the number of gate pairs separated by a distance l. Therefore,  
( )
3
2
3
2 2                          1    
3( )
1 2                            2
3
sockets sockets sockets
sockets sockets sockets
l l N lN l N
M l
N l N l N
⎧ − + ≤ <⎪⎪= ⎨⎪ − ≤⎪⎩ <
 …(5.3)
It should be noted that the value of l is in gate socket lengths. A gate socket length is 
defined as the distance between two adjacent gate sockets and is equal to (Die 
area/Nsockets)0.5 . Davis [5.3] defines gate pitch as (Die area/Ngates)0.5. A gate socket length is 
thus equal to (Ngates /Nsockets)0.5  = pgates0.5 gate pitches.  
 
                 
Figure 5.2: Block definitions to find average number of wires between a gate socket pair. 
The average number of interconnects between a gate socket pair separated by l is given 
 88
by: 
exp ( ) ( A to C
C
II l P Gate in block A).
N
− −=  …(5.4)
where P(Gate in block A) is the probability that block A of Fig. 5.2 is occupied by a gate, 
IA-to-C is the average number of interconnects connecting block A to block C and NC is the 
number of gates in block C. 
( gates gates
sockets
N
P Gate in block A)= p  
N  
=  …(5.5)
From the Davis derivation [5.3], 
( ) ( ) (p p p pA to C A B B B C A B CI k N N N N N N N Nα− − ⎡ ⎤= + − + + − + +⎣ ⎦)  …(5.6)
 f.o. is the average fan-out of the system, α=f.o./(f.o.+1), k and p are Rent’s constants and 
NA, NB are the number of gates in blocks A and B respectively. If gates are randomly 
distributed in gate sockets, the following approximations hold from Fig. 5.2.                   
1
. .( 1)
2 .
A
gatesB
gatesC
N
N p l l
N p l
=
= −
=
 …(5.7)
Combining Equations (5.2), (5.3), (5.4), (5.5), (5.6), (5.7) and normalizing, we get the 
average number of interconnects of length l gate socket lengths to be: 
( )
3
2 2 4
3
2 4
2 2                          1
2 3( )
2                                   2
6
p
sockets sockets socket
p
sockets sockets sock
k l l N lN l l N
i l
k N l l N l N
α
α
−
−
⎧ ⎛ ⎞Γ − + ≤ <⎪ ⎜ ⎟⎪ ⎝ ⎠= ⎨⎪ Γ − ≤ <⎪⎩
            where 
…(5.8)
 89
12 1
2 (1 )
21 2 2 1
( 1)(2 1)(2 3) 6 2 1 1
p
gates gates
p
p sockets sockets
sockets
N N
N NpN
p p p p p p p
−
−
−Γ = ⎛ ⎞+ −− − +⎜ ⎟⎜ ⎟− − − − −⎝ ⎠
−
 
The average wire length for this interconnect distribution is 
2
1
2
1
0.5
2 1
( )
(in gate socket lengths)=
( )
0.5 0.5 1 4
2( 0.5) ( 1)6 ( 0.5)
 = 
21 2 2 1
( 1)(2 1)(2 3) 6
sockets
sockets
N
avg N
p
p
sockets sockets
sockets
p
p s
sockets
li l dl
L
i l dl
p p pN N
p pN p
NpN
p p p p p
−
−
⎡ ⎤⎛ ⎞− − − − +− − +⎢ ⎥⎜ ⎟+ −+ ⎝ ⎠⎢ ⎥⎣ ⎦
+ −− − +− − −
∫
∫
o
p p
2 1 1
ckets socketsN
p p
⎛ ⎞−⎜ ⎟⎜ ⎟− −⎝ ⎠
  
For a large number of gates and p>0.5, this expression can be simplified to 
0.5
1 0.5 1 4(in gate pitches)
2( 0.5)( 0.5)
p
p p
avg gates gates
pL p N
p p
−
− − ⎛ ⎞+ −= ⎜ − +⎝ ⎠p ⎟
 …(5.9)
When gates are uniformly distributed over the die area, Davis derived the expression 
for average wire length to be: 
0.5
0.5 1 4(in gate pitches)
2( 0.5)( 0.5)
p
p
avg gates
pL N
p p
−
− ⎛ ⎞+ −= ⎜ − +⎝ ⎠p ⎟
 …(5.10) 
   Average wire length with the new wiring distribution is therefore the Davis average 
wire length multiplied by a factor that depends on the Rent’s constant p and the fraction of 
total die area occupied by logic gates. Most typical circuit blocks have 50-75% of the total 
die area occupied by logic gates [5.9].  Fig. 5.3 shows a comparison of measured average 
lengths and average lengths predicted by the Donath distribution [5.3], the Davis 
distribution and the new distribution for 22 ISCAS’89 circuit blocks. Rent’s constants and 
 90
number of gates for these benchmark circuits are obtained from [5.10]. While the Donath 
distribution and the Davis distribution have an average error of 75% and 38% with respect 
to actual data respectively, the new model has an error between 8% and 24% corresponding 
to values of pgates ranging from 0.5 to 0.75.   
  
 
Figure 5.3: Validation of average wire lengths with new model with actual data from 22 
ISCAS’89 circuit blocks. Average error of: Donath distribution = 75%, Davis distribution 
= 38%, New distribution = 8%-24%. 
 
Table 5.1 shows a comparison of average wire length obtained from measurements 
with values predicted by the Davis distribution and the new distribution, for benchmark 
circuits provided by Davis in [5.11]. It can be seen that while the Davis distribution has an 
average error of 26% for these circuits, the new distribution has average errors of only 
2%-12 %. 
 91
Table 5.1: Validation of model with actual data for average wire length. 
Number 
of gates 
Rent’s 
constant p 
Actual 
data 
Davis average 
length 
New model with   
pgates=0.5 
New model with 
pgates=0.75 
2146 0.75 3.53 5.26 4.37 4.87 
576 0.75 2.98 3.9 3.22 3.6 
528 0.59 2.20 3.12 2.44 2.79 
671 0.57 2.63 3.12 2.45 2.82 
1239 0.47 2.14 2.96 2.26 2.64 
73 0.667 2.00 2.35 1.89 2.14 
78 0.667 2.27 2.38 1.91 2.17 
72 0.667 1.88 2.34 1.88 2.13 
252 0.667 2.73 2.96 2.39 2.71 
236 0.667 2.198 2.93 2.36 2.67 
237 0.667 2.887 2.93 2.36 2.67 
55 0.667 1.579 2.23 1.79 2.03 
59 0.667 1.38 2.25 1.81 2.06 
62 0.667 2.08 2.28 1.83 2.08 
Average 
error 
  26% 2% 12% 
 
 
Fig. 5.4 shows how the new wiring distribution differs from the Davis distribution for 
a 36 sq. mm circuit block with 12 million gates, pgates = 0.5, average fan-out = 3, and Rent’s 
constants k=4 and p=0.55.  Equations (5.9) and (5.10) suggest average length for the new 
distribution would be 27% less than the average length for the Davis distribution. While the 
log scale plot in Fig. 5.4(a) indicates only a small difference for short lengths, the linear 
 92
scale plot in Fig. 5.4(b) shows a noticeable difference and captures the trend of the wiring 
distribution moving towards shorter lengths.  
 
          
             
 
(a) 
 
 
 
 
 
 
 
 
(b) 
 
 
 
 
Figure 5.4: (a) Comparison of new distribution with Davis distribution in the log scale. (b) 
Comparison of new distribution with Davis distribution in the linear scale for short lengths. 
 93
5.1.2. Logic Gate Model  
Logic gates are modeled as two input NAND gates and are sized based on average 
wire length estimates provided by the new wiring distribution. If W is the device width, the 
delay of a logic path having two input NAND gates driving a fan-out f.o. is given by [5.9]: 
( int0.7 . . . .NANDd d NANDRt L f o C W f o CW χ= + )  …(5.11) 
where Ld is the logic depth, χ = 4/(f.o.+3) is a factor that converts point-to-point net 
length to wiring net length, RNAND is the average drive resistance of a minimum size 2 input 
NAND gate, CNAND is the input capacitance of the NAND gate and Cint is the capacitance of 
an average wire. CNAND is computed assuming nMOS and pMOS devices are sized equally 
in a 2 input NAND gate, while RNAND  is obtained from equations given in [5.9]. If c is the 
capacitance per unit length of a wire, A is the die area and F is the feature size, the area of a 
NAND gate of width W is given by 20.4(7.3+W)F2 [5.3]. Eq. (5.9) indicates            
0.5
1 0.5
int
12 0.5
1 4
2( 0.5)( 0.5)
20.4(7.3 ) 1 4                 
2( 0.5)( 0.5)
p
p p
avg gates gates
gates
p p
p AC cL cp N
p p p N
W F pc A
A p p
−
− −
− −
⎛ ⎞+ −= = ⎜ ⎟− +⎝ ⎠
⎛ ⎞ ⎛+ + −= ⎜ ⎟ ⎜ − +⎝ ⎠ ⎝ p
⎞⎟⎠
 …(5.12) 
It is interesting to note that unlike with previous wiring distributions, the length of an 
average length wire with the new distribution is a function of logic gate size. Essentially, it 
means if the die area is fixed and we use smaller size gates, they can be placed closer to 
each other, and so average wire lengths would reduce. If we define a constant  
12 0.5
1
20.4 1 4
2( 0.5)( 0.5)
p pF pk c A
A p p p
− −⎛ ⎞ ⎛ ⎞+ −= ⎜ ⎟ ⎜ ⎟− +⎝ ⎠ ⎝ ⎠
  
 94
Eq. (5.11) becomes: 
( )110.7 . . . . (7.3 ) pNANDd d NANDRt L f o C W f o k WW χ −= + +  …(5.13) 
The delay expression in Eq. (5.13) is equated to (1-margin)/f for finding gate size 
where f is the frequency and margin is the fraction of a clock cycle that constitutes skew 
and variability. 
5.1.3. Global Interconnect Model    
The choice of global interconnect pitch significantly impacts performance of signal, 
power and clock interconnect networks [5.1]. In this section, a compact physical model for 
global wire pitch is derived by considering all these three types of interconnect networks. 
The existing literature for global wire pitch models consists of [5.12], which considered 
only signal interconnects while deciding global interconnect pitch, and [5.13], which gave 
a design plane for choice of global interconnect pitch without providing a single optimal 
value that could be used in a technology.    
 
       
       Figure 5.5: Cross-section of global wiring layers for an integrated circuit. 
 
Fig. 5.5 shows two thick global metal layers of a certain integrated circuit. An 
 95
expression for IR drop of a uniform power grid with flip chip packaging is given by [5.2] 
_ _ _ _
2
0.65
ln( )
2 . . .
pad to pad pad to padT
IR
p pad
d  dIV
N AR k P l
ρ
π=  …(5.14) 
where ρ is the wire resistivity, dpad_to_pad is the distance between two adjacent power 
pads, lpad is the length of a pad, IT is the current distributed per pad and N is the number of 
power grid segments between two pads. The total area occupied by the power distribution 
in two orthogonal global wire levels can be estimated using Fig. 5.6. 
 
 
 
 
 
 
 
 
 
Figure 5.6:  Top view of a global power grid between four pads. 
 
Essentially, the area occupied by a single power wire is (kpP+P/2).dpad_to_pad and there 
are N segments between two pads. Therefore, the area occupied by power wires on two 
orthogonal metal levels is 2.N.(kpP+P/2).dpad_to_pad. Since the number of power pads is 
 96
Npower_pads, the total area occupied by power wiring is               
_ _.2 .( . ).2power power pads p pad to pad
PA N N k P d= + _  …(5.15) 
where Npower_pads is the number of power pads. Since the global wire area of clock 
wires is negligible [5.14], the area occupied by global signal, power and clock wires is: 
2 .total power signalA A l P= +  …(5.16) 
Note that lsignal is the total length of global signal interconnects and 2Apower is 
considered since power and ground networks have equal areas. Using Equations (5.14), 
(5.15) and (5.16),  
 
( ) 2_ _ _ __ . 0.65. 12. 0.5 . . . .ln .. .T pad to pad pad to padtotal p power pads signalp IR pad
I d d
A k N l P
AR k V l P
ρ π
⎡ ⎤⎛ ⎞= + +⎢ ⎥⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
 
…(5.17) 
 Also, Atotal=ew.Adie where ew is a wiring efficiency factor and Adie is the die area. Using 
the above two equations,  
 
( ) 2_ _ _ __ 2. 0.65.. 12. 0.5 . . . .ln. . .T pad to pad pad to padw diesignal p power pads p IR pad
I d de Al k N
P AR k V
ρ π
⎡ ⎤⎛ ⎞= − +⎢ ⎥⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦l P
 
 
Since a designer typically wants to route as many signal wires in the global wire levels 
as possible, dlsignal/dP=0. This gives 
 97
 ( ) 2_ _ _ __ . 0.65.4. 0.5 . . . .ln. . . . .T pad to pad pad to padp power pads w die p IR pad
I d d
P k N
e A AR k V l
ρ π
⎛ ⎞= + ⎜⎜⎝ ⎠
⎟⎟  …(5.18) 
The global wire pitch is also a function of clock wiring. Based on [5.15], the maximum 
distance a buffer can drive four similar buffers (with acceptable slew) through a tapered H 
Tree is a good metric for how well a clock tree operates. Fig. 5.7, Fig. 5.8 and Fig. 5.9 
reveal that the H tree in Fig. 5.7 is equivalent to the wire in Fig. 5.9.  
The slew for the wire [5.16] in Fig. 5.9 is  
 
( )
90% 10%
2
0.8
1.1 2 2.75 4 .2 . 4
2 2
o
o o o
t tSlew
Rr r       c D R C c D C W D
W
−=
⎛ ⎞⎛ ⎞= + + +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
 …(5.19) 
To get the buffer size that gives the best-case slew, we set d(Slew)/dW=0 and 
substitute the obtained W in Eq. (5.19) to get 
21.1 11( )best o o o oSlew rcD R C D R C rc= + +   …(5.20) 
 
 
 
 
             
 
Figure 5.7: A tapered H tree that is typically used for clock distribution purposes. The wire 
resistance per unit length is denoted as r and the wire capacitance per unit length is denoted 
as c.  Due to symmetry, x1, x2, x3 and x4 are equipotential points. 
 98
  
 
 
 
 
 
 
 
Figure 5.8: The equipotential points from Fig. 5.7 are merged in this figure. Distributed RC 
wire models show the equivalence between the top and bottom configurations in this figure. 
Due to symmetry, y1 and y2 are equipotential points. 
 
 
 
 
 
 
 
 
Figure 5.9: Equipotential points from Fig. 5.8 are merged in this figure. 
 99
The maximum drive distance is obtained when the best-case slew in Eq. (5.20) just 
meets a particular slew requirement fixed by a designer i.e. Slewbest=β/f where f is the clock 
frequency and β is normally 0.15-0.25. Using this equation, the fact that r=ρ/kcP2  and Eq. 
(5.20), we get 
 1 44. 72.6 11
2 11c o o o o
o o
D cP
k R C fR C
fR C
ρ β
β
⎡ ⎤= + +⎢ ⎥⎣ ⎦−
 …(5.21) 
In Eq. (5.21), c is a function of ratios of wire dimensions [5.17] and therefore does not 
change with wire pitch as long as all wire dimensions are scaled. From Eq. (5.18) and Eq. 
(5.21),  
 
( ) 2_ _ _ __ . 0.65.4. 0.5 . . . .ln ,. . . . .
. 1 44. 72.6 11
2 11
T pad to pad pad to pad
p power pads
w die p IR pad
c o o o o
o o
I d d
k N
e A AR k V l
P Max D c   
k R C fR C
fR C
ρ π
ρ β
β
⎡ ⎤⎛ ⎞+⎢ ⎥⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎢ ⎥= ⎛ ⎞⎢ ⎥+ +⎜ ⎟⎢ ⎥⎜ ⎟⎝ ⎠−⎢ ⎥⎣ ⎦
 
…(5.22)
To verify this equation, the parameters in Eq. (5.22) are chosen based on published 
data from a commercial microprocessor [5.1] and values from the ITRS as:  
 
 
kp=2, Npower_pads=2100, ρ=2.2x10-8ohm-m, IT=0.13,  ew=0.4, dpad_to_pad =400um, Adie=400 sq. mm, 
AR=0.6, VIR=30mV, lpad=100um, D=2mm, c=4x10-10F/m, kc=1.9, RoCo=1.85ps,  β=0.25, and  f=1.3GHz  
This gives P=max(0.5um,1.35um)=1.35um. This result matches well with data from 
that particular microprocessor, which had a pitch of 1.26um [5.1].  
5.1.4. Local Interconnect Model    
 IntSim has two wire levels for routing local signal, power and clock wiring. Local 
 100
interconnect pitch, Plocal, is selected as 2F, where F is the feature size [5.17]. The length of 
the longest wire routed in local interconnect levels, lmax, is obtained from [5.17] as              
 
max
1
2
l
w local
sockets
Ae A P li l dl
N
χ= ∫ ( )                                …(5.23)
Essentially, the left hand side of Eq. (5.23) represents the area available for routing 
wires in the two local interconnect levels, and the right hand side of Eq. (5.23) denotes the 
area required for routing all wires having lengths between 1 and lmax gate socket lengths. ew 
is a wiring efficiency factor given by: 
  / _1w router power gnd signal viase e e e= − − −        …(5.24)
where erouter is the efficiency of the wire routing tool (typically around 0.5), epower/gnd is 
the fraction of area used by power and ground wires and esignal_vias is the fraction of area 
used by signal vias. epower/gnd is obtained from the model for local power distribution 
networks derived in [5.18]. Via blockage in local interconnect levels comes from vias to 
wires routed in higher metal levels and vias for repeaters. Based on the model for via 
blockage given in [5.4],  
 
2
_
_
(2 2 ).( )wires higher repeaters local
signal vias
N N P
e
A
sλ+ +=           …(5.25)
where Nwires_higher is the number of wires routed in higher metal levels, Nrepeaters is the 
number of repeaters for higher metal levels, λ is the design rule unit and s is a via covering 
factor which is typically 3 [5.4]. Nwires_higher is found from the stochastic wiring distribution 
by finding the number of wires whose length is greater than lmax. IntSim also runs 
electromigration checks on local power wiring based on maximum current density limits 
 101
set by the user. The local interconnect model used in IntSim is more accurate than local 
interconnect models used in prior multilevel interconnect simulators such as MINDS. This 
is because IntSim rigorously models wiring efficiency by calculating the area consumed by 
signal via blockage and power wires, while MINDS assumes wiring efficiency is 0.4 for all 
designs. Also, IntSim utilizes the improved stochastic wire length distribution described in 
Section 5.1 while MINDS makes use of the Davis distribution. 
5.1.5. Intermediate and Semi-Global Interconnect Model    
  Intermediate and semi-global wires in IntSim are modeled based on Eq. (5.26) and 
Eq. (5.27). The right hand side of Eq. (5.26) denotes the area required for routing wires of 
length lying between lmin and lmax in a pair of wire levels, and the left hand side denotes the 
area available for routing. Here, P is the pitch of the pair of wiring levels. Eq. (5.27) 
represents the condition that the delay of the longest wire in a pair of metal levels should be 
a certain fraction of the clock period, as discussed in [5.17]. Eq. (5.27a) represents this 
criterion when no repeaters are inserted while Eq. (5.27b) represents the case when 
repeaters are inserted with the Energy-Delay Product minimization strategy discussed in 
Chapter 3.  Width of wires is equal to spacing between wires. 
 
max
min
2 (
l
w
sockets l
Ae A P li l dl
N
χ= ∫ )  …(5.26) 
2
max2
( , )4.4
.rc sockets
P ar Acl
ar P N f
ρ βτ = =  
max2 ( , ) 0.7 0.40.7 0.7
.
o o
rc
sockets
l P ar cR CA
N ar P
ρ
f
βτ γδ γ
⎛= + +⎜ ⎟⎝ ⎠δ
⎞+ =            
…(5.27a) 
…(5.27b) 
 102
2 2
2
2
/ _
(0.73 0.07ln ) , (0.88 0.07ln )
1
2
1
2
1
gate gate
o dd
gate
o dd dd leak
w router power gnd signal vias
aC V f
aC V f bV I
e e e e
γ φ δ
φ
= + = +
=
+
= − − −
φ
 
In these equations, ρ is the wire resistivity, c is the wire capacitance per unit length, b 
is the percentage of time the circuit is not sleep gated, f is the clock frequency, Vdd is the 
supply voltage, a is the activity and ar is the wire aspect ratio. Ro, Co and Ileak are the 
resistance, capacitance and leakage of a minimum size repeater respectively. The value of 
β is 0.25 for short wires and 0.9 for long wires. The wiring efficiency factor for 
intermediate and semi-global levels (ew) has two sources. The first source is via blockage 
due to vias to higher levels of metal and due to repeaters (esignal_vias). These are modeled 
based on [5.4]. The second source is power via blockage (epower_gnd) that is modeled based 
on [5.18]. Wire resistivity increases due to size effects are modeled as shown in [5.5].  
The intermediate and semi-global wire model for IntSim is superior to models for the 
same in prior multilevel interconnect simulators such as MINDS due to the following 
reasons: (i) IntSim utilizes an energy-delay product based repeater insertion model while 
MINDS makes use of the sub-optimal repeater insertion model [5.17]. As explained in 
Chapter 3, the energy-delay product based repeater insertion model is more relevant to 
sub-90nm technologies. (ii) IntSim rigorously models wiring efficiency by considering the 
area consumed by signal via blockage and power via blockage, while MINDS assumes 
wiring efficiency is 0.4 for all designs. (iii) IntSim utilizes the improved stochastic wire 
length distribution described in Section 5.1 while MINDS makes use of the Davis 
distribution. 
 103
5.2. Implementation of CAD Tool 
This section describes the algorithm and graphical user interface design for IntSim. 
5.2.1. Algorithm  
In IntSim, the process of selecting wire pitches for different interconnect levels 
proceeds in several steps: 
1. Input all parameters:  The user inputs various details of the system that is being modeled. 
2. Logic gate sizing: Logic gates are sized based on Eq. (5.13) such that clock frequency 
targets are reached. 
3. Generation of stochastic wiring distribution: Based on the logic gate size chosen in Step 
2, the fraction of die area occupied by logic gates, pgates, is found. This is used to generate 
the stochastic wiring distribution given in Eq. (5.8).  
4. Set baseline parameters for iterations: The design of power interconnects and their area 
allocation depends on the chip power. However, chip power is not known until repeaters 
are designed in the multilevel wiring network, especially in sub-90nm chips where 
repeaters consume a significant fraction of total power.  Also, design of the interconnect 
stack needs some knowledge of via blockage caused by repeaters. Thus, an iterative 
process is followed for assigning wires in a multilevel wiring network. An initial chip 
power estimate is set (as 100W, say) and the number of repeaters is set as 0.   
5. Local interconnect modeling: Local wire pitch is set as 2F. Using Equations (5.23), 
(5.24) and (5.25), the longest wire routed in M1 and M2 is determined. 
6. Arrangement of wires without repeaters: Once the longest wire routed in M1/M2 is 
determined, it is set as lmin in Eq. (5.26). Equations (5.26) and (5.27a) are then used to find 
 104
the pitch of M3/M4 and and maximum wire length routed in them. This in turn is set as lmin 
for the next pair of metal levels and this process continues till the longest interconnect of 
the wiring distribution is assigned a pitch. 
7. Global interconnect modeling: A top-down process of global interconnect pitch 
selection and repeater insertion then begins. Global wire pitch is constrained to be the value 
found from Eq. (5.22). The area needed for routing power wires is then found from Eq. 
(5.15), and it helps calculate the area available for signal wires in global wire levels.  Clock 
wire area is neglected in IntSim because previous work has shown it is small [5.14]. 
Repeaters are inserted into these global signal wires, and the shortest signal wire routed in 
global wire levels is found based on a formula similar to Eq. (5.26). 
8. Assignment of wires with repeaters: Based on the length of shortest global signal wire, 
wires with repeaters are assigned to the pair of metal levels below the global wire levels, 
based on Equations (5.26) and (5.27b). The pitch and shortest wire lmin are found for this 
pair of wiring levels and lmin is set as lmax for the pair of wiring layers below it. Repeater 
insertion is performed for the pair of wiring layers below it. This keeps continuing till one 
runs out of die area for placing more repeaters or till the addition of repeaters does not 
improve wire delay. 
9. Power computation and iteration: Once repeaters are assigned, the total chip power is 
calculated. Logic gate power is found using device widths calculated in Step 2 and 
formulae given in [5.9]. Local clock power is computed by extending models in [5.19]. 
Wire power is calculated based on the stochastic wiring distribution [5.3], and repeater 
power is calculated based on Step 8 and repeater power models given in [5.20]. Leakage 
power variability is modeled as discussed in [5.21]. If the total power calculated is different 
 105
from the power estimate used for designing power distribution wiring, IntSim sets  
             Old estimated power+Calculated powerEstimated power = 
2
 
and goes back to Step 5. For the next iteration, the number of repeaters is set as the value 
calculated in Step 8. 
10. Data output: When the simulation converges, the total number of wire levels, pitches of 
each wire level and a power estimate are output. 
5.2.2. Graphical User Interface 
IntSim is designed to run in MATLAB with an easy-to-use graphical user interface 
(GUI). A description of how a typical simulation in IntSim is performed is given below.  
1. Type “intsim” in the command prompt for MATLAB. 
 
 
 
 
 
 
 
 
 
 
 106
2. Click “Start” in the pop-up menu that opens up. 
 
 
 
 
 
 
3. Enter system parameters such as supply voltage, threshold voltage, clock frequency, 
logic depth, Rent’s constants, activity factor, die area and number of gates. 
 
 
 
 
 
 
 
 
 
4. Enter device technology parameters such as minimum feature size, saturation drain 
current of a nFET, leakage current of a nFET, effective oxide thickness, ratio of drive 
 107
currents of pMOS to nMOS and subthreshold slope. 
 
 
 
 
 
 
 
 
5. Enter interconnect/package technology parameters such as dielectric constant, wire 
aspect ratio, number/size of power pads and average distance between power pads.  
 
 
 
 
 
 
 
 
 
 108
6. Specify filename where text output of simulation needs to be stored. 
 
 
 
 
7. A pop-up window opens with some of the results of the simulation. A user can also 
look at his/her specified text file for more results. 
 
 
 
 
 
 
 
 
 
5.3. Verification and Case Studies 
This section shows the verification of IntSim and details its use with two case studies. 
5.3.1. Verification of CAD Tool 
In this section, IntSim is used to predict wiring requirements of a commercially 
available 65nm 3GHz high-performance dual core microprocessor [5.22]. The predictions 
 109
for number of interconnect levels, wire pitches and logic core power are compared with 
actual values of these quantities. Details of this chip’s transistor parameters and number of 
gates in each core are obtained from published data in [5.22][5.23]. The dielectric constant 
for interconnects is 2.9, contacted gate pitch is 220nm and supply voltage is 1.325V 
[5.22][5.23]. Rent’s constants k and p are chosen as 4 and 0.55 respectively based on 
guidelines in [5.9] that custom chips would have Rent’s parameters around these values. 
Area of a logic core is obtained from die photos and published information about total die 
area [5.22].  Package technology parameters are obtained from data on older 
high-performance chips with the assumption that package technology does not scale. The 
values of wire pitch obtained are not very sensitive to package technology parameters, so 
these rough calculations are not expected to cause significant error.     
 
Table 5.2: Comparison of results from IntSim with actual data. 
 Actual data Prediction from IntSim 
M1 210nm 220nm 
M2 210nm 220nm 
M3 220nm 283nm 
M4 280nm 283nm 
M5 330nm 283nm 
M6 480nm 283nm 
M7 720nm 880nm 
M8 1080nm 880nm 
 
Table 5.2 shows a comparison between wire pitches predicted by IntSim and actual 
 110
wire pitches used for that technology [5.23]. IntSim predicts the number of metal levels to 
be 8, which is exactly what is used for that interconnect technology. The wire pitches 
predicted by IntSim are similar to the ones actually used. One notable difference is that 
IntSim chooses wire pitches of two adjacent orthogonal metal levels to be the same, while 
the actual data has different wire pitches for adjacent orthogonal wiring levels.   
IntSim also predicts the total power of logic cores of this chip to be 62.3W, while total 
chip power based on measured data is 80W [5.22]. Published data is not available 
regarding the percentage of chip power consumed by caches and I/Os for this 
microprocessor. However, another 65nm processor had 19% of its total power consumed 
by these components and 81% of total power taken up by logic cores [5.24]. Assuming the 
processor analyzed with IntSim has similar numbers, the logic core power for this 
processor is 65.6W, which is quite close to IntSim’s prediction of 62.3W. 
5.3.2. Case Study of a 22nm Air-Cooled 2D Integrated Circuit    
This section shows the results of a case study conducted with IntSim on a future 22nm 
air-cooled logic core with 58M gates.  The purpose of this study is to show how IntSim can 
be used to project interconnect requirements and generate die size, frequency and power 
estimates in future generations of technology.  Device technology parameters are chosen to 
be ITRS low operating power technology parameters. Rent’s parameters k and p are 4 and 
0.6 respectively. Two “fat” global wire levels are used for this design. The logic core is 
assumed to be cooled with an air cooled heat sink having a thermal resistance of 
0.6oC-cm2/W. For a maximum operation temperature of 85oC (for reliability) and ambient 
temperature of 25oC, the maximum power density of the logic core can be 
(85-25)/0.6=100W/cm2.  
 111
Table 5.3 shows power, power density and interconnect requirements predicted by 
IntSim for different frequency and die size values for this logic core. Tables such as these 
can help a microprocessor architect generate rough estimates for frequency, die size, 
pitches of interconnect levels and power of a chip prior to design. For example, a clock 
frequency of 6.5GHz and die size of 25mm2 would be a good choice for the logic core 
analyzed in Table 5.3.  This is because 6.5GHz is the highest frequency that gives a power 
density less than 100W/cm2.  
 
Table 5.3: Design space exploration with IntSim. 
Frequency Die size 
Power 
predicted by 
IntSim 
Power density 
predicted by 
IntSim 
Number of 
metal levels 
predicted by 
IntSim 
15mm2 18W 120W/cm2 14 
20mm2 19.9W 100W/cm2 12 5GHz 
25mm2 20.9W 84W/cm2 10 
15mm2 24.6W 164W/cm2 14 
20mm2 26.6W 133W/cm2 12 5.5GHz 
25mm2 25.4W 102W/cm2 12 
15mm2 28W 187W/cm2 14 
20mm2 31W 155W/cm2 12 6GHz 
25mm2 33.8W 135W/cm2 12 
   
 
 
 112
Total power = 23.2W, Die area = 25mm2
 
 
Leakage - logic
18%
Active - logic
13%
Wires
38%
Clock
20%
Active - 
repeaters
3%
Leakage - 
repeaters
8%
 
 
 
                                   
Figure 5.10: Power predictions from IntSim. 
 
Power predictions from IntSim for this 25mm2 5.5GHz logic core are summarized in 
Fig. 5.11. The wiring requirement is summarized in Table 5.4. These wire pitch values can 
be used to develop a multilevel interconnect technology suitable for this logic core. 
  
Table 5.4: Wiring predictions from IntSim. 
 Metal 
levels 
Wire 
pitches 
 
Repeater 
count 
Percentage of 
total wire area 
available for 
signal wires 
Percentage of total 
wire area available 
for power wires 
and vias 
Percentage of 
total wire area 
used for signal 
via blockage 
M1, M2 44nm  29% 15% 6% 
M3, M4 100nm  42% 4% 4% 
M5, M6 111nm  38% 4% 8% 
M7, M8 176nm 2.5M 45% 5% ~0% 
M9, M10 224nm 0.5M 43% 7% ~0% 
M11, M12 782nm 0.02M 25% 25% ~0% 
 
 113
5.3.3. Case Study of a 22nm Microchannel-cooled 3D Integrated Circuit    
This section shows the results of a case study conducted with IntSim on a future 22nm 
microchannel-cooled logic core with 58M gates.  The microchannel-cooled logic core 
forms part of the 3D integrated circuit configuration described in Chapter 2. This structure 
is shown below in Fig. 5.11. 
 
 
 
 
 
 Figure 5.11: Schematic of Microchannel-cooled 3D Integrated Circuit under 
consideration. 
 
The junction-to-ambient thermal resistance for each chip in this 3D integrated circuit 
is 0.24oC-cm2/W, as discussed in Chapter 2. For a maximum operation temperature of 
85oC (for reliability) and ambient temperature of 25oC, the maximum power density of the 
logic core can be (85-25)/0.24=250W/cm2.  Supply voltage is 0.8V while other device and 
system parameters are taken from the ITRS. Through-silicon electrical and fluidic vias are 
assumed to consume 5% of the total area of this logic core. 
  Simulations are run with IntSim for different clock frequency and die size values as 
shown in Table 5.5. The maximum clock frequency that results in a power density of less 
than 250W/cm2 is 8.5GHz while the minimum die size that allows this clock frequency and 
power density is 35mm2. Thus, a designer could choose 8.5GHz as the clock frequency and 
 114
35mm2 as the die size. 
 
Table 5.5: Design space exploration with IntSim. 
Frequency Die size 
Power 
predicted by 
IntSim 
Power density 
predicted by 
IntSim 
Number of 
metal levels 
predicted by 
IntSim 
30mm2 83W 276W/cm2 10 
35mm2 79W 226W/cm2 12 8GHz 
40mm2 81W 202W/cm2 10 
30mm2 92W 307W/cm2 12 
35mm2 86W 246W/cm2 12 8.5GHz 
40mm2 92W 229W/cm2 12 
30mm2 102W 340W/cm2 14 
35mm2 105W 300W/cm2 12 9GHz 
40mm2 112W 280W/cm2 10 
 
  
The wiring requirement is summarized in Table 5.6. These wire pitch values can be 
used to develop a multilevel interconnect technology suitable for this logic core. An 
interesting point to note is that for certain metal levels such as M1, M2, M9 and M10, the 
area occupied by signal interconnects is less than the area occupied by ancillary 
functionality such as power interconnects, via connections to higher levels of metal, 
electrical through-silicon vias and fluidic through-silicon vias. Table 5.6 reveals that, on 
the average, interconnects responsible for ancillary functionality consume 40% of all 
routable wiring tracks for the logic core! 
 115
Table 5.6: Wiring predictions from IntSim. 
 Metal 
levels 
Wire 
pitches 
 
Percentage 
of total area 
for signal 
wires 
Percentage of 
total area for 
power wires 
and vias 
Percentage of 
total wire area 
for signal via 
blockage 
Percentage of total 
area for electrical 
and fluidic 
through-silicon vias
M1, M2 44nm 22% 15% 3% 5% 
M3, M4 74nm 39% 4% 2% 5% 
M5, M6 164nm 37% 7% 1% 5% 
M7, M8 296nm 33% 12% ~0% 5% 
M9, M10 944nm 23% 22% ~0% 5% 
Averaged 
from M1 
to M10 
 30% 20% 
  
 
Total power = 86W, Die area = 35mm2 
Leakage - logic
8%
Active - logic
13%
Wires
40%
Clock
22%
Active - repeaters
8%
Leakage - repeaters
9%
 
 
 
 
                                                  
Figure 5.12: Power predictions from IntSim. 
 
Power predictions from IntSim for this 35mm2 8.5GHz logic core are summarized in 
Fig. 5.12. 
 116
5.3.4. Applications 
IntSim could be used in industry for prediction of interconnect pitches, number of 
metal levels, die size, power and/or clock frequency of an integrated circuit prior to design. 
It could also be used to study scaling trends and estimate benefits of different technology 
and design innovations. Students in universities could use this CAD tool and interactively 
learn how a GSI chip works as well. 
  
5.4. Summary   
This chapter describes a CAD tool called IntSim that co-designs signal, power, clock 
and thermal interconnects along with via blockage. IntSim utilizes a new stochastic signal 
wire length distribution model, a newly derived global interconnect model, an 
energy-delay product based repeater insertion model, and an algorithm that co-designs 
various types of interconnects and vias on an integrated circuit. IntSim is verified with 
actual data from a commercial microprocessor and is used in this chapter to predict 
characteristics of logic cores in future 2D and 3D stacked integrated circuits. It is available 
for download from www.ece.gatech.edu/research/labs/gsigroup. 
 117
CHAPTER 6 
CONCLUSIONS AND FUTURE WORK 
 
This chapter begins with a summary of the contributions of this dissertation in Section 
6.1. Following this, avenues for future research are described in Section 6.2. 
 
6.1. Contributions of this Research 
The main findings of this research are as follows: 
• A microchannel-cooled high-performance 3D Integrated Circuit technology with 
each die having a thermal resistance of 0.24oC/W to its heat sink is developed. 
Fluidic network fabrication proceeds at the wafer-level, is compatible with CMOS 
processing and flip-chip assembly and requires four lithography steps. Demonstrated 
through-silicon electrical via density is 2500/cm2. To the best of the author’s 
knowledge, this represents the first experimental effort towards microchannel 
cooling of 3D integrated circuits.  
• Carbon nanotube interconnects can potentially provide a 56% reduction in power 
and 39% reduction in die size for a 22nm 4GHz logic core, if manufacturing 
challenges with carbon nanotube interconnects are solved. 
• Compact repeater insertion models that involve minimization of energy-delay 
product of a repeated wire are developed. It is also found, for the first time, that use 
of unique device technologies for logic and repeater transistors could be beneficial to 
future GSI chips. 
• Parallel processing architectures could significantly reduce interconnect 
 118
requirements in future chips. A 2.5GHz quad core chip, for example, requires five 
fewer metal levels than a 10GHz single core chip for a 22nm technology. 
• A circuit technique that improves solder bump electromigration lifetimes by several 
orders of magnitude is invented. This can be particularly useful in applications where 
solder electromigration is a serious challenge, such as 3D-ICs and automotive 
electronics.  
• An improved on-chip power distribution architecture for 2D and 3D integrated 
circuits consisting of power and ground planes separated by a high k dielectric is 
proposed. The feasibility of fabricating such structures with high yield is 
demonstrated.  
• A CAD tool called IntSim is developed to simulate multilevel interconnect networks 
in GSI chips. IntSim includes an improved stochastic signal wire length distribution, 
a newly derived global interconnect model and a methodology to co-design signal, 
power, clock and thermal interconnects. Results from IntSim have been validated 
with data from a 65nm microprocessor. This CAD tool is available for download 
from www.ece.gatech.edu/research/labs/gsigroup and is also provided in a CD on 
the back-cover of hard copies of this document. 
 
6.2. Avenues for Future Research 
Several opportunities for future research on this topic exist. Some of these are listed in 
this section. 
6.2.1. Wafer-to-Wafer Bonding Approach for Microchannel  Cooled 3D-ICs 
The work on thermal interconnects can be extended by exploring wafer-to-wafer 
 119
bonding approaches [6.1] for enclosing microchannels instead of using the sacrificial 
polymer and Avatrel approach that is described in Chapter 2. Essentially, silicon dioxide 
obtained during back-end-of-line (BEOL) processing of one of the chips in a 3D stack 
would cover the microchannels of a chip below it in the 3D stack, as shown in Fig. 6.1. 
 
 
 
 Figure 6.1: Wafer-to-wafer bonding approach for fabrication of microchannel-cooled 3D 
Integrated Circuits. 
  
6.2.2. 3D Stacked SRAM Cache Memory 
A high-performance microprocessor today typically has 40% of its area occupied by 
logic circuits while 60% of its area is consumed by SRAM arrays. While logic circuits 
require 10 to 12 levels of metal for sub-45nm technologies, SRAM caches rarely require 
more than 5 to 6 levels of metal. SRAM cells also frequently have longer channel lengths, 
higher threshold voltages and thicker gate oxides compared to logic circuits. This opens up 
opportunities for fabricating SRAM arrays on a separate chip and 3D stacking them with 
logic circuits to form a microprocessor, as shown in Fig. 6.2. The process for fabricating 
SRAM arrays could have fewer levels of metal, higher threshold voltages, longer channel 
 120
lengths and/or higher oxide thickness thereby saving process and mask cost. The 
challenges involved with implementing this scheme are the increased cost of 3D stacking 
and the elevated power densities that arise from such stacking. 
 
 
Figure 6.2: 3D stacking of SRAM with logic circuits to form a microprocessor. 
 
6.2.3. Stacking of 3D Phase Change Memory Arrays with Microprocessors 
Three-dimensional one time programmable (OTP) arrays were introduced by Matrix 
Semiconductor in 2003 at the Intl. Solid State Circuits Conference [6.2]. These arrays 
consisted of multiple layers of polysilicon diodes as steering elements and antifuses as 
memory elements. One could envision the antifuse being replaced by a resistive memory 
element such as a phase change material. Phase change materials such as Ge2Sb2Te5 (GST) 
and GeSb are being investigated by several DRAM and flash memory manufacturers as 
next-generation memory materials, since phase change memory arrays could have the 
speed of DRAM and the non-volatility of flash memory. Stacking such 3D phase change 
memory arrays with microprocessor chips could provide high memory bandwidths along 
with low latency, while reducing the area occupied by storage class memory in a 
high-performance server considerably. Fig. 6.3 shows a pictorial representation of this 
 121
concept.  
 
Figure 6.3: Stacking of 3D phase change memory arrays with microprocessors. 
 
6.2.4. Physical Limits of Copper Interconnects  
Copper wire resistivity increases exponentially when feature sizes scale down as 
discussed in Chapter 3. Scaling a circuit block from one technology generation to the next 
typically brings in about 50% die area savings. However, increases in wire resistivity could 
cause a die area penalty to be incurred while scaling down. This is because wires would 
need to be sized larger to compensate for increases in wire resistivity, which in turn, would 
require more die area. It would be interesting to study how this die area penalty changes 
with scaling. With speed and power advantages of scaling reducing in the post-classical 
scaling era [6.3], most integrated circuit manufacturers scale just to get higher component 
density. If even component density is sacrificed due to copper wire resistivity increases, 
that might prevent many manufacturers from scaling device technologies in the future.  
 122
APPENDIX A 
SOLUTION OF EQUATIONS TO CO-DESIGN THERMAL AND 
ELECTRICAL FUNCTIONALITY 
The two equations representing the model derived in Section 2.1 are given below for 
convenience. These essentially represent two equations in two unknowns, temperature (T) 
and power (P). 
amb
th
T TP
R
−=  …(A.1)
0
1 1
2
t
s
V ( T )
N
gates dd leak
av s
WP N V (T )I e
L N
α
α
− ⎛ ⎞−⎞= +⎜ ⎟⎟⎠ ⎝ ⎠
 
…(A.2)
 
Table A.1: Description of symbols used in model. 
Symbol Description 
thR  Thermal resistance of logic cores in oC/W 
ambT  Ambient temperature in K 
gatesN  Number of logic gates 
sN  Sub-threshold slope factor in mV 
av
W
L
⎞⎟⎠  Width to length ratio of transistors in a typical logic gate 
0leakI  Leakage current co-efficient of a typical minimum size logic gate in A 
Vdd(T) Optimal supply voltage obtained from the Nose-Sakurai model in V 
Vt(T) Optimal threshold voltage obtained from the Nose-Sakurai model in V 
 123
A description of symbols used in Eq. (A.1) and Eq. (A.2) is given in Table A.1. 
Substituting equations for optimal supply and threshold voltage values from Nose and 
Sakurai’s paper in Eq. (A.1) and Eq. (A.2) and solving them self-consistently, we get the 
following cubic equation for temperature. 
3 2
2
2 2 2 2 2
2 2
2 1 1 2    0
amb th
amb amb
th th th
nk AT T T R
B q B
T TA nkC CT R R R
B B B qB B B
α α
α α
⎛ ⎞+ − − −⎜ ⎟⎝ ⎠
⎛ ⎞− −⎛ ⎞ ⎛ ⎞+ + − − − −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
1 =
 
…(A.3)
where 
1
1 1
1
t 1
0
1
2 340 1 1
1 340
V
/
l l
l /
o ol d
o
/
l d
o
L
leak
gates
av o
afC fC KLnk nk ( )A k afC log . .
q I q e IfC KL nk
I eq
fC KL nkB
I eq
C .k afC
IWk N
L I
d
α
α
α
α
∆
⎛ ⎞⎛ ⎞⎜ ⎜ ⎟ ⎛ ⎞⎜ −⎜ ⎟= − + ⎜ ⎟⎜ ⎜ ⎟⎛ ⎞ ⎝ ⎠⎜ ⎜ ⎟− ⎜ ⎟⎜ ⎟⎜ ⎝ ⎠⎝ ⎠⎝ ⎠
⎛ ⎞= ⎜ ⎟⎝ ⎠
=
⎞= ⎟⎠
 
 
In the above equations, a is the activity factor, f is the clock frequency in Hz, Cl is the 
load capacitance of a minimum size nMOS transistor in F, n is a sub-threshold slope factor, 
k is Boltzmann’s constant, q is charge on an electron in C, Io is leakage current of a 
minimum size nMOS transistor in A, Ld is the logic depth, α is the exponent of the Alpha 
power law MOSFET model, ∆Vt is the change in threshold voltage due to all sources of 
variation in Vand K is a delay co-efficient. A reader could refer to [A.1] for more details on 
these symbols. Quantities whose units are not indicated are dimensionless. 
 124
There exist closed form solutions to cubic equations such as Eq. (A.3). For example, 
the Cardano solution [A.2] could be utilized to solve Eq. (A.3) to obtain temperature. Once 
temperature is obtained, Eq. (A.1) can be used to obtain power.  
 125
APPENDIX B 
A MODEL TO MINIMIZE ENERGY-DELAY PRODUCT OF A 
REPEATED WIRE 
For the purpose of this derivation, repeaters are considered to be inserted into an 
interconnect with resistance Rint and capacitance Cint. The output resistance and input 
capacitance of a minimum size repeater are Ro and Co respectively. If the size of each 
repeater is h times the size of a minimum size repeater and if k repeaters are inserted into 
the wire, the repeated wire can be represented as shown in Fig. B.1. Here, Ro/h is the output 
resistance of a single repeater and hCo is the input capacitance of a single repeater while 
Rint/k and Cint/k are the resistance and capacitance of a single segment of the repeated wire 
respectively. 
 
 
 
 
 
Figure B.1: Diagrammatic representation of a repeated wire. 
 
The delay of this repeated wire [B.1] can be written as shown in Eq. (B.1). If a is the 
activity factor, Vdd is the supply voltage, f is the clock frequency, b is the fraction of leakage 
power that is not saved with clock gating and Ileak is the leakage current of a minimum size 
 126
repeater, the power of the repeated wire can be approximated by Eq. (B.2).  
0 7 0 4 0 7o int int into o
R C R CDelay k . hC . . hC
h k k k
⎡ ⎤⎛ ⎞ ⎛= + + +⎜ ⎟ ⎜⎢ ⎥⎝ ⎠ ⎝⎣ ⎦
⎞⎟⎠  
…(B.1)
2 21
2 2o dd dd leak int dd
Power a C V f bV I hk a C V f⎛ ⎞= + +⎜ ⎟⎝ ⎠
1  
…(B.2)
A reader might observe that while dynamic power and leakage power are considered 
in Eq. (B.2), short circuit power is neglected. This is because short circuit power forms less 
than 10-15% of the total power of a sub-100nm repeated wire and previous work has 
shown that it can be neglected in a repeater optimization analysis [B.2-97]. Results from 
the model are compared with SPICE simulations in Chapter 3 to confirm this assumption.       
2
2
2 2int int
int int
.
1 10.7 0.4 0.7
2 2
o
o o o dd dd leak
Energy-Delay Product (EDP) Delay Power
kR C ChC R hC a C V f bV I hk a C V f
h k k
=
⎡ ⎤
dd
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + + + + +⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎣ ⎦⎣ ⎦
 
To obtain optimal values of h and k that minimize the energy-delay product, 
d(EDP)/dh and d(EDP)/dk are set as zero. The equation d(EDP)/dh=0 gives 
2
2
2
10 7 0 4 0 7
2
0 72 0 7 0 4 0 7 0 7
                                  
o int int
o int o o dd dd leak
o int int o int
o int o int o
kR C C. hC R . . hC a C V f bV I k
h k k
kR C C . R C. hC R . . hC . R C
h k k h
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ + + +⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛+ + + + − +⎜ ⎟ ⎜ ⎟ ⎜⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝⎣ ⎦
⎞⎟⎠
2 21 1                 0
2 2o dd dd leak int dd
a C V f bV I hk a C V f⎡ ⎤⎛ ⎞+ + =⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
 
…(B.3)
 
The equation d(EDP)/dk=0 gives 
 127
2
2
2
10 7 0 4 0 7
2
0 42 0 7 0 4 0 7 0 7
                                  
o int int
o int o o dd dd leak
o int int int int
o int o o o
kR C C. hC R . . hC a C V f bV I h
h k k
kR C C . R C. hC R . . hC . R C
h k k k
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ + + +⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛+ + + + − +⎜ ⎟ ⎜ ⎟ ⎜⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝⎣ ⎦
⎞⎟⎠
2 21 1                 0
2 2o dd dd leak int dd
a C V f bV I hk a C V f⎡ ⎤⎛ ⎞+ + =⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
 
…(B.4)
Eq. (B.3) and Eq. (B.4) therefore represent two equations in two unknowns, h and k. If 
int int
o o
R Ck
R C
γ= and o int
int o
R Ch
R C
δ= , where γ and δ are arbitrary variables, Eq. (B.3) is 
equivalent to 
 
2
2
2
1
0 7 0 4 0 4 20 7 0 7 2 0 7 01
2
o dd
o dd dd leak
a C V f. . .. . .
a C V f bV I
γ δ δ γδδ γ γ
⎛ ⎞⎜ ⎟⎛ ⎞ ⎛ ⎞+ + + + − + =⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎜ ⎟+⎝ ⎠
 
 Denoting the ratio of dynamic power to the total power of a repeater as Φgate, the 
above expression is 
( )20 7 0 4 2 0 40 7 0 7 0 7 0gate. . .. . .γ δ γδ Φδ γ δ γ⎛ ⎞ ⎛ ⎞+ + + + − + =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠  
…(B.5)
where 
2
2
1
2
1
2
o dd
gate
o dd dd leak
a C V f
Transistor dynamic power fraction =
a C V f bV I
Φ =
+
       …(B.6) 
Just like Eq. (B.3) has been simplified to Eq. (B.5), Eq. (B.4) can be simplified to 
( )20 7 0 4 1 4 10 7 0 7 1 0gate. . .. .γ δ γδ Φδ γ γ δ⎛ ⎞ ⎛ ⎞+ + + + − + =⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠  
…(B.7)
Eq. (B.5) and Eq. (B.7) represent two equations in two unknown variables, γ and δ, and 
one known variable, Φgate. Thus, it is clear that γ and δ are functions of Φgate and are 
 128
independent of other transistor and wire parameters. 
From Eq. (B.5) and Eq. (B.7),  
2 2
1 4 1 2 0 41 0 7. ..γ δ δ γ
⎛ ⎞⎛ ⎞− = −⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠  
This equation when rearranged is 
2 0 40 7 0 7 0 7 0.. .δ δ γγ
⎛ ⎞ .+ − − =⎜ ⎟⎝ ⎠  
…(B.8)
Eq. (B.5) when rearranged is 
2
2
0 40 7 2 1 0 7 1 4 0 8 0gategate
.. . . . .
Φδ δ γ Φγ γ
⎛⎛ ⎞+ − + + −⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠
⎞ =  …(B.9)
Finding the difference between Eq. (B.8) and Eq. (B.9) and simplifying,  
21 4 1 4 0 8
0 8 2 8
gate
gate. . .
. .
ΦΦ γδ
γγ
+ −
=
−
 
…(B.10)
Substituting Eq. (B.10) in Eq. (B.8) and simplifying yields 
( ) ( ) ( ) ( )
( )( ) ( )
3 22 2 2
2 2 2
1 0 5 0 43 0 79
                             0 08 0 57 0 16 0 09 0
gate gate gate
gate gate gate gate
. . .
. . . .
γ Φ γ Φ Φ
γ Φ Φ Φ Φ
− + − +
+ − + − =
 
    …(B.11)
This is a cubic equation in γ2. Several closed form solutions are available for cubic 
equations, with the most prominently used solution being the one proposed by Cardano 
[B.3]. Using Cardano’s solution, the six roots of Eq. (B.11) are given by  
 129
20 5 0 43 0 791
3 3 1
gate gate
gate
. . .p u
u
Φ Φγ Φ
⎛ ⎞− += − − ⎜ ⎟⎜ ⎟−⎝ ⎠
 
where 
( )
4 3 2
2
0 083 0 43 0 33 0 15 0 21
1
gate gate gate gate
gate
. . . . .
p
Φ Φ Φ Φ
Φ
− − + + −=
−
, 
 
( )
6 5 4 3 2
3
0 0093 0 071 0 034 0 24 0 25 0 13 0 04
1
gate gate gate gate gate gate
gate
. . . . . .
q
Φ Φ Φ Φ Φ Φ
Φ
+ + − + − +=
−
.
 
and 
2 3
3
2 4 27
q q pu = ± +  
 
 …(B.12)
  
 
 
 
 
 
 Figure B.2: Analytical solutions for γ (gamma) and δ (delta) for different values of Φgate 
(transistor dynamic power fraction). 
  
 130
Closed form solutions for δ can be obtained from Eq. (B.12), Eq. (B.8) and Eq. (B.9). 
Plotting the obtained values of γ and δ from these analytical solutions in Fig. B.2, it can be 
seen that γ and δ have a logarithmic dependence on Φgate. 
While the derived analytical solutions for γ and δ are closed-form and exact, their 
solution is slightly cumbersome since it requires manipulation of complex numbers. Eq. 
(B.13) and Eq. (B.14) give empirical solutions for γ and δ that are compact and simple to 
evaluate. These solutions have <5% error, as shown in Fig. B.3.  
 
( )20 73 0 07 gate. . logγ Φ= +  …(B.13)
( )20 88 0 07 gate. . logδ Φ= +  …(B.14)
 
 
 
 
 
 
 
 
 
Figure B.3: Comparison of empirical solutions with analytical solutions.  
 131
Thus, the model can be summarized as follows 
          int int
o o
R Ck
R C
γ=  where ( )20 73 0 07 gate. . logγ Φ= +  
         o int
int o
R Ch
R C
δ=  where ( )20 88 0 07 gate. . logδ Φ= +  
Substituting these expressions in Eq. (B.1) and Eq. (B.2),  
Optimal delay = 0 7 0 40 7 0 7int int o o
. .R C R C . .γ δδ γ
⎛ ⎞+ + +⎜ ⎟⎝ ⎠  
Optimal energy-delay product =  
( )22 21 0 7 0 40 7 0 7
2int int o o dd dd leak gate
. .R C R a C V f bV I . .γ δ γδ Φδ γ
⎡ ⎤⎛ ⎞⎛ ⎞+ + + + +⎢ ⎥⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎢ ⎥⎣ ⎦
  
 
 132
REFERENCES 
 
 
Chapter 1 
[1.1] S. Naffziger, B. Stackhouse, T. Grutkowski, “The implementation of a 2 core 
multi-threaded Itanium family processor”, Proc. Intl. Solid State Circuits Conference, pp. 
182-183, 2005. 
[1.2]  http://www.wininsider.com/news/?2167 (08/31/2008) 
[1.3]  L. Berlin, “The Man behind the Microchip: Robert Noyce and the Invention of 
Silicon Valley”, Oxford University Press. 
[1.4]  J. Hoerni, US patent  3025589, March 20, 1962. 
[1.5]  R.H. Dennard, F.H. Gaensslen, V.L. Rideout, E. Bassous, A.R. LeBlanc, “Design of 
Ion-Implanted MOSFET's with Very Small Physical Dimensions,” IEEE Journal of 
Solid-State Circuits, Vol. 9, Issue 5, pp. 256-268, Oct. 1974. 
[1.6]  H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley, 
1990. 
[1.7]  D. Edelstein, J. Heidenreich, R. Goldblatt, W. Cote, C. Uzoh, N. Lustig, P. Roper, T. 
McDevitt, W. Motsiff, A. Simon, J. Dukovic, R. Wachnik, H. Rathore, R. Schulz, L. Su, S. 
Luce, J. Slattery, “Full Copper Wiring in a sub-0.25 µm CMOS ULSI Technology”, Proc. 
Intl. Electron Devices Meeting, pp. 773-776, 1997. 
[1.8]  J. Davis, PhD dissertation, Georgia Institute of Technology, 1999. 
[1.9]  http://en.wikipedia.org (08/31/2008) 
[1.10] J. Early, “Speed, Power and Component Density in Multi-element High-Speed 
Logic Systems”, Proc. Intl. Solid State Circuits Conference, pp. 78-79, 1960. 
[1.11] www.eetimes.com (08/31/2008) 
 
Chapter 2 
[2.1] C. Isci, PhD dissertation, Princeton University, 2007. 
[2.2] G. Shahidi, “Evolution of CMOS Technology at 32nm and Beyond”, Proc. Custom 
Integrated Circuits Conference, pp. 413-416, 2007.  
[2.3] M. Turkowski, “Multi-Scale Thermal Simulations of 3D ICs from Full Chip to 
 133
Nanoscale Phonon Transport”, Focus Center Research Program Workshop on Power and 
Thermal Management, May 2007. 
[2.4] D. C. Sekar, C. King, B. Dang, H. Thacker, T. Spencer, P. Joseph, M. Bakir, J. 
Meindl, “Microchannel-cooled 3D Integrated Systems”, Proc. Intl. Interconnect 
Technology Conference, 2008.  
[2.5] K. Nose, T. Sakurai, “Optimization of Supply Voltage and Threshold Voltage for 
Low Power and High Speed Applications”, Proc. Asia and South Pacific-Design 
Automation Conference, pp. 469-474, 2000. 
[2.6] K. A. Bowman, PhD dissertation, Georgia Institute of Technology, 2001. 
[2.7] S. Naffziger, B. Stackhouse, T. Grutkowski, “The implementation of a 2 core 
multi-threaded Itanium family processor”, Proc. Intl. Solid State Circuits Conference, pp. 
182-183, 2005. 
[2.8] D. Deleganes, J. Douglas, B. Kommandur, M. Patyra, “Designing a 3GHz 130nm 
Intel Pentium4 Processor”, Proc. Symp. on VLSI Circuits, pp. 130-133, 2002. 
[2.9] S. Thompson, M. Alavi, M. Hussein, P. Jacob, C. Kenyon, P. Moon, M. Prince, S. 
Sivakumar, S. Tyagi, M. Bohr, “130nm Logic Technology Featuring 60nm Transistors, 
Low-K Dielectrics, and Cu Interconnects”, Intel Technology Journal, Vol. 6, Issue 2, 2002. 
[2.10] R. Gonzalez, B. Gordon, M. Horowitz, “Supply and Threshold Voltage Scaling for 
Low Power CMOS”, IEEE Journal of Solid State Circuits, Vol. 32, Issue 8, pp. 1210-1216, 
Aug. 1997.  
[2.11] http://en.wikipedia.org (08/31/2008) 
[2.12] G. Chandra, P. Kapur, K. Saraswat, “Scaling trends for the on-chip power 
consumption”, Proc. Intl. Interconnect Technology Conference, pp. 170-172, 2002. 
[2.13] D. Tuckerman and R. Pease, “High Performance Heat Sinking for VLSI”, IEEE 
Electron Device Letters, Vol. 2, Issue 5, pp. 126-129, May 1981. 
[2.14] B. Dang, PhD dissertation, Georgia Institute of Technology, 2006. 
[2.15] J. Wu, J. Scholvin, J. del Alamo, “A Through Wafer Interconnect in Silicon for 
RFICs”, IEEE Transactions on Electron Devices, Vol. 51, Issue 11, pp. 1765-1771, Nov. 
2004. 
[2.16] A. W. Topol, D. C. La Tulipe, Jr., L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. 
Kumar, G. U. Singco, A. M. Young, K. W. Guarini, M. Ieong, “Three Dimensional 
Integrated Circuits”, IBM Journal of Research &Development, Vol. 50, Number 4/5, 2006. 
[2.17] P. Bai, C. Auth, S. Balakrishan, et al., “A 65nm Logic Technology featuring 35nm 
Gate Lengths, Enhanced Channel Strain, 8 Cu Interconnect Layers, low-k ILD and a 
 134
0.57um2 SRAM Cell”, Proc. Intl. Electron Devices Meeting, pp. 657-660, 2004. 
[2.18] S. Garimella, V. Singhal, D. Liu, “On Chip Thermal Management with 
Microchannel Heat Sinks and Integrated Micropumps”, Proc. IEEE, Vol. 94, Issue 8, pp. 
1534-1548, Aug. 2006. 
[2.19] A. F. Benner, M. Ignatowski, J. A. Kash, D. M. Kuchta, and M. B. Ritter, 
“Exploitation of Optical Interconnects in Future Server Architectures”, IBM Journal of 
Research &Development, Vol. 49, Number 4/5, 2005. 
[2.20] www.eetimes.com (08/31/2008) 
 
Chapter 3 
[3.1] C. -H. Jan, P. Bai, J. Choi, et al., “A 65nm Ultra Low Power Logic Platform 
Technology using Uni-Axial Strained Silicon Transistors”, Proc. Intl. Electron Devices 
Meeting, pp. 60-63, 2005.  
[3.2] R. Venkatesan, PhD dissertation, Georgia Institute of Technology, 2003. 
[3.3] I. Young, K. Raol, “A comprehensive metric for evaluating interconnect 
performance”, Proc. Intl. Interconnect Technology Conference, pp. 119-121, 2001. 
[3.4] J. Davis, PhD dissertation, Georgia Institute of Technology, 1999. 
[3.5] http://public.itrs.net (08/31/2008) 
[3.6] Models in BACPAC. Available online at www.eecs.umich.edu/~dennis/bacpac 
[3.7] R. Sarvari, A. Naeemi, R. Venkatesan, J. Meindl, “Impact of Size Effects on the 
Resistivity of Copper Wires and Consequently the Design and Performance of Metal 
Interconnect Networks”, Proc. Intl. Interconnect Technology Conference, pp. 197-199, 
2005. 
[3.8] D. C. Sekar, R. Venkatesan, K. Bowman, A. Joshi, J. Davis, J. Meindl, “Optimal 
Fepeaters for sub-50nm Interconnect Networks”, Proc. Intl. Interconnect Technology 
Conference, pp. 199-201, 2006. 
[3.9] D. C. Sekar, A. Naeemi, R. Sarvari, J. Davis, J. Meindl, “IntSim: A CAD Tool for 
Optimization of Multilevel Interconnect Networks”, Proc. Intl. Conference on Computer 
Aided Design, pp. 560-567, 2007. 
[3.10] D. C. Sekar, J. Meindl, “The Impact of Parallel Processing Architectures on the 
Design of Chip-Level Interconnect Networks”, Proc. Intl. Interconnect Technology 
Conference, pp. 123-125, 2007. 
[3.11] R. Puri (IBM), “3D design and CAD needs”, Proc. SRC Interconnect Forum, Sep. 
 135
2006. 
[3.12] C. Belady (HP), “In the Data Center, Power and Cooling Costs more than the IT 
Equipment it Supports”, Electronics Cooling Magazine, Feb. 2007. 
[3.13] A. Naeemi, J. Meindl, “Design and Performance Modeling for Single-Walled 
Carbon Nanotubes as Local, Semiglobal, and Global Interconnects in Gigascale Integrated 
Systems”, IEEE Transactions on Electron Devices, Vol. 54, pp. 26-37, Jan. 2007. 
[3.14] K. Banerjee, N. Srivastava, “Are Carbon Nanotubes the Future of VLSI 
Interconnections?”, Proc. Design Automation Conference, pp. 809-814, 2006.  
[3.15] A. Naeemi, J. Meindl, Compact Physical Models for Multiwall Carbon Nanotube 
Interconnects”, IEEE Electron Device Letters, Vol. 27, Issue 5, pp. 338-340, May 2006. 
[3.16]  D. Edelstein, J. Heidenreich, R. Goldblatt, W. Cote, C. Uzoh, N. Lustig, P. Roper, 
T. McDevitt, W. Motsiff, A. Simon, J. Dukovic, R. Wachnik, H. Rathore, R. Schulz, L. Su, 
S. Luce, J. Slattery, “Full Copper Wiring in a sub-0.25 µm CMOS ULSI Technology”, 
Proc. Intl. Electron Devices Meeting, pp. 773-776, 1997. 
[3.17] E. G. Cristal, “Coupled Circular Cylindrical Rods between Parallel Ground Planes”, 
IEEE Transactions on Microwave Theory and Techniques, Vol. 12, Issue 4, pp. 428-439, 
July 1964. 
[3.18] A. Naeemi, R. Sarvari, J. Meindl, “Performance Modeling and Optimization for 
Single and Multi-Wall Carbon Nanotube Interconnects”, Proc. Design Automation 
Conference, pp. 568-573, 2007.  
[3.19] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, C. Hu, "New Paradigm of Predictive 
MOSFET and Interconnect Modeling for Early Circuit Design," Proc. Custom Integrated 
Circuit Conference, pp. 201-204, 2000. Available at http://www.eas.asu.edu/~ptm 
[3.20] A. Chandrakasan, S. Sheng, R. Brodersen, “Low Power CMOS Digital Design”, 
IEEE Journal of Solid State Circuits, Vol. 27, Issue 4, pp. 473-484, Apr. 1992. 
[3.21] K. A. Bowman, PhD dissertation, Georgia Institute of Technology, 2001. 
[3.22] R. Gonzalez, B. Gordon, M. Horowitz, “Supply and Threshold Voltage Scaling for 
Low Power CMOS”, IEEE Journal of Solid State Circuits, Vol. 32, Issue 8, pp. 1210-1216, 
Aug. 1997.  
[3.23] H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI, 
Addison-Wesley, 1990. 
[3.24] J.Tendler, J. Dodseon, J. Fields, et al, “POWER4 System Microarchitecture”, IBM 
Journal of Research & Development, Vol. 46, Number 1, 2002. 
[3.25] A.S. Leon, J.L. Shin, et al, “A Power Efficient High Throughput 32 Thread SPARC 
 136
Processor”, Proc. Intl. Solid State Circuits Conference, pp. 295-304, 2006. 
[3.26] G. Shahidi, “Evolution of CMOS Technology at 32nm and Beyond”, Proc. Custom 
Integrated Circuits Conference, pp. 413-416, 2007. 
[3.27] www.eetimes.com (08/31/2008) 
 
Chapter 4 
[4.1] K. Shakeri, PhD dissertation, Georgia Institute of Technology, 2005. 
[4.2] D. C. Sekar, B. Dang, J. Davis, J. Meindl, “Electromigration Resistant Power 
Delivery Systems”, IEEE Electron Device Letters, Vol. 28, Issue 8, pp. 767-769, Aug. 
2007. 
[4.3] D. C. Sekar, E. Demaray, H. Zhang, P. Kohl, J. Meindl,”A New Global Interconnect 
Paradigm: MIM Power Ground Plane Decoupling Capacitors”, Proc.Intl. Interconnect 
Technology Conference, pp. 48-50, 2006. 
[4.4] D. Mallik, K. Radhakrishnan, J. He, C-P Chiu, T. Kamgaing, D. Searls, J. D. Jackson, 
“Advanced Package Technologies for High-Performance Systems”, Intel Technology 
Journal, Vol. 9, Issue 4, pp. 259-272, Nov. 2005. 
[4.5] T-Y. Chiang, B. Shieh, K. C. Saraswat, “Impact of Joule Heating on Scaling of deep 
Sub-Micron Cu/low-k Interconnects”, Proc. Symposium on VLSI Technology, pp. 38-39, 
2002.  
[4.6] J. Tao, J. F. Chen, N. W. Cheung, C. Hu, “Modeling and Characterization of 
Electromigration Failures under Bidirectional Current Stress”, Transactions on Electron 
Devices, Vol. 43, Issue 5, pp. 800-808, May 1996. 
[4.7] J. Tao, N. W. Cheung, C. Hu, “An Electromigration Failure Model for Interconnects 
under Pulsed and Bidirectional Current Stressing”, Transactions on Electron Devices, Vol. 
41, Issue 4, pp. 539-545, April 1994. 
[4.8] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, J. Yamada, “1V Power 
Supply High Speed Digital Circuit Technology with Nultithreshold Voltage CMOS”, 
Journal of Solid State Circuits, Vol. 30, Issue 8, pp. 847-854, Aug. 1995. 
[4.9] K. Shi, D. Howard, “Challenges in Sleep Transistor Design and Implementation in 
Low-Power Designs”, Proc. Design Automation Conference, pp. 113-116, 2006. 
[4.10] J. N. Kozhaya, L. A. Bakir, “An Electrically Robust Method for Placing Power 
Gating Switches in Voltage Islands”, Proc. Custom Integrated Circuits Conference, pp. 
321-324, 2004. 
 137
[4.11] http://en.wikipedia.org/wiki/Law_of_large_numbers (08/31/2008) 
[4.12] K-S. Moon, J. Wu, C. P. Wong, “Improved Stability of Contact Resistance of Low 
Melting Point Alloy Incorporated Isotropically Conductive Adhesives”, Transactions on 
components and packaging technologies, Vol. 26, Issue 2, pp. 375-381, June 2003. 
[4.13] M. Anis, S. Areibi, M. Elmasry, “Design and Optimization of Multithreshold 
CMOS (MTCMOS) Circuits”, Transactions on Computer-Aided Design of Integrated 
Circuits and Systems, Vol. 22, Issue 10, pp. 1324-1342, Oct. 2003. 
[4.14] H. Hiller, “There is more than Moore in Automotive”, Proc. Design Automation 
Conference, pp. 376-376, 2007. 
[4.15] H. Casier, P. Moens, K. Appeltans, “Technology Considerations for Automotive”, 
Proc. European Solid State Device Research Conference, pp. 37-41, 2004. 
[4.16] T. Braun, K. –F. Becker, J.-P. Sommer, T. Loher, K. Schottenloher, R. Kohl, R. 
Pufall, V. Bader, M. Koch, R. Aschenbrenner, H. Reichl, “High Temperature Potential of 
Flip Chip Assemblies for Automotive Applications”, Proc. Electronic Components and 
Technology Conference, 2005. 
[4.17] C-T. Chuang, P-F. Lu, C. J. Anderson, “SOI for Digital CMOS VLSI: Design 
Considerations and Advances”, Proc. IEEE, Vol. 86, Issue 4, pp. 689-720, Apr. 1998. 
[4.18] J. Warnock, J. Keaty, J. Petrovick, et al, ”The Circuit and the Physical Design of the 
POWER4 Microprocessor”, IBM J. of Research & Development, Vol. 46, Number 1, Jan. 
2002. 
[4.19] Raphael Interconnect Analysis Tool, from Synopsys. 
[4.20] D. Roberts, et al, “Application of On-Chip MIM Decoupling Capacitor for 90nm 
SOI Microprocessor”, Proc. Intl. Electron Devices Meeting, pp. 72-75, 2005. 
[4.21] S. Naffziger, “High-Performance Processors in a Power Limited World”, Symp on 
VLSI Circuits, pp. 93-97, 2006. 
[4.22] J. Xu, P. Hazucha, et al, “On-Die Supply Resonance Suppression using Band 
Limited Active Damping”. Proc. Intl. Solid State Circuits Conference, pp. 286-603, 2007. 
[4.23] P. Hazucha, et al. “A 233-MHz 80%-87% Efficient Four-Phase DC-DC Converter 
utilizing Air-Core Inductors on Package”, IEEE J. of Solid State Circuits, Vol. 40, Issue 4, 
pp. 838-845, Apr. 2005. 
[4.24] J. Stinson, S. Rusu, “A 1.5GHz Third Generation Itanium2 Processor”, Proc. 
Design Automation Conference, pp. 706-709, 2003. 
[4.25] D. Amey, K. Dietz, “Application of Embedded Capacitor Technology for High 
Performance Semiconductor Packaging”, Proc. DesignCon, 2007. 
 138
[4.26] www.fabtech.org (08/31/2008) 
 
Chapter 5 
[5.1] J. Warnock, J. Keaty, J. Petrovick, et al, ”The Circuit and the Physical Design of the 
POWER4 Microprocessor”, IBM J. of Research & Development, Vol. 46, Number 1, Jan. 
2002. 
[5.2] K. Shakeri, PhD dissertation, Georgia Institute of Technology, 2005. 
[5.3] J. Davis, PhD dissertation, Georgia Institute of Technology, 1999. 
[5.4] Q. Chen, J. Davis, P. Zarkesh-Ha, J. Meindl, “A Compact Physical Via Blockage 
Model”, Transactions on VLSI Systems, Vol. 8, Issue 6, pp. 689-692, Dec. 2000. 
[5.5] R. Sarvari, A. Naeemi, R. Venkatesan, J. Meindl, “Impact of Size Effects on the 
Resistivity of Copper Wires and Consequently the Design and Performance of Metal 
Interconnect Networks”, Proc. Intl. Interconnect Technology Conference, pp. 197-199, 
2005. 
[5.6] D. C. Sekar, A. Naeemi, R. Sarvari, J. Davis, J. Meindl, “IntSim: A CAD Tool for 
Optimization of Multilevel Interconnect Networks”, Proc. Intl. Conference on Computer 
Aided Design, pp. 560-567, 2007. 
[5.7] M. Lanzerotti, G. Fiorenza, R. Rand, “Assessment of On-Chip Wire Length 
Distribution Models”, Trans. VLSI Design, Vol. 12, Issue 10, pp. 1108-1112, Oct. 2004 
[5.8] M. Lanzerotti, G. Fiorenza, R. Rand, "Interpretation of Rent's rule for 
Ultralarge-Scale Integrated Circuit Designs, with an Application to Wirelength 
Distribution Models" Trans. VLSI Design, Vol. 12, Issue 12, pp. 1330-1347, Dec. 2004. 
[5.9] Models in BACPAC. Available online at www.eecs.umich.edu/~dennis/bacpac 
[5.10] D. Stroobandt, “Apriori Wire Length Estimates for Digital Design”, Kluwer 
Academic Publishers. 
[5.11] J. Davis, V. De, J. Meindl, “Apriori Wiring Estimations and Optimal Multilevel 
Wiring Networks for Portable ULSI Systems”, Proc. Electronic Components and 
Technology Conference, pp. 1002-1008, 1996.  
[5.12] A. Naeemi, PhD dissertation, Georgia Institute of Technology, 2003. 
[5.13] P. Zarkesh-Ha, PhD dissertation, Georgia Institute of Technology, 2001. 
[5.14] J. Davis, J. Meindl, “Interconnect Technology and Design for Gigascale 
Integration”, Kluwer Academic Publishers. 
 139
[5.15] D. C. Sekar, “Clock Trees: Differential or Single Ended?”, Proc. Intl. Symp. On 
Quality Electronic Design, pp. 548-553, 2005. 
[5.16] J. Davis, PhD dissertation, Georgia Institute of Technology, 1999. 
[5.17] R. Venkatesan, PhD dissertation, Georgia Institute of Technology, 2003. 
[5.18] R. Sarvari, A. Naeemi, P. Zarkesh-Ha, J. Meindl, “Design and Optimization for 
Nanoscale Power Distribution Networks in Gigascale Systems”, Proc. Intl. Interconnect 
Technology Conference, pp. 190-192, 2007. 
[5.19] G. Chandra, P. Kapur, K. Saraswat, “Scaling Trends for the On Chip Power 
Dissipation”, Proc. Intl. Interconnect Technology Conference, pp. 170-172, 2002. 
[5.20] K. Banerjee, A. Mehrotra, “A Power Optimal Repeater Insertion Methodology for 
Global Interconnects”, Trans. Electron Devices, Vol. 49, Issue 11, pp. 2001-2007, Nov. 
2002. 
[5.21] S. Narendra, V. De, S. Borkar, D. Antoniadis, A. Chandrakasan, "Full-Chip 
Subthreshold Leakage Power Prediction and Reduction Techniques for Sub-0.18um 
CMOS," J. of Solid State Circuits, Vol. 39, Issue 3, pp. 501-510, Mar. 2004. 
[5.22] N. Sakran, M. Yuffe, M. Mehalel, J. Doweck, E. Knoll, A. Kovacs, “The 
Implementation of the 65nm Dual Core Merom Processor”, Proc. Intl. Solid State Circuits 
Conference, pp. 106-590, 2007. 
[5.23] P. Bai, C. Auth, S. Balakrishnan, et al. “A 65nm Logic Technology featuring 35nm 
Gate Lengths, Enhanced Channel Strain, 8 Cu interconnect layers, Low k ILD and a 0.57 
um2 SRAM cell”, Proc. Intl. Electron Devices Meeting, pp. 657-660, 2004. 
[5.24] S. Naffziger, B. Stackhouse, T. Grutkowski, “The implementation of a 2 core 
multi-threaded Itanium family processor”, Proc. Intl. Solid State Circuits Conference, pp. 
182-183, 2005. 
 
Chapter 6 
[6.1] A. W. Topol, D. C. La Tulipe, Jr., L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. 
Kumar, G. U. Singco, A. M. Young, K. W. Guarini, M. Ieong, “Three Dimensional 
Integrated Circuits”, IBM Journal of Research &Development, Vol. 50, Number 4/5, 2006. 
[6.2] M. Crowley, A. Al-Shamma, D. Bosch, et al., “512Mb PROM with eight layers of 
antifuse diode cells”, Proc. Intl. Solid State Circuits conference, pp. 284-493, 2003. 
[6.3] G. Shahidi, “Evolution of CMOS Technology at 32nm and Beyond”, Proc. Custom 
Integrated Circuits Conference, pp. 413-416, 2007.  
 140
 Appendix A 
[A.1] K. Nose, T. Sakurai, “Optimization of Supply Voltage and Threshold Voltage for 
Low Power and High Speed Applications”, Proc. Asia and South Pacific-Design 
Automation Conference, pp. 469-474, 2000. 
[A.2] http://en.wikipedia.org/wiki/Cubic_equation (08/31/2008) 
 
Appendix B 
[B.1] H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-Wesley, 
1990. 
[B.2] K. Banerjee, A. Mehrotra, “A Power Optimal Repeater Insertion Methodology for 
Global Interconnects”, Trans. Electron Devices, Vol. 49, Issue 11, pp. 2001-2007, Nov. 
2002. 
[B.3] http://en.wikipedia.org/wiki/Cubic_equation (08/31/2008) 
 141
LIST OF PUBLICATIONS 
 
[1] D. C. Sekar, R. Venkatesan, K. Bowman, A. Joshi, J. Davis, J. Meindl, “Optimal 
Fepeaters for sub-50nm Interconnect Networks”, Proc. Intl. Interconnect Technology 
Conference, pp. 199-201, 2006. 
[2] D. C. Sekar, A. Naeemi, R. Sarvari, J. Davis, J. Meindl, “IntSim: A CAD Tool for 
Optimization of Multilevel Interconnect Networks”, Proc. Intl. Conference on Computer 
Aided Design, pp. 560-567, 2007. 
[3] D. C. Sekar, J. Meindl, “The Impact of Parallel Processing Architectures on the Design of 
Chip-Level Interconnect Networks”, Proc. Intl. Interconnect Technology Conference, pp. 
123-125, 2007. 
[4] D. C. Sekar, C. King, B. Dang, H. Thacker, T. Spencer, P. Joseph, M. Bakir, J. Meindl, 
“Microchannel-cooled 3D Integrated Systems”, Proc. Intl. Interconnect Technology 
Conference, 2008. 
[5] D. C. Sekar, “Clock Trees: Differential or Single Ended?”, Proc. Intl. Symposium on 
Quality Electronic Design, pp. 548-553, 2005. 
[6] D. C. Sekar, E. Demaray, H. Zhang, P. Kohl, J. Meindl,”A New Global Interconnect 
Paradigm: MIM Power Ground Plane Decoupling Capacitors”, Proc.Intl. Interconnect 
Technology Conference, pp. 48-50, 2006. 
[7] B. Dang, D. C. Sekar, M. Bakir, P. Kohl, J. Meindl, “Chip Scale Cooling with Integrated 
Heat Sink and Microfluidic I/Os”, Proc. SRC Techcon, 2005.  
[8] G. Huang, D. C. Sekar, A. Naeemi, K. Shakeri, J. Meindl, "Physical Model for Power 
Supply Noise and Chip/Package Co-Design in Gigascale Systems with the Consideration 
of Hot Spots", Proc. Custom Integrated Circuits Conference, pp. 841-844, 2007.  
[9] G. Huang, D. C. Sekar, A. Naeemi, K. Shakeri, J. Meindl, "Compact Physical Models for 
Power Supply Noise and Chip/Package Co-Design of Gigascale Integration", Electronic 
Components and Technology Conference, pp. 1659-1666, 2007.  
[10] C. King, D. C. Sekar, M. Bakir, B. Dang, J. Pikarsky, J. Meindl, “3D Stacking of Chips 
with Electrical and Fluidic I/O Interconnects”, Proc. Electronic Components and 
Technology Conference, 2008. 
[11] M. Bakir, C. King, D. C. Sekar, H. Thacker, B. Dang, G. Huang, A. Naeemi, J. Meindl, 
“Ultra Compact and High-Performance 3D Integrated Heterogeneous Systems: Liquid 
Cooling, Power Delivery and Implementation”, Proc. Custom Integrated Circuits 
Conference, 2008. 
 142
[12] D. C. Sekar, B. Dang, J. Davis, J. Meindl, “Electromigration Resistant Power Delivery 
Systems”, IEEE Electron Device Letters, Vol. 28, Issue 8, pp. 767-769, Aug. 2007. 
 
 
 143
VITA 
 
Deepak was born in Vellore, India in 1982. He received a B. Tech from the Indian 
Institute of Technology (Madras) in 2003, a M.S from the Georgia Institute of Technology 
in 2005 and a PhD from the Georgia Institute of Technology in 2008, all in Electrical and 
Computer Engineering. His internship experiences include a summer at IBM 
Microelectronics and a semester at SanDisk Corporation. He joined SanDisk as a Senior 
Device Engineer in December 2006, where he has worked on non-volatile memory 
technology, device and circuit research. Some of the awards he has received include the 
Intel PhD Fellowship, the Maharashtra State Government Fellowship for undergraduate 
studies, the SRC inventor recognition award and the National Talent Search Scholarship 
given by the Govt. of India. His research interests span VLSI technology, device physics, 
interconnects, packaging, low-power circuits and VLSI system design. He has more than 
fifteen patents pending and has authored several publications. 
 
 
 144
