Design and modelling of variability tolerant
on-chip communication structures for
future high performance system on chip
designs by Hassan, F.
 
 
 
 
 
Hassan, F. (2011) Design and modelling of variability tolerant on-chip 
communication structures for future high performance system on chip 
designs. PhD thesis 
 
http://theses.gla.ac.uk/2729/
 
 
Copyright and moral rights for this thesis are retained by the author 
 
A copy can be downloaded for personal non-commercial research or 
study, without prior permission or charge 
 
This thesis cannot be reproduced or quoted extensively from without first 
obtaining permission in writing from the Author 
 
The content must not be changed in any way or sold commercially in any 
format or medium without the formal permission of the Author 
 
When referring to this work, full bibliographic details including the 
author, title, awarding institution and date of the thesis must be given. 
 
Glasgow Theses Service 
http://theses.gla.ac.uk/ 
theses@gla.ac.uk 
Design and Modelling of Variability Tolerant 
on-Chip Communication Structures for 
Future High Performance System on Chip 
Designs 
 
by 
Faiz-ul-Hassan 
Thesis submitted in fulfilment of the requirements for 
the degree of 
Doctor of Philosophy 
in 
Electronics and Electrical Engineering 
at 
 
University of Glasgow 
 
May 2011 
Copyright © Faiz-ul-Hassan, 2011, all rights reserved. 
 
ii 
 
Abstract 
The incessant technology scaling has enabled the integration of functionally complex 
System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single 
chip. The processing elements on these chips are integrated through on-chip 
communication structures which provide the infrastructure necessary for the exchange of 
data and control signals, while meeting the strenuous physical and design constraints. The 
use of vast amounts of on chip communications will be central to future designs where 
variability is an inherent characteristic. For this reason, in this thesis we investigate the 
performance and variability tolerance of typical on-chip communication structures. 
Understanding of the relationship between variability and communication is paramount for 
the designers; i.e. to devise new methods and techniques for designing performance and 
power efficient communication circuits in the forefront of challenges presented by deep 
sub-micron (DSM) technologies.  
The initial part of this work investigates the impact of device variability due to Random 
Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. 
The characterization data so obtained can be used to estimate the performance and failure 
probability of simple links through the methodology proposed in this work. For the 
Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate 
estimation of the probability density functions of different circuit parameters is proposed. 
Moreover, its significance on pipelined circuits is highlighted. Power and area are one of 
the most important design metrics for any integrated circuit (IC) design. This thesis 
emphasises the consideration of communication reliability while optimizing for power and 
area. A methodology has been proposed for the simultaneous optimization of performance, 
area, power and delay variability for a repeater inserted interconnect. Similarly for multi-
bit parallel links, bandwidth driven optimizations have also been performed. Power and 
area efficient semi-serial links, less vulnerable to delay variations than the corresponding 
fully parallel links are introduced. Furthermore, due to technology scaling, the coupling 
noise between the link lines has become an important issue. With ever decreasing supply 
voltages, and the corresponding reduction in noise margins, severe challenges are 
introduced for performing timing verification in the presence of variability. For this reason 
an accurate model for crosstalk noise in an interconnection as a function of time and skew 
is introduced in this work. This model can be used for the identification of skew condition 
that gives maximum delay noise, and also for efficient design verification. 
  
iii 
 
 
 
 
 
 
 
 
 
 
 
 
Dedicated 
to 
My Family 
  
iv 
 
Acknowledgements 
I would like to thank almighty Allah for giving me health, knowledge and blessings that 
have made it possible to complete this work. 
I would like to express my profound gratitude to my supervisor Dr. Fernando Rodríguez-
Salazar for giving me the opportunity to work with him and for his inspiration, guidance 
and continuous support. Fernando your impressive knowledge, extreme patience, big heart 
and special attention on my needs have made it possible for me to comfortably complete 
this research. Thank you very much again for your special help and guidance. 
Special thanks and appreciation to my co-supervisor, Dr. Wim Vanderbauwhede, who 
always gave me very useful comments and suggestions during my research. I would also 
like to thank Dr. Binjie Cheng from Device Modeling Group on his help in the area of 
device model cards. I would also like to acknowledge the guidance and encouragement by 
Dr. Scott. Roy; especially in the early period of my research.  
Infinite thanks to my beloved parents for their prayers and wishes throughout my life. I am 
also thankful to my sisters and brothers for their continuous support and wishes without 
which I could not achieve my goal. Exclusive thanks to my beloved wife Farkahnda, my 
daughters Sumayyah, Juwariyah, Maria and son Bilal for providing me comfort and joy 
and also for their patience and sacrifices.  
Finally, I acknowledge the worthy support of computer support staff, library staff and all 
my colleagues on helping me on several issues. 
 
Faiz-ul-Hassan 
University of Glasgow  
Glasgow, United Kingdom 
May 2011 
  
v 
 
List of Publications 
The following list details publications that have been produced while undertaking this 
research project. 
 
Journal Papers 
1.  Faiz-ul-Hassan, Wim Vanderbauwhede, Fernando Rodriguez-Salazar, "Impact of 
Random Dopant Fluctuations on the Timing Characteristics of Flip-Flops," IEEE 
Transactions on Very Large Scale Integration (VLSI) Systems, Princeton, USA, 2011. 
2.  Faiz-ul-Hassan, Wim Vanderbauwhede, Fernando Rodriguez-Salazar, “Performance 
Analysis of on-chip Communication Structures under Device variability,” INVITED 
PAPER in International Journal of Embedded and Real-Time Communication Systems 
(IJERTCS), vol. 1(4), 2010, pp. 40-62. 
Conference Papers 
1. Faiz-ul-Hassan, Wim A. Vanderbauwhede, Fernando Rodriguez-Salazar, 
"Optimization of on-chip link performance under area, power and variability 
constraints," in Proceedings of IEEE International Conference on Microelectronics 
(ICM 2010), pp. 48-51, 19-22 Dec. 2010, Cairo, Egypt. 
2. Faiz-ul-Hassan, Wim Vanderbauwhede, Fernando Rodriguez-Salazar, "Power 
dissipation in NoC repeaters under random dopant fluctuations," in Proceedings of 
IEEE PrimeAsia 2009, pp. 388-391, 19-21 Jan. 2009, Shanghai, China. 
3. Faiz-ul-Hassan, Binjie Cheng, Wim Vanderbauwhede, Fernando Rodriguez-Salazar, 
“Impact of Device Variability in the Communication Structures for Future 
Synchronous SoC Designs,” in Proceedings of IEEE International Symposium on 
System-on-Chip (SoC 2009), pp. 68-72, Tamepre, Finland.  
4. Faiz-ul-Hassan, Wim Vanderbauwhede, Fernando Rodriguez-Salazar, “Timing Yield 
of Pipelined Circuits under Statistical Device variability,” in Proceedings (on cd) of 
2nd IEEE Latin American Symposium on Circuits and \Systems (LASCAS 2011), 23-25 
Feb. 2011, Bogota, Colombia. 
5.  Faiz-ul-Hassan, Wim Vanderbauwhede, Fernando Rodriguez-Salazar, “Maximizing 
Bandwidth Over Faulty Links,” 21st International Conference on Field Programmable 
Logic and Applications (FPL 2011), Sep. 5-7, 2011, Chania, Crete, Greece 
(ACCEPTED). 
  
vi 
 
Table of Contents 
 
Abstract ........................................................................................................................ ii 
Acknowledgements ....................................................................................................... iv 
List of Publications ........................................................................................................ v 
Table of Contents .......................................................................................................... vi 
List of Figures ............................................................................................................... xi 
List of Tables .............................................................................................................. xix 
List of Abbreviations .................................................................................................. xxi 
Author’s Declaration .................................................................................................xxiii 
Chapter 1 ........................................................................................................................... 1 
Introduction ....................................................................................................................... 1 
1.1 Interconnect-Centric Design Paradigm ................................................................. 2 
1.1.1 Scaling .............................................................................................................. 3 
1.1.2 Power Dissipation .............................................................................................. 3 
1.1.3 Crosstalk ........................................................................................................... 4 
1.1.4 Variability ......................................................................................................... 4 
1.2 Research Overview .............................................................................................. 6 
1.2.1 Research Objective 1 ......................................................................................... 6 
1.2.2 Research Objective 2 ......................................................................................... 7 
1.2.3 Research Objective 3 ......................................................................................... 7 
1.2.4 Research Objective 4 ......................................................................................... 8 
1.2.5 Research Objective 5 ......................................................................................... 8 
2.1 Thesis Organization ............................................................................................. 8 
Chapter 2 ......................................................................................................................... 10 
On-Chip Communication Structures ................................................................................ 10 
2.1 Communication Architectures for SoCs .................................................................. 10 
2.1.1 Buses ............................................................................................................... 10 
vii 
 
2.1.2 Point-to-Point Direct Links .............................................................................. 13 
2.1.3 Network Architecture ...................................................................................... 13 
2.3 Performance of On-Chip Communication ............................................................... 17 
2.4 Interconnect Modelling in DSM Technologies ....................................................... 17 
2.4.1 Parasitic Resistance ......................................................................................... 18 
2.4.2 Parasitic Capacitance ....................................................................................... 18 
2.4.3 Inductance ....................................................................................................... 20 
2.4.4 Impact of Technology Scaling on Interconnect Parasitics................................. 21 
2.5 Performance Metrics .............................................................................................. 22 
2.5.1 Signal Delay .................................................................................................... 22 
2.5.2 Skew ............................................................................................................... 24 
2.5.3 Delay Variability ............................................................................................. 27 
2.5.4 Crosstalk ......................................................................................................... 27 
2.5.5 Power Dissipation ............................................................................................ 29 
2.5.6 On-Chip Area .................................................................................................. 33 
2.5.7 Throughput ...................................................................................................... 34 
2.5.8 Bandwidth ....................................................................................................... 34 
2.5.9 Parametric Yield .............................................................................................. 34 
2.6 Performance Characterization Methodology ...................................................... 34 
2.6.1 Extraction of I-V Characteristics of MOSFETs ................................................ 35 
2.7 Summary ................................................................................................................ 37 
Chapter 3 ......................................................................................................................... 38 
Communication Structures under Device Variability ........................................................ 38 
3.1 Technology Scaling and Gate Delay .................................................................. 40 
3.2 Delay Uncertainty in Buffers ............................................................................. 42 
3.2.1 Skewness of Delay Distributions................................................................. 45 
3.3 Ring Oscillator (RO) .......................................................................................... 46 
3.4 Tapered Buffer Drivers........................................................................................... 47 
viii 
 
3.5 Repeaters ........................................................................................................... 52 
3.6 Data Storage Elements (Flip-flops) .................................................................... 54 
3.6.1 Timing Measurement Procedure ...................................................................... 56 
3.6.2 Results and Discussion .................................................................................... 58 
3.7 Interconnect ....................................................................................................... 59 
3.8 Performance of Communication Links ................................................................... 59 
3.8.1 Estimation of Link Performance ...................................................................... 60 
3.8.2 Link Failure Probability ................................................................................... 62 
3.8.3 Case Study ...................................................................................................... 63 
3.9 Summary ........................................................................................................... 64 
Chapter 4 ......................................................................................................................... 66 
SSTA of Pipelined Communication Circuits .................................................................... 66 
4.1 Introduction to STA ........................................................................................... 66 
4.2 Introduction to SSTA ............................................................................................. 68 
4.3 Representation of Characterization Data ................................................................. 69 
4.4 Estimation of the Timing Distributions ................................................................... 72 
4.4.1 Pearson Distributions ....................................................................................... 72 
4.4.2 Johnson Distribution ........................................................................................ 74 
4.5 Estimation of Timing Distributions and Yield ........................................................ 76 
4.6 Timing Distributions of Pipelined Circuits ......................................................... 76 
4.7 Pipeline Delay ................................................................................................... 77 
4.8 Statistical Analysis of the Timing Yield ................................................................. 79 
4.9 Experimental Setup and Results ......................................................................... 80 
4.10 Summary ........................................................................................................ 86 
Chapter 5 ......................................................................................................................... 88 
Optimal Scaling for Variability Tolerant Repeaters .......................................................... 88 
5.1 Introduction ............................................................................................................ 88 
5.2 Methodology for Power Measurement .................................................................... 90 
ix 
 
5.3 Results and Discussion ...................................................................................... 92 
5.3.1 Impact on Repeater Inserted Links ................................................................... 95 
5.3.2 Impact of Repeater Size on Power Dissipation ................................................. 95 
5.3.3 Impact on NoC links ........................................................................................ 96 
5.4 Power and Area Optimal Repeater Insertion ........................................................... 97 
5.4.1 Unconstrained Repeater Insertion .................................................................... 97 
5.4.2 Repeater Insertion under Area Constraints ....................................................... 98 
5.4.3 Repeater Insertion under Power Constraint .................................................... 100 
5.4.4 Communication Reliability ............................................................................ 100 
5.5 Optimization Methodology .............................................................................. 101 
5.5.1 Case Study .................................................................................................... 102 
5.6 Summary ......................................................................................................... 104 
Chapter 6 ....................................................................................................................... 106 
Design of Variability Tolerant Data Channels ................................................................ 106 
6.1 Inter-Resource Communication ............................................................................ 106 
6.2 Channel Configuration and Modelling .................................................................. 108 
6.2.1 Interconnect Resistance ................................................................................. 109 
6.2.2 Interconnect Capacitance ............................................................................... 109 
6.2.3 Interconnect Delay......................................................................................... 111 
6.3 Repeater Insertion ................................................................................................ 113 
6.4 Bandwidth Estimation .......................................................................................... 115 
6.4.1 Bandwidth as a Function of Length................................................................ 117 
6.5 Channel Performance under Variability ................................................................ 117 
6.5.1 Sensitivity Analysis of the Delay under Variability ........................................ 118 
6.6 Area Constrained Channel Bandwidth .................................................................. 120 
6.6.1 Experimental Setup and Simulation Results ................................................... 121 
6.6.2 Results........................................................................................................... 121 
6.7 Optimization under Different Trade-offs .............................................................. 130 
x 
 
6.8 Failure of Channels under Variability ................................................................... 131 
6.9. Channel Serialization .......................................................................................... 134 
6.9.1 Concept ......................................................................................................... 134 
6.9.2 Channel Structure .......................................................................................... 135 
6.9.3 Experimental Results ..................................................................................... 138 
6.10 Link Utilization and Power Dissipation .............................................................. 139 
6.11 Summary ............................................................................................................ 140 
Chapter 7 ....................................................................................................................... 142 
Crosstalk in Coupled Interconnects ................................................................................ 142 
7.1 Introduction .......................................................................................................... 142 
7.2 Coupled RC Transmission Lines .......................................................................... 143 
7.2.1 Voltage Representation .................................................................................. 144 
7.2.2 Model Validation ........................................................................................... 145 
7.3 Skew Amplification under Variability .................................................................. 147 
7.4 Summary .............................................................................................................. 149 
Chapter 8 ....................................................................................................................... 150 
Conclusions and Future Work ........................................................................................ 150 
8.1 Conclusions .......................................................................................................... 150 
8.2 Future Work ......................................................................................................... 154 
Appendix A ............................................................................................................... 155 
References ................................................................................................................. 161 
 
 
  
xi 
 
 
List of Figures 
Figure 1.1: Typical system implementation of Marvell 88F6282 SoC [3]. .......................... 2 
Figure 1.2: Projected relative delay of devices and interconnects (local and global) for 
different technology generations. The relative performance of the global interconnect is 
decreasing with technology scaling. ................................................................................... 3 
Figure 1.3: Random discrete dopant effects in deep sub-micrometer CMOS devices [21]. 
The figure on the left hand side is a solid model of a CMOS transistor and that on the right 
side is its transparent version showing the discreteness due to dopants in the channel 
region. ............................................................................................................................... 5 
Figure 1.4: Impact of technology scaling on the average number of dopant atoms in the 
channel. ............................................................................................................................. 6 
Figure 2.1: A SoC in which different components are integrated through the bus 
communication architecture. ............................................................................................ 11 
Figure 2.2: A simple shared bus, allowing different FUs to share the same communication 
channel. ........................................................................................................................... 11 
Figure 2.3: A bus divided into two sub-buses using a bridge. ........................................... 12 
Figure 2.4: An example of AMBA bus. The bridge provides an interface to connect two 
different types of buses. ................................................................................................... 12 
Figure 2.5: A point-to-point communication architecture. ................................................ 13 
Figure 2.6: A conceptual realization of a NoC [34]. ......................................................... 14 
Figure 2.7: A 4×4 grid structured NoC. Each intellectual property (IP) block is connected 
to a router through a network interface (NI) adapter. The routers are connected with each 
other through communication links in a certain topology. ................................................ 15 
Figure 2.8: A bidirectional link. There is a shared interconnect between the transmitter and 
receiver. ........................................................................................................................... 16 
Figure 2.9: A unidirectional link. ..................................................................................... 16 
Figure 2.10: Repeater inserted interconnect. ..................................................................... 16 
Figure 2.11: Flip-flop inserted pipelined interconnect. ..................................................... 17 
Figure 2.12: Different interconnect models, (a) the ‘T’, (b) the ‘pi’ and (c) the ‘ladder’. A 
long wire is divided into N segments using ladder model and is shown in (d). .................. 18 
xii 
 
Figure 2.13: The cross-sectional view of an interconnect surrounded by two parallel similar 
interconnects over a ground plane (in the top global layer) showing different components 
of capacitance. ................................................................................................................. 19 
Figure 2.14: Impact of technology scaling on interconnect resistance and capacitance per 
unit length (Fig. (a) and (b) respectively) for local, intermediate and global interconnects 
with minimum width and pitch. ........................................................................................ 22 
Figure 2.15: The circuit used for the derivation of the delay expression, where an 
interconnect is driven by an input buffer and at the output another buffer is connected. .... 23 
Figure 2.16: (a) A simple H-tree with 16 nodes, and (b) an illustration of skew in the clock 
signals due to difference in their arrival times at location 1 and location 16 of the H-tree. 25 
Figure 2.17: A high speed differential serial link. The skew beyond a limit can also effect 
its functioning. ................................................................................................................. 26 
Figure 2.18: An N-bit parallel link. The skew reduces the amount of the bit overlap. ....... 26 
Figure 2.19: Two RC coupled interconnects. Due to switching of the aggressor line, a 
voltage is induced in the victim line as shown in (a). The equivalent circuit of the crosstalk 
model is given in (b). ....................................................................................................... 28 
Figure 2.20: (a) A rough sketch of voltage and current waveforms of a simple buffer 
circuit, (a) input and output voltage waveforms, (b) the short circuit current peaks appear 
when both nMOS and pMOS conduct, and (c) the switching current used for the charging 
and discharging of the capacitive load. ............................................................................. 32 
Figure 2.21: I-V characteristic curves of 200 devices for each of nMOS (left) and pMOS 
(right) for the technology generations of 25, 18 and 13nm. Along with each set of curves, 
the characteristic curve for the uniformly doped device is also plotted and the dispersion of 
other curves around this curve shows the effect of variability due to RDF. ....................... 36 
Figure 3.1: Communication structures in CDN and data channels: (a) an H-type CDN, (b) a 
repeater inserted synchronous data channel, (c) a flip-flop based pipelined data channel. . 39 
Figure 3.2: The definition of FO4 delay. .......................................................................... 40 
Figure 3.3: FO4 delay for different technology generations. The error bars represent the 
uncertainty in delay1σ. ................................................................................................. 41 
Figure 3.4: Delay distribution of minimum sized inverters with a fan-out of four for the 
technology generations of 25, 18, and 13 nm. ................................................................... 42 
Figure 3.5: Mean buffer delay (a), Delay variability (b), plotted as a function of buffer size 
for 18 nm technology generation. The curves have been plotted for the average response in 
low-to-high and high-to-low transitions. .......................................................................... 44 
xiii 
 
Figure 3.6: Delay variability plotted against buffer size for 18 nm buffers. The smaller 
dashed lines represent delay variability for low-to-high transition and bigger dashed lines 
for high-to-low transition. Similarly, the solid lines are for the average response.............. 45 
Figure 3.7: Skewness of delay distributions as a function of the buffer size for 13 nm 
technology. ...................................................................................................................... 46 
Figure 3.8: A five-stage ring oscillator circuit constructed of minimum sized devices. ..... 47 
Figure 3.9: Tapered buffer driver system. ......................................................................... 48 
Figure 3.10: Cumulative mean delay in tapered buffer drivers of the given three technology 
generations along with the delay uncertainty shown as error bars (corresponding to 1σ). . 48 
Figure 3.11: Delay variability introduced by different stages of the tapered buffer driver for 
low-to-high input transition. ............................................................................................. 49 
Figure 3.12: Cumulative and stage delay variability during low-to-high and high-to-low 
transitions for 13 nm tapered buffer driver. ...................................................................... 50 
Figure 3.13: Delay variability of tapered buffer drivers for different tapering factors during 
high-to-low and low-to-high input transitions. .................................................................. 51 
Figure 3.14: Delay variability in a chain of minimum sized repeaters of 13 nm plotted 
against the number of repeater stages. .............................................................................. 53 
Figure 3.15: Cumulative delay variability plotted as a function of repeater size in a chain of 
20 repeaters. ..................................................................................................................... 54 
Figure 3.16: Schematic view of a standard CMOS D flip-flop circuit [84]-[85]. ............... 55 
Figure 3.17: Basic timing parameters of a flip-flop........................................................... 55 
Figure 3.18: Dependence of CLK-to-Q delay on the D-to-CLK time. ............................... 57 
Figure 3.19: 3D-space occupied by the timing parameters of the DFF. ............................. 59 
Figure 3.20: A simple data communication link. The signal coming out from the 
combinational logic is powered up through tapered buffer driver and then it passes through 
the repeater inserted interconnect to reach the input of the flip-flop. ................................. 60 
Figure 3.21: Link failure probability as a function of link operating frequency, as 
calculated using the analytical model and Monte Carlo simulation. .................................. 64 
Figure 4.1: Demonstration of static timing analysis of a simple circuit. ............................ 67 
Figure 4.2: An example of the timing graph for delay traversal from source to sink.......... 67 
Figure 4.3: Basic statistical operations used in STA and SSTA. The SUM operation (a), 
and the MAX operation (b) [89]. ...................................................................................... 69 
Figure 4.4: Histograms of observed data taken through Monte Carlo simulations for the 
timing parameters of the FFs of 13 nm. ............................................................................ 71 
xiv 
 
Figure 4.5: The probability density function of setup time for the 13 nm flip-flops plotted 
with different systems. ..................................................................................................... 74 
Figure 4.6: Cumulative delay distribution of setup time of 18 nm flip-flops. The SU system 
from Johnson family of distributions better fits the simulation data as compared to normal 
distribution....................................................................................................................... 75 
Figure 4.7: Cumulative distribution functions for the setup time of 13 nm flip-flops with 
Normal and Pearson type IV approximations. .................................................................. 76 
Figure 4.8: N-stage flip-flop based pipeline. ..................................................................... 78 
Figure 4.9: Transistor level model of the pipeline segments. ............................................ 81 
Figure 4.10: MAX delay distributions of individual pipeline stages and overall pipeline for 
18nm technology generation. ........................................................................................... 82 
Figure 4.11: Overall pipeline delay distributions of a pipeline consisting of 6 stages 
simulated for the technology generations of 18 and 13 nm. .............................................. 82 
Figure 4.12: Maximum delay distributions plotted for low-high and high-low transitions 
for the 13 nm pipeline. ..................................................................................................... 83 
Figure 4.13: Histograms of timing variable comprising of D-CLK time, CLK-Q time and 
combinational delay for a 13 nm pipeline. ........................................................................ 84 
Figure 4.14: Probability density functions for the pipeline delay with a combinational logic 
of 60 inverters in series for 13 nm. ................................................................................... 85 
Figure 4.15: Difference in timing yield estimation with normal and skew-normal 
approximations. ............................................................................................................... 85 
Figure 5.1: Optimal number and size 	
 of uniformly inserted buffers in an 
interconnect of minimum width and spacing for the three technology generations. ........... 89 
Figure 5.2: Arrangement for the measurement of power dissipation in the repeater. ......... 91 
Figure 5.3: Different components of power dissipation along with the total power in a 
minimum sized inverter (MSI). The inverter under investigation refers ‘R’ in Figure 5.2 
operating at a frequency of 2GHz. .................................................................................... 92 
Figure 5.4: A plot of FO4 delay and the leakage power in MSI. ....................................... 93 
Figure 5.5: Normalized power distribution components in MSI operating at 2GHz. ......... 93 
Figure 5.6: Histogram of leakage power in 25nm MSIs. The distribution is quite 
asymmetric about the mean. ............................................................................................. 95 
Figure 5.7: Effect of repeater size on leakage power. Leakage power and its variability 
increases with repeater size. ............................................................................................. 96 
Figure 5.8: Buffer inserted interconnect. .......................................................................... 97 
xv 
 
Figure 5.9: An interconnect between the transmitter and receiver (a), optimal buffer 
insertion (b), buffer insertion under area constraint (c). .................................................... 98 
Figure 5.10: Optimal repeater size and inter-repeater segment length (both normalized) for 
different area ratios. ......................................................................................................... 99 
Figure 5.11: Delay variability as a function of different ratios of repeater size, in the 
absence of crosstalk (a), Dependence of delay variability on repeater size and inter-repeater 
segment length (b). ........................................................................................................ 102 
Figure 5.12: Comparison between analytical model and simulation results for performance 
degradation due to area scaling. ..................................................................................... 103 
Figure 5.13: Performance, area, power and performance certainty trade-off curves. ....... 104 
Figure 6.1: Simple Core-Core link consisting of multiple interconnects (a), Functional unit-
Router and Router-Router links in a Network-on-Chip (b). ............................................ 107 
Figure 6.2: Structure of a multi-bit bus, where the number of interconnects in a fixed 
channel width  depends on the interconnect width and spacing. (a) the cross-sectional 
view showing different dimensions and (b) the top view of the bus indicating outer and 
middle lines. The input signals on any two adjacent lines are opposite in phase, thus 
simulating the worst case of crosstalk. Each line in the bus can be considered as an 
aggressor or victim, as they can affect the performance of each other. ............................ 108 
Figure 6.3: Capacitance curves for minimum width global interconnects of 18nm plotted as 
a function of interconnect spacing. ................................................................................. 110 
Figure 6.4: Capacitance curves for 18nm global interconnects plotted as a function of 
width at minimum interconnect spacing. ........................................................................ 110 
Figure 6.5: The total capacitance of an interconnect (not at the outer edge) of a bus in 18 
nm technology plotted as a function of the interconnect spacing and width. ................... 111 
Figure 6.6: Propagation delay of the middle interconnect of minimum width of a bus for 
the given three technologies plotted as a function of the spacing between the conductors.
 ...................................................................................................................................... 112 
Figure 6.7: Propagation delay of the middle interconnect of a bus with neighbouring 
interconnects at minimum spacing for the given three technologies plotted as a function of 
the width of the conductors. ........................................................................................... 113 
Figure 6.8: Optimum number of repeaters for minimum interconnect delay for different 
lengths of the global interconnect plotted as a function of the interconnect width. The 
interconnect is of 13 nm technology and the spacing between interconnects is Smin. ....... 114 
xvi 
 
Figure 6.9: Optimum repeater size for minimum interconnect delay for different 
interconnect widths (global interconnect) for 13 nm technology. The spacing between 
interconnects is Smin. ...................................................................................................... 115 
Figure 6.10: Data rate per wire of a channel bus in 13nm technology plotted as a function 
of spacing and width. ..................................................................................................... 116 
Figure 6.11: Maximum allowed interconnect length for a particular bandwidth with and 
without the use of repeaters for the given three technologies. These curves have been 
plotted for minimum interconnect width and spacing. .................................................... 117 
Figure 6.12: Scatter plot of interconnect resistance and capacitance with thickness variation 
of 3σ=5% in a 13nm technology interconnect of 1mm length. ........................................ 119 
Figure 6.13: Scatter plot of interconnect resistance and capacitance with width and 
thickness variation of 3σ=5% in a 13nm technology interconnect of length 1mm. .......... 119 
Figure 6.14: Contribution of different parametric variations on the delay of a bus line of 
length 1mm of minimum width and spacing in 13nm technology. .................................. 120 
Figure 6.15: Mean delay (in picoseconds) of interconnects (without repeaters) in the 
channel bus of 13 nm for different geometrical configurations under variability Case 1. 122 
Figure 6.16: The standard deviation (in picoseconds) of the delay of interconnects (without 
repeaters) in the channel bus of 13nm for different geometrical configurations under 
variability Case 1. .......................................................................................................... 122 
Figure 6.17: Delay variability (%) of interconnects (without repeaters) in the channel bus 
of 13nm for different geometrical configurations under variability Case 1. ..................... 123 
Figure 6.18: The number of repeaters per unit length required for different interconnect 
dimensions (width and spacing) for a 13 nm bus under worst crosstalk. The numbers have 
been rounded-off. ........................................................................................................... 123 
Figure 6.19: The size of the repeaters for different interconnect dimensions (width and 
spacing) for a 13 nm bus under worst crosstalk. The repeater sizes have been rounded-off.
 ...................................................................................................................................... 124 
Figure 6.20: Mean delay (in picoseconds) of interconnects (with repeaters) in the channel 
bus of 13nm for different geometrical configurations under variability Case 1. .............. 124 
Figure 6.21: The standard deviation (in picoseconds) of the delay of interconnects (with 
repeaters) in the channel bus. ......................................................................................... 125 
Figure 6.22: Delay variability (%) of interconnects (with repeaters) in the channel bus of 
13nm for different geometrical configurations under variability Case 1. ......................... 125 
xvii 
 
Figure 6.23: Bandwidth of the individual interconnect lines (without repeaters) in Gb/s 
given as a function of the interconnect width and spacing for 13 nm. ............................. 126 
Figure 6.24: Bandwidth of the individual interconnect lines (with repeaters) in Gb/s given 
as a function of the interconnect width and spacing for 13 nm. ....................................... 126 
Figure 6.25: Total bandwidth (Gb/s), without repeaters, plotted as a function of 
interconnect width and spacing. ..................................................................................... 127 
Figure 6.26: Total bandwidth (Gb/s), with repeaters, plotted as a function of interconnect 
width and spacing. ......................................................................................................... 127 
Figure 6.27: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm 
technology without repeaters. ......................................................................................... 128 
Figure 6.28: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm 
technology with repeaters............................................................................................... 129 
Figure 6.29: Total bandwidth per unit power (Gb/s.mW) consumption for interconnects 
with repeaters. ................................................................................................................ 129 
Figure 6.30: Surface plot of the area consumed by the channel bus interconnects, with and 
without repeaters. ........................................................................................................... 130 
Figure 6.31: The figure of merit  plotted as a function of spacing and width for the 
repeater inserted interconnect. ........................................................................................ 131 
Figure 6.32: A multi-bit communication link. Tapered buffers have been used on the 
transmission side, whereas flip-flop registers have been used at the receiving end. ......... 132 
Figure 6.33: Probability of link failure as a function of operating frequency. .................. 134 
Figure 6.34: Structure of a semi-serial communication channel. ..................................... 136 
Figure 6.35: Conventional shift-register type SerDes...................................................... 136 
Figure 6.36: Wave front train Serializer and Deserializer [138]. ..................................... 137 
Figure 6.37: Different performance metrics for a bus with different serialization ratios (1, 
1.5, 2.0…, 4.5 corresponding to S= 1Smin to 8Smin or W= 1Wmin to 8Wmin), (a) by increasing 
spacing and keeping width constant, (b) by increasing width and keeping spacing constant.
 ...................................................................................................................................... 139 
Figure 6.38: Leakage power normalized with the total power for different link utilization 
rates. .............................................................................................................................. 140 
Figure 7.1: Coupled RC transmission line model with distributed RC parameters. ......... 144 
Figure 7.2: Typical responses of aggressor and victim lines during up/up transitions for 
finite lines with open ends. ............................................................................................. 146 
xviii 
 
Figure 7.3: Typical responses of aggressor and victim lines during up/down transitions for 
finite lines with open ends. ............................................................................................. 146 
Figure 7.4: Typical responses of aggressor and victim lines during up/up transitions for 
finite lines with capacitive loads..................................................................................... 147 
Figure 7.5: Typical responses of aggressor and victim lines during up/down transitions for 
finite lines with capacitive loads..................................................................................... 147 
  
xix 
 
List of Tables 
Table 2.1: Interconnect Technology Parameters for the Three Wiring Tiers ..................... 21 
Table 3.1: Statistical Analysis of the Timing Parameters of a Standard Flip-flop .............. 58 
Table 3.2: Statistical Delay Characteristics of Different Elements of the Link. These values 
have been taken from the characterization data of different elements................................ 63 
Table 4.1: Statistical Analysis of the Timing Parameters of the Standard Flip-flop shown in 
Figure 3.16 ...................................................................................................................... 71 
Table 4.2: Goodness of Fit Statistics (for Figure 4.5) in terms of R-Square, Sum of Squares 
due to Error (SSE), Adjusted R-Square, Root Mean Squared Error (RMSE) .................... 74 
Table 4.3: Statistical Parameters of the MAX Delay Distribution of the Complete Pipeline
 ........................................................................................................................................ 83 
Table 5.1: Statistics of Power Measurements for MSI ...................................................... 94 
Table 6.1: Coefficients of the delay model for different switching patterns [135]. .......... 112 
Table 6.2: Primary interconnect and device parameters based on the ITRS and the device 
model cards [76], [77]. The device parameters are for the uniformly doped devices. ...... 118 
Table 6.3: Performance of a parallel and a serial bus of degree 2 for the same throughput
 ...................................................................................................................................... 138 
Table 7.1: Monte Carlo simulation results for studying the effect of input signal variability 
on skew amplification. ................................................................................................... 149 
Table A.1: Mean delay (in picoseconds) of interconnects (without repeaters) in the channel 
bus of 13nm for different geometrical configurations under variability Case 1. The columns 
of the table show the interconnect spacing and the rows show the width. ....................... 155 
Table A.2: The standard deviation (in picoseconds) of the delay of interconnects (without 
repeaters) in the channel bus of 13nm for different geometrical configurations under 
variability Case 1. .......................................................................................................... 155 
Table A.3: Delay variability (%) of interconnects (without repeaters) in the channel bus of 
13nm for different geometrical configurations under variability Case 1. ......................... 155 
Table A.4: The size of the repeaters for different interconnect dimensions (width and 
spacing) for a 13 nm bus under worst crosstalk. The repeater sizes have been rounded-off.
 ...................................................................................................................................... 156 
Table A.5: The number repeaters per unit length required for different interconnect 
dimensions (width and spacing) for a 13 nm bus under worst crosstalk. The numbers have 
been rounded-off. ........................................................................................................... 156 
xx 
 
Table A.6: Mean delay (in picoseconds) of interconnects (with repeaters) in the channel 
bus of 13nm for different geometrical configurations under variability Case 1. .............. 156 
Table A.7: The standard deviation (in picoseconds) of the delay of interconnects (with 
repeaters) in the channel bus. ......................................................................................... 157 
Table A.8: Delay variability (%) of interconnects (with repeaters) in the channel bus of 
13nm for different geometrical configurations under variability Case 1. ......................... 157 
Table A.9: Bandwidth of the individual interconnect lines (without repeaters) in Gb/s given 
as a function of the interconnect width and spacing for 13 nm. ....................................... 157 
Table A.10: Bandwidth of the individual interconnect lines (with repeaters) in Gb/s given 
as a function of the interconnect width and spacing for 13 nm. ....................................... 158 
Table A.11: Total bandwidth (Gb/s) through the bus constrained in channel width , 
without repeaters in 13nm. ............................................................................................. 158 
Table A.12: Total bandwidth (Gb/s) through the bus constrained in channel width , with 
repeaters in 13nm. .......................................................................................................... 158 
Table A.13: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm 
technology without repeaters. ......................................................................................... 159 
Table A.14: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm 
technology with repeaters............................................................................................... 159 
Table A.15: Total bandwidth per unit power (Gb/s.mW) consumption for interconnects 
with repeaters. ................................................................................................................ 159 
Table A.16: Probability of link failure (in parts per thousand) of the individual lines of the 
channel under variability. ............................................................................................... 160 
Table A.17: Probability of link failure (in parts per thousand) for the channel under area 
constraint. ...................................................................................................................... 160 
 
  
xxi 
 
List of Abbreviations 
 
ADC  Analog to Digital Converter 
AMBA Advanced Microcontroller Bus Architecture 
CDF  Cumulative Density Function 
CDN  Clock Distribution Network 
CMOS  Complementary Metal Oxide 
CMP  Chemical Mechanical Polishing 
DAC  Digital to Analog Converter 
DCs  Data Channels 
DFF  Data Flip-Flop 
DSM  Deep Sub-micron 
DSPs  Digital Signal Processors 
FFs  Flip-Flops 
FPGA  Field Programmable Gate Array Logic 
FUs  Functional Units 
IC  Integrated Circuit 
ILD  Inter layer Dielectric 
IP  Intellectual Property 
ITRS  International Technology Roadmap for Semiconductors 
LER  Line edge roughness 
LFP  Link Failure Probability 
MC  Mont Carlo 
MPU  Microprocessor Unit 
MSI  Minimum Size Inverter 
NAs  Network Adapters 
NoC  Network-on-a Chip 
xxii 
 
OTV  Oxide thickness variation 
PDF  Probability Density Function 
RAM  Random Access Memory 
RO  Ring Oscillator 
ROM  Read only Memory 
RV  Random Variable 
SerDes  Serializer Deserializer 
SoC   System-on a Chip 
SSTA  Statistical Static Timing Analysis 
STA  Static Timing Analysis 
VC  Video Converter 
µP  Microprocessor 
  
xxiii 
 
Author’s Declaration 
 
This thesis presents the work that was carried out at the Department of Electronics and 
Electrical Engineering, University of Glasgow under the supervision of Dr. Fernando 
Rodríguez-Salazar, during the period from June 2007 to January 2011. I declare that the 
work is entirely my own, except where reference is made to the work of others, and it has 
not been previously submitted for any other degree or qualification in any university. 
 
Faiz-ul-Hassan 
Glasgow, UK 
May 2011 
 
 
Chapter 1                                                                                                              Introduction 
 
1 
 
 
 
 
 
 
 
Chapter 1 
 
 
Introduction 
 
As transistor gate lengths continue to shrink according to Moore’s law [1], designers are 
able to integrate increasingly complex systems in a single microchip. Although in principle 
it is possible to construct a multi-billion transistor chip in today's technology, the practical 
problems faced while designing and testing such designs have proven to be too arduous, as 
evidenced by the increasing designer's productivity gap [50]. Techniques, such as SoC 
design, where the design complexity is managed by the use of a hierarchy of 
interconnected modules, have been introduced to overcome this limitation. A typical SoC 
may include different functional units (FUs) like Microprocessors μPs
, Digital Signal 
Processors (DSPs), Random Access Memory (RAM), Read only Memory (ROM), Digital 
to Analog Converters (DACs), Analog to Digital Converters (ADCs), Video Controllers 
(VCs) and several other Intellectual Property (IP) elements, which typically have already 
been designed and validated independently (perhaps by third parties). The current state-of-
the-art SoCs allow the design and integration of highly diversified and complex systems 
using adaptive circuits and increased parallelism [2]. Figure 1.1 shows the example of a 
SoC with diversified functionalities. For such systems, the designer still faces a number of 
challenging problems in the design, project management, simulation and verification of 
these devices. For instance, as the number of FUs integrated into a SoC increases, the role 
played by the on-chip communication structures becomes progressively important. 
However the semiconductor industry predicts that future generation of SoCs may possibly 
Chapter 1                                                                                                              Introduction 
 
2 
 
contain several thousands of cores. According to the International Technology Roadmap 
for Semiconductors (ITRS), on-chip communication is becoming the limiting factor in 
designing high performance and power efficient SoCs. 
 
Figure 1.1: Typical system implementation of Marvell 88F6282 SoC [3]. 
1.1 Interconnect-Centric Design Paradigm 
Historically, the performance of designs was limited by that of the individual functional 
units, as communication (through wires) was substantially faster than computation (via 
transistors). However, the effect of technology scaling is not equally favourable for 
transistors and wires. With technology scaling, the performance of the devices is 
continuously improving, whereas the wires are becoming relatively slower, as highlighted 
by the ITRS [4] and shown in Figure 1.2. Several clock cycles are required for the signals 
to travel across newer chips. Therefore modern SoC designs, which are abundant with 
interconnects, are faced with the difficult task of orchestrating the computation of a large 
number of fast local islands, across the whole chip, by using (relatively) progressively 
slower interconnects. In order to mitigate this problem, the design paradigm has shifted 
from computation-centric to interconnect centric, in-line with the SoC methodology as we 
have seen. In DSM region, the interconnect has become the main bottleneck in the 
designing of high performance and complex SoCs [5], [6]. The design of efficient 
interconnects is affected by many issues, as detailed in the following subsections. 
Chapter 1                                                                                                              Introduction 
 
3 
 
 
Figure 1.2: Projected relative delay of devices and interconnects (local and global) for different technology 
generations. The relative performance of the global interconnect is decreasing with technology scaling. 
1.1.1 Scaling 
The objective of the technology scaling is to produce faster devices, increase on-chip 
component density and reduce energy per storing [7]. The impact of technology scaling on 
the computational units is that they can now be constructed in smaller sizes (due to device 
scaling) with same or even with much more functionalities. Therefore the local wires in the 
cores reduce. However, the global wires which are used to connect cores do not reduce. 
This allows the cores to operate at a higher frequency, whereas the communication 
between the cores do not speeds up in the same proportion [8]. Again according to ITRS 
the interconnect width and pitch decreases with technology scaling, while chip size 
increases. The result is that the devices and local wires scales with the process technology, 
whereas the global interconnect do not improve much [9]. 
1.1.2 Power Dissipation 
The circuits are designed to operate at higher and higher frequencies in the interest of 
improved performance. However very dense interconnects switching at high frequencies 
becomes a major source of power consumption in the circuits and this trend is continuously 
growing with technology scaling. It has been reported that in a 130nm microprocessor, 
about 50% of the total power is consumed in the interconnect [10]. In circuit designing, 
power consumption is taken as a design constraint [11] and designers are always struggling 
Chapter 1                                                                                                              Introduction 
 
4 
 
to reduce it. However, in DSM technologies, reducing power consumption is quite 
challenging. The supply voltages are decreasing with technology scaling, requiring 
threshold voltages to decrease to prevent junction breakdown due to higher fields. 
However, there is an exponential dependence of the leakage current on the threshold 
voltage so it is expected to become the prevailing part of the total power [12]. Thus the 
dynamic power which was the dominant component of the power dissipation may not 
account for the maximum share of the total power in DSM technologies. 
1.1.3 Crosstalk 
In order to incorporate more and more functionality, the number of transistors on a chip is 
continuously increasing for every new generation of SoCs [13]. Reduction in the gate delay 
of devices has made it possible to switch the circuits at higher frequencies to obtain higher 
performance. But this has introduced an important issue of Crosstalk, which can introduce 
functional noise and delay variation. The main reason behind the emergence of crosstalk in 
DSM region is the increase of capacitive and inductive coupling due to the shrinkage of 
geometries. The functional noise can cause a glitch on the victim line which can travel to 
the dynamic node causing circuit state to change and resulting in functional failures. Each 
victim line in a bus may experience different coupling capacitance due to which their 
propagation delay may vary significantly under different switching patterns of the 
neighbouring lines. Therefore, this introduces uncertainty in the timing of the signals, thus 
affecting the communication reliability. As we will demonstrate, crosstalk failures are 
particularly sensitive to skew variations, which are of course a prevailing characteristic of 
future designs. 
1.1.4 Variability 
In the semiconductor industry, variability is often defined as the deviation of the process 
parameters from their intended or designed values. It has always been an important aspect 
of semiconductor manufacturing, process control and circuit design. As the semiconductor 
feature sizes continue to shrink with every new technology generation, the importance of 
the underlying variability is increasing; so much in fact that in DSM region, variability has 
become one of the major design challenges and is considered as the hindrance in the way 
of technology scaling [14]-[16]. The variability affects devices as well as interconnects 
causing significant unpredictability in the performance and power characteristics of the 
integrated circuits (ICs). This can lead to certain undesirable effects such as malfunctioning 
of circuits or performance degradation. 
Chapter 1                                                                                                              Introduction 
 
5 
 
Amongst various sources of device variability, intrinsic parametric fluctuations play an 
increasingly important role in contemporary and future CMOS devices [17]. These 
variations are introduced due to the discreteness of charge and matter and cannot be 
controlled or diminished by tightening the process tolerances. Some of the sources of 
intrinsic device variability are 
• Random dopant fluctuation (RDF) 
• Local oxide thickness variation (OTV) 
• Gate line edge roughness (LER) 
• Strain variations 
For state-of-the-art nano-scale circuits and systems, intrinsic parametric fluctuations have 
significantly affected the signal system timing [18] and behaviour of the circuits at higher 
frequencies [19]-[20]. In the circuits, it results in component mismatch and thus can reduce 
the yield and performance. 
 
Figure 1.3: Random discrete dopant effects in deep sub-micrometer CMOS devices [21]. The figure on the 
left hand side is a solid model of a CMOS transistor and that on the right side is its transparent version 
showing the discreteness due to dopants in the channel region. 
One of the most important sources of intrinsic parameter fluctuations is random dopant 
fluctuation (RDF) [17] which is caused by the randomness of the dopant position and 
number in the devices, thus making every device microscopically different from its 
counterparts. Therefore, the devices which are macroscopically identical will have 
different performance characteristics, mainly due to the variation in the threshold voltage 
(Vt
. Figure 1.3 shows the significance of RDF in deep sub-micrometer CMOS 
technologies. The normalized magnitude of the variations due to random dopant 
Chapter 1                                                                                                              Introduction 
 
6 
 
fluctuations increases steadily with technology scaling; as fewer number of dopant atoms 
are now left in smaller devices (see Figure 1.4 and [21]). 
 
Figure 1.4: Impact of technology scaling on the average number of dopant atoms in the channel. 
Variability is also affecting interconnects in deep submicron technologies causing variation 
in their width, spacing, thickness and inter-layer dielectric thickness. However, there could 
exist strong spatial pattern dependencies, especially when interconnect variability in 
chemical mechanical polishing (CMP) is considered. Therefore, total variability can be 
classified into systematic and random components. A significant portion of the systematic 
component of variations can be modelled by analyzing the layout characteristic; whereas 
random variations cannot be modelled.  
1.2 Research Overview 
The challenges imposed by interconnects in the development of high performance SoCs, 
and ways to overcome them are an active field of academic research. The aim of this thesis 
is to advance this effort, in particular on understanding how variability intrinsically affects 
communication performance, fault tolerance, signal integrity, area and power consumption 
of the interconnect. To achieve this goal the following objectives are defined. 
1.2.1 Research Objective 1 
On-chip communication involves the use of different circuit elements and interconnects to 
move data from one location of the circuit to another. The communication performance 
entirely depends on these elements. The intrinsic device variability cannot be eliminated in 
nanometer CMOS devices as it is process independent. This defines a minimum amount of 
Chapter 1                                                                                                              Introduction 
 
7 
 
variations in the circuit parameters for a particular size of the devices. In order to design 
communication structures for DSM technologies, an accurate and realistic estimation of the 
delay performance of all related circuit elements is required. Unfortunately, there is 
insufficient data available in this regard. Therefore the objective is: 
Accurate characterization of the delay performance of on-chip communication circuit 
elements for future CMOS technologies in the presence of variability due to RDF.  
This data is required to estimate the performance of a complete channel. Based on this 
information, it is possible to explore and design circuit level fault tolerant communication 
(sub) systems. 
1.2.2 Research Objective 2 
In the presence of characterization data of circuit elements, it is more convenient to use 
computationally efficient analysis techniques like Static Timing Analysis (STA) or 
Statistical Static Timing Analysis (SSTA). Presently, SSTA is preferred over STA, being 
computationally efficient and more accurate than STA. The SSTA technique can be used to 
evaluate the performance of a communication link. However, its accuracy strongly depends 
on accurate representation of the characterization data of the associated circuit elements. 
So far, underlying timing distributions are assumed to be Normal, but its validity needs to 
be investigated in DSM technologies. Therefore objective 2 of this thesis is: 
Study the nature of the timing distributions of communication elements and try to find their 
accurate probability density function. Once this is done, apply these distributions for the 
SSTA of a large communication channel. 
1.2.3 Research Objective 3 
As pointed out in [10], as much as 50% of the chip power is consumed by the global 
interconnects. This power is mainly dissipated in the drivers and repeaters used to improve 
the delay performance of interconnects. Different coding techniques are used at software 
level for efficient data transmission [22], [23]. A power optimal repeater insertion 
technique proposed in [24] is commonly used along with data coding. This technique gives 
excellent results in terms of power and area savings at the cost of nominal performance 
degradation. However, the implications of this technique are yet to be investigated for 
DSM technologies where variability and leakage power effects become quite prominent. 
Therefore objectives 3 of this thesis is: 
Chapter 1                                                                                                              Introduction 
 
8 
 
To measure different components of power dissipation in repeaters of future technology 
generations. This data can be useful by the designers to make a choice between low 
activity parallel links or high activity serial links (as low activity parallel links will 
dissipate a large amount of the leakage power as compared to the serial links, for 
particular data requirements). Similarly, a power-optimal repeater insertion technique 
which accounts for delay variability is required to be developed. 
1.2.4 Research Objective 4 
Quite significant amount of academic work has been undertaken in finding the optimum 
configuration of a multi-bit communication channel for best possible performance under 
power and area constraints [25]-[27]. Again very little work is found in this area 
considering variability in the figure of merit. So objective 4 is: 
Find the optimum configuration of the channel link which gives best bandwidth under 
power, area and variability constraints. Moreover, a comparison of serial and parallel 
links is also required to be made in this perspective. 
1.2.5 Research Objective 5  
In order to ascertain signal integrity in the channel bus, accurate modelling of the crosstalk 
in aggressor and victim lines is required. In the past, many researchers have published 
crosstalk analysis models and algorithms [28]-[30] but all of them either require numerical 
techniques to solve them or do not give sufficient insight into the underlying crosstalk 
effects on signal responses. In order to reduce this difficulty, this thesis aims: 
To find closed form expressions that give accurate voltages for the aggressor and victim 
lines in time domain, as a function of wire length, due to switching transitions on them. 
Also study the effect of variability on the delay performance of interconnects in the 
presence of crosstalk. 
2.1 Thesis Organization 
The rest of the thesis is organized as follows: 
Chapter 2- In the beginning of the chapter, different structures used for on-chip 
communication are briefly discussed. Subsequently, different performance metrics that 
have been used throughout the thesis to evaluate the performance of on-chip 
communications are defined. 
Chapter 3- In this chapter, the performance of on-chip communication structures under 
device variability has been characterized through HSPICE simulations. The 
Chapter 1                                                                                                              Introduction 
 
9 
 
characterization results of all the basic elements have been included and discussed. At the 
end of the chapter, a methodology is given that can be used to estimate the performance of 
a complete channel link using the characterization data. Moreover, link failure probability 
has been estimated using this approach. 
Chapter 4- If we talk about core-to-core or router-to-router (for NoCs) communication 
links, flip-flops are normally used at the input and output of the functional units. Therefore, 
the output of a router or functional unit is emitted from the flip-flops and is then amplified 
through the tapered buffer drivers before transmitting through the link. Similarly, at the 
receiving end, flip-flops are used at the input of the functional unit or router. Again, flip-
flops are also used in pipelined interconnects. Therefore, in order to estimate the 
performance of a link using Statistical Static Timing Analysis (SSTA), accurate 
representation of the characterization data of the timing parameters of the flip-flops (in the 
form of PDFs) is required. Furthermore, accurate approximation of the probability 
distribution functions is also required. In this chapter, this aspect has been described in 
detail and its application in pipelined communication circuits has been discussed. 
Chapter 5- In the start of this chapter, the measurement results for the power dissipation in 
repeaters for the given three technology generations have been presented. The impact of 
device variability on the leakage power has also been studied and its implication on NoC 
links has been discussed. In the next part of this chapter, the optimization of the 
performance of a single wire link under area, power and variability constraints has been 
described. The impact of repeater size and inter-repeater segment length on the delay, 
power, area and variability has been discussed and an optimization scheme has also been 
proposed. 
Chapter 6- This chapter describes the performance of a multi-bit parallel link under area 
and power constraints. The optimization of bandwidth under area, power and variability 
constraints has been discussed. Moreover, a comparison of parallel vs. serial links has also 
been described. 
Chapter 7- In this chapter analytical model for the voltages at aggressor and victim lines 
under crosstalk effects have been presented. The validity of the data through comparison 
with the simulation results has been demonstrated. Moreover, the effect of crosstalk on 
input skew variability has been studied. 
Chapter 8- This chapter makes a conclusion of the study and also mentions some future 
work. 
Chapter 2                                                                       On-Chip Communication Structures 
 
10 
 
 
 
 
 
 
 
Chapter 2 
 
On-Chip Communication 
Structures 
 
 
A SoC design typically consists of many functional units (FUs) that work together to 
perform desired functions. The FUs always need to communicate with each other during 
the execution of the application and it is the responsibility of the on-chip communication 
structure/ architecture to provide a mechanism for the correct and reliable transfer of 
information from the source units to the destination units [31]. In addition to this, the on-
chip communication structure must satisfy certain metrics like latency, bandwidth, area and 
power dissipation. The performance of SoC designs largely depends on the choice and 
design of the underlying communication architecture. Therefore, depending upon the 
performance requirements, a suitable communication architecture is designed or selected 
for the SoC design. 
2.1 Communication Architectures for SoCs 
2.1.1 Buses 
The simplest on-chip communication architecture which is widely used in SoCs is the bus 
interconnection network [8]. In its simplest form, a bus is a group of wires which provides 
Chapter 2                                                                       On-Chip Communication Structures 
 
11 
 
a communication media for the exchange of data between different functional units 
connected to it. Figure 2.1 [31] shows the example of a simple system with many 
functional units connected through on-chip buses. 
 
Figure 2.1: A SoC in which different components are integrated through the bus communication architecture. 
There are several types of bus configurations used in SoCs and the simplest one is called 
simple shared bus, as shown in Figure 2.2. In this case only one FU at a time has a control 
over the bus and transfers data. If some other unit also tries to use the bus at the same time 
in order to transfer data, this will cause bus contention. Arbitrators are used to resolve the 
conflict who gives the control to one of the units on the basis of the assigned priorities. In 
bus based systems, this is one of the major problems and efforts have been made to reduce 
this problem. 
 
Figure 2.2: A simple shared bus, allowing different FUs to share the same communication channel. 
Chapter 2                                                                       On-Chip Communication Structures 
 
12 
 
In such systems, every unit attached to the bus adds capacitance which results in large 
delays and large power consumption. This allows only a limited number of components to 
be attached with the bus in order to keep the delay and power consumption within 
permissible limits. Due to this reason, the simple bus architecture is not scalable. 
 
Figure 2.3: A bus divided into two sub-buses using a bridge. 
This difficulty is typically reduced by dividing a common bus into several buses using 
bridges [32]. Figure 2.3 shows a bus split up into two sub-buses using a bridge. The 
implementation of bridges is fairly simple if it connects buses with same protocols and 
operating frequencies. There are also other types of bus configurations used in SoCs. 
Amongst them, Advanced Microcontroller Bus Architecture (AMBA) from ARM [33] 
defines several bus types which are widely used in SoCs. AMBA proposes various bus 
solutions for SoCs ranging from simple bus architectures to multi-master high performance 
bus structures. An example of an AMBA bus is shown in Figure 2.4. 
 
Figure 2.4: An example of AMBA bus. The bridge provides an interface to connect two different types of 
buses. 
Chapter 2                                                                       On-Chip Communication Structures 
 
13 
 
2.1.2 Point-to-Point Direct Links 
In this architecture, each functional unit is directly connected with a subset of other 
functional units on the chip, as shown in Figure 2.5. The point-to-point communication 
architecture eliminates the contention problem of shared medium (buses). Each functional 
unit, in this architecture, has a network interface block, usually called a router and is 
directly connected with the neighbouring functional units through the communication 
links. These links can either be of input, output or bidirectional type. Unlike buses, as the 
number of routers (nodes) in this architecture increases, the total bandwidth increases. This 
property makes point-to-point links suitable to make large scale systems [34]. 
Unfortunately the number of links (and hence the power and area) grows with the square of 
the number of functional units. Hence this architecture is not promising for very large 
systems. 
 
Figure 2.5: A point-to-point communication architecture. 
2.1.3 Network Architecture 
Network-on-Chip (NoC) has been proposed as a promising solution for on-chip 
communication in large SoC designs, where the complexity of the design is managed by 
the use of a number of networked, but self contained blocks [35], [36]. NoC provides a 
generalized scheme for on-chip global communication. Routing nodes (R) are spread over 
the chip and connected by point-to-point communication links. The resources or IP blocks 
are connected to NoC through network adapters (NAs), as shown in Figure 2.6. In a 
Chapter 2                                                                       On-Chip Communication Structures 
 
14 
 
Network-on-Chip, data is exchanged amongst computing elements (IP blocks) by 
transmitting and relaying data packets through the interconnection network. There are 
similarities between the conventional computer networks and NoC, like layered 
communication models and decoupling of computation and communication. However, 
there are also some differences which are mainly due to the difference in the cost ratio of 
wiring and processing resources [37].  
Memory
DSP
UART
uP
CPU
Router
Network Adapters
Link
 
Figure 2.6: A conceptual realization of a NoC [34]. 
In NoC the whole chip can be partitioned into several regions, each of which contains one 
(or several) IP block(s). These IP blocks can operate with their own clocks and exchange 
data with other IPs through the switches and communication links. In this way the 
requirement of a global synchronization is relaxed. Computations are undertaken within 
locally synchronous IP blocks, and global synchronization is obtained by the execution of 
semantics embedded within the global communications network. Similarly, in addition to 
communication infrastructure, NoC can also provide standard IP interfaces which will 
facilitate the reuse of already verified IP resources [37]. This can simplify the design 
process and also reduce verification efforts. Due to a layered structure, the signal integrity 
issues can be addressed at physical, data-link or any higher layer [38]. 
NoC can be constructed in different types of topologies such as 2D mesh, Star, Torus, 
Octagon, Hypercube [37], [39]. The topology defines the connectivity and layout of the 
nodes and links on the chip. A 4  4 grid topology is shown in Figure 2.7 which presents a 
regular structure. The topology can be application specific having an irregular structure. 
Chapter 2                                                                       On-Chip Communication Structures 
 
15 
 
Depending upon the specific requirements of, say bandwidth, the protocol dictates how the 
nodes and links of NoC will be utilized in the operation. 
 
Figure 2.7: A 4×4 grid structured NoC. Each intellectual property (IP) block is connected to a router through 
a network interface (NI) adapter. The routers are connected with each other through communication links in a 
certain topology. 
2.2 Link as an important Communication Media 
In all communication architectures, the underlying communication links between the 
functional units or between the functional units and routers are always used. These links 
form the backbone of any communication architecture. These links can be synchronous, 
asynchronous or self-timed. However, in this thesis we have chosen to focus on 
synchronous links due to their prevalence in the industry. Ideally these links should consist 
of a certain number of parallel wires running between the source and destination. However, 
in practical circuits (especially in DSM technologies), their construction is not so simple in 
order to meet certain design requirements. Therefore, it is of great importance to study 
these links in detail to design high efficiency links. 
A link can be bidirectional or unidirectional as shown in Figure 2.8 and 2.9 respectively 
[40]. A bidirectional link allows the signals to travel in either direction. This provides a 
flexibility in the routing of interconnects and makes it possible to effectively use available 
metal tracks on the chip. The implementation of this approach requires the use of tristate 
buffers on transmitter and receiver sides, as shown in Figure 2.8. 
Chapter 2                                                                       On-Chip Communication Structures 
 
16 
 
 
Figure 2.8: A bidirectional link. There is a shared interconnect between the transmitter and receiver. 
A unidirectional channel allows the signals to travel only in one direction and thus suggest 
that a pair of wires should be used in each channel. This approach is less flexible than the 
bidirectional approach for routing the tracks on the chip, however it provides less 
contention and more bandwidth. 
 
Figure 2.9: A unidirectional link. 
Furthermore, in each interconnect line, different circuit elements like tapered buffer 
drivers, repeaters and flip-flops are used and there are two basic designs for the 
interconnect-repeater inserted interconnects and flip-flop (or latch) inserted pipelined 
interconnects, as shown in Figure 2.10 and 2.11 respectively.  
 
Figure 2.10: Repeater inserted interconnect. 
Chapter 2                                                                       On-Chip Communication Structures 
 
17 
 
 
Figure 2.11: Flip-flop inserted pipelined interconnect. 
2.3 Performance of On-Chip Communication 
The performance of on-chip communication in the physical layer can be evaluated from 
several aspects. In this thesis, however, we consider the metrics exposed in this chapter as 
important for interconnect centric circuits. We start with a short review on interconnect 
design. Subsequently, some basic concepts and the mathematical equations describing 
these metrics are provided. We will make use of these metrics in subsequent chapters for 
evaluating the merits of different interconnects and for quantifying the effects that 
variability introduces in the design. 
2.4 Interconnect Modelling in DSM Technologies  
In early days of VLSI design, the clock speeds and integration densities on the chip were 
low and so the signal integrity effects were minimal. However, with rapid evolution of the 
semiconductor technology, several important issues associated with interconnects in deep 
sub-micron technologies have emerged that are effecting the performance of high speed 
circuits. The problems such as interconnect delay, device and interconnect variability, 
power dissipation, crosstalk, substrate coupling, inductive coupling and IR drop are among 
the many emergent challenges which the circuit designers are facing [5], [6].  
The fundamental parameters influencing the interconnect delay are on-resistance of the 
driver, output capacitance of the driver and wire parasitics. The interconnect parasitics of 
interest are the wire resistance and the wire capacitance (and inductance for very high 
frequency signalling). These parasitics are a function of the physical properties of the 
construction and layout of the wires, and will act as an RC load increasing the propagation 
delay. 
A simple lumped element model is not sufficiently accurate to model state of the art 
interconnects, which of course are formed by continuously distributed RC (or RLC) 
elements in space. For simulation purposes, an approximation to a distributed element 
Chapter 2                                                                       On-Chip Communication Structures 
 
18 
 
model can be formed by breaking the interconnect into a large number (N) of smaller 
identical lumped sections (RLC cells). Some possible models are shown in Figure 2.12 
[41]. The accuracy of the simulation results depends on the number of RLC cells 
(segments) used (i-e the resolution of the lumped RLC model). However, this number is 
limited in practice by the correspondingly large simulation time of the model. 
 
C/2 C/2
R L
   
(a)     (b)      (c) 
 
      (d) 
Figure 2.12: Different interconnect models, (a) the ‘T’, (b) the ‘pi’ and (c) the ‘ladder’. A long wire is divided 
into N segments using ladder model and is shown in (d). 
2.4.1 Parasitic Resistance 
The signal speed through a wire depends, to a first order approximation, to the distributed 
RC constants in it, and hence to the parasitic resistance. The resistance depends on the wire 
dimensions and the type of the material used (gold, aluminium, copper or polysilicon). For 
an interconnect having thickness T and width W, the resistance can be calculated as [41] 
                                                                       2.1
 
where  is the resistivity and  is the length of the interconnect. Using this formula, the 
parasitic resistance of a wire of given dimensions can be estimated. 
With technology scaling, the wires are becoming thinner and so the parasitic resistance per 
unit length is increasing for minimum wire widths (according to ITRS). 
2.4.2 Parasitic Capacitance 
The accurate estimation of the parasitic capacitances of the interconnects in DSM 
technologies is a complex task. This is due to the fact that each interconnect is a three 
dimensional metal structure surrounded by a number of other interconnects with significant 
variations of shape, width, thickness and spacing with respect to other conductors and 
Chapter 2                                                                       On-Chip Communication Structures 
 
19 
 
ground planes [42]. Unlike the simplest way of calculating the capacitance of a parallel 
plate capacitor, the capacitance measurement in integrated circuits require the 
consideration of other factors like coupling capacitance and fringe capacitance in addition 
to ground capacitance, as shown in Figure 2.13. It has been observed that the contribution 
of the coupling capacitance in the total interconnect capacitance is increasing rapidly with 
technology scaling due to the reduction of interconnect spacing and an increased aspect 
ratio of wires. 
 
Figure 2.13: The cross-sectional view of an interconnect surrounded by two parallel similar interconnects 
over a ground plane (in the top global layer) showing different components of capacitance. 
An accurate estimation of the parasitic capacitance can be made by solving Maxwell’s 
equations in three dimensions, provided all material and geometrical details are available. 
Presently, computer aided software tools like Raphael [43] and FASTCAP [44] are 
available which are based on 2D or 3D field solvers which can calculate the parasitic 
capacitance with reasonable accuracy. However, some important aspects of interconnect 
parasitic capacitance can also be calculated using closed form models such as [45] as 
follows- The ground capacitance per unit length (considering the fringe flux) to the 
underlying plane is given by 
     3.28 #

  2$
%.%&' # ((  2$
).)*+                                2.2
 
Where  is the dielectric constant of the insulating material and ,, ( and  are the 
geometrical dimensions shown in Figure 2.13. Similarly the coupling capacitance per unit 
length is given by 
Chapter 2                                                                       On-Chip Communication Structures 
 
20 
 
-   1.064 #($#
  2
  2  0.5($
%.*12  #   0.8($
).3)34 #   2  2  0.5($
%.4%3
 0.831#   0.8($
%.%22 # 22  0.5($
'.23&5                                          2.3
 
The total capacitance of the wire can be calculated as 
678    2-                                                             2.4
 
Typically such derivations are limited to particular domains. In this case the valid range for 
using the approximation is 
0.3 9  9 10, 0.3 9
(
 9 10,
T
H 9 10 
Other closed form capacitance models with different interconnect configurations are also 
given in [46], [47], [134]. 
2.4.3 Inductance 
Inductance is another important parasitic. It can be described by the magnetic flux 
generated due to the flow of current in a loop. In integrated circuits several electrical loops 
can exist which produce inductive parasitic effects. At high enough operational frequencies 
of the circuits, the inductive impedance associated with interconnects become comparable 
or prevail over the resistive part [48]. The inductive interference caused due to the 
interaction of the magnetic fields can affect the signal integrity in the form of signal 
distortion, delay variation, crosstalk noise and glitches.  
In this research we have ignored the effects of inductance due to the following reasons: 
(a) The interconnect delay is not significantly effected by the inductive effects. For 
scaled global interconnects, the line resistance per unit length increases (according 
to the ITRS) and so the effects of inductance on the performance of global 
interconnects actually diminishes [49]. This is true, especially for the technologies 
and interconnect geometries we have considered in this thesis. Using the delay 
models of [143] for RC and RLC interconnects, it has been found that the percent 
increase in the propagation delay caused by neglecting inductance and considering 
an RLC line as an RC line, is nominal. For instance, for the global interconnects of 
25, 18 and 13 nm technology generations at S=1Smin and W=1Wmin, this increase is 
1.74%, 1.25% and 1.16% respectively. Similarly for the fastest interconnects with 
Chapter 2                                                                       On-Chip Communication Structures 
 
21 
 
S=10Smin, W=10Wmin (we used in this thesis), the maximum increase in delay is 
14.2%, 10.92% and 10.01% for the corresponding technologies. 
(b) The inductive effects have much longer spatial range in contrast to the capacitive 
effects which primarily depends on features in close proximity. The inductance 
matrix generally becomes very dense and is difficult to specify in a straightforward 
way. Therefore, accurately simulating inductive effects might not be practical [48]. 
(c) The effective interconnect inductance in a chip environment is very difficult to 
predict accurately. For the estimation of the inductance associated with a wire, the 
return current path should be defined. However, the return current path can be 
dynamic in a real chip environment, as it depends strongly on the signal condition 
and the overall layout and configuration of the integrated circuit. 
2.4.4 Impact of Technology Scaling on Interconnect Parasitics 
In order to study the impact of technology scaling on interconnect resistance and 
capacitance parasitics, particular interconnect parameters have been taken from the 
International Technology Roadmap for Semiconductors (ITRS) [50] for the technology 
generations of 25, 18, 13 and 10 nm. These are given in Table 2.1. It is important to 
mention that these lengths (technologies) correspond to the MPU physical gate length. The 
data shows that interconnect pitch is reducing and height is increasing with technology 
scaling for all three wiring tiers. The parasitics have been calculated using equations (2.1)-
(2.4) for minimum wire width and pitch and are plotted in Figure 2.14 as a function of the 
technology generation. 
Table 2.1: Interconnect Technology Parameters for the Three Wiring Tiers 
Parameter/ Technology Generation 25nm 18nm 13nm 10nm 
Local wiring pitch (nm) 136 90 64 50 
Local wiring aspect ratio 1.7 1.8 1.9 1.9 
Intermediate wiring pitch (nm) 136 90 64 50 
Intermediate wiring aspect ratio 1.8 1.8 1.9 1.9 
Global wiring pitch (nm) 210 135 96 75 
Global wiring aspect ratio 2.3 2.4 2.5 2.6 
Metal Resistivity (µΩ-cm) 2.2 2.2 2.2 2.2 
Dielectric Constant 2.5-2.9 2.3-2.7 2.1-2.5 1.9-2.3 
On-chip local clock frequency (MHz) 4,700 5,875 7,344 8,522 
Chip Size at production (mm2)  310 310 310 195 
Chapter 2                                                                       On-Chip Communication Structures 
 
22 
 
The curves show that the interconnect resistance is increasing more rapidly as compared to 
the capacitance which is decreasing (as wire widths are decreasing) with technology 
scaling. This indicates that RC delay increases with technology scaling and will contribute 
a larger portion of the path delay. 
        
(a)                                                                                 (b) 
Figure 2.14: Impact of technology scaling on interconnect resistance and capacitance per unit length (Fig. (a) 
and (b) respectively) for local, intermediate and global interconnects with minimum width and pitch. 
2.5 Performance Metrics 
2.5.1 Signal Delay 
Signal delay is the most important parameter describing the performance of on-chip 
communication, as it determines the maximum possible speed at which communication can 
be made. For reliable communication, it is required that the signals reach their destinations 
within the specified timing constraints. Consider the simple circuit of Figure 2.15 where a 
signal propagates through two buffers via the interconnect. The signal delay depends on 
the interconnect RC, the driver resistance and load capacitance.  
If <=>?@A  is the time between the step input voltage excitation Vin and output voltage Vout 
reaching 90 percent (0-90%) of the final value then according to Bakoglu [51], the signal 
delay to the first order is given by 
<=>?@A  0.7CD8E  0.4CD8CD8  0.78FE  0.78FCD8                       2.5
 
Where different interconnect parameters have been shown in the equivalent circuit in 
Figure 2.15 and are defined as follows: 
Chapter 2                                                                       On-Chip Communication Structures 
 
23 
 
CD8  total interconnect resistance, 
8F= on-resistance of the transistors in the buffer, 
CD8  total interconnect capacitance, 
E  load capacitance (capacitance of the output buffer). 
 
Figure 2.15: The circuit used for the derivation of the delay expression, where an interconnect is driven by an 
input buffer and at the output another buffer is connected.  
It is assumed that when nMOS transistor in the buffer turns ON, the pMOS transistor 
immediately turns OFF and vice versa (so no cross-bar current occurs). The on-resistance 
of nMOS and pMOS transistors can be approximated as 
8FD  >GGHD7IJKK L J6D
                                              2.6
 
and  
8FM  >GGHM7INJKK L J6MO                                              2.7
 
where, 
>GG  transistor gate length, 
  transistor width, 
H  mobility of carriers in the transistor, 
7I  gate capacitance per unit area. 
If the interconnect is long such that CD8 P E , then equation (2.5) reduces to 
Chapter 2                                                                       On-Chip Communication Structures 
 
24 
 
<=>?@A  0.4CD8CD8  0.78FCD8                                                 2.8
 
The expressions given above can provide a qualitative idea of the effects of different 
parameters on the delay. Moreover, they help to understand how variations in these 
parameters can affect the delay characteristics. 
It may be noted that the delay given by the expressions (2.5) and (2.8) is the Elmore delay 
[52]. Elmore delay is the most common and fastest approach for computing the signal 
delay of a wire. However, it accounts for only the first order moment and thus gives an 
approximation of the actual RC delay. When better accuracy in delay estimation is 
required, higher moments will have to be included using SPICE simulation. 
2.5.2 Skew 
The difference in the arrival times amongst a group of signals (at a specific location) is 
defined as the skew in the group. The skew is a critical parameter for high speed circuits, 
as it can limit their performance. Therefore its minimization is emerging as a difficult 
engineering challenge to afford proper circuit operation under the tight design margins left 
by the increasingly short clock period. Traditionally, skew has always been a point of 
concern for the clock distribution network in synchronous circuits. However, it is also 
becoming an important parameter to control in high speed data transmission between 
different functional blocks on the chip. 
2.5.2.1 Clock Skew 
There is a fundamental difference between clock distribution and data distribution because 
clock signal is periodic and predictable and every sequential element in a synchronous 
circuit needs it. Generally, the delay of the clock signals does not matter, as long as the 
clock signal reaches all circuit locations simultaneously [53]. However, in all practical 
systems (especially large synchronous systems), the clock signals do not exactly arrive at 
the same time at different spatial locations, and hence are skewed. Figure 2.16 gives an 
illustration of clock skew in a simple H-tree clock distribution network (CDN). 
The possible causes of skew in the clock signals may be the mismatch of the signal path 
length in the clock tree, imbalance of loads at different nodes of CDN, or process 
variations in the devices and interconnects. Clock drivers (buffers) of different sizes are 
used in the CDN which can be a potential source for introducing skew (due to device 
variability) along with the interconnect variability. 
Chapter 2                                                                       On-Chip Communication Structures 
 
25 
 
 
(a) 
(b) 
Figure 2.16: (a) A simple H-tree with 16 nodes, and (b) an illustration of skew in the clock signals due to 
difference in their arrival times at location 1 and location 16 of the H-tree.  
2.5.2.2 Skew in Data Links 
With the speed increase of digital systems, the demand for high speed links used for the 
exchange of data between different functional blocks on the chip has also increased. The 
link can consist of a single wire, a group of wires forming a parallel link or a more 
complex serial link. However, all these links have to perform the difficult task of 
orchestrating fast computation and data transfers through the functional units connected to 
them. 
The serial links use a small number of wires and usually operate at high frequency to meet 
bandwidth requirements. The overall bandwidth of serial links depend on the 
characteristics of the interconnect and the abilities (complexity) of the receiver (and 
transmitter).  
A high speed differential serial link is shown in Figure 2.17. Ideally, the differential signals 
travelling on two separate lines should remain synchronous at any time until they reach the 
receiver. However, in reality there are certain factors such as mismatching of wire lengths 
Chapter 2                                                                       On-Chip Communication Structures 
 
26 
 
due to routing constraints, variability in interconnects and/ or devices, due to which the 
signals arrive at slightly different times. This effect causes skew in differential pairs as 
shown in the figure. Skew beyond a certain value may not be tolerable for proper 
functioning of the receiver. Thus the skew beyond permissible limits can either limit the 
speed of these links or can cause functional errors. 
S(t)
-S(t)
Differential Channel
S(t)
-S(t)
No skew Skew
 
Figure 2.17: A high speed differential serial link. The skew beyond a limit can also effect its functioning. 
Alternatively, parallel links can also be used for data communication. Here a group of bits 
is simultaneously transferred through a number of wires (typically the number of bits is 
equal to the word size). Ideally, all the bits arrive simultaneously and are sampled with the 
arrival of a clock edge. Again, in reality this is an idealization and in reality signals 
travelling through different wires of a parallel link arrive at the destination at slightly 
different time instant as shown in Figure 2.18. 
 
Figure 2.18: An N-bit parallel link. The skew reduces the amount of the bit overlap. 
Chapter 2                                                                       On-Chip Communication Structures 
 
27 
 
Due to the presence of the skew in the signals, the amount of overlap at the destination 
reduces, thereby increasing the probability of data sampling error. The skew can either 
reduce the operational distance or the throughput of a parallel link. If left unbounded, data 
corruption and functional errors will ensue.  
2.5.3 Delay Variability 
The variability in the devices and/ or interconnect has a direct impact on the performance 
of circuits. In the presence of variability, the signal delay no longer remains a deterministic 
fixed quantity and so the arrival times of signals can vary significantly. Thus the signal 
paths which are not critical in a circuit design may become critical under the impact of 
variability and can result in the malfunctioning of the circuit; in other words, there ceases 
to exist a unique critical path. On-chip communication circuits may also suffer from such 
variability issues and can affect the performance of circuits.  The delay variability is, 
therefore, an important design metric and should be considered in the design process for 
making accurate signal timing plans. 
Under the impact of variability, the signal delay becomes a random variable (RV). The 
characteristics of this RV can be determined by computing its probability distribution 
function (PDF) or cumulative distribution function (CDF).  The moments of the probability 
distribution function represents their different characteristics. For instance, the first 
moment represents the mean value (µ) and the second moment gives the dispersion of the 
distribution about the mean (in terms of the standard deviation, σ). Similarly, other aspects 
of the distribution such as whether the distribution is skewed or peaked are described by 
higher moments.  
The delay variability is defined as (3σ/µ) where σ is the standard deviation and µ is the 
mean value of a set of delay data. It provides a measure of the dispersion of delay values 
about the mean value. This metric should be as small as possible for the circuits. 
Depending upon the shape of the distribution, other higher moments are also required for 
accurate timing analysis. 
2.5.4 Crosstalk 
Crosstalk arises when a neighbouring wire (aggressor) unintentionally affects (couples 
energy into) another wire (victim). It occurs due to the coupling between the neighbouring 
wires and can be classified into functional noise and delay variation. Functional noise 
refers to a fluctuation in the signal state of a quiet wire (non-switching) due to switching in 
Chapter 2                                                                       On-Chip Communication Structures 
 
28 
 
the neighbouring wire. This noise produces a glitch that may propagate through the 
interconnect to the dynamic node or a latch and may tend to change the signal state. This is 
illustrated in Figure 2.19, where the effect is shown on a quiet victim line due to the 
switching in a neighbouring aggressor line. 
 
(a) 
 
(b) 
Figure 2.19: Two RC coupled interconnects. Due to switching of the aggressor line, a voltage is induced in 
the victim line as shown in (a). The equivalent circuit of the crosstalk model is given in (b). 
Crosstalk can also cause variation in the delay of signals depending on the phases of the 
aggressor and victim line signals. If the aggressor and victim lines switch in the same 
phase, the signal speed on the victim line will increase and this is called in-phase crosstalk. 
On the other hand, if the two signals switch in the opposite phase, the crosstalk will reduce 
the signal speed in the victim line and this is called out-of-phase crosstalk [54]. On a chip, 
an interconnect may have multiple couplings with neighbouring wires and simultaneous 
switching on these wires will increase the magnitude of the crosstalk, thereby affecting the 
propagation delay and introducing delay variations [55]. These delay variations may result 
in timing failures. Therefore, crosstalk effects are very critical in the designing of high 
performance circuits.  
There are several publications [30], [56], [57] which have discussed crosstalk in 
interconnects and have derived analytical expressions. A relatively simple expression used 
Chapter 2                                                                       On-Chip Communication Structures 
 
29 
 
to calculate the induced voltage due to a rising step of amplitude J== and rise time F at 
aggressor driver output, in an RC coupled interconnect is given by [58] and is 
JI  Q-J==R%F SR) TU
V8 WXY L UV8V6Z
 WXY 5 L R& TUV8 W[Y L UV8V6Z
 W[Y 5\   ,      < ] F        2.9
 
The equivalent circuit of the crosstalk model is shown in Figure 2.19(b), where @ , @ , Q 
and - are respectively the aggressor line resistance, total capacitance of the aggressor line, 
total capacitance of the victim line and coupling capacitance between the two lines. QVCD8 
is the victim line resistance and I is the driver resistance of the victim line. The victim 
resistance Q is QVCD8  I. The time constants R%, R), and R& are given by 
R%  _`& L 4a                                                          2.10
 
R)  2a`  R%                                                                 2.11
 
R&  2a` L R%                                                                  2.12
 
where, 
`  QQ  -
  @@  -
                                                 2.13
 
a  Q@@-  @Q  -Q
                                                 2.14
 
The above expressions clearly show the dependence of crosstalk noise on interconnect and 
device parameters.  
With technology scaling, signal speeds are increasing, interconnect aspect ratios are 
increasing and also interconnects are coming closer. Moreover, the supply voltages and 
also the design margins are reducing. More importantly, variability is also influencing 
crosstalk effects. Therefore, it is important to analyse the performance of on-chip 
communication networks in crosstalk environment under the impact of variability. 
2.5.5 Power Dissipation 
Buffer (repeater) insertion is a common technique to optimize the performance of global 
interconnects for on-chip communication networks. With technology scaling, more and 
more functionality is being integrated and thus on-chip communication networks are also 
growing rapidly. Moreover, the number of optimal buffers per unit interconnect length are 
also increasing (due to progressively resistive interconnect) and therefore very large 
number of these buffers are used in high performance designs [59]-[60]. Optimal repeaters 
Chapter 2                                                                       On-Chip Communication Structures 
 
30 
 
are used to construct delay optimal interconnections and are of a significant size. Thus they 
consume a proportionally large portion of the silicon and power [61]. The power 
dissipation has been pointed out as the main limiting factor in the scaling of the future 
CMOS circuits [62]. Therefore, power estimation for on-chip communication (and the 
whole chip) is an important metric to consider during the design process. 
The power dissipation in CMOS circuits comprises: (1) the dynamic (switching) power 
bcd
, (2) the short circuit power bce
 and (3) the leakage power b?>@f
. The average 
power can be expressed as the sum of these three components  
b678@?  bcd  bce  b?>@f                                                    2.15
 
A brief description of these power components is given below [24], [42]: 
2.5.5.1 Switching Power 
Switching power is the power dissipation whenever there is a state transition, from low-to-
high or from high-to-low, in the circuit. The energy during this transition is actually 
consumed in charging or discharging (low-high or high-low) the load capacitance 
connected at the output of the driver (a buffer). In deep sub-micron on-chip communication 
networks, the load capacitance consists primarily of the interconnect and gate capacitance. 
The switching power dissipation in a buffer driving an interconnect of length  having 
resistance g and capacitance  per unit length is given by [24] 
bcd  `hiN%  MO  jJ==& keEl                                              2.16
 
where 
% input capacitance of a minimum sized buffer, 
M  output parasitic capacitance of the minimum sized buffer, 
keEl  clock frequency, 
J==  power supply voltage, 
i  buffer size, 
`  switching or activity factor and gives the fraction of buffers switching during an 
average clock cycle. 
The switching power is independent of the rise or fall time of the input waveform. The 
expression of bcd (Eq. 2.16) shows that the switching power can be reduced by reducing 
the supply voltage J== . However, this is at the cost of increased delay. 
Chapter 2                                                                       On-Chip Communication Structures 
 
31 
 
2.5.5.2 Short Circuit Power 
The buffers or repeaters which are used to drive interconnects consist of inverters 
constructed with nMOS and pMOS devices. If the input to a buffer has a finite rise time 
and fall time, then during the switching process both nMOS and pMOS transistors may 
conduct simultaneously for a short interval of time, forming a direct path between the 
supply and ground for the flow of the current. The short circuit power is that dissipated 
during this eventuality. Unlike the switching power, the rise time and fall time play an 
important role in the determination of the magnitude of the short circuit power. If J6D and 
J6M are the threshold voltages of the nMOS and pMOS transistors respectively, then 
following condition holds during the short circuit phase 
J6D m Jno m J== L pJ6Mp 
Approximating the short circuit current by a triangular waveform, the total short circuit 
power is given by [24] 
bce  `<FJ==DqrsitcekeEl                                                2.17
 
where 
Dqrs minimum width of the nMOS transistor, 
i  transistor size 
tce u 65Hv/H  across all technologies. 
<F is given by 
<F  TgxN%  MO  gxi   gi%  
1
2 g&5 ln3                                 2.18
 
If the input rise and fall times are much larger than the output rise and fall times, the 
transistors will conduct for longer time and therefore short circuit current will increase. It is 
proposed in [63] that the short circuit current can be eliminated if the power supply voltage 
is adjusted such that 
J== m J6D  pJ6Mp 
Under this condition, both nMOS and pMOS transistors will not be ON simultaneously for 
any input voltage. However, this technique will make the circuit more vulnerable to noise 
effects due to reduced signal to noise ratio. 
Chapter 2                                                                       On-Chip Communication Structures 
 
32 
 
Figure 2.20 shows a rough sketch of the voltage and current waveforms of a simple buffer 
(inverter) circuit during its switching. Figure 2.20(b) shows the short circuit current and 
Figure 2.20(c) shows the switching current. Note that short circuit current is much smaller 
as compared to the switching current. 
2.5.5.3 Leakage (static) Power 
Ideally, the power dissipation in CMOS circuits is thought to occur only during their state 
transitions and once the circuits are in a stable state, there should not be any power 
dissipation. However, a leakage current flows through the CMOS circuits during any of the 
states. This constitutes an increasingly important component of the total power dissipation-
called leakage power.  
Time
Time
V
o
lt
a
g
e
Time
V
IN
V
OUTV
dd
S
h
o
rt
C
ir
c
u
it
C
u
rr
e
n
t
S
w
it
c
h
in
g
C
u
rr
e
n
t
(a)
(b)
(c)
t
r
 
Figure 2.20: (a) A rough sketch of voltage and current waveforms of a simple buffer circuit, (a) input and 
output voltage waveforms, (b) the short circuit current peaks appear when both nMOS and pMOS conduct, 
and (c) the switching current used for the charging and discharging of the capacitive load. 
Five major sources of leakage power in CMOS devices are [64] 
(i) Sub-threshold leakage, t({|
 
(ii) Gate oxide tunneling leakage, t}
 
(iii) Reverse bias junction leakages, t~J
 
(iv) Gate induced drain leakage, tt
 
(v) Gate current due to hot carrier injection, t
 
Chapter 2                                                                       On-Chip Communication Structures 
 
33 
 
These effects are becoming more important as the devices are miniaturized with 
technology scaling and so leakage power is rapidly increasing and dominating in the 
CMOS circuits [65]. 
The buffers used in on-chip communication also exhibit this mode of power dissipation. 
According to [24], the average amount of leakage power in the buffers inserted in the 
interconnect is given by 
b?>@f  J==t?>@f                                                                     2.19
 
 J== 12 t7GGsDqrs  t7GGMqrs i                    2.20
 
where, t?>@f  leakage current through the buffer, 
t7GGs t7GG  leakage current per unit width of nMOS (pMOS) transistor, 
DqrsNMqrsO  width of the nMOS(pMOS) transistor in a minimum size buffer(inverter). 
Like delay, statistical device variability has also introduced variability in the leakage power 
and has become a point of serious concern in deep sub-micron technologies.  Both delay 
and leakage power variability, are seriously effecting the performance, yield and reliability 
of the circuits and seems to be an obstacle in the progression of designing power-
constrained high performance circuits using miniaturized devices [65]-[67]. 
2.5.6 On-Chip Area 
On-chip communication networks are deeply spread over the whole chip to provide 
communication media to the functional units. However, as previously stated, they consume 
a larger portion of the chip area due to large number of buffers. In future technology 
generations, unconstrained optimal buffering of interconnects might require up to 80% of 
the total on-chip area [68]. 
The area of the on-chip communication network is simply the area occupied by the wires 
and the area of CMOS circuitry used to drive these wires (line drivers, buffers, switches, 
etc.). The total area of the repeaters of size i placed at regular intervals of length  in an 
interconnect of length  can be estimated as 
vF>M>@8>Fx  >GGi                                                                2.21
 
where, >GG  is the effective transistor gate length. (This is actually a lower bound; routing 
might add more area). 
Chapter 2                                                                       On-Chip Communication Structures 
 
34 
 
Limited area resources available on the chip have made this metric very important for the 
present and future system designs. 
2.5.7 Throughput 
Throughput is one of the important parameters of interest and is defined as the average rate 
of error free delivery of data over a communication channel. It is generally measured in 
bits per second (bps) or data packets per second. 
2.5.8 Bandwidth 
Bandwidth refers to the maximum capacity of error free data transmission over a 
communication channel. The higher the bandwidth, better will be system performance and 
so there are always been design efforts to maximize it. 
2.5.9 Parametric Yield 
Due to process variations, the uncertainty in the performance and power characteristics of 
the designs is increasing [69].  This can lead to a significant deviation of the manufactured 
products from their actual designs.  
Parametric yield is defined as the percentage of the manufactured dies which meet the 
specified frequency and power consumption requirements [70]. It can be calculated as 
bN 9 6@F>8O   
                                              2.22

Z
%
 
where,  is the observed delay or power dissipation and 6@F>8 is the corresponding 
constraint. 
The yield measurement could result in discarding a large number of dies which do not meet 
the performance or power criteria, even if they are otherwise functional. This results in 
parametric yield loss. Since power dissipation and delay are negatively correlated, fast 
designs may consume more power, causing an increased yield loss. Similarly, power 
efficient designs may not fulfill the performance requirements and again result in yield 
loss. Therefore, careful consideration of this metric is required in the designs. 
2.6 Performance Characterization Methodology 
In some recent studies, the effect of intrinsic parameter fluctuations introduced due to RDF 
and other sources, on the performance of CMOS circuits has been studied for the future 
technology generations [71]-[73]. The 3-D atomistic simulation method [74], [75] is used 
Chapter 2                                                                       On-Chip Communication Structures 
 
35 
 
to study the effect of different sources of variability at device level. However, this method 
is not feasible for circuit level analysis, being computationally expensive.  
In this research, the performance of on-chip communication circuits for the future 
technology generations of 25, 18 and 13 nm physical gate length bulk MOSFETs has been 
accurately characterized using Monte Carlo (MC) method and HSPICE simulations of a 
large number of distinct realizations of the circuit under investigation. The industry 
standard BSIM4 model card libraries have been used for the given technology generations 
[76]. These model card libraries are developed through parameter extraction strategy [77] 
in which the comprehensive Glasgow 3D statistical physical device simulations are 
performed and fluctuation information due to random dopant fluctuation (RDF) is 
transferred into the model card libraries. 
The devices in each library are macroscopically similar but are microscopically different 
due to the difference in the number and position of the dopant atoms in the channel. So all 
the devices in each library have different characteristics due to statistical variations in the 
device parameters and the distribution of these variations represents the distribution of 
variations found in the general population. For the statistical analysis, a Monte Carlo 
simulation method has been used (as previously stated) with random selection of the 
devices from the given model card libraries, while constructing different circuit 
realizations. The circuits are biased with a supply voltage of 1.1V, 1.0V and 0.9V for the 
technology generations of 25, 18, and 13 nm, respectively [78]. Different delay 
measurements taken during this study correspond to 50% of the signal levels during the 
transitions. Power measurements for the circuits have also been made through this 
methodology. Several sets of HSPICE simulations have been performed for the transient 
analysis of the circuits. 
2.6.1 Extraction of I-V Characteristics of MOSFETs 
The dependence of the device drain current on the gate voltage is given by the I-V 
characteristic curves. In order to validate the test methodology, the IV characteristics of the 
devices in the library has been measured. These curves have been plotted for the nMOS 
and pMOS devices of the given three technology generations and are shown in Figure 2.21. 
Each set of the curves is plotted for 200 devices taken from the model card libraries. The 
blue curve (with symbols) over the red curves and red curve (with symbols) over the blue 
curves is for the uniformly doped devices. These curves match the data in [145] and show 
that I-V characteristics of the devices in the model card libraries differ from each other 
Chapter 2                                                                       On-Chip Communication Structures 
 
36 
 
under the impact of RDF and lie on both sides of the uniformly doped device curves. It 
may also be noted that the spread of these curves increases with technology scaling. This 
hints that the delay characteristics of the devices (and circuits) will certainly be affected 
due to the variability in the I-V performance. 
Gate Voltage, V
G 
(V)
 
 
 
Figure 2.21: I-V characteristic curves of 200 devices for each of nMOS (left) and pMOS (right) for the 
technology generations of 25, 18 and 13nm. Along with each set of curves, the characteristic curve for the 
uniformly doped device is also plotted and the dispersion of other curves around this curve shows the effect 
of variability due to RDF. 
 
Chapter 2                                                                       On-Chip Communication Structures 
 
37 
 
2.7 Summary 
This chapter gives an introduction to the on-chip communication structures used in SoCs. 
In all the structures, the underlying communication links play an important role in their 
design. Therefore, modeling of interconnects used in these links is first presented. 
Subsequently, different performance metrics used to evaluate the performance of the 
communication structures in DSM regions have been discussed. Finally, the methodology 
used in this thesis to characterize the performance of different circuits is outlined. 
 
 
Chapter 3                                              Communication Structures under Device Variability 
 
38 
 
 
 
 
 
 
 
Chapter 3 
 
Communication Structures under 
Device Variability 
 
 
A clock distribution network (CDN) and a data channel (DC) consists of basic circuit 
elements like tapered buffer drivers, buffers (repeaters), flip-flops (or latches) and 
interconnects, as shown in Figure 3.1. Hence the performance of CDN and DC (and 
consequently the synchronous system) depends on the performance of these circuit 
elements. 
The performance of on-chip communication circuits (CDN or DCs) can be estimated either 
through modelling or simulation. However, it is very difficult (if possible) to accurately 
model these circuits while considering variability effects due to different parameters. In 
this situation, simulation can provide accurate results. The performance can be 
characterized by simulating the complete communication network or from the known 
performance of the individual communication elements. Again, evaluating the performance 
of a complete communication network through simulation is computationally very 
expensive and might not be feasible for large systems. Therefore, the performance of such 
large systems can be estimated with reasonable accuracy using the performance of the 
individual communication structures in some statistical framework. 
Chapter 3                                              Communication Structures under Device Variability 
 
39 
 
 
(a) 
 
(b) 
F
u
n
c
ti
o
n
a
l 
U
n
it
F
u
n
c
tio
n
a
l U
n
it
F
F
 R
e
g
is
te
r
 
(c) 
Figure 3.1: Communication structures in CDN and data channels: (a) an H-type CDN, (b) a repeater inserted 
synchronous data channel, (c) a flip-flop based pipelined data channel.  
Chapter 3                                              Communication Structures under Device Variability 
 
40 
 
In this chapter we present a systematic study to investigate the effect that variability will 
introduce in the communication structures for future technology generations. Such a study 
becomes important for designers and academia so that they can formulate efficient design 
methodologies for the coming technology generations under tight design margins and other 
technology challenges. 
3.1 Technology Scaling and Gate Delay 
In a particular technology generation, the maximum clock speed and the speed at which 
computation can be performed, is determined by the gate delay. On-chip communication 
will need to be designed to support these speeds in order to preclude data starvation. Due to 
statistical variation in the devices, gate delay is no longer a fixed quantity, but a random 
variable (RV) which follows a given distribution. For better estimation of the maximum 
clock speed, statistically accurate description of the delay is required to be derived with 
consideration of the effects introduced due to variability. In this section, we study the 
impact of device variability in the gate delay of an inverter in a given technology, as 
representative of delay in more complex combinational circuits and gates. This delay has 
been measured in terms of FO4 delay and is used as a reference or benchmark to which we 
can compare the results of the communication structures. The metric FO4 delay or “fan-
out-of-four inverter delay” has been used elsewhere [6] and is a quite reasonable metric, as 
four is the typical average gate connectivity in a digital circuit [79]. This is defined as the 
delay through an inverter driving four copies of itself. Since the effect of variability is 
more pronounced in smaller geometries, FO4 delay has been measured corresponding to 
 
Figure 3.2: The definition of FO4 delay. 
Chapter 3                                              Communication Structures under Device Variability 
 
41 
 
the delay of minimum sized inverters, as shown in Figure 3.2. Here we have used 
minimum sized inverters of size CD=25, 18 and 13 nm for the given three technology 
generations of 25, 18 and 13 nm, respectively. 
HSPICE simulations were performed (using the Monte Carlo method, described in section 
2.6) and FO4 delay measurements were taken for the given technology generations. The 
mean value of the FO4 delay is plotted for the three technologies and results are shown in 
Figure 3.3. The standard deviation of the FO4 delay is also represented in the form of error 
bars. It can be seen that the mean value of the FO4 delay decreases, whereas the delay 
variability increases, with the decrease of the gate length. This is to be expected. However, 
we are interested in determining the nature of the delay distributions. For this reason, 
histograms are plotted from the measurement data and shown in Figure 3.4. It becomes 
evident that the dispersion of the distributions increases with gate length scaling. 
Moreover, the distributions are asymmetric about the mean delay and the degree of 
asymmetry increases with the decrease of the gate length. The positively skewed nature of 
the distributions has a detrimental impact on the performance of the circuits as a significant 
number of samples beyond the nominal value imply a long tail which will certainly limit 
the speed of the circuits and also introduces reliability issues. 
F
O
4
 D
e
la
y
 (
p
s
)
 
Figure 3.3: FO4 delay for different technology generations. The error bars represent the uncertainty in 
delay1σ
. 
More importantly, it becomes obvious that the dispersion and the worst case of FO4 delay 
grow hyper-linearly as the technology scales down. Due to this fact, the performance of 
Chapter 3                                              Communication Structures under Device Variability 
 
42 
 
circuits will certainly be affected unless some corrective measures are not incorporated in 
their design. The effect becomes more important in the design of synchronous systems 
under the tight design margins typical of high performance circuits. It is obvious then, that 
variability in the devices warrants a careful consideration during the design of high 
performance circuits. 
25nm
18nm
13nm
FO4 Delay (ps)
1800
1600
1400
1200
1000
800
600
400
200
2 4 6 8 10 12
 
Figure 3.4: Delay distribution of minimum sized inverters with a fan-out of four for the technology 
generations of 25, 18, and 13 nm. 
3.2 Delay Uncertainty in Buffers 
The efficiency of high performance circuits not only depends on the performance of 
computational elements but also depends greatly on the communication network 
responsible for the exchange of data between the computational elements. Delay 
uncertainty in the clock signal can produce setup and hold time violations at the data 
registers. Similar violations can also occur in the data signals. A large number of buffers 
are used in these communication networks that can introduce delay uncertainty in the 
signals. For designing high performance circuits (with correspondingly tight timing 
constraints), the delay uncertainty will have to be reduced. Therefore, design 
methodologies that reduce delay uncertainty should be explored. 
Chapter 3                                              Communication Structures under Device Variability 
 
43 
 
The delay of a CMOS buffer (inverter) to the first order, as given by Bakoglu [51] is 
<=  0.7=FQE                                                                      3.1
 
where =FQ is the on-resistance and E is the capacitive load at the output of the inverter. 
The inverter resistance =FQ, which is approximated by averaging the drain currents at the 
extreme points (0 and JKK) of the high-to-low and low-to-high transitions, is given by 
=FQ  H7IJKK L J6
                                                    3.2
 
where, 
  transistor gate length, 
  transistor gate width, 
7I  gate capacitance per unit area, 
H  mobility of the transistor, 
JKK= supply voltage. 
A variation in these factors will cause the inverter resistance (and consequently the drain 
current) to change and eventually will result in variability of the gate delay. In deep sub-
micron (DSM) region, it is impossible to precisely control all transistor parameters during 
the fabrication process. Therefore in a batch of similar transistors, different parameters can 
have a complete distribution with some nominal value and a wide spread about this 
nominal value. For instance, due to variations (in particular to random dopant fluctuations), 
the threshold voltage J6 of the transistors will have some distribution (wider or narrower). 
Hence, the on-resistance of the transistors can no longer be treated as a fixed quantity; 
rather it will follow a distribution, resulting in the distribution of the inverter delay. Let 
=>Q  represents the effect of RDF on J6, then the on-resistance of the inverter will be given 
by 
=FQ  H7IJKK L J6=>Q
                                               3.3
 
Therefore, 
<=  0.7EH7IJKK L J6=>Q
                                               3.4
 
In order to evaluate the effect of variations in J6 on the delay of the inverter, we 
differentiate <= with respect to =>Q , yielding 
<==>Q 
0.7J6EH7IJKK L J6=>Q
&                                            3.5
 
This shows that the sensitivity of the inverter delay is inversely proportional to the size 
(width) of the inverter. In the same way, the delay sensitivity to other transistor parameters 
Chapter 3                                              Communication Structures under Device Variability 
 
44 
 
can be determined. Therefore, we can deduce that the simple technique of circuit scaling 
can be used to minimize the effect of RDF on delay variability. 
We proceed to quantify the effect of RDF on the delay performance of individual buffers 
of different sizes. To this end SPICE models are developed for the buffers of sizes 1, 2, 3, 
5, 7, 10, 15, 20, and 25 times CD, with a load of a 25CD buffer connected at their 
output (where CD = size of the minimum sized buffer = 25, 18 and 13 nm for the given 
three technology generations). The results of Monte Carlo simulations are shown in Figure 
3.5, where mean delay and delay variability are plotted for the given buffer sizes. It can be 
seen that the buffer delay and dispersion in delay is inversely proportional to the buffer 
size, as expected from equation (3.4) and (3.5). More importantly, the relation is not linear 
and a small increase in the size of the buffer can give us significant advantage towards the 
improvement in delay and delay variability, especially at smaller buffer sizes. 
It has also been found that there is a difference in the amount of delay variability for low-
to-high and high-to-low transitions, as shown by the dashed lines in Figure 3.6. For 
instance, it is larger during high-to-low transitions and the effect is more prominent at 
smaller buffer sizes. This is due to the inherent nMOS and pMOS asymmetries i-e the size 
of the pMOS transistor is normally taken as twice the size of the nMOS transistor to make 
identical delay in both swings. Therefore, while considering delay variability, its 
magnitude in both swings is required to be considered. 
   
(a)                                                                                    (b) 
Figure 3.5: Mean buffer delay (a), Delay variability (b), plotted as a function of buffer size for 18 nm 
technology generation. The curves have been plotted for the average response in low-to-high and high-to-low 
transitions. 
Chapter 3                                              Communication Structures under Device Variability 
 
45 
 
If we assume that delay variations in buffers of different sizes are independent of each 
other and if J=1CD
  31CD
 H1CD
⁄  is the delay variability of a minimum 
sized inverter in a given technology generation, then the delay variability of an inverter of 
size CD can be approximated as 
J=CD
  3CD
HCD
 u
31CD
 H1CD
⁄
√                                  3.6
 
(due to properties of the normal distribution). This relation can be used to make an estimate 
of the delay variability in a buffer of given size. It is, however, important to mention that 
equation (3.6) gives only an approximate result, especially in deep sub-micron 
technologies because this relation is valid for the distributions which are close to the 
normal distribution. However, we have seen that the delay distributions under RDF are 
skewed and the degree of skewness increases with scaling down of the technology.  
 
Figure 3.6: Delay variability plotted against buffer size for 18 nm buffers. The smaller dashed lines represent 
delay variability for low-to-high transition and bigger dashed lines for high-to-low transition. Similarly, the 
solid lines are for the average response. 
3.2.1 Skewness of Delay Distributions 
Skewness is a measure of the degree of asymmetry (lack of symmetry) of a probability 
distribution of a real valued random variable. The skewness of a distribution can be 
positive or negative or zero. If the tail on the right side of the probability density function 
is more pronounced than the left tail, the distribution is said to have positive skewness. In 
this case, the bulk of the values lie to the left of the mean. If the reverse is true, it is said to 
Chapter 3                                              Communication Structures under Device Variability 
 
46 
 
have negative skewness. Zero skewness indicates that the values are relatively evenly 
distributed on both sides of the mean. The skewness of a distribution is defined as 
)  H'H&' &Y
 
where HC is the ith central moment. 
As we have mentioned earlier, delay distributions of the buffers under RDF are positively 
skewed. The degree of skewness, however, depends on the size of the buffers. Figure 3.7 
shows the dependence of the skewness on the size of the buffers for 13 nm technology 
generation. The curve shows that the delay distributions corresponding to small buffers are 
significantly skewed and the degree of skewness decreases as the size of the buffers 
increases. Thus for larger buffers, the delay distributions tend to approximate Gaussian 
distribution. 
We will discuss skewness in more detail in Chapter 4. 
 
Figure 3.7: Skewness of delay distributions as a function of the buffer size for 13 nm technology. 
3.3 Ring Oscillator (RO) 
A ring oscillator is a type of test structure which is commonly used [80]-[81] for timing 
tests. It requires only one input start up signal (or no signal in case of self oscillating) and 
gives output in the form of frequency. This circuit can be used to assess the performance of 
buffers under the impact of RDF for a certain input signal and load conditions. A five stage 
Chapter 3                                              Communication Structures under Device Variability 
 
47 
 
ring oscillator is shown in Figure 3.8 where the inverters have been constructed of 
minimum sized square devices and interconnect capacitance have been assumed to be 
negligible. 
V
DD
V
IN VOUT
 
Figure 3.8: A five-stage ring oscillator circuit constructed of minimum sized devices. 
The netlists for the ring oscillator were generated with random selection of the 
devices from the model card libraries and HSPICE simulations were performed. The 
results show that the average delay of a five-stage ring oscillator for 25 and 18 nm 
technology generation is 20.4 ps and 16.6 ps, which corresponds to a frequency of 
24.5GHz and 30.1GHz respectively. However, due to RDF, the frequency has a 
spread with standard deviation of 0.8GHz and 1.67GHz (corresponding to a five-stage 
delay variation of σ=0.67ps and σ=0.925ps), respectively for 25 and 18nm technology 
generations. This shows that the uncertainty in the timing signals increases with 
technology scaling. 
3.4 Tapered Buffer Drivers 
In CMOS integrated circuits, large capacitances are common in large fan-out circuits and/ 
or in long range interconnects. Therefore, in order to source and sink a relatively large 
amount of current, a tapered buffer system is used to drive such circuitry, especially where 
the load is predominantly capacitive. For instance, in a clock distribution network, such 
drivers are used to power up the clock source signal. As in any element, device variability 
will introduce delay uncertainty in these drivers resulting in the introduction of skew in 
clock distribution networks and in on-chip communication networks, thus limiting the 
performance and yield. 
Such drivers are composed of a chain of cascaded inverters with increasing buffer sizes as 
shown in Figure 3.9. The drivers are sized according to [82] with total number of inverters 
Chapter 3                                              Communication Structures under Device Variability 
 
48 
 
in the system equal to  such that the last inverter in the chain can drive the load 
connected at its output. For the optimal delay performance of tapered buffers, a logarithmic 
tapering factor (a  U  2.72
 has been proposed [83], though in practice this value is 
seldom used. 
 
Figure 3.9: Tapered buffer driver system. 
While using such buffers in the circuits, their delay performance under device variability 
needs to be known. Therefore, in this work we have investigated their delay performance 
when implemented in the given three technologies. A chain of five inverters (the first stage 
being of minimum size) has been used for this study and adjacent inverters in the driver 
chain are sized with a tapering factor β equal to 3. The delay performance of the drivers 
has been studied during low-to-high and high-to-low input transitions. 
 
Figure 3.10: Cumulative mean delay in tapered buffer drivers of the given three technology generations along 
with the delay uncertainty shown as error bars (corresponding to 1σ). 
Chapter 3                                              Communication Structures under Device Variability 
 
49 
 
The results show that as we proceed along the chain of inverters, the cumulative mean 
delay increases at each next stage, in a linear manner, as shown by the straight lines in 
Figure 3.10. However, the slope of these lines decreases with technology scaling, which 
means that tapered buffers can be constructed with relatively lesser delay penalty for 
smaller technologies. However due to device variability, the inverters used in the tapered 
buffer drivers introduce delay uncertainty at each stage which accumulates statistically and 
appears at the output of the driver. The amount of this delay variability increases in a non 
linear fashion with the number of stages and is shown in the form of error bars in Figure 
3.10. This delay variability will have a detrimental effect in the designing of high speed 
circuits. The tapered buffer drivers from all the given technology generations show the 
same response and maximum delay variability has been observed in 13 nm drivers. 
D
e
la
y
 V
a
ri
a
b
ili
ty
 (
%
)
 
Figure 3.11: Delay variability introduced by different stages of the tapered buffer driver for low-to-high input 
transition. 
Since inverters of different sizes are used in the driver chain, the share of each stage 
towards delay variability cannot be the same. The results show that earlier stages of the 
tapered buffer drivers contribute a major portion of the delay variability (as shown in 
Figure 3.11), because they are constructed with relatively smaller transistors. Again, it has 
also been found that the delay uncertainty introduced by each stage is different during low-
to-high and high-to-low transitions due to the reason mentioned before. However, this 
difference reduces as we move along the chain towards larger sizes. This fact is shown in 
Figure 3.12 where the gap between the two solid lines gradually decreases with stage 
Chapter 3                                              Communication Structures under Device Variability 
 
50 
 
number and finally the lines almost coincide after the fifth stage. The difference developed 
in all the stages travels through the chain and accumulates accordingly, thus making a 
difference in the delay variability at the output of the 	8 stage, depending upon the type of 
the input transition. For instance, the difference in the delay variability for low-to-high and 
high-to-low input transitions at the output of the 3rd and 5th stages is about 9% and 5%, 
respectively, for 13 nm drivers. It is also observed that maximum delay variability 
appears in the cumulative and stage delays for low-to-high input transitions, as shown in 
Figure 3.12. 
 
Figure 3.12: Cumulative and stage delay variability during low-to-high and high-to-low transitions for 13 nm 
tapered buffer driver. 
As previously stated, during the circuit design, a tapering factor a  U is not always the 
best choice and so tapered buffers with different tapering factor are used. Therefore, we 
have extended the study on tapered buffers to see the effect of a on their delay 
characteristics. The results are shown in Figure 3.13, where delay variability has been 
plotted for the tapered buffers having tapering factors of two, three and four for the given 
three technology generations. In all these cases, tapered buffers are so constructed that their 
first stage is a minimum sized inverter (CD = 25, 18 and 13 nm for the technology 
generation of 25, 18 and 13 nm, respectively) with number of stages equal to 6, 4 and 3 
corresponding to the tapering factor of 2, 3, and 4, respectively. All these tapered buffers 
are driving a load equivalent to a 64CD inverter in the corresponding technology. 
Chapter 3                                              Communication Structures under Device Variability 
 
51 
 
An interesting observation has been made on the results that delay variability increases 
with the increase of tapering factor and this effect becomes more prominent for smaller 
technology generations. This is to be expected since the majority of variability is 
introduced by the smaller inverters. As discussed before, the delay variability is different 
for low-to-high and high-to-low transitions for even properly T-sized devices in the 
inverters (for identical performance in both swings). However, it has been observed that 
this difference in performance also increases with the increase of the tapering factor and 
becomes worse for smaller technologies at larger tapering factors. 
 
Figure 3.13: Delay variability of tapered buffer drivers for different tapering factors during high-to-low and 
low-to-high input transitions. 
Larger tapering factors are sometimes attractive for power and area efficient designs. 
However, in the presence of device variability, the designers will have to make a trade-off 
between these parameters and the amount of tolerable delay variability (larger β means 
lesser power and area requirement as compared to smaller β, but greater delay variability). 
If 	 is the stage number in the tapered buffer driver, then its size will be given by 
	
  βDV)                                                                   3.7
 
Due to random dopant fluctuations, the delay uncertainty introduced by each stage of the 
tapered buffer driver is independent of each other (independent RVs). Therefore, the delay 
uncertainty introduced by the 	8 stage can be approximated by 
Chapter 3                                              Communication Structures under Device Variability 
 
52 
 
=	
 u 1CD 
_	
                                                       3.8
 
For optimally sized chain of buffers (according to a) in the tapered buffer driver, the mean 
value of the delay at 	8 stage is  
<=qs	
 u n  <=qs1CD
                                        3.9
 
By using equation (3.8) and (3.9), the delay variability at the 	8 stage of the tapered 
buffer driver can be approximated in first order as 
J=	
 u

3 
&1
   &2
   &3
… &n

<=qs	
 ¢£
£¤                                         
J=	
 u

3∑  
&i
§¨©)
<=qs	
 ¢£
£¤                                                                   3.10
  
The denominator of equation (3.10) increases linearly whereas the numerator increases as a 
square root with the increase of the number of stages in a tapered buffer driver. This means 
that the delay variability decreases with the increase of buffer stages; however at the cost 
of a relatively slower driver. 
3.5 Repeaters 
Owing to the technology scaling, the interconnect is becoming slower relative to the 
devices. Therefore, the use of repeaters is very common in long interconnects for reducing 
the dependence of the interconnect delay on length from quadratic to linear. Although, the 
insertion of repeaters in the interconnect lines reduces the overall delay, it introduces delay 
uncertainty in the lines. In the individual interconnect lines, the effect of the delay 
uncertainty introduced by the repeaters is that the bandwidth will have to be reduced in 
order to obtain a particular yield. In clock distribution networks, the delay variation due to 
these repeaters can produce skew across various branches and will limit its performance. 
This delay variation is particularly unfavourable in wider communication channels because 
in synchronous links, the speed of the link is limited by the slowest line in the complete 
channel. Due to statistical variations in the devices, the cumulative delay at the receiving 
end of the communication channel will become a random variable. Moreover, the delay 
characteristics of the same communication channel on different chips produced in the same 
Chapter 3                                              Communication Structures under Device Variability 
 
53 
 
batch will not be the same but randomly distributed. Therefore, while designing such 
communication links, the delay characteristics of the repeaters should be known to explore 
different design options for better performance. 
In this study, we have quantified the amount of delay variability in a chain of repeaters of 
various sizes. Figure 3.14 shows the results for the repeaters constructed with minimum 
sized inverters (MSI). It is evident that the mean cumulative delay increases linearly with 
the increase of the number of repeater stages in the chain. The dispersion (standard 
deviation) of delay also increases as square root of the number of repeater stages. This is 
because statistical variations in each repeater stage are independent of each other and can 
be additive or subtractive towards the cumulative delay. The delay variability on the other 
hand decreases with the number of repeater stages but at the expense of reduced speed of 
the repeater line. 
 
Figure 3.14: Delay variability in a chain of minimum sized repeaters of 13 nm plotted against the number of 
repeater stages. 
If <=ª« is the mean delay and =ª«  is the standard deviation in the delay for every section of 
repeated interconnect line, then the cumulative mean delay at the 	8 stage will be 
<=«¬q	
 u 	<=ª«                                                                   3.11
 
and the standard deviation for cumulative delay at this stage will be 
=«¬q	
 u √	=ª«                                                               3.12
 
Chapter 3                                              Communication Structures under Device Variability 
 
54 
 
Therefore, the normalized delay variability at the 	8 stage will be 
J=«¬q  3=«¬q	
<=«¬q	
 u
3√	=ª«	<=ª« 
3=ª«
√	<=ª«                                        3.13
 
Since the magnitude of the delay uncertainty introduced by buffers (inverters) depends on 
their size, a repeater line having large sized repeaters will have less delay variability as 
compared to one constructed with small repeaters driving a particular interconnect load. 
The simulation results shown in Figure 3.15 endorse this fact. Here cumulative delay 
uncertainty has been plotted as a function of repeater size for a chain of 20 repeaters. The 
results demonstrate that a repeater interconnect with large repeaters offers less delay 
uncertainty as compared to the similar chain constructed with small repeaters. However, a 
trade-off will have to be made for getting this advantage, as large sized repeaters consume 
more power and chip area. 
 
Figure 3.15: Cumulative delay variability plotted as a function of repeater size in a chain of 20 repeaters. 
3.6 Data Storage Elements (Flip-flops) 
A common technique to enhance the throughput in synchronous digital circuits is the use 
of Flip-Flops (FFs) to implement pipelined designs. Similarly, flip-flops are also used for 
the storage of different digital signals on the chip, for instance as the last stage of a 
communication channel. Thus clocked storage elements are essential for a digital circuit. 
As a result of this tendency, the number of flip-flops on a chip is growing and therefore 
FFs represent a significant area of the chip. 
Chapter 3                                              Communication Structures under Device Variability 
 
55 
 
The performance of the circuits incorporating flip-flops as storage elements depends, to a 
great extent, on the timing characteristics of the flip-flops. However, as before, these 
become random variates due to variability. Consequently, the performance of the whole 
circuit is affected by this variability. Hence it becomes imperative to estimate the timing 
characteristics of the flip-flops under the impact of statistical variations in order to design 
high-performance circuits with high yield. 
 
Figure 3.16: Schematic view of a standard CMOS D flip-flop circuit [84]-[85]. 
 
Figure 3.17: Basic timing parameters of a flip-flop. 
In this work, the effect of device variability due to RDF on the timing characteristics of a 
standard CMOS D-flip-flop (DFF) [84]-[85], as shown in Figure 3.16, has been studied. 
Chapter 3                                              Communication Structures under Device Variability 
 
56 
 
Flip-flops are typically characterized by different timing parameters which are pictorially 
represented in Figure 3.17. Since accurate analytical modelling of flip-flops with statistical 
variations in the devices is difficult, transient analysis of the timing parameters of the FFs 
has been performed through HSPICE simulations for accurate results. Although flip-flops 
of various sizes are available in the standard cell libraries used for modern designs, we 
chose to construct them with the minimum size (i-e minimum transistor dimensions). 
3.6.1 Timing Measurement Procedure 
The procedure adopted for the measurement of different timing parameters of the flip-flops 
is given below: 
3.6.1.1 Setup time 
The minimum data-to-clock rising edge time for which the flip-flop correctly latches the 
data is the setup time. In order to find the setup time for a large sample of flip-flops under 
RDF, a rough estimation of it is made first. For this purpose the flip-flop circuit is 
constructed using uniform devices (having uniform dopant fluctuations and is available in 
the device models). The clock pulse width is made sufficiently large and the data is also 
kept stable for sufficiently long time after the arrival of the clock signal. The data is made 
available at the data input D of the flip-flop quite earlier than the arrival of the clock edge. 
Thus the flip-flop safely latches the data at output Q. In the next step, the data at input port 
D is made available with some delay than the previous case and latching of the data at the 
output of the flip-flop is monitored. The process is repeated until the flip- flop is just able 
to hold the data. At this point, the time difference between the arrival of the data and the 
clock signal is the setup time for the uniform devices. This value gives a reference point 
and setup times of large number of devices under RDF are expected to lie around this 
value. 
Now 5500 netlists of the flip-flop circuits were generated with random selection of devices 
from the model card libraries. For each of these netlists, several new netlists were 
generated by gradually delaying the arrival time of the data (with an increment of 0.2ps), 
starting from a large value with reference to the setup time we have already measured for 
the uniform devices. HSPICE simulations were carried out and setup time was measured 
for each of the flip-flops. 
3.6.1.2 Hold Time 
The hold times were measured in a similar way as that of the setup time. During the 
measurements, the setup time and the clock pulse width were made sufficiently large to 
Chapter 3                                              Communication Structures under Device Variability 
 
57 
 
avoid setup time and other timing violations. The time for which the data remains stable 
after the clock pulse was gradually reduced (starting from a long time) and the hold time 
was measured as the minimum time between the rising edge of the clock and the falling 
edge of the data for which the data at output Q is correctly registered. Again the hold times 
were measured with an accuracy of 0.2ps for the flip-flop circuits used for the setup time 
measurement. 
3.6.1.3 CLK-to-Q time 
The CLK-to-Q time is measured as the time delay between the rising edge of the clock and 
the output Q. Since CLK-to-Q time depends on the arrival time of the data prior to the 
clock edge (D-to-CLK time) as shown in Figure 3.18, therefore in this study CLK-to-Q 
time has been measured for large value of D-to-CLK time. Similarly, the hold time and 
clock pulse width were also made quite large to avoid any of the timing violations due to 
these parameters. These measurements were made for several flip-flops (5500) constructed 
with random selection of devices and CLK-to-Q time is measured with an accuracy of 0.2ps. 
17 19 21 23 25
14
16
18
20
22
24
26
D-to-CLK Time (ps)
 
Figure 3.18: Dependence of CLK-to-Q delay on the D-to-CLK time. 
3.6.1.4 Minimum Clock Pulse Width 
Again for these measurements, the setup time and hold times were made sufficiently large. 
The clock pulse width was gradually reduced to measure minimum clock pulse width for 
which the flip-flop can hold data, similar to the technique used for setup time 
measurement. 
Chapter 3                                              Communication Structures under Device Variability 
 
58 
 
3.6.2 Results and Discussion 
From the Monte Carlo simulations, different timing parameters of the flip-flops have been 
characterized and are given in Table 3.1 in terms of the first four moments of their 
distribution. The results show that the timing parameters of the flip-flops are very sensitive 
to statistical variation in the devices. It has been observed that while the mean decreases, 
the dispersion of these timing parameters is increased with technology scaling. The 
increase in the standard deviation quantifies this dispersion and warns for careful 
consideration of timing variability analysis during the design of synchronous systems. For 
instance, for 13nm technology generation, the variability (σ/µ) in the setup time increases 
up to 13%. Similarly, the variation in the hold time, the clock-to-Q time and minimum 
pulse width requirement reaches up to 15%, 19% and 22%, respectively. Due to the 
variability in the timing parameters of the flip-flops, extra safety margins will have to be 
assigned, thus slowing the pipeline. Although the hold time is negative for the Master-
Slave flip-flops used, its spread also increases, which suggests transparent latches will be 
affected by this increase. 
Table 3.1: Statistical Analysis of the Timing Parameters of a Standard Flip-flop 
Statistical attribute Technology 
Setup Time 
(ps) 
Hold Time 
(ps) 
CLK-Q 
Time 
(ps) 
D-Q Time 
(ps) 
Min. PW 
(ps) 
Mean, µ (ps) 25 nm 17.5 -12.7 13.9 43.7 12.8 
Standard deviation, σ (ps)  0.78 0.72 0.88 4.84 1.10 
Skewness  0.33 -0.32 0.25 1.69 0.09 
Kurtosis  3.46 3.08 3.29 7.32 2.99 
Mean, µ (ps) 18 nm 14.5 -10.2 11.1 36.2 10.4 
Standard deviation, σ (ps)  1.06 0.91 1.09 4.67 1.37 
Skewness  0.53 -0.44 0.36 1.74 -0.385 
Kurtosis  3.81 3.67 3.28 7.87 6.97 
Mean, µ (ps) 13 nm 9.5 -6.38 6.9 23.9 7.54 
Standard deviation, σ (ps)  1.25 0.98 1.29 3.94 1.49 
Skewness  0.94 -0.88 0.88 1.76 -0.17 
Kurtosis  4.46 4.48 4.77 9.40 5.82 
From Figure 3.19, we can see that setup time, hold time and CLK-to-Q time spans a large 
timing space (there appears to be no visible correlation between the parameters) and design 
Chapter 3                                              Communication Structures under Device Variability 
 
59 
 
margins will have to be chosen while keeping in view this space in order to achieve a 
particular yield. 
 
Figure 3.19: 3D-space occupied by the timing parameters of the DFF. 
Circuits become increasingly faster with technology scaling, demanding a drastic reduction 
in the tolerances allowed to their clocks. However, the magnitude of the timing variability 
we have observed in the flip-flop circuits will certainly tend to reduce the performance of 
the circuits, unless some corrective measures are not taken. 
3.7 Interconnect  
The interconnect also exhibits variation in its characteristics due to the structural variation 
in the lateral and vertical dimensions. Besides material variations, the structural variation 
in the interconnect can appear in conductor thickness
, the width
, and interlayer 
dielectric thickness
. It is important to mention that interconnect spacing is not an 
independent parameter and is automatically effected with the variation in interconnect 
width. In addition, there are other sources of interconnect variability such as surface and 
edge roughness or sidewall thickness but all of these geometrical variations result in the 
deviation of the electrical properties of the interconnect like, the resistance 
, the 
capacitance 
 and the inductance 
. Consequently, this will result in the delay 
variability of interconnects.  
3.8 Performance of Communication Links 
The variability in the delay characteristics of the individual communication circuits 
(discussed above) will cause uncertainty in the signal delay through the complete channel. 
Chapter 3                                              Communication Structures under Device Variability 
 
60 
 
If the delay of a signal is larger than the nominal value plus the design margin, this will 
introduce a link failure. In order to get the best performance of the design, we need to 
quantify the effect and allow for the expected variation in the design margins. It is 
important that these margins are neither pessimistic (which waste resources) nor optimistic 
(which affect yield). Whatever these margins are, it is certain that under delay variability, 
the throughput of the channel will have to be certainly compromised (as compared to the 
deterministic case) in order to keep the probability of link failure below a certain 
acceptable limit. Conversely, additional resources (area and power) will be required to 
attain the same bandwidth. It is clear then that device variability will contribute 
significantly towards the performance/area/power compromise of clock distribution 
networks and the data links, which are basically composed of these structures.  
3.8.1 Estimation of Link Performance 
A simple communication link is shown in Figure 3.20 which consists of tapered buffer 
drivers, interconnect wires, repeaters and data storage elements. The output of the 
combinational logic is powered up using tapered buffer driver before transmitting it 
through the link. The repeaters are used to improve the delay characteristics, especially in 
predominantly resistive interconnects. Similarly, flip flops or latches are used to hold the 
data at the receiving end. The link operating frequency depends upon the cumulative delay 
introduced by each of these elements plus the setup time of the flip-flop. The nominal 
delay of such a data link from input to output can be calculated using the following 
equation 
=­rs®  <=¯°¬±±  <=Z¯rsZ  <eElV²³³                                        3.14
  
In the above expression, =­rs®  is the total delay of the link, <=Z_rsZ is the repeater-
inserted interconnect delay and <eElV²³³  is the CLK-Q time of the flip-flop. 
 
 
Figure 3.20: A simple data communication link. The signal coming out from the combinational logic is 
powered up through tapered buffer driver and then it passes through the repeater inserted interconnect to 
reach the input of the flip-flop. 
Chapter 3                                              Communication Structures under Device Variability 
 
61 
 
Let µ be the number of repeaters (each of size ¶ times the size of the minimum sized 
repeater). In a particular technology, if the output impedance of a minimum sized inverter 
is =FQ and output capacitance is =FQ, then the output impedance of a repeater of size ¶ 
becomes =FQ ¶⁄  and the output capacitance ¶ · =FQ. In Figure 3.20, the symbol  
represents a capacitively coupled interconnect. If we assume that  is the interconnect 
resistance, - is the coupling capacitance with the neighbouring interconnects and x is the 
self capacitance of the interconnect, the propagation delay of one section of the repeated 
interconnect [86], which is taken to be the time difference of the input and output 
waveforms at 50% of the transition points, is given by 
<%.2,x>-  0.7=FQx  =FQ  2.2  2-
  0.4x  0.58-  0.7=FQ
        3.15
 
The total delay of the interconnect inserted with repeaters is given by 
<%.2  µ T0.7 =FQ¶ #
xµ  ¶=FQ  2.2
2-µ $  

µ #0.4
xµ  0.58
-µ  0.7¶=FQ$5       3.16
 
Under the assumption of statistical independence, the time delay in the link can be 
calculated from its component’s distributions as follows 
=­rs® u H=¯°¬±±  H=Z¯rsZ  HeElV²³³   
 &=¯°¬±±  &=Z¯rsZ  &eElV²³³                                      3.17
 
This equation consists of two parts, the mean value and standard deviation of the delay 
distribution. The standard deviation has been added in the mean delay in order to estimate 
the maximum delay (3 or 6 can also be used to estimate the worst cases of delay). Two 
parts of the equation (3.17) can be denoted as  
H=­rs®  H=¯°¬±±  H=Z¯rsZ  HeElV²³³                                               3.18
 
=­rs®  &=¯°¬±±  &=Z¯rsZ  &eElV²³³                                     3.19
 
Similarly, H=Z¯rsZ and &=Z¯rsZ  are given by H=Z¯rsZ  Hx>-)  Hx>-&  Hx>-'  ¸Hx>-o                                                  3.20
 
and  &=Z¯rsZ  &x>-)  &x>-&  &x>-'  ¸&o                                            3.21
 
where Hx>-  and &x>-  represents the mean and variance of the delay of each section of the 
repeater inserted interconnect. 
Now if we have complete description of the delay characteristics of the individual 
communication structures under the impact of device and/ or interconnect variability due to 
any of their parameters, we can approximate the overall performance of the complete link 
Chapter 3                                              Communication Structures under Device Variability 
 
62 
 
using equations (3.17)-(3.21). The results can also be used to estimate the probability of the 
link failure due to variability, as explained below.  
3.8.2 Link Failure Probability 
Let us assume that the link is operating at a clock frequency k having clock period eEl . 
For the flip-flop (having setup time x>8¹M) to correctly latch the data, the delay of the 
interconnect must satisfy the following constraint 
eEl L x>8¹M ] <=¯°¬±±  <=Z¯rsZ 
Therefore, the probability that correct data is transmitted between the input and output is 
given by 
º  P eEl L x>8¹M ] <=¯°¬±±  <=Z¯rsZ                          3.22
 
A design margin is also used to cater for the delay variation due to different circuit 
parameters and let it be ∆. Therefore, expression (3.22) can be written as 
º  P eEl L ∆ L x>8¹M ] <=¯°¬±±  <=Z¯rsZ                      3.23
 
We define the time delay between the input of the tapered buffer driver and the input D of 
the receiving flip-flop to be T ¼½¾¿À . Then the probability that the delay of the link will be 
greater than eEl L ∆ L x>8¹M  is given by 
bhT ¼½¾¿À Á eEl L ∆ L x>8¹Mj  1 L bhT ¼½¾¿À m eEl L ∆ L x>8¹Mj               
 1 L Â
ÃeEl L ∆ L µx>8¹M L µ ¼½¾¿Àσ ¼½¾¿À 
&  σx>8¹M
& Ä
Å        3.24
 
  Æ Â
ÃeEl L ∆ L µx>8¹M L µ ¼½¾¿Àσ ¼½¾¿À 
&  σx>8¹M
& Ä
Å               3.25
 
In the above expressions, 
µ ¼½¾¿À  H=¯°¬±±  H=Z¯rsZ 
and σ ¼½¾¿À  &=¯°¬±±  &=Z¯rsZ 
If we assume that the delay variability in the clock signal is CLK with some mean HeEl , 
Eq. (3.25) will become 
Chapter 3                                              Communication Structures under Device Variability 
 
63 
 
bhT ¼½¾¿À Á eEl L ∆ L x>8¹Mj   ÆÂ
Ã HeEl L ∆ L µx>8¹M L µ ¼½¾¿Àσ ¼½¾¿À 
&  σx>8¹M
&  σCLK
&Ä
Å              3.26
 
Again in expressions (3.24)-(3.26),  is the cumulative distribution function of the link 
delay and Æ is the classical error function, respectively. The Borjesson’s approximation 
[87], as given below, can be used to evaluate Æ. 
ÆÊ
 u  11 L Ë
Ê  Ë√Ê&  Ì+ 1√2Í UVI[ &Y              for Ê ] 0 
with Ë  1 ÍY  and Ì  2Í. 
3.8.3 Case Study 
Consider a typical interconnect of length 500 H, width 0.675 H, and thickness 0.324 H in 18 nm technology. The delay characteristics of this interconnect inserted with 10 
repeaters of size 5CD are given in Table 3.2 along with the performance characteristics 
of the tapered buffer driver and flip-flop used in the complete link. 
Table 3.2: Statistical Delay Characteristics of Different Elements of the Link. These values have been taken 
from the characterization data of different elements. 
 Tapered Buffer Repeater Inserted 
Interconnect 
Flip-Flop Setup 
Time 
Flip-Flop CLK-
Q Time 
Mean, H 20.95 ps 152.9 ps 14.52 ps 11.12 ps 
Standard Deviation,  1.11 ps 2.12 ps 1.05 ps 1.09 ps 
For a design margin ∆ 5 ps and an uncertainty in the clock period σCLK  2 ps, the 
probability of the link failure has been plotted as a function of the operating frequency and 
is shown in Figure 3.21. Both curves, one obtained using equation 3.26 and the other 
through Monte Carlo simulation of the complete channel, are shown for comparison. It has 
been observed that beyond a certain operating frequency, the link failure probability starts 
increasing from zero (observe the slope due to spread in the PDF). These particular curves 
correspond to delay variability due to only RDF in the devices. However in the real case, 
there are other sources of variability in the devices as well as in the interconnect, and 
therefore overall delay variability in the link will be even larger. Thus for a particular link 
failure probability, the operating frequency will have to be reduced, otherwise the yield 
will be reduced for high speed links under tight design margins. 
Chapter 3                                              Communication Structures under Device Variability 
 
64 
 
 
Figure 3.21: Link failure probability as a function of link operating frequency, as calculated using the 
analytical model and Monte Carlo simulation. 
It may also be noted in Figure 3.21 that the results of the analytical model slightly deviate 
from the simulation results, especially in the beginning of the curves. This is due to the 
reason that the probability distribution function of the delay of different communication 
structures, for smaller devices, deviate from the normal distribution (as explained before). 
Therefore, the cumulative delay distribution of the complete channel may also be non-
normal (skewed). Hence, in order to obtain accurate results, the delay distributions of all 
the communication structures should be accurately characterised and corresponding 
statistical operators may be used to obtain the cumulative delay distribution.  
3.9 Summary 
In this chapter, we have critically examined the effect of device variability due to RDF on 
the performance of the basic elements of on-chip communication, such as tapered buffer 
drivers with different tapering factor, repeaters of different sizes, and data storage registers 
(FFs). FO4 delay measurements have also been taken, as representative of the logic 
circuitry and results can be used as a performance benchmark. The study revealed that 
RDF has significant impact on the performance of communication structures and their 
performance deteriorates very significantly with technology scaling from 25 to 13 nm. As a 
design methodology, scaling up of circuits in the critical paths can be employed to 
minimize the effects of device variability, in particular, since we have shown that this 
trade-off is not linear and a small increase in the repeater size can give substantial benefit 
4.2 4.4 4.6 4.8 5 5.2 5.4 5.6
10-25
10-20
10-15
10-10
10-5
100
Link Operating Frequency (GHz)
Li
n
k 
Fa
ilu
re
 
Pr
ob
ab
ilit
y
 
 
Model
MC Simulation
Chapter 3                                              Communication Structures under Device Variability 
 
65 
 
towards the performance. For instance, we have corroborated that large sized repeaters can 
be used in the interconnect to reduce delay variability, however, the power and area 
penalties due to this passive technique of circuit scaling should be compared with active 
countermeasure techniques which can be used to mitigate the delay variability. 
Although NoC is more robust against on-chip communication faults than simpler designs, 
we note that such occurrences have increased hyper-linearly (and will continue to do so) 
due to device variability. In order to evaluate the performance of a typical NoC link, we 
have derived analytical models to predict link failure probability (LFP) using the 
characterization data of the individual on-chip communication elements. The results show 
that link failure probability increases significantly with the increase of device variability 
and is a limiting factor in the maximum operating frequency of a synchronous link.  
 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
66 
 
 
 
 
 
 
 
Chapter 4 
 
SSTA of Pipelined Communication 
Circuits 
 
 
The performance of circuits under variability can be evaluated accurately through 
simulation (as it has been done so far in this thesis). However, for large designs this 
method is not feasible; being computationally expensive. The solution to this problem is 
the use of Statistical Static Timing Analysis (SSTA) which is a powerful analysis tool and 
provides a convenient means of estimating the circuit performance under the impact of 
variability. In this chapter we describe the use of SSTA to examine the performance of 
large on-chip communication networks, formed by the components that have been 
analysed and characterized so far (FFs, Buffers and Tapered Buffers). 
4.1 Introduction to STA 
During the designing of the digital circuits, it is always necessary to ensure that timing 
constraints are met. This requires to find the maximum delay between the inputs and 
outputs along different paths. In the traditional design, this analysis is used to identify (and 
subsequently optimize) a critical path in the circuit. The delay along this path determines 
the maximum operating frequency. Figure 4.1 shows a simple circuit consisting of seven 
combinational blocks between two flip-flops. The critical path for such a circuit can be 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
67 
 
determined using Static Timing Analysis (STA), in which individual circuit elements are 
pre-characterized through simulation and then delays (corresponding to the worst-case) are 
added up along different paths from input to output. The latest arrival time of the signals 
along different paths for which the data is correctly received at the output is calculated and 
is then compared with the required timing. The difference between these two values is 
signal slack. For the example of Figure 4.1, the latest arrival time of the signal is 4.4. If the 
slack is negative, the circuit will not meet the performance requirements. The minimum 
slack along any of the paths in a circuit is the critical path.  
 
Figure 4.1: Demonstration of static timing analysis of a simple circuit. 
 
Figure 4.2: An example of the timing graph for delay traversal from source to sink. 
A timing graph is very useful for the timing analysis of the circuits and describes the 
timings of the combinational logic between the source and the sink along different paths. It 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
68 
 
is a Directed Acyclic Graph (DAG), as shown in Figure 4.2. In the timing graph, the signal 
lines are denoted as nodes and input-output transformation through every gate in the circuit 
is shown as an edge. The delay associated with every input-output is represented as the 
weight over the corresponding edge. For STA, the weight over every edge is usually 
corresponds to the worst-case delay. 
4.2 Introduction to SSTA 
In traditional circuit design, corner based approaches are used alongside STA in which the 
best-case or worst-case corner values are identified corresponding to different sources of 
variability. Thus for die-to-die variations, it is then assumed that 3σ deviation of circuit 
parameters for different manufactured circuits will not be beyond these corner values [88]. 
However, due to technology scaling, the magnitude of the variability due to different 
sources is increasing manifold and so guard-banding based on 3σ corners will significantly 
affect the performance due to excessive margins for delay variations. Moreover, in actual 
chips with many sources of variability, it is extremely unlikely of all the factors 
contributing towards delay variability, being at their corner values and so this approach 
produces pessimistic results and too much slack in the design [89]. 
Under the impact of statistical variability, the delay of each gate becomes a random 
variable. Therefore, statistical methods are required to accurately analyze the circuit delay. 
SSTA modifies STA such that the random variations of the delay are considered as random 
variables. During the SSTA of large digital circuits, the probability density function (PDF) 
and cumulative density function (CDF) of the timing parameters of different circuit 
elements are analytically processed to estimate the timing characteristics of the complete 
circuit. Notice then that the design paradigm is shifted from deterministic to stochastic. 
There is no single critical path in the circuit; any path can potentially become the critical 
path. Because of its statistical nature, the accuracy of the analysis depends on the 
characterization data of individual circuit elements, accurate representation of the 
characterization data in the form of PDFs and finally the correctness of different analytical 
operations, like MIN, MAX, or SUM which are applied during the analysis, and are 
usually computed with fast approximations.  
The statistical SUM and MAX operations are used to calculate the PDF of the delay at 
each node of the timing graph. These operations take delay variations of the gates and 
interconnect as input and give that of the outputs. Thus by traversing the timing graph 
using the statistical operations, the overall PDF of different circuit parameters can be 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
69 
 
calculated. The basic statistical operations (SUM and MAX) are pictorially shown in 
Figure 4.3. For the signal paths in series, the delay at the output is calculated using the 
SUM operation. If the two circuits in series have PDFs as ‘g1’ and ‘g2’ then the PDF of the 
circuit at the output can be calculated using the convolution integration. Similarly, if a gate 
has multiple inputs, the delay distribution at the output is calculated using the MAX 
operation. 
 
(a) 
P
ro
b
a
b
ili
ty
P
ro
b
a
b
il
it
y P
ro
b
a
b
ili
ty
 
(b) 
Figure 4.3: Basic statistical operations used in STA and SSTA. The SUM operation (a), and the MAX 
operation (b) [89]. 
4.3 Representation of Characterization Data 
For the SSTA of circuits, accurate characterization of the timing parameters of the 
combinational logic, interconnect and sequential elements (flip-flops and latches) is vitally 
important [90]. This need becomes more crucial in the design of high speed circuits due to 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
70 
 
the strict design margins required [91]. The task becomes more challenging when statistical 
device variability effects are also considered. While the maximum achievable performance 
and yield of a circuit depends on the magnitude of the variability in the timing parameters, 
a better estimate of these parameters can only be made by the transient analysis of the 
circuits through the SPICE simulation using detailed device models. In current state-of-the-
art chips, the device count has already exceeded one billion, mandating the estimation of 
the distributions more precisely, especially in the tail regions, as events deep within the 
tails will most likely be realized.  
Parametric analysis, in which a known parametric distribution (e.g. normal) is fitted on the 
experimental data, can be used to undertake this estimation. However, the limitation of this 
approach is that its accuracy depends on the choice of a particular a-priori density function 
[92]. Therefore, the distribution functions may be determined through non-parametric 
estimations. With correct approximation of the density functions, a better estimate of the 
circuit yield can be made which is neither optimistic nor pessimistic and thus helps in 
enhancing circuit performance with minimum yield loss. 
In order to demonstrate the effectiveness of using non-parametric estimations, we use the 
simulation data obtained during the characterization of different timing parameters of the 
CMOS flip-flops (Chapter 3). The histograms of various timing parameters of FFs for 13 
nm technology are shown in Figure 4.4. The histograms indicate that the timing 
distributions are asymmetric (positively skewed except hold time which is negatively 
skewed). The degree of asymmetry (around the mean) and the shape of distributions have 
been measured in terms of skewness and kurtosis and are given in Table 4.1 for all the 
timing parameters. As mentioned earlier, skewness is a measure of the degree of 
asymmetry; whereas kurtosis is a measure of whether the data is peaked or flat relative to a 
normal distribution (high kurtosis means peaked distribution). The non-zero value of 
skewness and kurtosis confirms that the distributions are not normal, supporting our 
conclusion drawn from the visual inspection of the distributions. Similar asymmetry has 
recently been reported for the distribution of J8 in 65nm technology generation [93] and in 
35nm channel length MOSFETs [94]. The increasing value of these parameters with 
technology scaling shows that the asymmetry increases as the technology scales. 
The characterization of these timing parameters can be used alongside SSTA to determine 
analytically the impact that variability will impair in a more complex circuit.  This is done 
by determining its probability distribution function (PDF). For instance, the timing analysis 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
71 
 
of flip-flop based sequential circuits involve the timing characteristics of the sequential 
elements and circuit elements pertaining to a clock network, in addition to the 
combinational logic.  
 
0 5 10 15
0
100
200
300
400
500
600
700
800
CLK-Q Time (ps)
F
re
q
u
e
n
c
y
F
re
q
u
e
n
c
y
 
Figure 4.4: Histograms of observed data taken through Monte Carlo simulations for the timing parameters of 
the FFs of 13 nm. 
Table 4.1: Statistical Analysis of the Timing Parameters of the Standard Flip-flop shown in Figure 3.16 
Statistical attribute Technology 
Setup Time 
(ps) 
Hold Time 
(ps) 
CLK-Q 
Time 
(ps) 
D-Q Time 
(ps) 
Min. PW 
(ps) 
Skewness 25 nm 0.33 -0.32 0.25 1.69 0.09 
Kurtosis  3.46 3.08 3.29 7.32 2.99 
Skewness 18 nm 0.53 -0.44 0.36 1.74 -0.385 
Kurtosis  3.81 3.67 3.28 7.87 6.97 
Skewness 13 nm 0.94 -0.88 0.88 1.76 -0.17 
Kurtosis  4.46 4.48 4.77 9.40 5.82 
Most of the previous work on statistical static timing analysis (SSTA) is based on the 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
72 
 
assumption that the underlying distributions are Gaussian (i.e. the distributions of various 
timing, physical and electrical parameters of the devices). Any deviation from normality 
(as for instance the skewness and kurtosis shown in Figure 4.4) in the timing parameters of 
the circuit elements will introduce inaccuracy in the analysis results. However, the use of 
non-Gaussian distributions is likely to pose several challenges for efficient SSTA, as 
analytical results for the combination of non-Gaussian PDFs would need to be determined. 
As a first step into this uncharted territory, it is required to determine an analytical 
distribution which provides a good match to the observed data. 
4.4 Estimation of the Timing Distributions 
Statistical methods can be used to estimate the distributions from the experimental data. As 
mentioned before, parametric estimation of the distributions does not give a satisfactory fit 
to the experimental data (for instance Normal or Gaussian), since higher moments 
(skewness and kurtosis) are not zero in our case. Therefore, in this work we chose to use 
non-parametric statistical methods and found that Pearson and Johnson systems fit the data 
much more precisely, as they have the ability to adapt themselves to the data and do not 
require a priori or a posteriori knowledge of the data-producing process. They have the 
property of being able to capture skew and kurtosis and so provide a good match to the 
data. 
The PDF based on the simulation data has been compared with the normal distribution, 
Pearson and Johnson systems. It has been found that the normal distribution does not 
provide an accurate fit to the simulation data due to its asymmetric nature, whereas Pearson 
type IV from the Pearson system and the SU system from the Johnson family of systems 
closely matches the data. A good description of Pearson and Johnson systems is available 
in [144]. 
4.4.1 Pearson Distributions 
The Pearson distribution is a family of continuous probability distributions to model 
skewed observations. The Pearson system defines a family of distributions parameterized 
on the mean, standard deviation, skewness, and kurtosis. There are seven basic types of 
distributions all available in a single parametric framework [95].  
The Pearson type IV distribution is characterized by four parameters,   , Ò, Ë, Ó
 and 
these parameters uniquely determine the first four moments of the distribution. The 
probability density function of the Pearson type IV distribution can be expressed as [96], [97] 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
73 
 
Ê
  µ, Ò, Ë
  1  #Ê L ÓË $&+V exp TLÒtanV) #Ê L ÓË $5 ,         # Á 12$        4.1
 
Here the parameters Ë and Ó are for the scale and location, whereas the shape parameters  and Ò jointly determine the degree of skewness and kurtosis of the distribution. µ, Ò, Ë
 is a normalization constant given as 
µ, Ò, Ë
  ×
√ÍË× L 12
 Ø
×  Ò2 
×
 Ø
&                                   4.2
 
where × is the Gamma function. 
The maximum likelihood fitting requires minimizing the negative log likelihood [96] given 
below and can be computed numerically. 
L ln   Ùln 1  #ÊC L ÓË $&+  
o
C©)  ÒÙtanV) #ÊC L ÓË $ L  	 µ                           4.3

o
C©)  
We have used this equation to fit a Pearson type IV distribution for the PDF of the setup 
time from the simulation data of 13 nm flip-flops. This is shown in Figure 4.5 along with 
the normal distribution fit. It can be seen that the Pearson type IV distribution closely 
matches the PDF of simulation data, as determined by the goodness of fit statistics given in 
Table 4.2. This clearly shows that the assumption that the timing distributions are normal is 
not correct and can produce incorrect conclusions. As an example (refer Figure 4.5), 
consider the probability of occurrence of a timing event at <x>8¹M  12.76 i 
(corresponding to 3σ of normal distribution). With the assumption of a normal distribution, 
this probability is given by   0.0043, whereas it is   0.0243 with the Pearson type IV 
estimation (5.6 times higher than the normal case). 
The CDF of Pearson type IV is given by [96] 
Px
  ka2mL 1 T1  x L λa 
&5VÝ exp TLνtanV)x L λa 
5
 #i L x L λa $ Fà1,m  iν2 ; 2m; 21 L i x L λa â                      4.4
   
where F is a hypergeometric function and can be calculated using the method given in [98]. 
The above function converges for x m λ L a√3. For x Á λ  a√3, the symmetry identity 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
74 
 
Px|m, ν, a, λ
 ä 1 L PLx|m,Lν, a,Lλ
 can be used and in the case |x L λ| m a√3, the 
linear transformation as given in [99] can be employed. 
 
Figure 4.5: The probability density function of setup time for the 13 nm flip-flops plotted with different 
systems. 
Table 4.2: Goodness of Fit Statistics (for Figure 4.5) in terms of R-Square, Sum of Squares due to Error 
(SSE), Adjusted R-Square, Root Mean Squared Error (RMSE) 
Distribution R-square SSE Adjusted R-square RMSE 
Normal 0.9845 0.004279 0.9826 0.01635 
Pearson type IV 0.9989 0.0002925 0.9986 0.004571 
4.4.2 Johnson Distribution 
Statistician Norman Johnson formulated a system of distributions such that for every valid 
combination of mean, standard deviation, skewness and kurtosis, there is also a unique 
distribution. The Johnson system is based on exponential, logistic, and hyperbolic sine 
transformations, plus the identity transformation [95]. The systems of distributions 
corresponding to these transformations are known as SL, SU, SB and SN, respectively. The 
general form of the three normalizing transformations (exponential, logistic and hyperbolic 
sine) is given by [100] 
å    æk Tç L èÓ 5                                                            4.5
 
Where å is a standard normal random variable, k is the transformation,  and æ are shape 
parameters, Ó is a scale parameter and è is a location parameter. 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
75 
 
The lognormal system of distributions, SL, is given by 
å    ælog Tç L èÓ 5 ,    ç Á è                                       4.6
 
The unbounded system of distributions SU is defined by 
å    æ log êTç L èÓ 5  ëTç L èÓ 5&  1ì
) &Y í ,   L ∞ m ç m  ∞           4.7
 
and the bounded system SB is given by 
å    ælog T ç L èè  Ó L ç5 ,      è m ç m è  Ó                      4.8
 
In order to generate a sample from the Johnson distribution that matches the given data, 
first the sample quantiles of the data for the cumulative probabilities of 0.067, 0.309, 
0.691, and 0.933 are computed. These probabilities correspond to four evenly spaced 
standard normal quantiles of -1.5, -0.5, 0.5 and 1.5 [95]. 
The cumulative distribution function of the experimental data for the setup time of the flip-
flop and Johnson system which matches four evenly spaced standard normal quantiles of -
1.5, -0.5, 0.5, and 1.5 corresponding to the cumulative probabilities of 0.067, 0.309, 0.691, 
and 0.933 are plotted in Figure 4.6. The normal CDF has also been plotted for comparison. 
Again, the type of the distribution within the Johnson family of systems which matches 
these quantiles is the SU system. 
C
u
m
m
u
la
ti
v
e
d
e
n
s
it
y
 
Figure 4.6: Cumulative delay distribution of setup time of 18 nm flip-flops. The SU system from Johnson 
family of distributions better fits the simulation data as compared to normal distribution. 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
76 
 
4.5 Estimation of Timing Distributions and Yield 
Accurate estimation of the yield depends significantly on the evaluation of the CDF at the 
tail of the distribution. With a better estimation of the probability distributions with 
Pearson or Johnson systems, the designer can predict the yield of a design more accurately. 
The use of a normal approximation will produce optimistic results, whereas fabricated 
chips will suffer from significant yield loss. For instance, the cumulative distribution 
function (CDF) for the setup time of 13 nm flip-flops is plotted in Figure 4.7. The 
performance yield for the target setup time of 11.5 ps is 96.69% with normal and 91.57% 
with a Pearson IV approximation. Since typical designs include a large number of flip-
flops, and no failures are tolerable, the failure probability for the whole system behaves as 
a power function of the probability of failure of a single device. Therefore even small 
errors in the estimation of this probability are readily scaled up and will provide very 
different failure rates for the complete system. 
 
Figure 4.7: Cumulative distribution functions for the setup time of 13 nm flip-flops with Normal and Pearson 
type IV approximations. 
4.6 Timing Distributions of Pipelined Circuits 
In high performance designs, data and control paths are aggressively pipelined to enhance 
the throughput. The pipelining is realized by inserting sequential elements (flip-flops or 
latches) in the circuit at different locations, thus dividing it into several segments. 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
77 
 
However, after a certain pipeline depth, the timing overheads of the pipeline become a 
significant bottleneck for the throughput of the circuits [101], so the number of segments 
for maximum throughput is bounded. In any case, a large number of sequential elements 
are used in heavily pipelined designs.  
The effectiveness of high performance system design strongly depends on the timing yield 
of the fabricated chips. The timing yield is defined as the ratio of the chips who meet 
certain target delay (or the target frequency) to the total number of fabricated chips. 
Conventionally, high performance circuits are designed for particular target frequencies. In 
synchronous data transmission through the pipeline, the speed of the circuit is limited by 
the pipe segment which is slowest (having largest delay) amongst the other pipe segments 
[102] in the complete path, which becomes the critical path. However, due to variability 
any pipe segment can potentially be the critical one. Therefore, statistical approaches are 
required to determine the maximum pipeline delay so that an estimation of the maximum 
achievable speed of the circuit can be made under permissible yield loss. From the arrival 
time distributions of different pipeline segments, the maximum arrival time distribution of 
the complete pipeline is computed in SSTA through the use of SUM and MAX operations. 
Most of the existing statistical static timing analysis (SSTA) approaches [103], [104] are 
invariably based on Clark’s approximation [105] to compute the distribution of the 
maximum arrival time. The Clark’s approximation for the MAX operation gives exact 
results for the operands having joint bivariate normal distributions. The MAX operation is 
intrinsically a nonlinear function as the maximum of two normally distributed arrival times 
is typically a positively skewed distribution [106]. Moreover, the variability in the devices 
and interconnect also results in asymmetric non-normal distributions [107], [108]. 
Therefore, performing Clark’s MAX operation by approximating the non-normal 
distributions with normal distributions will produce inaccurate results. 
There are some recent studies [109]-[111] which propose analytical evaluation of SUM 
and MAX operations by approximating the arrival times with skew-normal distributions. 
However, the accuracy of the proposed models strongly depends on how accurate the 
arrival times are represented by the skew-normal distributions. 
4.7 Pipeline Delay 
Consider an N-stage pipeline as shown in Figure 4.8. The flip-flops have been inserted at 
regular intervals to store the signal states. If we denote the delay of the combinational 
logic in the -th segment by eEr , the CLK-Q delay of the flip-flop by eElV²r , and the 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
78 
 
setup time of the   1-th flip-flop by x>8¹MrïX , then the delay of the -th pipeline segment, x>r , will be given by x>r  eEr  eElV²r  x>8¹MrïX                                             4.9
 
 
Figure 4.8: N-stage flip-flop based pipeline. 
Under the impact of variability, the delay of each pipeline segment is a random variable 
(RV) with a certain distribution and the delay of the overall pipeline will depend upon the 
distributions of the individual segment delays. 
In order to determine the overall delay of the pipeline, we will make use of the Jensen’s 
inequality [105], [112]. It states that the expected value E of the convex transformation f of 
a random variable x is at least the value of the convex function at the mean of the random 
variable ~ðkÊ
ñ ] k~ðÊñ
 
Since “max” is inherently a convex function [112], therefore according to the Jensen’s 
inequality, the overall delay of the pipeline, òE, will be the maximum of the individual 
pipeline segment delays and a relatively less tight lower bound on the expected maximum 
is given by 
~ ó maxC©),…,o x>rô ] maxC©),…,oN~hx>rjO                                        4.10
 
~ ó maxC©),…,o x>rô ] maxC©),…,oN~heEr  eElV²r  x>8¹MrïXjO               4.11
 
The statistical static timing analysis of the pipelined circuits can be performed using 
numerical integration method, Monte Carlo method, or probabilistic analysis method [106]. 
However, the first two approaches are quite expensive in runtime as compared to the third 
approach. 
The overall pipeline delay can be approximated as [103] 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
79 
 
òE  maxhx>X, x>[ , x>õ ,… , x>ö¯X,x>ö,j                        4.12
  max ÷x>X , x>[ , x>õ , … ,max x>ö¯X , x>ö
ø           4.13
  maxùx>X , x>[ , x>õ , … ,maxùx>ö¯[ , úx>ö¯X,öûû       4.14
 
where  úx>ö¯X,ö represents a distribution which is obtained as a result of max operation on x>ö¯X and x>ö. Now, once úx>ö¯X,ö is determined, we can find úx>ö¯[,ö max x>ö¯[,úx>ö¯X,ö
 by iteratively applying the above procedure. Hence by repeating 
this procedure N-1 time, by taking two variables at a time, we can get the overall 
distribution of the pipeline delay in terms of its moments that can accurately represent the 
distribution. 
The maximum of two normally distributed random variables typically produces non-
normal positively skewed distributions [111]. The skewed arrival time distribution 
resulting from the MAX operation at a given node becomes input for the max operation at 
a downstream node. Moreover, due to device and interconnect variability, the timing 
distributions of the circuits themselves are asymmetric (non-normal) [93], [106], [107]. 
Hence, in the pipeline system described above, if Clark’s approximation is used at each 
stage, the final distribution will deviate significantly from the actual distribution. Again, 
there are some recent works [109]-[111] which proposes the evaluation of max function by 
approximating the timing distributions with skew-normal distributions. However, the 
SSTA results entirely depend on how accurately the timing distributions are represented by 
the underlying approximation models. 
4.8 Statistical Analysis of the Timing Yield 
We now proceed to discuss the yield of a pipelined circuit. The timing yield of a pipeline 
depends on the timing constraints introduced due to the setup time and the hold time of the 
sequential elements. The pipeline should be so designed that the signal from one flip-flop 
to the next flip-flop reaches at least one setup time earlier than the next clock edge. 
Moreover, the signal should not be so fast that the second register can not latch the data 
correctly. Under statistical variations, both shortest and longest paths in the pipeline no 
more remain fixed and therefore both setup and hold time constraints need to be considered 
in the statistical analysis and for yield estimation.  
Considering data transmission between flip-flop FFC
 and FFCü)
 such that FFC
 is the 
source and FFCü)
 is receiver, then the constraint introduced by the setup time for proper 
data latching by the FFCü)
 is 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
80 
 
0 9 eElV²r  eEr 9 eEl L xf>ý L x>8¹MrïX                         4.15
 
where eEl  is the clock period, xf>ýis the skew between the clock signals CLKC and CLKCü). 
In order to avoid race-through condition, the constraint imposed by the hold time is eEV²r  eEr ] 7?=rïX L xf>ý                                            4.16
 
The above constraints dictate that, for successful data transmission, the longest path delay 
should be less than and the shortest path delay should be greater than some target values. 
The time margin under setup time constraint for the pipe segment  is given by æ<x>8¹Mr  eEl L xf>ý L x>8¹MrïX L eElV²r L eEr             4.17
 
Similarly, the time margin under hold time constraint for the pipe segment  is given by æ<7?=r  eEV²r  eEr L 7?=rïX  xf>ý                               4.18
 
In order to minimize the yield loss, both these time margins should be greater than zero for 
all the pipeline segments. Therefore, we need to find the minimum of both the timing 
margins for the whole pipeline so as to check that these are greater than zero.  
All the parameters in the expression of æ<x>8¹Mr  and æ<7?=r , except <CLK, are circuit 
dependent and have certain timing distributions that can either be obtained through detailed 
device and circuit modeling or through simulation. From the distributions of timing 
margins of different pipeline segments, the timing margins of the complete pipeline under 
setup and hold time constraints (æ<x>8¹M , æ<7?=
 can be determined by applying the 
MIN operation over all pipeline segments, following the same procedure as laid down in 
the previous section. Finally, MIN operation is again applied over æ<x>8¹M , æ<7?= to 
find the combined time margin æ<-7 of the pipeline. The MIN operation can be 
performed in the same way as that of MAX: MIN(x1,x2) = -MAX(-x1,-x2). 
The timing yield of the pipeline at a clock period eElcan then be determined as  
YieldTCLK
  bæ<-7TCLK
 Á 0
 
It can be seen that for SSTA, the accuracy of the timing yield depends on how accurately 
the timing distributions are represented and MIN/MAX operations are performed.  
4.9 Experimental Setup and Results 
We used Monte Carlo simulations in HSPICE for the pipeline structure of Figure 4.8 with 
six pipeline stages. The transistor level structure of the single segment of the pipeline is 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
81 
 
shown in Figure 4.9. The study has been carried out for the technology generations of 18 
and 13 nm. During the simulation of the pipeline, the variation in the clock signal is not 
considered and a common clock signal is applied at all the flip-flops. Large numbers of 
simulations (5000) were run to extract the timing parameters. All timing measurements 
were taken corresponding to 50% of the maximum swing level. 
 
Figure 4.9: Transistor level model of the pipeline segments. 
The CLK-Q delay of each flip-flop and the propagation delay of the combinational logic 
between the flip-flops has been measured. Based on these measurements, the delay 
distribution of the maximum of the complete pipeline, using Clark’s approximation [105], 
has been determined and plotted in Figure 4.10 along with the delay distributions of the 
individual stage delays. It may be observed that the individual stage delays are not 
Gaussian and rather are having skewed distributions, under the impact of RDF. Therefore, 
the maximum delay distribution of the complete pipeline can no longer be Gaussian, as 
expected. However, the Clark’s approximation always gives the results in terms of Normal 
distribution. The maximum delay distribution of the complete pipeline has also been 
obtained through Monte Carlo simulations and is also plotted in Figure 4.10. The visual 
inspection shows that the actual distribution has a long positive tail and significantly differs 
from the Normal distribution. The statistical parameters of the two distributions verify this 
fact. 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
82 
 
In order to examine the impact of technology scaling on the evaluation of MAX 
distribution, the simulations were performed for the technology generations of 18 and 13 
nm and the results are shown in Figure 4.11. The results show that the asymmetry in 
different timing parameters of the flip-flops and the combinational logic increases with 
technology scaling, resulting in increased asymmetry in the MAX distribution, as is also 
evident from the statistical parameters given in Table 4.3 (for 0-1 input transition). 
D
e
n
s
it
y
 
Figure 4.10: MAX delay distributions of individual pipeline stages and overall pipeline for 18nm technology 
generation. 
 
Figure 4.11: Overall pipeline delay distributions of a pipeline consisting of 6 stages simulated for the 
technology generations of 18 and 13 nm. 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
83 
 
The increased asymmetry means greater deviation from the Gaussian distribution and more 
error in estimating MIN/MAX distribution using Clark’s approximation, as is evident from 
Figure 4.11. Although the Clark’s approximation provides a conveniently fast means of 
finding MAX distribution, but the inaccuracy of results, particularly in the tail section 
makes it not a good choice for the given purpose, as it will give very optimistic results for 
the pipeline delay. Therefore, it will result in yield loss due to difference in the PDFs at the 
tail section. 
Table 4.3: Statistical Parameters of the MAX Delay Distribution of the Complete Pipeline 
Parameters 18 nm 13 nm 
Mean Delay (ps) 25.2 16.93 
Std. Dev. (ps) 0.843 0.994 
Skewness 0.464 0.676 
Kurtosis 0.426 0.922 
It has also been observed that stage delays are different in opposite transitions even if 
NMOS and PMOS transistors are properly T-sized. For instance, the stage delay 
distributions for low-high and high-low transitions are shown in Figure 4.12. Although the 
size of the PMOS transistors is chosen to be double the size of the NMOS transistors to 
keep the circuit delay close in the two swings. However, the delay variability is inversely  
 
Figure 4.12: Maximum delay distributions plotted for low-high and high-low transitions for the 13 nm 
pipeline. 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
84 
 
proportional to the size of the transistors [107] and therefore the statistical parameters 
(mean, standard deviation, skewness, and kurtosis) are also different for the two 
transitions. Therefore, while determining the MAX distribution, the delay distribution in 
both swings needs to be considered. 
While measuring the timing parameters of the flip-flops, different interdependencies need 
to be considered. These interdependencies also have a negative impact on the shape of the 
distributions due to variability, thus pushing them away from the normal distribution. For 
example, Figure 4.13 shows two histograms for a timing random variable formed by the 
sum of D-CLK time, CLK-Q time and combinational logic delay. The narrow and high 
peak histogram is corresponding to the case when D-CLK time is very large. Similarly, the 
wider histogram is corresponding to the case when D-CLK time is short. The setting of D-
CLK time depends on the clock period and combinational logic delay. However, its value 
greatly affects the shape of the timing distributions and then increased deviation from 
normality for small D-CLK time. 
F
re
q
u
e
n
c
y
 
Figure 4.13: Histograms of timing variable comprising of D-CLK time, CLK-Q time and combinational 
delay for a 13 nm pipeline. 
As mentioned before, some recently reported works [108], [111], [113] propose to 
approximate skewed arrival time distributions with skew-normal and also present 
analytical models for the computation of the MAX function. However, we have seen in this 
work that skew-normal is also not an appropriate choice for representing skewed timing 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
85 
 
distributions of highly scaled devices as shown by the probability distributions in Figure 
4.14. The solid curve corresponds to the actual simulation data and the other two curves are 
for the normal and skew-normal approximations. It can be seen that although skew-normal 
distribution better matches the actual data as compared to the normal approximation, but 
still it does not exactly approximate it. Similar discrepancy is also reported in the arrival 
time plots given in [108], [111], [113].  
 
Figure 4.14: Probability density functions for the pipeline delay with a combinational logic of 60 inverters in 
series for 13 nm. 
 
Figure 4.15: Difference in timing yield estimation with normal and skew-normal approximations. 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
86 
 
The inaccuracy in approximating the arrival times for SSTA has a dreadful impact on yield 
estimation. In Figure 4.15, timing yield loss as a function of operating frequency has been 
plotted for normal and skew-normal distributions for their comparison with the MC 
simulation results. For this purpose, the model laid down in section 4.8 has been followed. 
Again it can be seen that normal approximation produces optimistic results but quite 
different from the MC results. The skew-normal approximation gives relatively better 
results but still with some error. For instance, the yield loss at a frequency of 7GHz is 9.3% 
with MC simulation data, 5.2% with normal and 7.8% with skew normal. This error 
increases to significant levels for deeply pipelined circuits with multiple MIN/MAX 
operations during SSTA. Therefore, in order to keep yield loss below permissible limits, 
operating frequency will have to reduce. 
4.10 Summary 
In this chapter accurate estimation of the shape of timing distributions of flip-flop 
parameters has been discussed. The study of the exact shape of these distributions, 
especially in the tail section, is of fundamental importance in the design and modeling of 
high-performance, reliable, economically feasible circuits. In this chapter, the distribution 
tails are estimated based on simulation data, with the aid of statistical nonparametric 
probability density functions, and it has been found that timing distributions can better be 
represented by certain nonparametric distributions, in particular Pearson and Johnson 
systems. The use of these representations during the statistical static timing analysis will 
provide more accurate results as compared with the normal approximation of distributions 
and will eventually reduce the probability of yield loss. The skew normal distribution 
provides an interesting alternative to represent the skewed data; however, it does not give 
better results than Pearson and Johnson systems. Since in current state-of-the-art systems 
the device count has already crossed several billion, accurate representation of data for 
SSTA is imperative to avoid yield loss. Therefore for such large systems also, Pearson and 
Johnson distributions provide very accurate results as compared to other distributions. 
Under statistical device variations, the delay distributions of the pipeline stages follow a 
skewed distribution in highly scaled devices. Therefore, in order to determine the 
maximum operating frequency of the pipelined circuits, accurate estimation of the slowest 
pipeline stage will have to be determined. This study shows that identifying the slowest 
pipeline stage using Clark’s approximation will produce quite optimistic results and will 
lead to significant yield loss. Moreover, it has been shown that while estimating the yield, 
Chapter 4                                                            SSTA of Pipelined Communication Circuits 
 
87 
 
the stage delay distributions in both low-to-high and high-to-low transitions need to be 
considered and hold time distributions should also be considered along with setup time 
distributions. 
 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
88 
 
 
 
 
 
 
 
Chapter 5 
 
Optimal Scaling for Variability 
Tolerant Repeaters 
 
 
5.1 Introduction 
As technology scales, on-chip interconnections are becoming progressively slower when 
normalised by the logic delay. Techniques to manage this discrepancy, and avoid a 
possible bottleneck, are therefore required. The use of caching, and wide buses are all 
possible.  However the most fundamental solution is the use of repeaters inserted in the 
communication links. The placement and size of repeaters can be tuned to construct delay 
optimal interconnections. Again due to technology scaling, the number of optimal repeaters 
per unit length is also increasing. Optimal repeaters are of significantly large size as 
compared to the minimum sized repeaters. Thus they require larger portions of the silicon 
and routing area [114] and a significantly larger portion of the chip power [61]. Due to 
their large number and size, their total power consumption can be as high as 60W [115]. 
For the future technology generations, unconstrained optimal buffering of interconnects 
might require up to 80% of the total on-chip area [68]. The impact of technology scaling on 
the number and size of delay optimal repeaters is shown in Figure 5.1. 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
89 
 
 
Figure 5.1: Optimal number and size CD
 of uniformly inserted buffers in an interconnect of minimum 
width and spacing for the three technology generations. 
Due to the increasing trend of the on-chip power dissipation, it has been pointed out as the 
main limiting factor in the scaling of CMOS circuits [62]. In previous technology 
generations, the switching power was the dominant component of power dissipation. 
However the relative contribution of different components of power dissipation (switching, 
short circuit, and leakage) is changing along scaling. Therefore, it becomes important to 
determine different components of power dissipation individually, as this approach may be 
helpful in designing more power efficient designs. 
The increasing magnitude of the variability in deep sub-micron (DSM) technologies is not 
only affecting the delay characteristics of the devices but also their power dissipation. In 
this work we will show that RDF causes inherent variability in the power dissipation of the 
devices. Therefore, similar to the operating frequency and yield which are affected by the 
delay variability, the variability in the power dissipation may also affect yield. 
In the first part of this chapter, we present the results for the power measurement in 
repeaters. We used Monte Carlo simulation method for the accurate characterization of 
power dissipation in repeaters of 25, 18 and 13 nm bulk MOSFETs, and to see the effect of 
RDF on all the components of power dissipation. Since repeaters of different sizes are used 
on the chip, therefore, the effect of repeater size on power dissipation has also been 
investigated under the impact of RDF. The results obtained through this study can be used 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
90 
 
to develop accurate models for other sizes and configuration of devices. Moreover, the 
characterization data so obtained can be used to design more effective power optimal links. 
In the second part of this chapter, we investigate the impact of variability on area and 
power optimal repeater insertion technique for on-chip links. In [116] it has been shown 
that absolute performance is expensive in terms of power dissipation and silicon area and 
we can make significant savings in these parameters at the cost of a little performance 
penalty. However, we argue that in addition to the delay performance, the predictability of 
the timing of the signals for all the wires in a multi-bit link is another important parameter 
for high performance designs. The timing variability not only degrades the system 
performance but can also produce timing violations and system faults, thus reducing 
system yield. With aggressive technology scaling, the variability in the devices and 
interconnect is continuously increasing, posing many challenges for high performance and 
yet reliable designs [117], [118]. The power optimal repeater insertion methodologies in 
[116], [24] suggest the use of smaller sized buffers (and increased inter-repeater segment 
length), whereas it has been shown in [107] that delay variability of the buffers is inversely 
proportional to their size and that this relation is not linear. Therefore, reducing the size of 
the buffers may be of little benefit if variability, reliability and yield are to be maintained 
within certain acceptable limits. Hence, robust designing of communication links require 
the need for studying any power and area efficient methodologies against the reliability of 
the system and any such methodology should also include this metric in the optimization 
process. 
5.2 Methodology for Power Measurement 
The arrangement for the measurement of different components of power dissipation is 
shown in Figure 5.2. Minimum sized inverters (MSI) of 25, 18 and 13 nm technology 
generations were used with a supply voltage of 1.1V, 1.0V and 0.9V, respectively. Based 
on the predictive model card libraries, Monte Carlo simulation method has been used and 
10,000 HSPICE simulations were run for accurate measurements, for each of the given 
technology generations. The measurements were taken during both swings (VHL and VLH) 
for the repeaters switching at a frequency of 2GHz for all the three technology generations. 
The typical value of activity factor 0.15 [119] is used in this study. 
The leakage power of the inverter R is measured using the leakage current flowing through 
zero-volt voltage sources VP and VN in each of its possible states. 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
91 
 
The short circuit power has been determined by measuring the energy dissipated across the 
supply voltage JKK  by integrating the current over the period 
 of interest: 
~  JKK  <
<6%                                                            5.1
 
where <
 is the short circuit current flowing through the inverter which can be sensed 
through the zero-volt voltage source VN for the LH transition and through the zero-volt 
voltage source VP for the HL transition. The transient analysis was carried out over the 
whole switching period, which was taken to be significantly long to cover the whole 
transition. 
The switching power is determined by first measuring the total energy dissipated by the 
inverter over both transitions and then subtracting the short circuit and leakage 
components. 
 
Figure 5.2: Arrangement for the measurement of power dissipation in the repeater. 
~878@?  JKK  ò<
<&6%                                                        5.2
  ~xý	  ~xýE  ~x-	  ~x-	  2b?>@f                   
~xýE  ~xýE  12 N~878@? L ~x-	 L ~x-	 L 2b?>@fO                  5.3
 
where ~878@?  , ~xý  , ~x-  are the total, switching and short circuit energies, respectively. The 
subscripts LH and HL represent the transitions from low-high and high-low, respectively. b?>@fis the leakage power.  
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
92 
 
5.3 Results and Discussion 
Different components of power dissipation along with the total power are shown in Figure 
5.3 for the three technology generations. The curves show the trend of these components 
with technology scaling. It may be noted that leakage power increases; whereas the other 
two components decreases, as the technology scales from 25nm to 13nm. The pace at 
which leakage power and short circuit power changes is roughly the same, whereas the 
switching power decreases more rapidly.  
 
Figure 5.3: Different components of power dissipation along with the total power in a minimum sized 
inverter (MSI). The inverter under investigation refers ‘R’ in Figure 5.2 operating at a frequency of 2GHz. 
Technology scaling has made it possible to switch the circuits at higher speeds. As 
mentioned before, the FO4 delay metric can be used to compare the speed of the circuits in 
different technologies. In Figure 5.4, the FO4 delay in the given three technologies has 
been plotted along with the leakage power in MSI of the corresponding technologies. It can 
be seen that the devices become faster with technology scaling, as expected. However this 
gain in performance is associated with dramatic increase in the leakage power. Hence there 
is an inverse correlation between circuit speeds and leakage power. 
The relative contribution of different components of power dissipation in the total power is 
graphically shown in Figure 5.5 and the corresponding data is given in Table 5.1. It has 
been found that leakage power is no more an insignificant quantity in comparison with 
other two components and can affect the performance of high performance designs. The 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
93 
 
increase in the leakage power is mainly due to the increase of sub-threshold leakage 
current. The short circuit power is decreasing and is due to the reason that devices are 
becoming smaller with technology scaling, having relatively higher output resistance. 
Amongst all components of power dissipation, switching power is the most dominant 
mode of power dissipation. 
 
Figure 5.4: A plot of FO4 delay and the leakage power in MSI. 
 
ShortCircuit
Technology Generation (nm)
Leakage
Switching
25 18 13
0
20
40
60
80
100
 
Figure 5.5: Normalized power distribution components in MSI operating at 2GHz. 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
94 
 
From Table 5.1, we can see that the leakage power represents a very significant portion of 
the total power dissipation. It becomes even more prominent if the system is operating at 
lower frequencies because short circuit and switching power components are frequency 
dependent and become small as frequency decreases. Therefore power optimization 
methodologies should also consider individual power dissipation components along with 
total power dissipation. 
Table 5.1: Statistics of Power Measurements for MSI 
 Tech. Pleak Psc Psw Ptot 
Mean (µW)  0.0125 0.16 1.5 1.67 
St. Dev. (µW) 25nm 0.01 0.004 0.016 0.022 
3σ/ Mean (%)  259.1 7.0 3.3 4.0 
Mean (µW)  0.021 0.0754 0.8701 0.966 
St. Dev. (µW) 18nm 0.021 0.0034 0.0095 0.0209 
3σ/ Mean (%)  303.1 13.7 3.3 6.5 
Mean (µW)  0.0318 0.0285 0.364 0.425 
St. Dev.(µW) 13nm 0.0368 0.0066 0.0121 0.0341 
3σ/ Mean (%)  346.6 69.4 10.0 24.1 
Due to variability in the devices, power dissipation becomes a random variable. For 
instance, due to the variation in the threshold voltage of devices, the leakage current is 
different for different devices on the chip. Similarly, due to the mismatching in the 
switching timing of the NMOS and PMOS devices in the inverter, the short circuit power 
varies for different inverters. However, there is little effect of device variability on the 
switching power. As a result of this behaviour, power dissipation follows a certain 
distribution, with statistical data given in Table 5.1. It may be noted that there is a 
significant variation in the power dissipation, especially in the leakage component. As can 
be seen (Table 5.1), the variability of leakage power in 13nm inverters reaches up to 346% 
with respect to the mean power. 
Figure 5.6 shows the histogram of leakage power in 25nm minimum sized repeaters. The 
spread of the distribution is quite evident which means that the leakage power of a large 
number of repeaters is away from the mean value. Therefore, when considering power 
issues (for instance, in optimizing a circuit for power consumption), the complete 
distribution of power dissipation needs to be considered instead of just the mean value. 
More importantly, it is apparent that the distribution is not normal; rather it is quite 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
95 
 
asymmetric about the mean value having positive skewness. This implies that some 
devices will dissipate a far larger amount of power than the mean. 
5.3.1 Impact on Repeater Inserted Links 
Due to the long tail in the distribution, a large number of on-chip repeaters will dissipate an 
excessively large amount of power. A similar asymmetry has already been observed in the 
delay distribution of the repeaters [107]. This is relevant, since an inverse correlation 
between the repeater delay and leakage power exists [120], and therefore the simultaneous 
optimization for delay and power becomes challenging. The variability in the devices, with 
asymmetric distributions of delay and power, has serious implications on the yield of the 
chips, as many of the chips would have to be discarded due to unacceptable delays and 
many more due to excessive power dissipation. If spatial correlations exist (due to process 
issues; not due to RDF), there may exist a cluttering of such highly leaky devices on the 
chip which can further create reliability issues. We have also observed that the skewness in 
the leakage power distribution greatly increases with technology scaling which further 
deteriorates the situation. This instigates the use of some preventive measures to control 
the leakage power in the circuits. 
 
Figure 5.6: Histogram of leakage power in 25nm MSIs. The distribution is quite asymmetric about the mean. 
5.3.2 Impact of Repeater Size on Power Dissipation 
In the global interconnect, repeaters of different sizes are used. Therefore, the study has 
been extended to investigate the effect of repeater size on power variability. We have 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
96 
 
chosen repeaters of sizes 1X, 2X, 4X, 8X and 16X with a similar repeater connected at 
their outputs to act as the load. HSPICE simulations were performed and results are shown 
in Figure 5.7 for 18nm technology. The error bars represent the uncertainty (corresponding 
to 1xsigma) in the leakage power. It can be seen that the leakage power increases linearly 
with repeater size. Similarly, the uncertainty in the leakage power also increases almost 
linearly with the increase in the repeater size. However, we have shown in [107] that 
increasing the size of the repeaters reduces the delay uncertainty but this advantage is not 
achieved in case of power. The normalized leakage power, on the other hand, decreases 
with the increase of repeater size. 
5.3.3 Impact on NoC links 
In Network-on-Chip (NoC), links of different width are designed to achieve a given 
throughput, and latency. The width of a communication link is usually defined in terms of 
the phit size, which determines the number of bits that can be simultaneously transferred 
through the link. In many cases the link utilization rates are not constant and can be very 
low, just a few percents [121]. Large phit sizes are preferred to meet latency requirements 
but such links also remain idle for most of the time. Thus in such links, leakage power will 
be the main contributor of power dissipation. Therefore, a stronger tradeoff will have to be 
made between the power and other performance metrics, in the presence of increased 
variability. 
L
e
a
k
a
g
e
P
o
w
e
r
(m
ic
ro
W
a
tt
)
L
e
a
k
a
g
e
P
o
w
e
r,
N
o
rm
a
liz
e
d
w
ith
P
to
ta
l(%
)
 
Figure 5.7: Effect of repeater size on leakage power. Leakage power and its variability increases with 
repeater size. 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
97 
 
5.4 Power and Area Optimal Repeater Insertion 
5.4.1 Unconstrained Repeater Insertion 
We consider a global interconnect having resistance g and capacitance  per unit length, 
inserted with repeaters of equal size at equal distance as shown in Figure 5.8. The whole 
interconnect, therefore, consists of 
 wire-segments each with repeater of size i and 
interconnect length  (which is the length of the interconnect between any two repeaters). 
We assume that the output resistance of a minimum sized repeater in a given technology 
generation is gx, the input capacitance is 7, and an output parasitic capacitance is M. These 
values are scaled accordingly for the repeaters of different sizes such that for a repeater of 
size i, the total output resistance becomes 8F  gx iY , the total input capacitance becomes E  7i and the total output parasitic capacitance becomes M  Mi. The delay per unit 
length corresponding to 50% of the full swing voltage is given by [24], [51]. 
?  #1 gxN7  MO  gxi   gi7  12 g$ log2                            5.4
 
 
Figure 5.8: Buffer inserted interconnect. 
The values of  and i, which gives optimal delay per unit length, are given by [51] 
7M8  2gx7  M
g                                                       5.5
 
i7M8  gxg7                                                                       5.6
 
Using 7M8 and i7M8 in equation (5.4), the optimal delay per unit length is given by 
?V7M8  2_gx7g à1  12 #1  M7$â                                  5.7
 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
98 
 
5.4.2 Repeater Insertion under Area Constraints 
We consider again the interconnect of Figure 5.8. Let i7M8 and 7M8 be the optimal repeater 
size and inter-repeater segment length and let i and  be the corresponding values under 
some area constraint. Then   `7M8 and i  ai7M8, where ` and a are taken to be ` ] 1 
and 0 m a 9 1. The area required for a repeater inserted interconnect of length  is the 
sum of the area of all the repeaters inserted at a regular interval of length . The area 
occupied by the interconnect itself is not included in the total area because it remains the 
same in the area constrained and area unconstrained case. Only the number and size of the 
repeaters will be reduced in the area constrained case as shown in Figure 5.9. 
 
Figure 5.9: An interconnect between the transmitter and receiver (a), optimal buffer insertion (b), buffer 
insertion under area constraint (c). 
If v is the total buffer area for the area constrained case and v7M8 is the area for the area-
unconstrained case, then we define the area ratio   v v7M8⁄ , 0 m  9 1 [116]. 
  vv7M8 
. >GG . i Y. >GG. i7M8 7M8 
i i7M8Y 7M8Y 
a`                                        5.8
 
Using the value of 7M8 and i7M8 from equation (5.5) and (5.6) and performing some simple 
mathematical steps, we can find the optimal values of ` under the area constraint as 
follows. 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
99 
 
`7M8& 
  
1  M 7Y  √2V)
1  M 7Y  √2                                         5.9
 
The value of `7M8
 can be used in equation (5.8) to get the value of a7M8
 such that 
a7M8
  `7M8
                                                              5.10
 
The speed at which the signal travels through the interconnect of length  in time  is the 
signal velocity . Under area constraint, the delay per unit length given by equation (5.7) 
can be used to derive the optimum value of reciprocal velocity, which can be written as 
7M8V)  UgkgË	UV) 2 log2
 gxg
) 3Y   
·_7  M√2  _7 #_gxg7  1√2gxg7  M$                     5.11
 
Now for any value of , there will be a combination of `7M8 and a7M8 (equation (5.9) and 
(5.10)) that will give the best possible performance through equation (5.11), as shown in 
Figure 5.10. 
 
Figure 5.10: Optimal repeater size and inter-repeater segment length (both normalized) for different area 
ratios. 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
100 
 
5.4.3 Repeater Insertion under Power Constraint 
The power dissipation in repeaters bF>M
 consists of three components namely, short 
circuit power (bx-), switching power (bxý) and leakage power (b?>@f) such that the total 
power is  
bF>M  bx-  bxý  b?>@f                                                   5.12
 
In an interconnect of length  having 
 uniformly inserted repeaters, the power dissipation 
per unit length is given by [24] 
bF>MÀ  
. bF>M  T`<FJKKDqrstx-keEl  `N7  MOJKK& keEl  12JKKt7GGsDqrs
 t7GGMqrs
5 i  `JKK& keEl                                                                      5.13
 
bF>MÀ  
bF>M  µ) i  µ&,                                          5.14
 
where µ1 and µ2 are constants. But   `7M8 and i  ai7M8, therefore,  
bF>MÀ  
bF>M  µ) ai7M8`7M8  µ&                                                    5.15
 
This expression shows that in order to reduce power dissipation per unit length (due to 
repeaters), the ratio  will have to be minimized. This will simultaneously reduce the area 
because   . 
5.4.4 Communication Reliability 
Reducing the repeater size seems attractive in terms of the silicon area and power savings. 
However in deep sub-micron region, reducing the size of the repeaters for area and/ or 
power savings will increase variability and produce reliability issues in data transmission. 
This is because the delay variability is inversely proportional to the size of the repeaters 
and spread in the delay distribution increases with technology scaling [107]. The variability 
in the devices and interconnect can produce uncertainty in the arrival times of the signals 
with respect to target values and thus can cause critical data loss. In this section we will 
determine the probability of such a failure in a single line interconnect. 
We again consider Figure 5.8, where at the receiving end of the interconnect, a positive-
edge triggered D-flip-flop (DFF) is used to register the data. For the DFF, let Rx>8¹M  be the 
setup time and RMF7M be the propagation delay from D to Q after the positive clock edge 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
101 
 
and RýCF> be the propagation delay of the interconnect of length . We assume that the link 
is operating at a clock frequency keEl  having period eEl . 
For a data bit meeting the desired timing constraint to reach the output of the FF, the 
following delay constraint must be satisfied 
0 9 RýCF> 9 eEl L Rx>8¹M L RMF7M                                        5.16
 
The probability of correct data transmission can, therefore, be expressed as follows 
º  PrN0 9 RýCF> 9 eEl L Rx>8¹M L RMF7MO                             5.17
 
where the clock period eEl , wire delay RýCF>, propagation delay RMF7M, and setup time of 
the DFF, Rx>8¹M  are random variables. Therefore, the total delay through the interconnect 
(from source to receiver output) will also be a random variable. This distribution can be 
determined analytically (by considering all possible sources of variability) or through 
simulation (as in this work). This relies on accurate characterization of the underlying 
distributions. Let H- and - be the mean and standard deviation of resultant pdf of kgU
  kiU<
  kg
 L k

, then the probability of correct 
data transmission is given by the error function [122] 
º  12  erf #H--$                                                             5.18
 
where erfÊ
  )√&  exp L 8[&I% 
< 
and the probability of failure for the data bit transmitted through the on-chip 
communication channel is then given by 
b  1 L º                                                                        (5.19) 
5.5 Optimization Methodology 
The design objective can either be the optimization of area, power or performance, under 
the permissible limits of delay variability. These metrics are coupled with each other so a 
trade-off will need to be established. For a particular communication link design, a unique 
cost function is established.  For this function an optimum configuration is found, which 
will give the best results in terms of the given parameters. This optimum is determined 
though a standard optimization technique using the trade-off curves connecting these 
parameters. 
 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
102 
 
5.5.1 Case Study 
We used Monte Carlo simulation method to perform experiments for this optimization 
study under the impact of device variability due to RDF. The interconnect structure is such 
that the middle wire under consideration is surrounded by two similar wires. The width of 
each wire and interspacing between them was kept at 0.048 Hm for 13 nm and 0.0675 Hm 
for 18 nm technology generation. The interconnect parameters were taken from ITRS 2007 
[50] and interconnect capacitances have been derived using the analytical models given in 
[45]. The wires are modelled as distributed RC interconnect with 100 ladder-segments. The 
variability in the interconnect wires is not considered in this study. The buffer size and 
inter-repeater segment length for optimal repeater insertion is i7M8=140 and 7M8= 80.67 
Hm for 13 nm and i7M8=137 and 7M8= 152.1 Hm for 18 nm technology generation. A 
supply voltage of 0.9V for 13 nm and 1.0V for 18 nm circuits was used. A large number of 
HSPICE simulations (6000) were performed and delay-power measurements were taken 
during each run. The delay measurements were made corresponding to 50% of the 
maximum swing level. The total power measurement results are based on leakage, 
switching and short circuit power components at a frequency of 2GHz.  
Based on the simulation data, the delay variability (=>?@A H=>?@A⁄ ) is determined and 
plotted in Figure 5.11(a) for the given technologies. It can be seen that the delay variability 
increases rapidly with the decrease of repeater size and also increases with technology 
scaling. In the presence of other sources of variability, the delay variability increases to  
 
(a)                                                                                   (b) 
Figure 5.11: Delay variability as a function of different ratios of repeater size, in the absence of crosstalk (a), 
Dependence of delay variability on repeater size and inter-repeater segment length (b). 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
103 
 
greater extent. The dependence of delay variability on i and  is also shown in Figure 
5.11(b) for 13 nm technology. It may be noted that delay variability not only increases with 
the decrease of repeater size but also with the increase of interconnect segment length. For 
using this trade-off in the optimization process, the surface plot can be converted into an 
empirical expression using multiple regression techniques. This helps to understand how 
the typical value of the dependent variable (for instance, delay variability) changes when 
any one of the independent variables (l and S) is varied over a particular range. 
As we have already seen, in order to reduce area, we need to increase  7M8⁄  and decrease 
( (7M8⁄  ratios according to Figure 5.10 for getting the optimum performance for a 
particular configuration. The performance degradation due to different values of ( and  
with respect to (7M8 and 7M8 can be predicted using equation (5.11). In Figure 5.12, these 
predictions are compared with the simulation results which matches very well at most of 
the area ratios. The model, however, deviates slightly from the simulation results at smaller 
area ratios. This is because at smaller repeater sizes, the delay distributions deviate from 
normality due to RDF [93], [94] and show some asymmetry. Figure 5.12 also shows the 
effect of area scaling on delay uncertainty. It can be seen that the standard deviation of the 
delay increases almost 3 times with v vM8⁄ =0.2. 
 
Figure 5.12: Comparison between analytical model and simulation results for performance degradation due to 
area scaling. 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
104 
 
In Figure 5.13 different trade-off curves have been plotted together to explore different 
design choices. The area, power and performance curves show that in order to get ultimate 
performance, we will have to consume significant amount of power and area. However, 
with only 4% of performance degradation, we can reduce 30% power dissipation and 40% 
area. Whereas, in the presence of variability, an adverse effect of this trade-off is that delay 
certainty (defined as the reciprocal of delay variability) will reduce by 24% from the 
optimum level. This will increase the probability of failure of the link at a particular 
frequency, which can be estimated using equation (5.18)-(5.19). Therefore, the speed of the 
link will be limited in order to keep the probability of link failure below some acceptable 
limits. This problem will aggravate in high speed wider links where the skew amongst 
various wires in the link will play a detrimental role in determining its performance. This 
effect will be considered in Chapter 6. It becomes evident then that during the optimization 
process, the delay variability should also be considered in the figure of merit; otherwise the 
yield will be badly affected.  
 
Figure 5.13: Performance, area, power and performance certainty trade-off curves. 
5.6 Summary 
In the first part of this chapter we have measured power dissipation in repeaters of given 
three technology generations under RDF. The results show that the relative proportion of 
different components of power dissipation is changing and leakage power is emerging as a 
Chapter 5                                                 Optimal Scaling for Variability Tolerant Repeaters 
 
105 
 
serious problem in the designing of high performance and power optimal chips. Therefore, 
design methodologies should consider individual components of power dissipation along 
with the total power. Wider links in NoCs, which are preferred for better latency, will 
consume more power due to higher leakage currents at low activity levels.  
The variability in the devices which is affecting the delay characteristics is also effecting 
the distribution of power dissipation. A significant asymmetry has been observed in the 
distribution of leakage power and hence effectively, leakage is increasing more rapidly 
than anticipated. This in turn, is badly affecting the yield. It will be more advantageous to 
consider power variability along with delay variability while making different circuit 
optimizations. Active countermeasures, such as the use of sleep transistors, could be a 
possible solution against leakage power. 
In the second part of this chapter, we have analysed the impact of device variability on the 
performance of on-chip single bit data links. We emphasize that due to increasing trend of 
the variability, power and area optimal repeater insertion methodologies should also 
consider performance variability. Analytic models for area, power, performance and 
probability of link failure have been presented in terms of the size of the repeaters and 
inter-repeater segment length. It has been found that beyond a certain reduction in the size 
of the repeaters, the delay variability may exceed acceptable limits while still satisfying 
other constraints. Therefore, while optimizing area, power and performance of on-chip 
communication links, delay (and power) variability should also be included in the figure of 
merit. 
  
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
106 
 
 
 
 
 
 
 
Chapter 6 
 
Design of Variability Tolerant Data 
Channels  
 
 
6.1 Inter-Resource Communication 
Different functional units in SoCs communicate with each other through the 
communication infrastructure, consisting of several links. The inter-resource 
communication link usually consists of a large number of parallel interconnects, as shown 
in Figure 6.1(a), which are coupled with each other (RC/RLC) along the length of the 
channel. In a Network-on-Chip (NoC) platform, the functional units are connected to the 
routers through such communication links. Similarly, the routers are also connected with 
each other, in a certain topology, through another group of communication channels, as 
shown in Figure 6.1(b). The communication channels can be wider or narrower in terms of 
the number of lines they contain and this determines the phit size. 
In order to reduce the resistance-capacitance (RC) delay of interconnects, low resistivity 
and low dielectric constant materials are used [124], [125]. A common technique to reduce 
the delay of the global interconnects is the use of repeaters and increasing the width of the 
wires [126]. However, increasing the width of the wires may reduce the channel capacity, 
as fewer wires can then be accommodated in the same channel widthe
. Similarly, 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
107 
 
interconnect spacing also effects the delay and bandwidth (since the coupling capacitance 
changes with spacing). The literature is abundant with several works on the optimization of 
the performance of global interconnects considering different metrics [24], [25], [127]-
[129]. However, most of the literature ignores variability and sources of noise (for 
instance, crosstalk), during their proposed optimization techniques. 
Core Core
L
W
C
LINK/
CHANNEL
 
(a) 
Router
FUFU FU
FU FU FU
FU FU FU
Router
Router
Router
 
(b) 
Figure 6.1: Simple Core-Core link consisting of multiple interconnects (a), Functional unit-Router and 
Router-Router links in a Network-on-Chip (b). 
As the process dimensions are shrinking to the nanometer region, the impact of variability 
has become extremely critical to the performance of the communication channels. The 
variability is affecting both the device (front-end of the line) and interconnect (back-end of 
the line) [130] resulting in the performance degradation of the whole channel. Moreover, 
under device scaling, leakage power is becoming an important source of power dissipation 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
108 
 
alongside the switching power and therefore channel designers should also consider these 
aspects during optimization for a certain parameter. Although some recent work has been 
done on the modelling and analysis of the global interconnects with the consideration of 
variability [131]-[133], but no comprehensive work on the optimization of the data 
channels under different trade-offs for future technologies, where these effects are quite 
prominent, has been published.  
6.2 Channel Configuration and Modelling 
 
(a) 
 
(b) 
Figure 6.2: Structure of a multi-bit bus, where the number of interconnects in a fixed channel width e  
depends on the interconnect width and spacing. (a) the cross-sectional view showing different dimensions 
and (b) the top view of the bus indicating outer and middle lines. The input signals on any two adjacent lines 
are opposite in phase, thus simulating the worst case of crosstalk. Each line in the bus can be considered as an 
aggressor or victim, as they can affect the performance of each other. 
The performance of a data channel strongly depends on its geometry. There are several 
possible configurations of a channel corresponding to different values of interconnect 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
109 
 
width (W), spacing (S), thickness (T) and dielectric thickness (H), as shown in Figure 6.2. 
The variation of these parameters affects the capacitance, resistance and inductance of 
interconnects, which in turn changes the delay and other metrics. Inductance is less of an 
issue for interconnects under consideration due to the reasons mentioned in section 2.4.3. 
Amongst these geometrical parameters, T and H are technology dependent and so the 
designers have only the choice of varying W and S to design a channel for the required 
performance. While designing such channels, these parameters are set at the designed 
values. However, these parameters are also affected due to process variations, thus 
affecting the geometrical dimensions of interconnects. These changes are process 
dependent and controllable only to some extent.  
6.2.1 Interconnect Resistance 
In a given technology generation, only interconnect width (W) affects the resistance (T and 
H are assumed to be fixed). The interconnect resistance for a given geometry can be 
calculated using equation (2.1). 
6.2.2 Interconnect Capacitance 
The capacitance of an interconnect in the channel comprises of the fringe capacitance, the 
coupling or mutual capacitance and parallel plate capacitance. The fringe capacitance and 
parallel plate capacitance add up to form the self or ground capacitance. In order to 
investigate the characteristics of the interconnect capacitance, we will use the electrical 
model of [134] as it closely matches the actual situation for interconnects in a bus. 
The capacitance of a global interconnect for 18 nm technology with minimum width has 
been plotted as a function of the spacing between the neighbouring interconnects in Figure 
6.3. The technology parameters have been taken from the ITRS 2007 [50]. The curves 
show that the coupling capacitance quickly drops with the increase of the spacing. 
Similarly, the ground capacitance increases with the increase of the spacing. The reason for 
the increase of ground capacitance with spacing is not very obvious. Actually the parallel 
plate capacitance is not affected with the increase or decrease of the spacing; however the 
fringe capacitance of interconnects (except at outer edges of the bus) increases with the 
increase of the interconnect spacing. This results in the increase of the ground capacitance 
with spacing. The effect of spacing on the coupling capacitance is more dominant than the 
ground capacitance and therefore the total capacitance decreases with the increase of the 
spacing. Consequently, the signal delay through widely spaced interconnects is less than 
the closer interconnects. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
110 
 
 
Figure 6.3: Capacitance curves for minimum width global interconnects of 18nm plotted as a function of 
interconnect spacing. 
The impact of line width variation on the capacitance is shown in Figure 6.4. The total 
capacitance increases linearly with the increase of the interconnect width. The main 
contributor of this increased capacitance is the parallel plate capacitance, whereas the 
coupling capacitance remains almost constant due to the obvious reason of constant 
spacing.  
0 5 10 15 20
0
50
100
150
200
250
300
350
400
Normalized Interconnect Width (W/Wmin)
C
a
p
a
c
it
a
n
c
e
 (
fF
/m
m
)
Ground Capacitance
Coupling Capacitance
Total Capacitance
 
Figure 6.4: Capacitance curves for 18nm global interconnects plotted as a function of width at minimum 
interconnect spacing. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
111 
 
 
Figure 6.5: The total capacitance of an interconnect (not at the outer edge) of a bus in 18 nm technology 
plotted as a function of the interconnect spacing and width. 
A 3D surface plot of the total capacitance of a bus interconnect as a function of the width 
and spacing is shown in Figure 6.5. The surface plot shows that the interconnect 
capacitance is largest for wider interconnects running parallel to each other at shorter inter-
spacing. Both resistance and capacitance affect the delay. 
6.2.3 Interconnect Delay 
We assume that all lines in the channel bus are uniformly coupled with two neighbouring 
aggressor lines. The lines at the extreme-edges are, however, coupled with only one line. 
We also assume that the length of the bus is  and all lines in the bus have the same 
designed geometrical dimensions. Let , x, and - be the total interconnect resistance, 
self capacitance and coupling capacitance of each line, respectively. Now for a step input, 
the delay corresponding to 50% transition level for the middle and outer edge conductors 
in the bus has been approximated by a simple linear model in [135] as 
C=  0.4x  C-                                                           6.1
 
7¹8>F  0.4x  C #-2 $                                                    6.2
 
In equation (6.1) and (6.2), the coefficient C is selected according to the type of the 
switching activity in the neighbouring aggressor lines. [135] gives six possible cases 
corresponding to which the values of Ó is given in Table 6.1. 
Case 1: Both the neighbouring aggressors switch from state 1 to state 0. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
112 
 
Case 2: One aggressor is quiet and the other switches from state 1 to sate 0. 
Case 3: Both the aggressors are quiet. 
Case 4: One aggressor switches from 0 to 1 and the other switches from 1 to zero. 
Case 5: One of the aggressors switches from 0 to 1 and the other remains quiet. 
Case 6: Both the aggressors switch from 0 to 1. 
Table 6.1: Coefficients of the delay model for different switching patterns [135]. 
Case 
i 
Switching 
pattern 
 
 
 
 
1 (a) 1.51 2.20 
2 (b) 1.13 1.50 
3 (c) 0.57 0.65 
4 (d) 0.57 0.65 
5 (e) N/A N/A 
6 (f) 0 0 
If the victim line switches from zero to one then Case 1&2 will slow down the victim line 
and Case 5&6 will speed up it. For the time being, we consider Case 3 only to find the 
reference delay. In this case, equation (6.1) and (6.2) will reduce to  
C=  0.4x  0.57-                                                           6.3
 7¹8>F  0.4x  0.285-                                                         6.4
 
 
Figure 6.6: Propagation delay of the middle interconnect of minimum width of a bus for the given three 
technologies plotted as a function of the spacing between the conductors. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
113 
 
Using these equations, the delay of the bus interconnects has been estimated for the three 
technology generations. Also the dependence of the delay on interconnect spacing and 
width has been studied and results are shown in Figures 6.6 & 6.7. The curves show that 
the interconnect delay can be reduced by increasing the interconnect width and/or by 
increasing the spacing between the neighbouring conductors. It is, however, important to 
note that increasing the spacing beyond certain value is not very beneficial in terms of the 
delay because coupling capacitance effects are minimal after some interconnect spacing. 
Increasing spacing beyond this point will simply waste chip area. 
 
Figure 6.7: Propagation delay of the middle interconnect of a bus with neighbouring interconnects at 
minimum spacing for the given three technologies plotted as a function of the width of the conductors. 
On the other hand, increasing width may improve delay performance over some large 
range of width as compared to the spacing. The decrease in the delay is due to the decrease 
of the interconnect resistance but at the same time the ground capacitance also increases 
with the increase of the width. The increase of the width has a negative effect as the 
switching power increases with the increase of the capacitance. Therefore, there will be 
some optimum value of the spacing and width that will give the best delay performance 
under some area and/or power constraints.  
6.3 Repeater Insertion 
The delay of the long interconnect can be reduced by inserting repeaters at appropriate 
locations along its length, thus dividing it into small sections. For such a system, the delay 
of each section can be approximated by [135] 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
114 
 
<x>-  0.7=FQx  =FQ  HC  2-
  0.4x  ÓC  -  0.7=FQ
                     6.5
 
where the coefficients HC and ÓC are given in Table 6.1. =FQ and =FQ are driver output 
resistance and capacitance respectively. 
The total delay of an interconnect of length  is given by 
<E  µ S0.7 =FQq¶ #xµ  ¶=FQq  HC 2-µ $  µ #0.4xµ  ÓC -µ  0.7¶=FQq$\  <F2   6.6
 
where =FQq  and =FQq  are the output resistance and capacitance of a minimum sized 
repeater, ¶ is the size of the repeaters and µ is the number of repeaters inserted in the 
interconnect. <F refers to the rise time of the signal. The optimal values of ¶ and µ are 
obtained by taking the partial derivative of equation (6.6) with respect to µ and ¶ and 
equating it to zero 
<Eµ  0        µ7M8  0.4x  ÓC-0.7=FQq=FQq                                                        6.7
 
<E¶  0         ¶7M8  0.7=FQqx  1.4HC=FQq-0.7=FQq                                 6.8
 
 
Figure 6.8: Optimum number of repeaters for minimum interconnect delay for different lengths of the global 
interconnect plotted as a function of the interconnect width. The interconnect is of 13 nm technology and the 
spacing between interconnects is Smin. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
115 
 
The optimal value of the delay can be obtained by using the value of ¶7M8 and µ7M8 in 
equation (6.6). For different interconnect lengths, the optimum number of required 
repeaters are plotted in Figure 6.8 as a function of the line width. It is shown that the 
number of repeaters which minimizes the propagation delay of the signals decreases with 
the increase of the line width for all lengths of the interconnect. The results also show that 
the maximum line length for an interconnect of width=25Wmin, which requires no repeater 
or only one driver is 0.152 mm. Therefore for typical interconnect lengths, large number of 
repeaters are required for optimum signalling (particularly as chip sizes are increasing). 
As we increase the interconnect width for faster signalling, the line capacitance per unit 
length increases. Although fewer repeaters are required to drive wider lines, each repeater 
will have to drive a larger section of the interconnect. Therefore, in order to drive large 
interconnect sections of greater width, the repeaters will have to drive large capacitances. 
So the repeaters will be of large size to reduce the overall delay. Figure 6.9 clearly shows 
that the repeater size for optimum delay is an increasing function of the interconnect width. 
h
o
p
t (
s
m
in
)
 
Figure 6.9: Optimum repeater size for minimum interconnect delay for different interconnect widths (global 
interconnect) for 13 nm technology. The spacing between interconnects is Smin. 
6.4 Bandwidth Estimation 
If  is the minimum pulse width that can be transmitted through the channel interconnects 
and correctly registered at the receiving register, then the bandwidth of a single 
interconnect is given by 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
116 
 
|ýCF>  1                                                                     6.9
 
If <F is the rise time of the signal from 10% to 90%, then the duration of a good signal is at 
least 3<F [136]. The rise time of the signal can be approximated from the RC time constant R as <F  2.2R [6]. Since 0-50% time <%.2  0.69R, therefore <F  3.188<%.2. and  u9<%.2. The pulse width of the signals in the bus interconnects will then be M,C=  9C=                                                             6.10
 
M,7¹8>F  97¹8>F                                                             6.11
 
For  conductors in the channel bus, the total bandwidth is given by 
|878@?   L 2M,C=  2M,7¹8>F                                           6.12
 
Equation (6.4) shows that the outer edge wires will offer less delay as compared to the 
middle wires and thus can give larger bandwidth. However, when a complete data word is 
transmitted over all the lines, the early arrival of the data bits travelling on the outer edge 
lines may not be very beneficial until the complete word is registered at the receiver (or 
complex receivers will be required). Therefore, we will estimate the worst case bandwidth 
due the middle wires.  
|878@?  max ðM,C= , M,7¹8>Fñ  M,C=      
 
Figure 6.10: Data rate per wire of a channel bus in 13nm technology plotted as a function of spacing and 
width. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
117 
 
Figure 6.10 shows the possible data rate per wire for a 13nm technology bus plotted as a 
function of the interconnect spacing and width without using repeaters. The plot shows that 
the bandwidth per wire can be increased by increasing wire width and/or spacing. 
6.4.1 Bandwidth as a Function of Length 
It is obvious that the interconnect delay increases with length (with and without the use of 
repeaters). This will directly impact the bandwidth. Repeater inserted interconnects provide 
more bandwidth as compared to interconnects without repeaters, as shown in Figure 6.11. 
The maximum allowed interconnect length corresponding to some desired bandwidth, with 
and without the use of repeaters, is plotted in Figure 6.11. The use of repeaters is more 
beneficial for the bandwidth at larger interconnect lengths. The curves also show that the 
interconnect become slower with technology scaling and provides reduced bandwidth for 
the same length. 
 
Figure 6.11: Maximum allowed interconnect length for a particular bandwidth with and without the use of 
repeaters for the given three technologies. These curves have been plotted for minimum interconnect width 
and spacing. 
6.5 Channel Performance under Variability 
In practical circuits, the performance of the communication links is always affected by the 
device and interconnect variability. Similarly due to the capacitive coupling, the switching 
activity in the neighbouring interconnects affects the delay characteristics of an 
interconnect (crosstalk effects). Therefore, in order to make a realistic estimate of the 
channel performance, both these effects should be considered in the analysis.  
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
118 
 
In order to study the worst case due to the crosstalk effect on the victim line, we consider 
Case 1 in section 6.2.3 where both the neighbouring aggressor lines switch simultaneously 
in opposite direction with respect to the victim line. This will slow down the victim line 
and thus will reduce its bandwidth. The delay equation (6.6) will then be modified 
accordingly using the appropriate coefficients from Table 6.1 corresponding to Case 1. 
6.5.1 Sensitivity Analysis of the Delay under Variability  
The uncertainties in the communication structures (drivers, interconnects, repeaters FFs) 
introduce uncertainty in the delay characteristics of the interconnect-buffer system. We will 
study the impact of process variations in the interconnect and statistical device variations on 
the delay performance of the link. The variation in the width, spacing, thickness and ILD 
thickness are taken into consideration. It is assumed that every part of bus wires is 
uniformly fluctuated. The primary interconnect parameters have been extracted from the 
ITRS2007 [50] and are given in Table 6.2 along with some device parameters. Since the 
actual levels of interconnect variability are not available from the manufacturing industry 
for the future technology generations, we assume three cases of the interconnect variability 
in which the 3 percentage variation for the given dimensions of the interconnect are kept 
at 5%, 10% and 15% of their mean value corresponding to case 1, 2 and 3 respectively. We 
also assume that the variation in these parameters follow Gaussian distribution. 
Table 6.2: Primary interconnect and device parameters based on the ITRS and the device model cards [76], 
[77]. The device parameters are for the uniformly doped devices. 
Technology Generation/ Parameters 25nm 18nm 13nm CD 	
 105 67.5 48 bCD 	
 210 135 96 v/ 2.3 2.4 2.5  	
 241.5 162 120 	
 241.5 162 120 F  2.5 2.3 2.1 10V4HΩ. 
 2.2 2.2 2.2 gx Ω
 18487 21166 23936 %k
 0.1436 0.086592 0.046315 Mk
 0.0425 0.083741 0.029071 ¶ iU Ë< g<	 &
 310 310 310 J==J
 1.1 1.0 0.9 
The interconnect capacitance and resistance are not statistically independent. Figure 7.12 
shows the relation between interconnect resistance and capacitance for 5% thickness 
variation in the global interconnect of minimum width. Similarly, Figure 7.13 shows the 
similar graph for a variation of 5% in the width and thickness. Both these plots show that 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
119 
 
with the increase of width and thickness, interconnect resistance decreases but capacitance 
increases. 
 
Figure 6.12: Scatter plot of interconnect resistance and capacitance with thickness variation of 3σ=5% in a 
13nm technology interconnect of 1mm length. 
 
Figure 6.13: Scatter plot of interconnect resistance and capacitance with width and thickness variation of 
3σ=5% in a 13nm technology interconnect of length 1mm. 
The variability in the geometrical dimensions of the interconnect and repeaters affect the 
delay in different proportions and a comparison is shown in Figure 6.14. In the plot, the 
impact of variation for 3=5% in W, S, T and H (separately and all variations together) in 
the interconnect and due to RDF in the repeaters, on the delay of an interconnect of 
minimum dimensions has been shown in the form of a bar chart. The interconnect with and 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
120 
 
without the use of repeaters have been considered. The results have been obtained by first 
transforming the interconnect geometrical variations into the electrical variations (using the 
analytical models) and then modelling and simulating the interconnect in HSPICE using 
MC simulations. From the results, it can be clearly inferred that the interconnect delay is 
more sensitive to width and thickness variation. The effect of RDF is least as compared to 
other sources of variation due to large size of the delay optimal repeaters. It may also be 
noted that interconnects without the use of repeaters are more vulnerable to delay 
variability. Moreover, a small variation in all interconnect parameters together can 
introduce significant variability in the delay.  
W S T H RDF ALL
0
2
4
6
8
10
12
14
16
18
20
Without Repeaters
With Repeaters
 
Figure 6.14: Contribution of different parametric variations on the delay of a bus line of length 1mm of 
minimum width and spacing in 13nm technology. 
6.6 Area Constrained Channel Bandwidth 
On chip area is a precious resource and is not freely available. Therefore, on-chip 
communication channels are also designed with optimum use of area. During floor-
planning, a fixed area is allocated to each link and a particular number of lines are fitted 
into this area. In order to minimize the effects of capacitive coupling, shielding wires are 
also used along the signal wires. The shielding wires are normally used with minimum 
width as permitted by the technology generation, independent of the size of the signal 
wires. In this way, an effective shielding against RC coupling can be achieved with 
minimum area consumption. 
Let -  be the channel width and  be the number of lines, each having width  and 
interspacing (. Then the constraints relating these quantities are approximated by [137] for 
the shielded and unshielded wires respectively. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
121 
 
-     L 1
(                                                             6.13
 -  xCD@?   L 1
2(  xC>?=
                           6.14
 
In the following sections we will explore the impact of variability on channel performance 
under fixed channel width -  for the channel without shielded wires. 
6.6.1 Experimental Setup and Simulation Results 
We consider a channel bus consisting of 128 lines connecting two cores or NoC routers. 
The physical width of the channel is assumed to be -  128  bCD, where bCD is the 
minimum allowable pitch in the given technology generation. The wires have been 
considered as parallel global copper traces placed over a ground plane. Interconnect 
geometrical and material parameters have been extracted from the ITRS2007 and given in 
Table 6.2 along with the device parameters. The interconnects have been designed with 
and without the use of repeaters. The variability in the devices due to RDF and due to 
variations in the width, spacing, thickness and ILD thickness of interconnects has been 
considered. Again we consider the following three cases of interconnect variability: 
Case 1: 3ý  5%,  3x  5%, 36  2%, 3  5%, 3K>Q  vi Ë<Ë 	 UUi 
Case 2: 3ý  10% ,  3x  10%, 36  2%, 3  10%, 3K>Q  vi Ë<Ë 	 UUi 
Case 3: 3ý  15%,  3x  15%, 36  2%, 3  15%, 3K>Q  vi Ë<Ë 	 UUi 
These values are with respect to the minimum interconnect dimensions in the 
corresponding technology. It is also assumed that variability follows Gaussian distribution. 
The repeaters have been constructed using the model card libraries with RDF effects. The 
bus length is taken to be 1mm in this study and worst case crosstalk effects (aggressor lines 
switch in opposite direction with reference to the victim line) are considered. 
The objective of this study is to explore the channel configuration which gives optimum 
performance under the impact of variability and its relation with the power and area. For 
this purpose several experiments were designed and extensive Monte Carlo simulations 
performed to get the results. The circuit netlists were generated and HSPICE simulations 
were performed until convergence (~6000 simulations in each case). In order to simulate 
the distributed nature of interconnects, each wire has been made up of 250 ladder 
segments. 
6.6.2 Results 
Here we present results for Case 1 of variability for 13nm technology, as the results for the 
other cases are similar. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
122 
 
6.6.2.1 Delay 
The mean delay, the standard deviation and delay variability of the channel bus at different 
values of the interconnect width and spacing are shown in Figure 6.15, 6.16 and 6.17 
respectively. The actual data is given in Tables A.1, A.2 and A.3 respectively and placed in 
the Appendix-A. As expected, the delay decreases both with the increase of the 
interconnect spacing and width. However, increasing interconnect width is more beneficial 
as compared to spacing in order to improve delay performance under the same channel 
width. In the same way, the standard deviation and delay variability decreases more rapidly 
with the increase of the interconnect width than the spacing. 
 
Figure 6.15: Mean delay (in picoseconds) of interconnects (without repeaters) in the channel bus of 13 nm for 
different geometrical configurations under variability Case 1.  
 
Figure 6.16: The standard deviation (in picoseconds) of the delay of interconnects (without repeaters) in the 
channel bus of 13nm for different geometrical configurations under variability Case 1. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
123 
 
 
Figure 6.17: Delay variability (%) of interconnects (without repeaters) in the channel bus of 13nm for 
different geometrical configurations under variability Case 1. 
The simulations were also performed to find the performance of the channel inserted with 
optimal repeaters. The size and number of repeaters depends on the geometrical 
dimensions of the interconnect (width, spacing, etc) and the parameters of minimum sized 
repeater in a given technology. For 13 nm technology, the number and size of the repeaters 
per unit length of the interconnect is shown in Figure 6.18 and 6.19, respectively. The 
corresponding data is given in Tables A.4 and A.5 respectively. 
 
Figure 6.18: The number of repeaters per unit length required for different interconnect dimensions (width 
and spacing) for a 13 nm bus under worst crosstalk. The numbers have been rounded-off. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
124 
 
The mean delay, the standard deviation and delay variability have been measured for 
different configurations of the bus inserted with repeaters and the results are shown in 
Figure 6.20, 6.21 and 6.22 respectively. The corresponding data is given in Table A.6, A.7 
and A.8 respectively. The results show that the delay of interconnects improves with the 
insertion of the repeaters, as expected. More importantly, the delay variability also 
decreases as compared to the case when repeaters are not used. 
 
Figure 6.19: The size of the repeaters for different interconnect dimensions (width and spacing) for a 13 nm 
bus under worst crosstalk. The repeater sizes have been rounded-off. 
 
Figure 6.20: Mean delay (in picoseconds) of interconnects (with repeaters) in the channel bus of 13nm for 
different geometrical configurations under variability Case 1. 
 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
125 
 
 
Figure 6.21: The standard deviation (in picoseconds) of the delay of interconnects (with repeaters) in the 
channel bus. 
 
Figure 6.22: Delay variability (%) of interconnects (with repeaters) in the channel bus of 13nm for different 
geometrical configurations under variability Case 1. 
6.6.2.2 Bandwidth 
Using the data of Table A.1 and A.6, the bandwidth of the individual lines of the bus has 
been calculated with and without repeaters and results are shown in Figure 6.23 and 6.24 
respectively. The corresponding data is given in Table A.9 and A.10 respectively. The 
results clearly show that the bandwidth can be increased by increasing the width of the 
interconnect and/or by increasing the spacing between interconnects. Moreover, the 
insertion of repeaters further increases the bandwidth. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
126 
 
 
Figure 6.23: Bandwidth of the individual interconnect lines (without repeaters) in Gb/s given as a function of 
the interconnect width and spacing for 13 nm. 
 
Figure 6.24: Bandwidth of the individual interconnect lines (with repeaters) in Gb/s given as a function of the 
interconnect width and spacing for 13 nm. 
For a channel link, it is important to determine the total bandwidth which it can support. In 
order to meet high bandwidth requirements under un-constrained area, the configuration of 
the bus interconnects which gives maximum bandwidth of the individual lines (large value 
of W and S) is used to get the maximum total bandwidth through a particular channel 
width (no. of lines). But this may occupy sufficiently large chip area. However in the 
actual designs, only a limited area budget is allocated for the channel links. Therefore, in 
this situation the bandwidth will be less than the unconstrained area case. Hence some sort 
of optimization is required to obtain the best possible bandwidth within the available area 
budget. During this optimization process, the delay variability as well as the power 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
127 
 
dissipation is required to be considered, because these quantities may become worse while 
looking for a configuration which gives best bandwidth. 
In this study, we have explored the geometrical space of interconnects which gives 
optimum total bandwidth under a channel area constraint. The total bandwidth has been 
calculated using equation (6.12), where the value of  has been computed from equation 
(6.13) for different values of  and (. The results are shown in Figure 6.25 (without 
repeaters case) and Figure 6.26 (with repeaters case) and corresponding data is given in 
Table A.11 and Table A.12.  
 
Figure 6.25: Total bandwidth (Gb/s), without repeaters, plotted as a function of interconnect width and 
spacing. 
 
Figure 6.26: Total bandwidth (Gb/s), with repeaters, plotted as a function of interconnect width and spacing. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
128 
 
From the results, it can be seen that there is a clear optimum point which gives the 
maximum total bandwidth. For the channel bus with no repeaters used, this point 
corresponds to   4CD and (  2(CD. Similarly, for the channel bus when repeaters 
are used, the optimum bandwidth is achieved at (  (CD and   CD . 
6.6.2.3 Power Dissipation 
Total power dissipation (switching) in the channel bus for maximum throughput in each of 
the bus configuration is given in Table A.13 (for the bus without repeaters) and in Table 
A.14 for the bus with repeaters. The results are shown in Figure 6.27 and 6.28. The power 
dissipation increases with the increase of the interconnect width due to increased wire 
capacitance. From the results, the additional power dissipation in the repeaters may also be 
observed. It is important to mention that this power dissipation is corresponding to the 
maximum bandwidth of the channel. 
 
Figure 6.27: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm technology 
without repeaters. 
The cost of data transfer in terms of power consumption is measured as the total bandwidth 
per unit power and is shown in Figure 6.29 (see Table A.15) for the repeater inserted case. 
It can be inferred from the results that transferring data from one point to the other through 
widely spaced interconnects is cheaper in terms of power consumption. This cost is 
different for different channel configurations. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
129 
 
 
Figure 6.28: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm technology with 
repeaters. 
 
Figure 6.29: Total bandwidth per unit power (Gb/s.mW) consumption for interconnects with repeaters. 
6.6.2.4 Area 
The area consumed by interconnects and repeaters in the channel bus is given by 
vgUËd ýCF>?CD>x                                                     6.15
 vgUË  ýCF>?CD>x   7M8(7M8>GG(CD
?CD>x                              6.16
 
where  vgUËd= Total area when repeaters are not used, 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
130 
 
vgUË= Total area when repeaters are used, ýCF>  Wire width, = Bus length, ?CD>x= No. of interconnect lines in the channel, 7M8=No. of optimal repeaters per unit interconnect length, (7M8= Size of the optimal repeaters, >GG= Effective gate length, iCD= Width of a minimum sized repeater. 
 
Figure 6.30: Surface plot of the area consumed by the channel bus interconnects, with and without repeaters. 
Figure 6.30 shows that maximum area is required when we use wider wires at minimum 
spacing. The area required with repeater insertion is larger than the case when no repeaters 
are used. However, the major portion of the area is consumed by the wires. Figure 6.30 
may be compared with Figures 6.15 and 6.20 to see the relation between performance and 
area cost. 
6.7 Optimization under Different Trade-offs 
An ideal data channel is expected to give the maximum bandwidth, small latency per unit 
length and minimum uncertainty in the arrival times of the signals at the receiver with 
minimum area and power costs. However, there are trade-offs between delay performance, 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
131 
 
bandwidth, area and power. Therefore the aim of an optimization study can be the 
maximization of one or more parameters.  
We define a figure of merit  to achieve the most desired objectives 
  |6 6  b  v  J                                                             6.17
 
Where  is the delay, J is the delay variability, b is the power dissipation and v is the 
area. For the repeater inserted interconnect, the figure of merit  is shown in figure 6.31. 
Again one can find an optimum interconnect configuration for maximum figure of merit. 
For instance, for the channel configuration under consideration, the optimum value of  is 
corresponding to (  4(CD  and   5CD . 
 
Figure 6.31: The figure of merit  plotted as a function of spacing and width for the repeater inserted 
interconnect. 
6.8 Failure of Channels under Variability 
During the optimization of the channel, the magnitude of the delay variability should also 
be considered in conjunction with other parameters like delay, power and area. In a 
sequential channel link, the data from the transmitter moves through the interconnect lines 
to the receiver simultaneously with a common synchronous clock, as shown in Figure 6.32. 
As we have seen, variability in the devices and interconnects introduces delay variability; 
this will produce data skew at the receiving end of the channel. The skew beyond a certain 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
132 
 
acceptable limit can result in data loss. This may also result in timing failures, as the data 
may not be properly latched at the receiving register. 
Let x>8¹Mr  be the setup time of the  L <¶ flip-flop, ýCF>r  be the delay of the  L <¶ 
interconnect line, the clock frequency keEl  and eEl  be the clock period. 
b1
b2
b3
bN-1
bN
CLK
bi
 
Figure 6.32: A multi-bit communication link. Tapered buffers have been used on the transmission side, 
whereas flip-flop registers have been used at the receiving end. 
For proper latching of the data bit, the following delay constraint must be satisfied 
0 9 ýCF>r 9 eEl L x>8¹Mr                                                  6.18
 
The probability of correct data transmission can, therefore, be expressed as follows 
º  PrN0 9 ýCF>r 9 eEl L x>8¹MrO                                    6.19
 
Since ýCF>r , eEl  and x>8¹Mr  are random variables, therefore æ  ýCF>r  x>8¹Mr  LeEl
 will also be a random variable with a p.d.f bæ
  bNýCF>rO · bNx>8¹MrO ·bLeEl
, where (*) is the convolution operator. If the flip-flops used in the receiving 
register are of large size, the timing distribution of the setup time will be Normal. 
Similarly, due to sufficiently large size of the optimal repeaters (see Table A.4), the delay 
distribution of the repeaters will also be normal. Also the delay distribution of 
interconnects under variability is assumed to be Normal. Then 
H!  HýCF>  Hx>8¹M L HeEl                                                6.20
 
!&  ýCF>&  x>8¹M&  eEl&                                                6.21
 
and the probability of correct data transmission is given by the error function [20] 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
133 
 
º  12  erf #H!!$                                                      6.22
 
where 
erfÊ
  1√2Í exp L <&2I% 
<                                        6.23
  
The probability of failure for one data bit transmitted through the interconnect is given by 
bxCD?>  1L º                                                          6.24
 
In a multi-bit link consisting of  channel lines, if the signal timing does not meet the 
target value in one or more lines, the communication link fails. So the probability of failure 
in such a link is given by 
b¹?8CM?>  1 L ºo                                              6.25
 
As we have seen that the magnitude of the delay variability also depends upon the channel 
configuration (width, spacing), this will directly impact the link failure probability; 
otherwise the operating frequency of the link will have to be reduced. The maximum 
frequency at which a link can operate depends upon the delay of interconnects. Tables A.6 
gives the delay and A.7 gives the standard deviation of the associated delay variability as a 
function of the interconnect width and spacing. At the receiving end of the channel, flip-
flop registers have been used having setup time Hx>8¹M  12.1i and x>8¹M  0.15i. 
The probability of failure (PoF) has been calculated using equation (6.24) at 5% below the 
maximum possible frequency of the link with a particular geometrical configuration and 
results are given in Table A.16. The results show that b is highest for S=1X and W=1X 
due to large variability in this configuration and decreases with the increase of width 
and/or spacing. The operating frequency of the link and b depends on the delay and 
delay variability as shown in Figure 6.33. 
As the channel width increases, the b increases and is governed by equation (6.25). In 
an area constrained channel, the b is given in Table A.17 using equation (6.25) while 
considering the possible number of lines in the given area. The results show that b is 
extremely large for the wider links. Therefore, while optimizing a channel for any of the 
parameters, the b should also be considered in the figure of merit. Otherwise, the 
channel speed and hence performance frequency will not be met. This is obviously 
undesirable for high performance designs. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
134 
 
 
Figure 6.33: Probability of link failure as a function of operating frequency. 
6.9. Channel Serialization 
The channel width (bit-width) determines the size of the physical transfer unit (phit) or 
vice versa. The data packet is accordingly divided into smaller units and transmitted 
through the on-chip communication network. If the bit-width of a processing unit (PU) is 
larger than the phit size of the channel, some sort of serialization will be required by the 
factor of: 
U"gUU k (UgËË<	  t #⁄ Ì<<¶¶< iU  
The throughput is the average rate of successful data transmission over a communication 
channel. The throughput is usually less than the bandwidth; which is the maximum 
capacity of a channel. In a throughput centric design, the channel can be designed in such a 
way that the desired throughput requirements can be achieved at optimum power and area 
consumption. In this section, we will investigate the effect of channel serialization on 
throughput, area and power consumption. 
6.9.1 Concept 
The power dissipation in a repeater-interconnect system is given by 
b>MVnD8  bxýC8-CD  bx7F8V-CF-¹C8  b?>@f@>                               6.26
 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
135 
 
The switching power is the most dominant component of power dissipation and strongly 
depends on the interconnect capacitance (along with the size and input/output capacitances 
of the driver) and is governed by the following expressions 
bxýC8-CDVýF   x  2-
J==& k-?f  ?CD>x                                 6.27
 
bxýC8-CDVF>M  ` iNM  7O  x  2-
J==& k-?f  µ7M8  ?CD>x          6.28
 
where 
bxýC8-CDVýF= switching power of the bus without repeaters, bxýC8-CDVF>M= switching power of the bus with repeaters, 7= input capacitance of the repeater, M  output parasitic capacitance of the repeater, x= self capacitance per unit length of the interconnect, -= coupling capacitance per unit length of the interconnect, = interconnect length between repeaters, = total interconnect length, µ7M8=number of optimal repeaters per unit length, `=switching activity, k-?f=clock frequency, ?CD>x= number of lines in the bus. 
Equations (6.27) & (6.28) dictate that in order to reduce bus power, the coupling 
capacitance (principal component of the bus capacitance) should be reduced. This is 
possible by increasing the spacing between interconnects and so the bit-width will have to 
reduce in area-constrained design. This motivates to use Serial links. 
6.9.2 Channel Structure 
The conceptual diagram of a serial data channel is given in Figure 6.34. Multi-bit parallel 
data (having U-bits) from the computational unit (or a router in NoC) is transformed into 
the serial data (having V-bits) using a special unit called the Serializer. The degree of 
serialization is defined as Ó  {/J. The serial data moves through interconnects which are 
widely spaced as compared to the parallel case. The serial data before entering the receiver 
is converted back into U-bits of parallel data through a special unit called De-serializer. In 
this way, the serializer and de-serializer provide an interface between the computational 
units and the link. The serializer is based on a chain of multiplexers in conjunction with 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
136 
 
flip-flops as shown in Figure 6.35. A more intelligent serializer and deserializer (SerDes) is 
shown in Figure 6.36. 
 
Figure 6.34: Structure of a semi-serial communication channel. 
 
EN
D Q
EN
D Q
EN
D Q
EN
D Q
EN
D Q
EN
D Q
EN
D Q
EN
D Q
Q0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
CLK
Enable
INT1
INT2
DE-SERIALIZER
 
Figure 6.35: Conventional shift-register type SerDes. 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
137 
 
QS3
D3
0
1 DE
QS2
D2
0
1 DE
QS1
D1
0
1 DE
QS0
D0
0
1 DE
QP
0
1 DE
VDD
(pilot signal)MUXP MUX0
SOUT
0
1 0
BUF EN
1
Q3
DE
1
Q2
DE
1
Q1
DE0 0
1
Q0
DE0
1
0
STOP
D Q
EN
ENb
QS
Small size
Dummy driver for
load balancing
STOPb
STOP
Feedback for latching
 
Figure 6.36: Wave front train Serializer and Deserializer [138]. 
The throughput of the parallel link M@F
 and serial link x>F
 can be calculated as 
follows M@F  kM@F  ÓJ                                                            6.29
 x>F  kx>F  J                                                               6.30
 
To obtain the same throughput from the serial link as that of the parallel link kx>F  ÓkM@F                                                                      6.31
 
Therefore the serial bus will have to operate Ó times faster than the parallel bus. 
The total power dissipation in a parallel and serial link is given by bM@FV?CDf  b=FCQ>Fx  bF>M                                           6.32
 bx>FV?CDf  b=FCQ>Fx  bF>M  bc>FK>x                          6.32
 
Note that the power dissipation in parallel links does not include the power dissipation in 
the SerDes (Serializer-Deserializer). The power dissipation in the repeaters, drivers and 
SerDes is mainly due to the switching and leakage power. The short-circuit power has a 
relatively less contribution in the total power during bus operation and therefore can be 
neglected. 
Using equation (6.28), the switching power dissipation in a repeater inserted parallel link is 
given by  
bxýC8-CDVM@F  ` iNM  7O  NxVM@F  2-VM@FOJ==& kM@F  µ7M8 {       6.33
  
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
138 
 
bxýC8-CDVx>F  ` iNM  7O  xVx>F  2-Vx>F
 J==& kx>F  µ7M8  J           6.34
  
Equation (6.33) and (6.34) show that the switching power of a serial bus is less than the 
parallel bus by a factor -VM@F/-Vx>F
. 
6.9.3 Experimental Results 
The performance of the parallel and serial buses constrained in width -  for the same 
throughput has been calculated and results are given in Table 6.3. The results show that for Ó  2, (W=Wmin, S=3Smin) the power dissipation decreases by 55.21% and 47.05% for the 
bus with and without repeaters respectively. Excluding the area of SerDes, the area of the 
serial bus is 34.57% and 33.21% less than the area of the corresponding parallel buses. 
Although interconnect spacing is a weak function of delay variability, the serial bus has 
less variability effects as compared to the parallel bus. Similarly, a serial bus will also be 
less vulnerable to the crosstalk effects due to increased interconnect spacing. Additional 
advantages of serial links are the minimization of skew between different lines of the link 
due to the reduced number of wires. The operational duty of a parallel link is less than a 
serial link and therefore leakage power becomes a significant portion of the total power in 
parallel links. Again, a serial bus reduces leakage power.  
Table 6.3: Performance of a parallel and a serial bus of degree 2 for the same throughput 
Parameters Parallel Bus Serial bus of degree 2 
 Without 
Repeaters 
With 
Repeaters 
Without 
Repeaters 
With 
Repeaters 
Interconnect Width 1(CD 1(CD 1(CD 1(CD 
Interconnect Spacing H
  1(CD 1(CD 3(CD 3(CD 
Number of Interconnects 128 128 64 64 
Throughput (Gb/s) 66.23 154.8 66.23 154.8 
Frequency (GHz) 0.5174 1.2093 1.0349 2.4194 
Power Dissipation (Watt) 0.008499 0.014289 0.0045 0.0064 
Area (mm2) 0.02511 0.05229 0.01677 0.03421 
Delay Variability (%) 18.16 14.35 10.13 7.39 
By considering all possible geometrical configurations of the bus (space spanned by W and 
S), we can explore different possibilities which can give best performance for a particular 
parameter and accordingly the serialization degree may be ascertained. The extreme case 
of serialization is the conversion of a multi-bit link to a single wire link. For instance, a 
serialization degree of 1, 1.5, 2.0, …,4.5 can be obtained either by increasing interconnect 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
139 
 
spacing from 1Smin to 8Smin (keeping width constant at Wmin) or by increasing the width 
from 1Wmin to 8Wmin (keeping spacing constant at Smin). Now if we want to operate the link 
at a bandwidth of 87.9 Gb/s, the channel performance in the two cases will be different as 
shown in Figure 6. 37. The results show that the serial bus using wide interconnects is 
efficient in terms of signal speed and delay variability and inefficient in terms of power and 
area, as compared to the bus with widely spaced interconnects. Also observe the reduction 
in bandwidth capacity in the two cases. Therefore, depending upon the metrics of interest 
and constraints, the channel configuration for serialization can be selected. 
       
(a) 
       
(b) 
Figure 6.37: Different performance metrics for a bus with different serialization ratios (1, 1.5, 2.0…, 4.5 
corresponding to S= 1Smin to 8Smin or W= 1Wmin to 8Wmin), (a) by increasing spacing and keeping width 
constant, (b) by increasing width and keeping spacing constant. 
6.10 Link Utilization and Power Dissipation 
As we have already seen that leakage power is increasing significantly with technology 
scaling, especially in the circuits where the activity level is low. In a SoC and NoC, 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
140 
 
different link types with various utilization rates are used which can be as low as few 
percent [121]. It has been reported that average activity level of microprocessor nets is 
4.5% [123], however some links may operate at higher utilization rates approaching 100%.  
In order to investigate the impact of link utilization on the total power consumption, we 
have considered two types of links (S=Smin, W=Wmin and S=Smin, W=5Wmin) and 
contribution of leakage power in the total power dissipation has been measured 
corresponding to different link utilization rates. The results are shown in Figure 6.38. It can 
be seen that contribution of the leakage power in the total power dissipation increases as 
the link utilization rates reduce. The leakage power becomes the dominant source of power 
dissipation at very low link utilization rates. 
 
Figure 6.38: Leakage power normalized with the total power for different link utilization rates. 
NoC links are designed to operate at low utilization rates in order to meet the stringent 
requirements for latency. Moreover the links with higher bandwidth capacity are used to 
reduce packet collisions [146]. For such designs, leakage power may become a critical 
design parameter and therefore, a careful consideration of all the performance parameters 
will help to achieve better optimization. 
6.11 Summary 
In this chapter we have discussed the performance of multibit links under the impact of 
variability. We started with the modelling of interconnects in DSM region and then 
simulation results for the delay and power measurement have been presented. From these 
Chapter 6                                                         Design of Variability Tolerant Data Channels 
 
141 
 
results several plots for the delay, delay variability, bandwidth and power dissipation have 
been presented. A figure of merit has been introduced for the optimization of channel 
performance under delay, power, area, and variability constraints. Then the failure of 
channels under variability has been discussed. In the end, it has been shown that channel 
serialization is an attractive approach for power, area and variability efficient designs for 
throughput centric systems. Moreover, it has been demonstrated that leakage power 
becomes an important component of power dissipation for the links operating at low 
activity levels. Therefore, this consideration may also be very beneficial for power-
efficient link designs. 
 
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
142 
 
 
 
 
 
 
 
Chapter 7 
 
Crosstalk in Coupled Interconnects 
 
 
7.1 Introduction 
Coupling capacitances have increased due to reduced interspacing and larger aspect ratios 
of wires in progressive DSM technologies. The technology scaling results in the increased 
dominance of coupling capacitance and it can be as high as 80% of the total wire 
capacitance [139]. 
The technology scaling has also pushed the signal frequencies to the gigahertz region and 
at such high speeds the transmission line effects such as crosstalk, distortion and reflection 
are becoming evident. Crosstalk represents the situation when a neighbouring wire 
unintentionally affects the performance of another wire through electromagnetic field 
interaction. It occurs due to coupling between the neighbouring wires and can be classified 
into functional noise and delay noise. Functional noise refers to a fluctuation in the signal 
state of a quiet wire (non-switching) due to switching in the neighbouring wire. This noise 
produces a glitch that may propagate through the interconnect to the dynamic node or a 
latch and may tend to change the signal state. Excessive noise will change the signal state 
and will result in circuit malfunction depending on the noise margin available. Crosstalk 
can also cause variation in the delay of signals depending on the phases of the aggressor 
and victim line signals. On a chip, an interconnect may have multiple couplings with 
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
143 
 
neighbouring wires and simultaneous switching on these wires will effect propagation 
delays, thereby resulting in delay variations [55], referred to as delay noise. The delay 
noise (variations) may result in timing failures. The delay noise is contributing a significant 
fraction of the circuit delay [140]. Therefore, crosstalk effects need serious considerations 
during the design process, otherwise, the system will suffer from performance degradation 
or even system failure. 
In actual circuits there are equal chances that the signal transitions on the victim and 
aggressor lines appear simultaneously or with some skew. Similarly, process variations in 
the circuits are translated into delay variations resulting in the introduction of skew at the 
input of aggressor and victim drivers. It has been observed that the amount of the delay 
noise on the victim line depends on the victim-aggressor skew [140]. This will cause delay 
variability at the receiver. Under these situations, signal delay noise and crosstalk are 
seriously affecting the performance of high performance designs. Accurate estimation of 
these effects is necessary for the design of high performance systems otherwise the 
designers will have to go through the extra design iterations which are computationally and 
time wise expensive [28]. 
In the past, many researchers have published crosstalk analysis models and algorithms 
[28]-[30], [141] but all of them either require numerical techniques to solve them or do not 
give sufficient insight into the underlying crosstalk effects on signal responses. Therefore, 
we present closed form expressions that give accurate voltages for the aggressor and victim 
lines in time domain, as a function of wire length, due to switching transitions on them. 
Extension to this work is continued to derive analytical expressions in order to determine 
the conditions that gives maximum crosstalk effects under the impact of variability. 
7.2 Coupled RC Transmission Lines 
Consider a coupled RC transmission line consisting of two signal conductors and a ground 
line with distributed RC parameters amongst them. A lumped element representation is 
shown in Figure 7.1, where the capacitance (x, -) are the self and coupling capacitance 
(per unit length), and the resistance R is the series resistance per unit length for each line. 
We are interested in determining the transient behavior of the system when the lines are 
driven by a unit step input at the source (x=0), corresponding to a high/low or low/high 
transition in any combination. In real digital systems, transition of the line drivers do not 
occur concurrently, but rather the transitions are mutually delayed by a short time æ, called 
the skew of the lines. The skew is not maintained constant throughout the line, but rather it 
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
144 
 
is increased or decreased depending on specific conditions. The aim of this study is to 
determine those conditions and to quantify the amount of passive skew amplification or 
reduction in such a system (defined as the ratio between the input and output skew). As a 
first step towards this objective, an accurate crosstalk model has been developed that can 
be used to determine those conditions. Here we will skip the derivation and present only 
the final results of this model developed by F. Rodriguez [142].  
 
Figure 7.1: Coupled RC transmission line model with distributed RC parameters. 
7.2.1 Voltage Representation 
The voltage on the victim line as a function of the interconnect length and time is given 
below for up-up and up-down transitions 
$¹¹Ê, <
  12 H<
 T)Ê, <
  &Ê, <
)Ê, <
 L &Ê, <
5  12 H< L æ
 T)Ê, < L æ
 L &Ê, < L æ
)Ê, < L æ
  &Ê, < L æ
5 7.1
 
$¹=Ê, <
  12 H<
 T )Ê, <
  &Ê, <
2  )Ê, <
 L &Ê, <
5
L 12 H< L æ
 T)Ê, < L æ
 L &Ê, < L æ
)Ê, < L æ
  &Ê, < L æ
5                                               7.2
 
and describe the signals in the wires due to a skewed input. Notice the response is formed 
by two functions which act at different times. When the input to the aggressor line is turned 
on at t = 0, a transitory waveform, )Ê, <
 L &Ê, <
 is induced in the victim line. 
Similarly, the switching in the victim line induces a transient response in the aggressor line 
whose magnitude is |)Ê, <
 L &Ê, <
|, when it is switched at <  æ. In both cases, the 
steady state solution is started in each line when its corresponding input switches. 
) and & used in expression (7.1) and (7.2) can be calculated for the following two cases: 
7.2.1.1 Finite Line with Open end 
For the finite line with open end, the vector % is given by 
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
145 
 
%&  H<
'1 L 4ÍÙËC
(
C©) UV
r[8) sin `CÊ*                               7.3
 
where ËC  2 L 1
V) and `C  Í2 L 1
/2. 
The eigenvalues + (used in (7.3)) are given below in the form of a column vector 
+  T xx  2-5                                                   7.4
 
7.2.1.2 Finite Line with Capacitive Load 
Similarly, the vector % for finite lines with capacitive loads connected at their output is 
given by  
%,  1 LÙ-C(C©) UV
r[8) sin.CÊ                                             7.5
 
The coefficients of the series are given below 
vC  2.C&  /&
.Cð.C&  /&
  0ñ                                                   7.6
 
where the parameter è ä  λ/CL and is related to 1 through the following equation 
1 tan1 
  ξ                                                                  7.7
 
There are an infinite number of such roots from which we only need to choose the positive 
ones as the proposed solution is an even function. The periodicity of the tangent function 
implies that the i-th root is within CV)
E m `C m &CV)
&E  for  ] 1 and so numerical 
solutions can be easily found by the bisection method. 
7.2.2 Model Validation 
For the validation of the proposed model, we consider victim and aggressor line 
configuration of Figure 7.1. The interconnects of length 1mm from 25 nm technology 
generation have been used having   867.6Ω,   22.3 ,-  88.8 . The victim 
and aggressor lines are excited by the step inputs and the signal on the victim line appears 
0.1 nsec later than the signal on the aggressor line. The response of the system using our 
model for the finite line with open end is shown in Figure 7.2 and 7.3 for the up / up and up 
/ down transitions respectively. HSPICE simulation results are also shown in the same 
figures. The curves clearly show that the model accurately matches with the simulation 
results and confirms its validity.  
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
146 
 
 
Figure 7.2: Typical responses of aggressor and victim lines during up/up transitions for finite lines with open 
ends. 
 
Figure 7.3: Typical responses of aggressor and victim lines during up/down transitions for finite lines with 
open ends. 
Similarly, the response of the model for finite lines with capacitive loads is plotted in 
Figure 7.4 and 7.5 for the up / up and up / down transitions respectively. Again the 
responses accurately match the simulation results shown along with the model curves.  
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
147 
 
V
o
lt
a
g
e
 (
v
o
lt
s
)
 
Figure 7.4: Typical responses of aggressor and victim lines during up/up transitions for finite lines with 
capacitive loads. 
 
Figure 7.5: Typical responses of aggressor and victim lines during up/down transitions for finite lines with 
capacitive loads. 
7.3 Skew Amplification under Variability 
As mentioned before, the skew amplification is defined as the ratio between the input and 
output skew. The analytical model and the plots show that the arrival time of the signals at 
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
148 
 
the output of the victim line depends on the input skew. The arrival time will be maximized 
(or minimized) when the skew in its driver occurs at the same time at which the aggressor 
line has managed to couple the maximum amount of energy into the victim. Under this 
condition, the input skew is amplified at the far end of the interconnect. Now, in a 
particular circuit configuration, if the signal transitions in the aggressor and victim lines 
always occur such that this condition is satisfied then a constant skew will be observed at 
the output of the channel. However, in the presence of variability, the output skew (and 
hence the skew amplification) will be in the form of a probability distribution. Therefore, 
under this condition, the uncertainty in the arrival time will also be amplified. As stated 
before, analytical expressions are being developed to determine the conditions and also to 
quantify the effects. In order to emphasise its significance, a case study is given below. 
We consider three coupled interconnects such that the victim line is surrounded by two 
aggressor lines. The resistance, self capacitance and coupling capacitance of the 
interconnect lines are taken to be 92.22 ohms/mm, 126.99 fF/mm and 39.26 fF/mm 
respectively. The supply voltage is taken to be 1.15V. The system response can either be 
measured using our proposed model or using simulations. Here we used Monte Carlo 
simulation method to incorporate the variability effects. We assume that due to variability 
the arrival time of the signals at the input of the victim line driver follows a normal 
distribution with standard deviation equal to 3ps. The system response has been measured 
corresponding to different values of input skew (taken as the time between the aggressor 
switching and mean of the arrival time distribution for the victim line). The delay 
measurements have been taken between the input and output of the victim line 
corresponding to 95% of the voltage levels. The results are shown in Table 7.1. 
The victim delay has been measured in the absence of X-talk for reference and is about 
81.34ps. Then an input signal is applied on the victim line with input skew=0 and standard 
deviation of the input delay=3ps. In order to simulate the in-phase X-talk situation, both 
aggressors were allowed to switch simultaneously in-phase with the victim line. It has been 
observed that the mean delay reduces to 72.25ps due to in-phase crosstalk. However, 
variability of 3ps in the input signal is amplified by 20.83% as the variability in the output 
signal increases to 3.625ps. However, the amplification in the delay variability reduces as 
the input skew is either increased or decreased from the zero value.  
Similar experiments were repeated to measure the effect on delay variability due to out-of-
phase crosstalk. It may be noted that the input delay variability is amplified as the input 
Chapter 7                                                                          Crosstalk in Coupled Interconnects 
 
149 
 
skew increases from negative values. The negative values of offset shows the situation 
when the aggressor switches prior to the victim switching. An amplification of input delay 
variability up to 43.46% has been observed with input skew of 60ps. 
Table 7.1: Monte Carlo simulation results for studying the effect of input signal variability on skew 
amplification. 
No X-
talk 
 
In-Phase X-Talk 
 
Out of phase X-talk. 
 Mean Input Mean 
  
Mean 
  delay 
(ps) 
Skew  
(ps) 
delay 
(ps) 
Increased 
stdev(ps) 
% age 
increase 
delay 
(ps) 
Increased 
stdev(ps) 
% age 
increase 
81.34 60     
 
105.9 1.304 43.46 
81.34 50 63.78 0.863 28.76 101.48 1.216 40.53 
81.34 40 65.513 0.844 28.13 98.21 1.019 33.96 
81.34 30 68.17 0.470 15.66 95.4 0.696 23.20 
81.34 20 69.07 0.227 7.56 93.15 0.572 19.06 
81.34 10 70.26 0.483 16.10 91.14 0.606 20.20 
81.34 0 72.24 0.625 20.83 89.09 0.543 18.10 
81.34 -10 70.26 0.480 16.00 87.27 0.44 14.66 
81.34 -20 76.32 0.460 15.33 85.82 0.375 12.50 
81.34 -30 77.722 0.316 10.53 84.71 0.284 9.46 
81.34 -40 78.744 0.254 8.466 83.82 0.217 7.23 
81.34 -50 79.46 0.184 6.133 83.07  0.173  5.70  
7.4 Summary 
In this chapter we have presented a crosstalk model that can be used to accurately describe 
the signals in the aggressor and victim lines under crosstalk effects due to RC coupling. 
Then we have shown that under crosstalk conditions, the delay variability in the arrival 
times of the signals is also amplified and can result in increased failure rates. 
Chapter 8                                                                                  Conclusions and Future Work 
 
150 
 
 
 
 
 
 
 
Chapter 8 
 
Conclusions and Future Work 
 
 
8.1 Conclusions 
Since variability is a major constraint in the design of state of the art systems, especially in 
deep sub-micron technologies, and technology scaling has caused communication to slow 
relative to computation. Future designs will require to enhance the on-chip 
communications while tolerating the inherent variability present in the system. Regardless 
of the communication architecture employed, this study has shown that variability in the 
communication infrastructures can compromise the ability to meet the designed targets, 
unless due attention to it is given during the design phase. In particular, we have critically 
examined the effect of device variability due to RDF on the performance of the basic 
elements of on-chip communication structures, such as tapered buffer drivers with different 
tapering factor, repeaters of different sizes, and data storage registers (FFs). FO4 delay 
measurements have also been taken, as representative of the logic circuitry and results can 
be used as a performance benchmark. The study revealed that RDF has significant impact 
on the performance of communication structures and their performance deteriorates very 
significantly with technology scaling from 25 to 13 nm.  
A simple design methodology, scaling up of circuits in the critical paths can be employed 
to minimize the effects of device variability, in particular, since we have shown that this 
Chapter 8                                                                                  Conclusions and Future Work 
 
151 
 
trade-off is not linear and a small increase in the repeater size can give substantial benefits 
towards performance. In a real system, however, the power and area penalties due to this 
passive technique of circuit scaling should be compared with any active countermeasure 
techniques which can be used to mitigate the delay variability. 
Although NoC is more robust against on-chip communication failure than simpler designs, 
we note that such occurrences have increased hyper-linearly (and will continue to do so) 
due to device variability. In order to evaluate the performance of a typical point-to-point 
link, we have derived analytical models to predict link failure probability (LFP) using the 
characterization data of the individual on-chip communication elements. The results show 
that link failure probability increases significantly with the increase of device variability 
and is a limiting factor in the maximum operating frequency of a synchronous link.  
It has also been observed that the timing distributions of different communication circuits 
are non-Gaussian, especially for smaller geometries. We have extended the study of these 
distributions on flip-flops and flip-flop based pipelined circuits. The simulation data shows 
that the timing distributions of FFs are positively skewed (except for the hold time, which 
is negatively skewed) and present nonzero higher moments, such as Kurtosis, which 
increase as the technology scales. The accurate estimation of the shape of the distributions, 
especially in the tail sections, is of great importance for large circuit designs, to improve 
performance and reliability in the presence of variability. The use of Gaussian 
approximation is common in SSTA (mainly because the necessary SSTA operations are 
known and easy to compute). However, as this work shows, the real distributions of the 
timing parameters deviate significantly from normality in the region of interest (the tail of 
the distribution) and hence will ultimately produce inaccurate results. The use of the skew-
normal distribution is an interesting alternative; however, it lacks enough degrees of 
freedom to fit the fourth moment of the distribution. Furthermore, it has been argued that 
the skewed distributions of arrival times are not represented accurately by it. Pearson and 
Johnson systems have enough degrees of freedom and can provide a very good fit to the 
timing distributions of FFs as shown in this thesis, and therefore their use during SSTA 
will provide improved results and significantly reduce the probability of yield loss. 
However, for this approach to be fully successful, it is required that different SSTA 
operations (e.g., SUM, MIN, or MAX) be analytically formulated for Pearson and Johnson 
systems, to allow efficient analysis. 
Chapter 8                                                                                  Conclusions and Future Work 
 
152 
 
The implications of skewed timing distributions on SSTA of pipelined circuits have also 
been discussed in this thesis. Due to skew in the timing distributions of FFs, the pipeline 
segment delay distributions are positively skewed about the mean and the degree of 
skewness increases with technology scaling. Therefore, in this situation determining the 
slowest pipeline segment (which determines the operating frequency of the pipeline) 
during SSTA using Clark’s approximation is not a good choice and will give wrong 
results, which will result in yield loss. Again, the skew-normal distribution is not a very 
ideal choice for approximating the timing distributions in highly scaled device, especially 
where the device count on a chip has jumped to several billions of devices. This is because 
a small deviation of the approximation from the actual results will produce significant yield 
loss. 
Power dissipation is an important design metric which plays a critical role in the design of 
on-chip communication architectures. The impact of technology scaling on power 
dissipation of buffers has been investigated in this thesis. The results show that the relative 
proportion of different components of power dissipation is changing and leakage power is 
emerging as a serious problem in the design of high performance and power optimal chips. 
Therefore, design methodologies should consider individual components of power 
dissipation along with the total power. Wider point-to-point links which are preferred for 
better latency, will consume more power due to higher leakage currents at low activity 
levels.  
The variability in the devices which is affecting the delay characteristics is also effecting the 
distribution of power dissipation. Since there is an inverse correlation between delay 
performance and leakage power, a significant asymmetry has also been observed in the 
distribution of leakage power. This in turn, will badly affect the yield in addition to delay 
variability. Therefore, it will be more advantageous to consider power variability along with 
delay variability while making different circuit optimizations. Active countermeasures, such 
as the use of sleep transistors, could be a possible solution against leakage power. 
In this thesis we emphasize that due to variability, power and area optimal repeater 
insertion methodologies should also consider variability in their optimization methodology. 
Analytical models for area, power, performance and probability of link failure have been 
presented in terms of the size of the repeaters and inter-repeater segment length. It has been 
found that beyond a certain reduction in the size of the repeaters, the delay variability may 
exceed acceptable limits while still satisfying other constraints. For instance, with only 4% 
Chapter 8                                                                                  Conclusions and Future Work 
 
153 
 
of performance loss due to the use of smaller repeaters, almost 30% of power and 40% of 
area savings can be achieved; however timing certainty is reduced by 24%. Therefore, 
while optimizing area, power and performance of on-chip communication links, delay (and 
power) variability should also be included in the figure of merit; performance and area 
alone are no longer a suitable metric. 
The performance of multi-bit parallel links under the impact of variability has also been 
discussed in this thesis. Based on the simulation data, optimum channel configuration for 
maximum bandwidth has been determined under area and power constraints. It has been 
found that delay variability also depends on the channel configuration (interconnect width 
and spacing) and so it determines the link operating frequency and the link failure 
probability. Moreover, the link failure probability also increases under variability as the 
number of lines in the channel increases. We have also compared the performance of 
parallel and semi-serial (serial) links for a particular throughput under some area 
constraint. This thesis proposes the use of semi-serial links for power efficient and fault 
tolerant links; these also have the additional benefit of less vulnerability to crosstalk effects 
due to larger interconnect spacing. Moreover, it has also been shown that leakage power 
becomes an important component of power dissipation for the links operating at low 
activity level and therefore this aspect needs to be considered in the link optimization 
methodology. 
In DSM technologies, the effects of crosstalk cannot be avoided and crosstalk severely 
affects the performance of data links. Analytical models have been presented in this thesis 
that can be used for accurate analysis of crosstalk effects in RC coupled interconnects. The 
simulation results confirm their validity for different channel configurations. The models 
are computationally efficient, more accurate and give direct outputs in the time domain. 
These models can be very effective for the design of variability tolerant links. This work 
also shows that crosstalk increases the input skew as well as skew variability. 
 
 
 
 
 
 
Chapter 8                                                                                  Conclusions and Future Work 
 
154 
 
8.2 Future Work 
Although the research work that was undertaken in the beginning is extensive for this 
thesis, there are still several dimensions in which this research can be extended. The 
suggested areas for future work are as follows: 
• The variability effects due to other sources can also be considered to evaluate the 
performance of on-chip communication architectures in DSM region. 
• Using the characterization data of communication structures and applying methods 
proposed in this thesis, variability tolerant network-on-chip can be designed along 
with its performance evaluation with different network topologies. 
• Complete set of statistical analysis tools can be developed that could work with 
skewed distributions of Pearson and Johnson systems for the accurate statistical 
static timing analysis (SSTA) in deep submicron technologies. 
• It would be an interesting area of research to devise active fault tolerant techniques 
that could effectively minimize the communication errors against increased level of 
variability in DSM circuits. Similarly, there is a need to develop circuit level 
techniques which could reduce leakage power, being a significant component of 
power dissipation in future technologies. 
 
 
 
                                                                                                                                  Appendix 
 
155 
 
Appendix A 
The following tables are related to Chapter 6 
Table A.1: Mean delay (in picoseconds) of interconnects (without repeaters) in the channel bus of 13nm for 
different geometrical configurations under variability Case 1. The columns of the table show the interconnect 
spacing and the rows show the width. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 180.7 121.4 107.4 102.1 99.53 97.95 97.01 96.46 95.98 95.59 
2X 97.51 67.25 60.27 57.6 56.29 55.52 55.08 54.66 54.49 54.24 
3X 69.81 49.32 44.57 42.79 41.93 41.41 41.08 40.86 40.7 40.53 
4X 55.92 40.36 36.75 35.38 34.72 34.33 34.08 33.93 33.79 33.69 
5X 47.59 34.95 32.05 30.96 30.4 30.09 29.89 29.73 29.67 29.58 
6X 42.08 31.39 28.95 28.00 27.53 27.27 27.09 26.97 26.9 26.84 
7X 38.11 28.84 26.66 25.87 25.47 25.24 25.13 25.03 24.93 24.87 
8X 35.15 26.92 25.02 24.31 23.96 23.72 23.6 23.55 23.45 23.4 
9X 32.82 25.41 23.71 23.05 22.75 22.58 22.48 22.39 22.33 22.26 
10X 30.97 24.23 22.68 22.08 21.79 21.63 21.51 21.46 21.4 21.36 
Table A.2: The standard deviation (in picoseconds) of the delay of interconnects (without repeaters) in the 
channel bus of 13nm for different geometrical configurations under variability Case 1. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 10.94 5.95 5.14 4.98 4.84 4.79 4.78 4.74 4.72 4.72 
2X 4.27 1.93 1.85 1.79 1.78 1.79 1.78 1.77 1.79 1.77 
3X 2.65 1.28 1.27 1.28 1.29 1.28 1.28 1.25 1.28 1.24 
4X 2.03 1.05 1.08 1.08 1.09 1.11 1.10 1.08 1.10 1.09 
5X 1.67 0.96 1.00 1.00 1.02 1.03 1.02 1.00 1.02 1.00 
6X 1.47 0.90 0.93 0.93 0.96 0.95 0.95 0.94 0.96 0.95 
7X 1.29 0.87 0.89 0.91 0.93 0.92 0.92 0.91 0.91 0.91 
8X 1.22 0.85 0.87 0.88 0.89 0.87 0.89 0.89 0.88 0.88 
9X 1.11 0.83 0.84 0.85 0.87 0.87 0.87 0.88 0.85 0.86 
10X 1.05 0.81 0.83 0.84 0.85 0.84 0.85 0.85 0.85 0.85 
Table A.3: Delay variability (%) of interconnects (without repeaters) in the channel bus of 13nm for different 
geometrical configurations under variability Case 1. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 18.16 14.7 14.35 14.65 14.59 14.67 14.78 14.74 14.75 14.81 
2X 13.12 8.632 9.219 9.322 9.5 9.686 9.68 9.718 9.837 9.776 
3X 11.4 7.81 8.554 8.955 9.199 9.276 9.328 9.211 9.42 9.202 
4X 10.89 7.793 8.826 9.154 9.436 9.671 9.657 9.536 9.747 9.717 
5X 10.51 8.241 9.317 9.736 10.08 10.22 10.26 10.1 10.27 10.17 
6X 10.46 8.587 9.671 9.989 10.47 10.49 10.56 10.47 10.67 10.64 
7X 10.19 9.024 10.05 10.54 10.95 10.89 11.03 10.88 10.94 11.02 
8X 10.38 9.453 10.45 10.85 11.13 10.98 11.3 11.31 11.29 11.33 
9X 10.14 9.847 10.68 11.08 11.46 11.5 11.58 11.76 11.48 11.54 
10X 10.13 9.974 11.01 11.47 11.71 11.7 11.91 11.84 11.93 11.87 
 
                                                                                                                                  Appendix 
 
156 
 
Table A.4: The size of the repeaters for different interconnect dimensions (width and spacing) for a 13 nm 
bus under worst crosstalk. The repeater sizes have been rounded-off. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 266 199 172 157 148 141 136 133 130 128 
2X 384 289 252 231 218 209 203 198 194 192 
3X 479 364 319 294 278 267 260 254 250 247 
4X 563 431 379 351 333 321 313 306 302 298 
5X 640 494 436 405 386 373 363 356 351 347 
6X 713 553 491 458 436 422 412 405 399 395 
7X 783 611 545 508 486 471 460 452 446 442 
8X 850 667 597 558 535 519 508 499 493 488 
9X 916 722 648 608 583 566 554 546 539 534 
10X 980 776 698 656 630 613 601 592 585 579 
 
Table A.5: The number repeaters per unit length required for different interconnect dimensions (width and 
spacing) for a 13 nm bus under worst crosstalk. The numbers have been rounded-off. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 30 23 20 18 17 17 16 16 15 15 
2X 22 16 14 13 13 12 12 12 12 11 
3X 18 14 12 11 11 10 10 10 10 9 
4X 16 12 11 10 10 9 9 9 9 9 
5X 15 11 10 9 9 9 9 8 8 8 
6X 13 11 10 9 9 8 8 8 8 8 
7X 13 10 9 9 8 8 8 8 8 8 
8X 12 10 9 8 8 8 8 7 7 7 
9X 12 9 8 8 8 7 7 7 7 7 
10X 11 9 8 8 7 7 7 7 7 7 
 
Table A.6: Mean delay (in picoseconds) of interconnects (with repeaters) in the channel bus of 13nm for 
different geometrical configurations under variability Case 1. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 70.57 52.95 45.93 42.08 39.65 37.99 36.83 35.97 35.3 34.77 
2X 50.91 38.49 33.65 30.99 29.31 28.18 27.39 26.77 26.33 25.97 
3X 42.37 32.31 28.4 26.27 24.94 24.05 23.41 22.93 22.58 22.28 
4X 37.38 28.73 25.39 23.57 22.44 21.68 21.14 20.74 20.44 20.2 
5X 34.03 26.34 23.39 21.79 20.79 20.13 19.65 19.3 19.04 18.83 
6X 31.62 24.63 21.96 20.52 19.62 19.02 18.6 18.29 18.05 17.87 
7X 29.76 23.32 20.87 19.55 18.74 18.19 17.81 17.53 17.31 17.14 
8X 28.31 22.3 20.02 18.81 18.05 17.54 17.19 16.94 16.73 16.58 
9X 27.09 21.46 19.34 18.19 17.5 17.03 16.71 16.46 16.28 16.12 
10X 26.1 20.77 18.77 17.7 17.04 16.6 16.29 16.07 15.89 15.76 
 
                                                                                                                                  Appendix 
 
157 
 
Table A.7: The standard deviation (in picoseconds) of the delay of interconnects (with repeaters) in the 
channel bus. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 2.384 1.412 1.133 1.011 0.935 0.894 0.874 0.854 0.837 0.829 
2X 1.338 0.581 0.445 0.387 0.363 0.36 0.353 0.354 0.359 0.358 
3X 0.998 0.39 0.289 0.258 0.254 0.256 0.26 0.261 0.272 0.27 
4X 0.851 0.317 0.233 0.217 0.22 0.229 0.237 0.242 0.25 0.256 
5X 0.747 0.278 0.213 0.205 0.216 0.225 0.234 0.238 0.248 0.251 
6X 0.687 0.256 0.207 0.201 0.214 0.222 0.233 0.238 0.249 0.253 
7X 0.624 0.244 0.201 0.205 0.22 0.226 0.237 0.242 0.248 0.256 
8X 0.594 0.234 0.205 0.208 0.219 0.225 0.239 0.246 0.251 0.257 
9X 0.552 0.235 0.208 0.208 0.223 0.233 0.242 0.252 0.251 0.258 
10X 0.523 0.228 0.206 0.215 0.227 0.234 0.247 0.251 0.258 0.262 
 
Table A.8: Delay variability (%) of interconnects (with repeaters) in the channel bus of 13nm for different 
geometrical configurations under variability Case 1.  
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 10.13 8.003 7.398 7.208 7.074 7.062 7.12 7.12 7.112 7.15 
2X 7.881 4.527 3.964 3.745 3.717 3.828 3.866 3.962 4.084 4.133 
3X 7.067 3.619 3.049 2.943 3.051 3.194 3.334 3.408 3.612 3.631 
4X 6.829 3.31 2.75 2.763 2.939 3.172 3.36 3.496 3.672 3.805 
5X 6.585 3.163 2.726 2.826 3.117 3.349 3.577 3.705 3.913 3.993 
6X 6.514 3.117 2.821 2.931 3.276 3.506 3.763 3.91 4.134 4.255 
7X 6.295 3.134 2.894 3.145 3.517 3.723 3.989 4.135 4.299 4.472 
8X 6.297 3.148 3.07 3.319 3.633 3.839 4.172 4.361 4.506 4.652 
9X 6.11 3.279 3.23 3.436 3.817 4.11 4.344 4.595 4.632 4.798 
10X 6.009 3.294 3.285 3.636 3.997 4.222 4.54 4.692 4.877 4.987 
 
Table A.9: Bandwidth of the individual interconnect lines (without repeaters) in Gb/s given as a function of 
the interconnect width and spacing for 13 nm. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 0.615 0.915 1.035 1.088 1.116 1.134 1.145 1.152 1.158 1.162 
2X 1.14 1.652 1.844 1.929 1.974 2.001 2.017 2.033 2.039 2.048 
3X 1.592 2.253 2.493 2.597 2.65 2.683 2.705 2.72 2.73 2.742 
4X 1.987 2.753 3.024 3.141 3.2 3.236 3.26 3.275 3.288 3.298 
5X 2.335 3.179 3.467 3.589 3.655 3.692 3.718 3.737 3.746 3.757 
6X 2.641 3.54 3.838 3.968 4.036 4.074 4.102 4.12 4.13 4.14 
7X 2.916 3.853 4.168 4.295 4.362 4.402 4.422 4.439 4.458 4.468 
8X 3.161 4.127 4.44 4.571 4.638 4.683 4.707 4.718 4.738 4.748 
9X 3.385 4.374 4.687 4.82 4.884 4.921 4.943 4.963 4.975 4.991 
10X 3.588 4.585 4.9 5.032 5.1 5.138 5.165 5.178 5.193 5.202 
 
                                                                                                                                  Appendix 
 
158 
 
Table A.10: Bandwidth of the individual interconnect lines (with repeaters) in Gb/s given as a function of the 
interconnect width and spacing for 13 nm. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 1.574 2.099 2.419 2.641 2.802 2.925 3.017 3.089 3.147 3.195 
2X 2.183 2.887 3.302 3.586 3.79 3.943 4.057 4.15 4.219 4.279 
3X 2.622 3.439 3.912 4.229 4.454 4.621 4.747 4.845 4.922 4.986 
4X 2.972 3.867 4.377 4.714 4.951 5.125 5.256 5.356 5.437 5.501 
5X 3.265 4.218 4.751 5.099 5.343 5.521 5.654 5.756 5.835 5.9 
6X 3.514 4.511 5.059 5.415 5.663 5.841 5.975 6.076 6.155 6.219 
7X 3.734 4.764 5.324 5.682 5.93 6.108 6.238 6.337 6.419 6.482 
8X 3.925 4.983 5.549 5.908 6.155 6.334 6.464 6.56 6.641 6.703 
9X 4.101 5.179 5.746 6.107 6.351 6.524 6.651 6.75 6.826 6.891 
10X 4.257 5.349 5.919 6.278 6.521 6.694 6.822 6.916 6.992 7.052 
 
Table A.11: Total bandwidth (Gb/s) through the bus constrained in channel width -, without repeaters in 
13nm. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 78.70 78.10 66.23 55.72 47.63 41.49 36.65 32.77 29.64 27.05 
2X 97.62 106.16 94.77 82.62 72.47 64.29 57.60 52.24 47.64 43.87 
3X 102.66 116.26 107.21 95.70 85.46 76.92 69.78 63.78 58.69 54.41 
4X 102.93 118.83 111.88 101.69 92.10 83.82 76.76 70.68 65.51 61.01 
5X 101.18 118.09 112.66 103.69 95.03 87.27 80.55 74.75 69.56 65.12 
6X 98.46 115.49 111.31 103.56 95.76 88.60 82.35 76.80 71.86 67.53 
7X 95.49 112.17 109.20 102.31 95.24 88.72 82.76 77.53 72.99 68.86 
8X 92.37 108.54 106.17 100.17 93.84 87.98 82.54 77.55 73.30 69.38 
9X 89.38 104.96 103.12 97.89 92.10 86.61 81.56 77.07 72.97 69.35 
10X 86.43 101.25 99.88 95.24 90.09 85.10 80.51 76.24 72.43 68.93 
 
Table A.12: Total bandwidth (Gb/s) through the bus constrained in channel width -, with repeaters in 
13nm. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 201.5 179.1 154.8 135.2 119.6 107.0 96.6 87.9 80.6 74.4 
2X 187.0 185.5 169.7 153.6 139.2 126.7 115.9 106.7 98.6 91.6 
3X 169.1 177.5 168.2 155.9 143.7 132.5 122.5 113.6 105.8 99.0 
4X 154.0 166.9 162.0 152.6 142.5 132.7 123.8 115.6 108.3 101.8 
5X 141.5 156.7 154.4 147.3 138.9 130.5 122.5 115.1 108.4 102.3 
6X 131.0 147.2 146.7 141.3 134.4 127.0 119.9 113.3 107.1 101.4 
7X 122.3 138.7 139.5 135.3 129.5 123.1 116.7 110.7 105.1 99.9 
8X 114.7 131.0 132.7 129.5 124.5 119.0 113.3 107.8 102.7 97.9 
9X 108.3 124.3 126.4 124.0 119.8 114.8 109.7 104.8 100.1 95.7 
10X 102.5 118.1 120.7 118.8 115.2 110.9 106.3 101.8 97.5 93.4 
 
                                                                                                                                  Appendix 
 
159 
 
Table A.13: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm technology 
without repeaters. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 10.13 6.29 4.46 3.43 2.78 2.34 2.01 1.77 1.58 1.43 
2X 13.38 9.33 7.06 5.67 4.74 4.07 3.57 3.19 2.87 2.62 
3X 14.90 11.08 8.77 7.25 6.19 5.42 4.82 4.34 3.95 3.63 
4X 15.78 12.21 9.95 8.43 7.32 6.49 5.84 5.30 4.87 4.50 
5X 16.33 13.01 10.84 9.34 8.23 7.38 6.69 6.13 5.66 5.26 
6X 16.71 13.58 11.52 10.06 8.97 8.12 7.42 6.84 6.35 5.93 
7X 16.99 14.03 12.07 10.66 9.60 8.75 8.04 7.46 6.95 6.52 
8X 17.20 14.38 12.52 11.17 10.13 9.29 8.59 8.00 7.50 7.06 
9X 17.35 14.68 12.90 11.61 10.59 9.76 9.08 8.49 7.98 7.54 
10X 17.49 14.92 13.23 11.97 10.99 10.19 9.52 8.93 8.42 7.98 
 
Table A.14: Power dissipation (mW) at maximum bandwidth for the interconnect of 13 nm technology with 
repeaters. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 18.55 9.50 6.35 4.76 3.81 3.18 2.72 2.39 2.12 1.91 
2X 18.01 10.51 7.54 5.92 4.89 4.17 3.64 3.23 2.91 2.65 
3X 17.00 10.70 8.05 6.52 5.51 4.79 4.24 3.81 3.46 3.17 
4X 16.13 10.67 8.30 6.89 5.93 5.22 4.68 4.24 3.88 3.58 
5X 15.42 10.58 8.44 7.14 6.23 5.55 5.02 4.58 4.22 3.92 
6X 14.83 10.47 8.52 7.32 6.47 5.82 5.30 4.87 4.51 4.21 
7X 14.36 10.36 8.58 7.46 6.65 6.03 5.53 5.11 4.76 4.46 
8X 13.96 10.26 8.61 7.56 6.80 6.21 5.73 5.32 4.98 4.68 
9X 13.63 10.18 8.64 7.65 6.93 6.36 5.90 5.50 5.17 4.87 
10X 13.35 10.10 8.66 7.73 7.04 6.50 6.05 5.67 5.34 5.05 
 
Table A.15: Total bandwidth per unit power (Gb/s.mW) consumption for interconnects with repeaters. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 10.86 18.84 24.37 28.38 31.38 33.66 35.43 36.83 37.95 38.85 
2X 10.38 17.64 22.5 25.95 28.47 30.37 31.83 32.98 33.89 34.63 
3X 9.947 16.59 20.89 23.89 26.06 27.67 28.9 29.85 30.61 31.23 
4X 9.545 15.65 19.51 22.14 24.03 25.42 26.46 27.27 27.91 28.43 
5X 9.176 14.81 18.29 20.63 22.29 23.49 24.4 25.11 25.65 26.1 
6X 8.831 14.06 17.21 19.31 20.78 21.84 22.64 23.26 23.73 24.11 
7X 8.513 13.38 16.27 18.15 19.46 20.41 21.11 21.65 22.08 22.42 
8X 8.216 12.77 15.41 17.12 18.3 19.16 19.79 20.26 20.64 20.94 
9X 7.942 12.21 14.64 16.2 17.28 18.04 18.61 19.04 19.38 19.65 
10X 7.684 11.69 13.94 15.38 16.36 17.05 17.57 17.96 18.26 18.5 
 
 
                                                                                                                                  Appendix 
 
160 
 
Table A.16: Probability of link failure (in parts per thousand) of the individual lines of the channel under 
variability. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 34.29 8.02 3.81 2.69 2.06 1.88 1.89 1.81 1.72 1.75 
2X 6.92 0.06 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
3X 2.31 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
4X 1.34 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
5X 0.77 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
6X 0.58 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
7X 0.35 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
8X 0.31 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
9X 0.21 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
10X 0.16 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 
 
Table A.17: Probability of link failure (in parts per thousand) for the channel under area constraint. 
S/W 1X 2X 3X 4X 5X 6X 7X 8X 9X 10X 
1X 988.51 496.90 216.64 128.74 84.38 66.34 58.87 50.19 43.21 40.02 
2X 448.43 3.69 2.73 2.27 1.94 1.70 1.51 1.36 1.24 1.14 
3X 138.36 2.73 2.28 1.95 1.71 1.52 1.37 1.24 1.14 1.05 
4X 67.18 2.28 1.96 1.71 1.52 1.37 1.25 1.14 1.06 0.98 
5X 32.93 1.97 1.72 1.53 1.38 1.25 1.15 1.06 0.98 0.92 
6X 21.48 1.73 1.54 1.38 1.26 1.15 1.06 0.99 0.92 0.86 
7X 11.53 1.54 1.39 1.26 1.16 1.07 0.99 0.93 0.87 0.82 
8X 9.11 1.39 1.27 1.16 1.07 1.00 0.93 0.87 0.82 0.78 
9X 5.47 1.27 1.17 1.08 1.00 0.93 0.87 0.82 0.78 0.74 
10X 3.90 1.17 1.08 1.00 0.94 0.88 0.83 0.78 0.74 0.71 
 
 
                                                                                                                                References 
 
161 
 
References 
[1] Gordon E. Moore, “Cramming More Components onto Integrated Circuits”, 
Electronics, Volume 38, Number 8, April 19, 1965. 
[2] Ethiopia Enideg Nigussie, “Exploration and Design of High Performance Variation 
Tolerant On-Chip Interconnects,” Ph.D Thesis Turun Yliopisto University of Turku, 2010. 
[3] Marvell 88F6282 SoC data sheet. Marvell ®. Available online at: 
http://www.datasheets.org.uk/88F6282-datasheet.html 
[4] International Technology Roadmap for Semiconductor, Semiconductor Industry 
Association, 2003, http://www.itrs.net/ 
[5] J. W. McPherson, “Reliability challenges for 45nm and beyond,” in Proceedings of the 
43rd Design Automation Conference, pages 176– 181, July 2006. 
[6] R. Ho, K. Mai, and M. Horowitz, “The future of wires,” in Proceedings of the IEEE, 
89(4):490–504, April 2001. 
[7] S. Borkar, “Design challenges of technology scaling,” Micro, IEEE, 19(4):23–29, 1999. 
[8] Simon Ogg, “Serialization and Asynchronous Techniques for Reliable Network-on-
Chip communication,” PhD. Thesis, Southampton University, May 2009. 
[9] H. Ron, M. Ken and M. Horowitz, “Managing wire scaling: a circuit perspective,” in 
Proc. of Interconnect Technology Conference, 2003. 
[10] N. Magen, A. Kolodny, U. Weiser, and N. Shamir, “Interconnect power dissipation in 
a microprocessor,” in IEEE/ACM International Workshop on System Level Interconnect 
Prediction, pp. 7-13, Feb. 2004. 
[11] T. Mudge, “Power: A first-class architectural design constraint,” IEEE Computer, 
34(4):52–58, April 2001. 
[12] D. Blaauw, A. Devgan, F. Najm, “Leakage power: trends, analysis and avoidance,” 
Embedded Tutorial II, ASP-DAC 2005. 
[13] Y. Chen and S. Y. Kung, “Trend and Challenge on System-on-a-Chip designs,” 
Journal of Signal Processing systems, vol. 53, issue 1-2, Nov. 2008. 
[14] S. R. Nassif, “Model to Hardware Matching; For nanometer Scale Technologies,” 
International Conference on Simulation of Semiconductor Processes and Devices 2006, 
pp. 5-8, Sept. 2006. 
[15] Dennis Sylvestera, Kanak Agarwalb and Saumil Shaha, “Variability in nanometer 
CMOS: Impact, analysis, and minimization,” Integration, the VLSI Journal, Vol. 41, No. 3, 
pp. 319-339, May 2008. 
                                                                                                                                References 
 
162 
 
[16] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, “Parameter 
variations and impact on circuits and microarchitectures,” in Proc. of the Design 
Automation Conf., 2003, pp. 338–342. 
[17] Asen Asenov, “Statistical device variability and its impact on yield and performance,” 
13th IEEE International On-Line Testing Symposium (IOLTS 2007). 
[18] H. Mahmoodi, S. Mukhopadhyay, and K. Roy, “Estimation of delay variations due to 
random-dopant fluctuations in nanoscale CMOS circuits,” IEEE J. Solid-State Circuits, 
vol. 40, no. 9, pp. 1787–1796, Sep. 2005. 
[19] Y. Li and C.-H. Hwang, “High-frequency characteristic fluctuations of nano-
MOSFET circuit induced by random dopants,” IEEE Trans.Microw. Theory Tech., vol. 56, 
no. 12, pp. 2726–2733, Dec. 2008. 
[20] Y. Li, C.-H. Hwang, T.-C. Yeh, H.-M. Huang, T.-Y. Li, and H.-W. Cheng, 
“Reduction of discrete-dopant-induced /high-frequency characteristic fluctuations in 
nanoscale CMOS circuit,” in Int. Conf. Simul. Semicond. Process. Devices, Sep. 2008, pp. 
209–212. 
[21] K. Kuhn et al, “Managing process variation in Intel’s 45nm CMOS technology,” Intel 
Technology Journal, vol. 12, issue 02, June 17, 2008. 
[22] Paul P. Sotiriadis and Anantha Chandrakasan, “Low Power Bus Coding Techniques 
Considering Inter-wire Capacitances,” IEEE Custom Integrated Circuits Conference, 2000. 
[23] P. Subrahmanya, R. Manimegalai, V. Kamakoti, Madhu Mutyam, “A Bus Encoding 
Technique for Power and Cross-talk Minimization,” Proceedings of the 17th International 
Conference on VLSI Design (VLSID’04), 2004. 
[24] K. Banerjee and A. Mehrotra, “A power-optimal repeater insertion methodology for 
global interconnects in nanometer designs,” IEEE Trans. on Elec. Dev., vol. 49, pp. 2001-
2007, Nov. 2002. 
[25] Man Lung Mui, K. Banerjee and Amit Mehrotra, “A global interconnect optimization 
scheme for nanometer scale VLSI with implications for latency, bandwidth, and power 
dissipation,” IEEE transactions on Electron Devices, vol. 51, no. 2, February 2004. 
[26] Anastasia Barger and David Goren, Avinoam Kolodny, “Simple Design Criterion for 
Maximizing Data Rate in NoC Links,” IEEE Workshop on signal propagation on 
interconnects, pp. 149-152, 2006. 
[27] Magdy A. El-Moursy, Eby G. Friedman, “Optimum wire sizing of RLC interconnect 
with repeaters,” Integration, the VLSI Journal, 38 (2004) 205–225. 
                                                                                                                                References 
 
163 
 
[28] [5] B. K. Kaushik, S. Sarkar, R. P. Agarwal and R. C. Joshi, “Crosstalk analysis of 
simultaneously switching coupled interconnects driven by unipolar inputs through 
heterogeneous resistive drivers,” ICET 2007. 
[29] L. M. Coulibaly, H. J. Kadim, “Analytical crosstalk noise and its induced delay 
estimation for distributed RLC interconnects under ramp excitation,” ISCAS 2005. 
[30] Xiaopeng Ji, Long Ge, X. Han, Z. Wang, “Crosstalk noise analysis for distributed 
parameter high-speed interconnect lines based on the transfer function,” CCDC 2008. 
[31] Sudeep Pasricha and Nikil Dutt, “On-Chip Communication Architectures-System on 
Chip Interconnect,” Morgan Kaufmann Publishers, 2008. 
[32] C. Sangik, and K. Shinwook, “Implementation of an on-chip bus bridge between 
heterogeneous buses with different clock frequencies,” in Proceedings of 5th System-on-
Chip for Real-Time Applications, 2005, pp. 530-534. 
[33] AMBA system architecture, 
http://www.arm.com/products/solutions/AMBA/HomePage.html 
[34] Giovanni De Micheli and Luca Benini, “Networks on Chips,” Morgan Kaufmann, 
Elsevier Inc. 2006. 
[35] L. Benini and G. D. Micheli, “Networks on chips: A new SoC paradigm,” IEEE 
Computer, 35(1):70–78, January 2002. 
[36] W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection 
networks,” In Proceedings of the 38th Design Automation Conference, 2001. 
[37] T. Bjerregaard and S. Mahadevan, “A survey of research and practices of network on-
chip”, ACM Computing Survey, 38(1):1–54, 2006. 
[38] Zhonghai Lu, “Design and Analysis of On-Chip Communication for Network- on-
Chip Plateforms”, PhD thesis, submitted in Royal Institute of Technology (KTH), 2007. 
[39] S. Murali and G. De Micheli, “SUNMAP: a tool for automatic topology selection and 
generation for NoCs,” San Diego, CA, USA, 2004, pp. 914-919. 
[40] Edmund Lee, “Interconnect driver design for long wires in field-programmable gate 
arrays,” M.S.c Thesis, The University of British Columbia, June 2006. 
[41] J. Nurmi, H. Tenhunen, J. Isoaho, A. Jantsch, “Interconnect-Centric Design for 
Advanced SOC and NOC,” 2004,  ISBN: 978-1-4020-7835-4. 
                                                                                                                                References 
 
164 
 
[42] “Design of VLSI Systems” Online Available at: 
http://lsmwww.epfl.ch/Education/former/2002-2003/VLSIDesign/ch04/ch04.html. 
[43] Raphael User’s Guide, Technology Modeling Associates, 1997. 
[44] K. Nabors and J. White, “Fastcap: A Multipole Accelerated 3-D Capacitance 
Extraction Program,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and 
Systems, Vol. 10, No. 11, pp. 1447-1459, Nov. 1991. 
[45] Jue-Hsien Chern, Jean Huang, Lawrence Arledge, Ping-Chung Li, and Ping Yang, 
“Multilevel Metal Capacitance Models For CAD Design Synthesis Systems,” IEEE 
Electron Device Letters, vol. 13, no. 1, Jan. 1992. 
[46] Shyh-Chyi Wong, Trent Gwo-Yann Lee, Dye-Jyun Ma, and Chuan-Jane Chao, “An 
Empirical Three-Dimensional Crossover Capacitance Model for Multilevel Interconnect 
VLSI Circuits,” IEEE Trans. on Semiconductor Manufacturing, vol. 13, no. 2, May 2000. 
[47] T. Sakurai, “Closed-Form Expressions for Interconnection Delay, Coupling, and 
Crosstalk in VLSI’s”, IEEE Trans. on Electron Devices, Vol. 40, No. 1, pp. 118-124, Jan. 
1993. 
[48] S. Simon Wong et al “On-Chip Interconnect Inductance –Friend or Foe,” Proceedings 
of the Fourth International Symposium on Quality Electronic Design (ISQED’03). 
[49] K. Banerjee and A. Mehrotra, "Analysis of On-Chip Inductance Effects for 
Distributed RLC Interconnects," IEEE Transactions on Computer-Aided Design of 
Integrated Circuits and Systems, vol. 21, pp. 904-915, 2002. 
[50] International technology roadmap for semiconductors 2007. Online available at: 
http://www.itrs.net/Links/2007ITRS/2007_Chapters/2007_Interconnect.pdf 
[51] H. Bakoglu, “Circuits, Interconnections, and Packaging for VLSI,” Addison-Wesley, 
1990. 
[52] W. C. Elmore, "The Transient Response of Damped Linear Networks with Particular 
Regard to Wideband Amplifiers," Applied Physics, pp. 55-63, 1948. 
[53] M. J. Mills, “Variation Aware Design of Data Receiver Circuits for On-Chip Optical 
Interconnect,” Master of Engineering Thesis, MIT EECS, 2002. 
[54] R. Gandikota, D. Blaauw, Dennis Sylvester, “Modeling crosstalk in statistical static 
timing analysis,” DAC 2008, June 8-13 2008. 
                                                                                                                                References 
 
165 
 
[55] T. Xiao and M. Marek-Sadowska, “Gate sizing to eliminate crosstalk induced timing 
violation,” in Proc. Intl. Conf. on Computer Design, 2001, pp. 186–191. 
[56] K. Agarwal, D. Sylvester, and D. Blaauw, “Modeling and analysis of crosstalk noise 
in coupled RLC interconnects,” IEEE trans. on Comp. Aided Design of Int. Cir. and Sys., 
vol. 25, no. 5, May 2006. 
[57] J. V. R. Ravindra, M. B. Srinivas, “Modelling and Analysis of Crosstalk for 
Distributed RLC interconnects using difference model approach,” SBCCI’07. 
[58] O. S. Nakagawa, D. Sylvester, J. McBride, and S.-Y. Oh, “On-Chip Cross Talk Noise 
Model for Deep-Submicrometer ULSI Interconnect,” The Hewlett-Packard Journal, pp. 
39-45, May 1998. 
[59] J. Cong and L. He, “An efficient technique for device and interconnect optimization in 
deep submicron designs,” in Proc. Int. Symp. Physical Design, 1998, pp. 45-51. 
[60] D. Sylvester, and K. Keutzer, “Getting to the bottom of deep submicron II : a global 
paradigm,” Proceedings of International Symposium on Physical Design, pages 193-200, 
1999. 
[61] G. Chen and E. Friedman, “Low power repeaters driving RC interconnects with delay 
and bandwidth constarints,” in Proc. Of ASIC/SOC, pp. 335-339, 2004. 
[62] Patrick P. Gelsinger, 41st DAC keynote, DAC 2004, (www.dac.com). 
[63] Anantha P. Chandrakasan and Robert W. Brodersen, “Minimizing Power 
Consumption in CMOS Circuits,” Dept. of EECS, University of California at Burkeley. 
[64] Ndubuisi Ekekwe, “Power dissipation and interconnect noise challenges in nanometer 
CMOS technologies,” IEEE potentials, 2010. 
[65] S. Markov, “Gate Leakage Variability in Nano-CMOS Transistors,” PhD thesis, 
Glasgow University, 2009. 
[66] S. Saxena, C. Hess, H. Karbasi, A. Rossoni, S. Tonello, P. McNamara, S. Lucherini, 
S. Minehane, C. Dolainsky, and M. Quarantelli, “Variation in transistor performance and 
leakage in nanometer-scale technologies,” IEEE Transactions on Electron Devices, vol. 55, 
p. 131, 2008. 
[67] H. Tuinhout, “Impact of parametric mismatch and fluctuations on performance and 
yield of deep-submicron CMOS technologies,” ESSDERC 2002 - Proceedings of the 28th 
European Solid-State Device Research Conference, vol. Florence, Italy, pp. 95 101, 2002. 
                                                                                                                                References 
 
166 
 
[68] R. H. J. M. Otten and G. S. Garcea, “Are wires plannable?,” in Proc. Int. Workshop on 
System-Level Interconnect Prediction, pp. 59-66, 2001. 
[69] K. A. Bowman, S. G. Duvall, and J. D. Meindl, “Impact of die-to-die and within-die 
parameter fluctuations on the maximum clock frequency distribution for gigascale 
integration,” IEEE J. Solid-State Circuits, vol. 37, no. 2, pp. 183–190, Feb. 2002. 
[70] K. Agarwal, R. Rao, D. Sylvester and R. Brown, “Parametric yield analysis and 
optimization in leakage dominated technologies,” IEEE trans. on very large scale 
integration (VLSI) systems, vol.15, no. 6, June 2007. 
[71] G. Roy, F. Adamu-Lema, A. R. Brown, S. Roy, A. Asenov, “Intrinsic parameter 
fluctuations in conventional MOSFETs until the end of the ITRS: A statistical simulation 
study,” in Journal of Physics: Conference Series 38 (2006), p. 188191. 
[72] M. F. Bukhori, Scott Roy and Asen Asenov, “Statistical Simulation of RTS Amplitude 
Distribution in Realistic Bulk MOSFETs Subject to Random Discreet Dopants,” ULIS 
2008, pp. 171-174, 2008. 
[73] H. Mahmoodi, S. Mukhopadhyay and Kaushik Roy, “Estimation of DelayVariations 
due to Random-Dopant Fluctuations in Nanoscale CMOS Circuits,” IEEE Journal of 
Solid-State Circuits, vol. 40, N0. 9, September, 2005.  
[74] B. Cheng, S. Roy, A.R. Brown, C. Millar, A. Asenov, “Evaluation of statistical 
variability in 32 and 22 nm technology generation LSTP MOSFETs,” Solid-State 
Electronics 53 (2009) 767–772. 
[75] Yiming Li, Chih-Hong Hwang, and Tien-Yeh Li, “Discrete-dopant-induced timing 
fluctuation and suppression in nanoscale CMOS circuit,” IEEE Trans. on Circuits and 
Systems-II, Express Briefs, vol. 56, no. 5, May 2009. 
[76] B.J. Cheng, S. Roy, A. Asenov, "Impact of random dopant fluctuation on bulk CMOS 
6-T SRAM scaling", ESSDERC, Montreux, Switzerland, Sept. 2006. 
[77] B. Cheng, S. Roy, G. Roy, F. Adamu-Lema and A. Asenov, "Impact of intrinsic 
parameter fluctuations in decanano MOSFETs on yield and functionality of SRAM cells," 
Solid-State Electronics, Vol. 49, No. 5, pp. 740–746, 2005. 
[78] International Technology Roadmap for Semiconductors, 2005. 
[79] J. Liu, L. R. Zheng and H. Tenhunen, “Interconnect intellectual property for Network-
on-Chip (NoC)” Journal of Systems Architecture, 50(2-3), 65-79, 2004. 
                                                                                                                                References 
 
167 
 
[80] M. Bhushan, M. B. Ketchen, S. Polonksy and A.Gattiker, “Ring Oscillator 
BasedTechnique for Measuring Variability Statistics,” Proceedings of the International 
Conference on Microelectronic Test Structures, pp. 87-92, 2006. 
[81] B. J. Cheng, S. Roy, G. Roy, A. Asenov, “Integrating 'atomistic', intrinsic parameter 
fluctuations into compact model circuit analysis,” In, ESSDERC '03 : 33rd Conference on 
European Solid-State Device Research, 16-18 September 2003, pages pp. 437-440, Estoril, 
Portugal. 
[82] B. S. Cherkauer and E. G. Friedman, “Design of tapered buffers with local 
interconnect capacitance,” IEEE Journal of Solid State Circuits, 30(2), 151-155, 1995. 
[83] M. Nemes, “Driving large capacitances in MOS LSI systems,” IEEE J. Solid-State 
Circuits, 19(1), 159-161, 1984. 
[84] TSMC 0.18µm process 1.8 volt in SAGE-XTM standard cell library, databook, Artisan 
components, Inc., Release 3.1, 2001. 
[85] N. Lotze, M. Ortmanns, Y. Manoli, “Variability of flip-flop timing at sub-threshold 
voltages,” Proceedings of the 13th International Symposium on Low Power Electronics and 
Design, ISLPED 2008. 
[86] D. Pamunuwa, H. Tenhunen, “Repeater insertion to minimise delay in coupled 
interconnects,” 14th International Conference on VLSI Design, 2001. 
[87] P. Borjesson and C.-E. Sundberg, “Simple approximations of the error function Q(x) 
for communications applications,” IEEE transanctions on communications, 27(3), 639-
643, 1979. 
[88] Cristiano Forzan, Davide Pandini, “Statistical static timing analysis: A survey,” 
Integration, The VLSI Journal, 42 (2009) 409–435. 
[89] Izumi Nitta, T. Shibuya, K. Homma, “Statistical static timing analysis technology,” 
Fujitsu Science and Technology journal, 43(4), pp. 516-523, October 2007. 
[90] E. Salman, A. Dasdan, F. Taraporevala, K. Kucukcakar, and E.G. Friedman, 
“Pessimism reduction in static timing analysis using interdependent setup and hold 
times,”. Proceedings of the 7th ISQED’06, page 6, March 2006. 
[91] S. Srivastava and J. Roychowdhury, “Interdependent latch setup/hold time 
characterization via Euler-Newton curve tracing on state-transition equations,” DAC, 
pp. 136-141, June 2007. 
                                                                                                                                References 
 
168 
 
[92] Saibal Mukhopadhyay, “A generic method for variability analysis of nanoscale 
circuits,” ICICDT 2008, pp. 285-288, 2008. 
[93] Sani Nassif et al. “High performance CMOS variability in the 65nm regime and 
beyond,” IEDM Digest of Technical Papers, pp. 569-571, 10-12 Dec. 2007. 
[94] Urban Kovac et al, “Statistical simulation of random dopant induced threshold voltage 
fluctuations for 35 nm channel length MOSFET,” Microelectronics Reliability 48 
(2008) 1572-1575. 
[95] MathWorks. Online available at: 
http://www.mathworks.com/help/toolbox/stats/br5k833-1.html#br5k833-2 
[96] J. Heinrich, “A guide to the Pearson type IV distribution,” 
CDF/MEMO/STATISTICS/PUBLIC/6820. 1, 2. 
[97] Y. Nagahara, “The PDF and CF of Pearson type IV distributions and the ML 
estimation of the parameters,” Statistics and Probability Letters 43 (1999) 251-264. 
[98] Michel, N. and M. Stoitsov, “Fast computation of the Gauss hypergeometric function 
with all its parameters complex with application to the Poschl-Teller –Ginocchio 
potential wave functions,” Computer Physics Communications, 178, 535-551, 2008. 
[99] M. Abramowitz and I.A. Stegun, editors, “Handbook of mathematical functions with 
formulas, graphs, and mathematical table,” Courier Dover Publications, 1965. 
Available online at: 
http://knovel.com/web/portal/browse/display?_EXT_KNOVEL_DISPLAY_bookid=52
8&VerticalID=0 
[100] David J. DeBrota, J. J. Swain, S. D. Roberts, S. Venkatraman, “Input modeling with 
the Johnson system of distributions,” Proceedings of the 1988 Winter Simulation 
Conference, 1988. 
[101] Manish Garg, “High performance pipelining method for static circuits using 
hetrogeneous pipelining elements,” Proceedings of ESSCIRC’03, pp. 185-188. 
[102] J. L. Hennessy et al., “Computer Architecture: A Quantitative Approach,” Morgan 
Kaufmann, 3rd edition, May 2002. 
[103] Datta, A.; Bhunia, S.; Mukhopadhyay, S.; Roy, K.; , "Delay modeling and statistical 
design of pipelined circuit under process variation," IEEE Transactions on CAD of 
Integrated Circuits and Systems, , vol.25, no.11, pp.2427-2436, Nov. 2006. 
[104] Min Pan, C.C.N Chu and H. Zhou, “Timing yield estimation using statistical static 
timing analysis,” ISCAS 2005. 
                                                                                                                                References 
 
169 
 
[105] C. E. Clark, “The greatest of a finite set of random variables”, Operations Research 
9(2), March-April, 1961, pp. 145-162. 
[106] D. Blaauw, K. Chopra, A. Srivastava and L. Scheffer, “Statistical timing analysis: 
From basic principles to state of the art,” IEEE Transanctions on Computer Aided 
Design of Integrated Circuits and Systems, vol. 27, no. 4, April, 2008. 
[107] Faiz-ul-Hassan, Binjie Cheng, Wim Vanderbauwhede, Fernando-Rodriguez, “Impact of 
device variability in the communication structures for future synchronous SoC designs,” SoC 
2009, Finland. 
[108] L. Zhang, W. Chen, Y. Hu, J. A. Gubner and C.C-P. Chen, “Correlation-preserved 
non-gaussian statistical timing analysis with quadratic timing model,” DAC ’05, pp. 83-
88. 
[109] L. Xie and A. Davoodi, “Fast and accuare statistical static timing analysis with 
skewed process parameter variation,” ISQED, 2008. 
[110] Chun-Yu Chuang, Wai-Kei Mak, “Accurate closed form parametrized block-based 
statistical timing analysis applying skew-normal distribution,” ISQED’09, 2009. 
[111] K. Chopra, B. Zhai, D. Blaauw, D. Sylvester, “A new statistical max operation in 
propagating skewness in statistical timing analysis,” ICCAD’06, November 5-9, 2006. 
[112] A. M. Ross, “Useful bounds on the expected maximum of correlated normal variables,” 
ISE Working Paper 03W-004, Aug 2003. 
[113] H. Chang, et al, “Statistical timing analysis under spatial correlations,” IEEE TCAD 
24(9), 2005. 
[114] K. Banerjee , S. J. Souri, P. Kapur , and K. C. Saraswat, “3-D ICs: A novel chip 
design for improving deep-submicrometer interconnect performance and systems-on-
chip integration,” Proc. IEEE, vol. 89, pp. 602-633, May 2001. 
[115] J. Cong, “Challenges and Opportunities for Design Innovations in Nanometer 
Technologies,” In SRC Design Sci. Concept Paper, 1997. 
[116] Giuseppe S. Garcea, Nick P. van der Meijs, Ralph H. J. M. Otten, “Simultaneous 
analytic area and power optimization for repeater insertion,” Proceedings of ICCAD 
’03. 
[117] Y. Cao, P. Gupta, A. B. Kahng, D. Sylvester and J. Yang, “Design sensitivities to 
variability: extrapolations and assessments in nanometer VLSI,” in Proc. of the 15th 
Annual IEEE International ASIC/ SOC conf., pp. 411-415, Sept. 2002. 
                                                                                                                                References 
 
170 
 
[118] S. R. Nassif, “Modeling and forecasting of manufacturing variations,” in Proc. of the 
ASP-DAC, pp. 145-149, Feb. 2001. 
[119] A.P. Chandrakasan and R. W. Brodersen, “Sources of power consumption” in low 
power digital CMOS design, Norwell, MA: Kluwer, 1995. 
[120] Rajeev Rao et al, “Parametric yield estimation considering leakage variability,” IEEE 
DAC 2004, 442-447. 
[121] E. Bolotin, I. Cidon, R. Ginosar, A. kolodny, “QNoC: QoS architecture and design 
process for Network on Chip,” vol. 50, pp. 105-128, 2004. 
[122] L. Zhang, Y. Hu, “Statistical timing analysis in sequential circuit for on chip global 
interconnect pipelining,” Proceedings of Design Automation Conference, 2004, pp. 
904-907. 
[123] N. Magen, A. Kolodny, U. Weiser, N. Shamir, “Interconnect-power dissipation in a 
microprocessor”, SLIP Conf., Feb., 2004. 
[124] R. H. Havemann and J. A. Hutchby, “High-performance interconnects: an integration 
overview,” Proc. IEEE, vol. 89, no. 5, pp. 586-601, May 2001. 
[125] S. Takahashi, M. Edahiro, and Y. Hayashi, “Interconnect design strategy: Structures, 
repeaters and materials with strategic system performance analysis (S2PAL) model,” IEEE 
Trans. Electron Devices, vol. 48, no. 2, pp. 239-251, Feb. 2001. 
[126] H. Shah, P. Shiu, B. Bell, M. Aldredge, N. Sopory, and J. Davis, “Repeater insertion 
and wire sizing optimization for throughput-centric VLSI global interconnects,” in Proc. 
IEEE/ACM Int. l Conf. Computer –Aided Design, 2002, pp. 280-284. 
[127] A. B. Kahng, S. Muddu, and E. Sarto, “Interconnect optimization strategies for high-
performance VLSI designs,” in Proc. IEEE Int. Conf. VLSI Design, 1999, pp. 464–469. 
[128] D. Pamunuwa, L.-R. Zheng, and H. Tenhunen, “Optimising bandwidth over deep 
sub-micron interconnect,” in Proc. IEEE Int. Symp. Circuits Syst., 2002, pp. IV/193–
IV/196. 
[129] A. Naeemi, R. Venkatesan, and J. D. Meindl, “System-on-a-chip global interconnect 
optimization,” in Proc. ASIC/SoC Conf., Sep. 2002, pp. 399–403. 
[130] Z. Lin. C. Spanos, L. Milor and Y. Lin, “Circuit sensitivity to interconnect 
variation,” IEEE Transactions on Semiconductor Manufacturing, vol. 11, pp. 557-568, 
Nov. 1998. 
[131] T. Fukuoka, A. Tsuchiya, H. Onodera, “Worst-case delay analysis considering the 
variability of transistors and interconnects,” ISPD’07, March, 2007. 
                                                                                                                                References 
 
171 
 
[132] V. Wason and K. Banerjee, “A probabilistic framework for power-optimal repeater 
insertion in global interconnects under parameter variations,” ISLPED’05, August 2005. 
[133] K. Agarwal, M. Agarwal, D. Sylvester and D. Blaauw, “Statistical interconnect 
metrics for physical design optimization,” IEEE Trans. On comp. Aided design of 
Integrated Circuits and Systems, vol. 25, no. 7, July 2006. 
[134] Li-Rong Zheng, Dinesh Pamunuwa, Hannu Tenhunen, “Accurate a priori signal 
integrity estimation using a multilevel dynamic interconnect model for deep submicron 
VLSI design,” in Proc. ESSCIRC, Sep. 2000, pp. 324-327. 
[135] D. Pamunuwa and H. Tenhunen, “On dynamic delay and repeater insertion in 
distributed capacitively coupled interconnects,” Proc. of ISQED’02. 
[136] J. Liu, L-R Zheng, D. Pamunuwa, H. Tenhunen, “A global wire planning scheme for 
network-on-chip,” Proc. of the ISCAS’03. 
[137] D. Pamunuwa, Li-Rong Zheng, and Hannu Tenhunen, “Maximizing throughput over 
parallel wire structures in the deep submicron regime,” IEEE Trans. on VLSI Systems, vol. 
11, no. 2, April 2003. 
[138] S.J. Lee, et al, “Adaptive Network-on-Chip with wave-front train serialization 
scheme,” Proc. Of VLSI circuits, 104-107, 2005. 
[139] R. Arunachalam, K. Rajagopal, L. T. Pileggi, TACO: Timing analysis with coupling, 
Proc. of Design Automation Conf., pp. 266-269, June 2000. 
[140] R. Gandikota, D. Blaauw, D. Sylvester, “Modeling crosstalk in statistical static 
timing analysis,” DAC 2008.  
[141] L. Dou, Z. Wang, “One high-efficiency analysis method for high-speed circuit 
network containing distributed parameter elements,” J. of Control Theory and 
Applications, vol. 3, no. 2, 117-120, 2005. 
[142] F. Rodriguez, Faiz-ul-Hassan, Wim Vanderbauwhede, “Passive skew amplification 
in coupled RC transmission lines,” TO BE SUBMITTED IN IEEE Trans., 2011. 
[143] Y. I. Ismail and E. G. Friedman, “Effects of inductance on the propagation delay and 
repeater insertion in VLSI circuits,” IEEE Transactions on Very Large Scale Integration 
(VLSI) Systems, Vol. 8, No. 2, pp. 195-206, April 2000. 
[144] James R. Thompson, Richard A. Tapia, “Nonparametric Function Estimation, 
Modelling and Simulation,” Society for Industrial Mathematics, Siam, 1990. 
                                                                                                                                References 
 
172 
 
[145] Kamsani, N.A., Cheng, B., Roy, S. and Asenov, A.  “Statistical circuit simulation 
with the effect of random discrete dopants in nanometer MOSFET devices,” In: Design 
Automation and Test in Europe: Workshop W2, Impact of Process Variability on Design 
and Test, 10-14 March 2008, Munich, Germany. 
[146] Arkadiy Morgenshtein, Israel Cidon, Avinoam Kolodny, Ran Ginosar, “Low leakage 
repeaters for NoC Interconnects,” 2005. 
 
 
 
