Optimising bandwidth over deep sub-micron interconnect. by Pamunuwa, Dinesh B. et al.
Optimising Bandwidth Over Deep Sub-micron Interconnect' 
Dinesh Pamunuwa, Li-Rong Zheng and Hannu Tenhunen 
Royal Institute of Technology (KTH), IMIT LECS 
Electrum 229, SE-164 40 Kista, Sweden 
dineshllrzhenglhannu@ele. kth.se 
ABSTRACT 
In deep sub-micron (DSM) circuits proper analysis of 
interconnect delay is very important. When relatively long 
wires are placed in parallel, it is essential to include the 
effects of cross-talk on delay. In a parallel wire structure, 
the exact spacing and size of the wires determine both the 
resistance and the distribution of the capacitance between 
the ground plane and the adjacent signal carrying conduc- 
tors, and have a direct effect on the delay. Repeater inser- 
tion depending on whether it is optimal or constrained, 
affects the delay in different ways. Considering all these 
effects we show that there is a clear optimum configura- 
tion for the wires which maximises the total bandwidth. 
Our analysis is valid for lossy interconnects as are typical 
of wires in DSM technologies. 
1. INTRODUCTION 
Interconnects in deep sub-micron technologies are typi- 
cally very lossy so that the RC delay dominates. In order 
to keep the resistance to a minimum the aspect ratio 
(widthiheight) of wires are kept low, which however gives 
rise to increased inter-wire capacitance. This inter-wire 
capacitance results in cross-talk which has an effect on the 
delay, depending on how the aggressor lines switch. 
Cross-talk is of especial significance in uniformly coupled 
parallel wires, causing unpredictable delays[l]. Recently 
there has been a profusion of research into block oriented 
architectures[2] which are being proposed as being suita- 
ble to overcome the interconnect bottleneck and to cope 
with complexity. The intra-block communication link in 
all of these will consist of a large number. of parallel 
wires, with uniform coupling over most of the wire length 
in all probability. The question we pose and attempt to 
answer in this paper is, given a fixed area in which to dis- 
tribute the interconnect, what is the best configuration of 
the wires to obtain the highest bandwidth? Is it to have a 
few fat wires and a high signaling frequency, or a large 
number of small wires with a lower signaling frequency, 
or anything in between? How does the wire spacing affect 
overall bandwidth? What effect does repeater insertion 
have? 
In such an analysis, it is essential that the wire capaci- 
tance is distributed over a ground component and cou- 
pling component, and the effect of cross-talk an delay is 
taken into account. Our electrical model for investigating 
delay is shown in Fig. 1. Each line, except the two periph- 
eral lines are coupled on both sides to aggressors. The 
exact effect of cross-talk on delay depends on the switch- 
ing alignment of the aggressors with respect to the victim. 
In [l] and [3] we give an analysis of delay in such uni- 
formly coupled lines with different switching pattems. 
Since in general, it is necessary to design for the worst- 
4 
W, t 
Figure 2. Minimum Bit Period Figure 1. Configuration for investigating effect of cross talk 
1. The funding suppoe of Si& and that of Vinnova via the Socwan and Exsite Programs are gratefully acknowledged 
0-7803-7448-7/02/$17.00 02002 IEEE IV - 193 
Authorized licensed use limited to: Lancaster University Library. Downloaded on May 07,2010 at 14:54:42 UTC from IEEE Xplore.  Restrictions apply. 
case, the appropriate expressions (which are reproduced in 
this article) are used for calculation of bandwidth. It is 
equally important to have accurate closed form equations 
for the capacitance terms shown in the figure. We use the 
expressions given in [4] which are empirical equations 
exhibiting good accuracy. 
The rest of this document is structured in this manner. In 
section 2 we consider the intrinsic delay of the line, and 
derive expressions for the optimal bandwidth. In section 3 
we include the effects of the non-ideal drivers and con- 
sider repeater insertion. Finally we give our conclusions. 
2. LINE DELAY 
All lines are uniformly coupled to two aggressor lines, 
except the two comer-most lines which are coupled to one 
aggressor. It has been shown that the dominant time con- 
stant approximation for the worst-case 50% delay for such 
lines is accurate to over 95% [3]. Eq. ( I )  gives the delay 
for the middle conductors while ( 2 )  gives the delay for the 
comer conductors' where the resistance and capacitances 
refer to the total values for the lines (see Fig. I ) .  
T0,5,mid = 0.4RC,+ ISIRC, ( 1 )  
To,, corn = 0.4RCS + 0.75RCc (2) 
The delay is matched to the bit period hy apportioning 
some percentage of the period to the rise and fall times so 
that a sufficient margin is allowed (Fig. 2) .  Usually 3 
delays are considered to be sufficient for the response to 
pass the 90% threshold. We chose to be more conservative 
and use 4. Hence the bit period is given by (3) and (4) 
' , , m i d  = 4T0.S,mid (3) 
Tp.corn -  TO.^. (4) - 
For N conductors, (5) gives the total bandwidth. 
(5) 
2 N - 2  BW = -+- 
TP,  corn 'P, mnid 
Now with reference to Fig. 1, R, C,, and C, are defined 
by equations ( 6 )  through (12).  The expressions for the 
fringing and mutual capacitances (where is the permit- 
tivity and pan empirical constant for the technology) were 
originally presented in [4]. These are accurate to over 90% 
when the following inequalities are satisfied. 
0.3 < ( w / h )  < 30, 0.3 < ( U h )  < I O ,  
For DSM, typical geometries are well within this range. It 
0.3 < ( s / h )  < IO 
I. Considsring the d i f f m t  delays on the comer sonductom m y  be M 
unnecessary rcfinnnmt far Certain applications where the same pig. 
naling kq-cy is used on all the lines. 
wl 
P ' h  c = E - (9) 
0.222 h 1.34 
C ,  = Cf-C>+~k[0.03(;)+0.83$-0.07(;) I(;) I 
must also he bome in mind that the delay expressions we 
give here are only valid for lossy lines, where the induct- 
ance has a minor effect and the delay is appreciably 
greater than the time of flight delay at the speed of light. 
Typically interconnects in DSM are such lines, where the 
line behaviour is diffusive in nature even for very fast rise 
times. 
From the geomehy of Fig. 1, we get the following rela- 
tion. 
W ,  = N W + ( N - I ) S  (13) 
Our problem definition is, for a constant width W,, what 
are the N(number of conductors), S (spacing between con- 
ductors), and W(width of the conductors) values that give 
the optimum bandwidth. Of the three, only two are inde- 
pendent, as the third is defined by (13) for any values that 
the other two may take. We choose to vary Nand S. The 
variables are discrete as S and W are dictated by the proc- 
ess as well, and there are geometrical limits which cannot 
be exceeded. As a first example, given in Fig. 3 is a plot of 
~ ' . N=19, S 4 . 3 p .  W 4 . 5 ~  . .  
- "  
I  
%-2%sm 
- d C a b b n . N  
Figure 3. Variation of Bandwidth with N and S 
IV - 194 
Authorized licensed use limited to: Lancaster University Library. Downloaded on May 07,2010 at 14:54:42 UTC from IEEE Xplore.  Restrictions apply. 
1c * lGh - - 
~ - Y- 
t 
Ih _ _  EA RA Y. 
e T  - _  +-[t"" % - -   -Gc 1- - i-- - - - 
Figure 4: Repeater Insertion in a long interconnect 
.. . . ... 
: . .. , . ' . . . .  . .  
the total bandwidth varying with Nand S for a width of 
15km in a representative 50nm technology. Copper wires 
are assumed with p=1.65, h = 0 . 2 p ,  and r = 0.21pm. The 
minimum wire width and spacing are each assumed to be 
0.lpm. It can be seen that there is.a clear optimum which 
does not translate to the maximum parallelism possible 
under the technology constraints. 
i' 
1: 3. REPEATER INSERTION 
The above analysis considered only the intrinsic delay 
of the lines. In practice the source will be a MOS driver 
(usually an inverter) with a considerable output imped- 
ance. Also to reduce delay the long lines in Fig. 1 are bro- 
ken up into shorter sections, with a repeater driving each 
section. The analysis for repeater insertion is carried out 
by characterizing the non-linear buffers (inverters) by an . . . .  
output resistance R ,  and input capacitance C, If the 
number of repeaters including the original driver is k, and 
the size of each repeater is h times a minimum sued 
inverter (all lines are buffered in a similar fashion), the 
output impedance of an h sized driver becomes R&,,P, 
and the output capacitance h*Cdm, where R,, and C-,m 
are the output resistance and input capacitance of a mini- 
mum sized inverter respectively in that particular technol- 
ogy. This configuration is sketched out in Fig. 4, where the 
symbol E refers to a capacitively coupled interconnect 
as shown in Fig. 1. A complete analysis is given in [I] and 
the total delay is as defined by (14) where r, refers to the 
rise time of the signal (it is assumed the rise times are typ- 
ically much less than the delay of the line). Additionally 
the optimum k and h for minimising delay are defined by 
(1 5) and (16). 
-*-.* 
~i~~~~ 5: variation ofBandwidth with @timal ~ ~ f f ~ , i , , ~  
. .  
7 G W S s  . i . . . . ;  , ' . . . .  . . . ~ . .  . . 
. . . . . . . 
. . .% .. . . . .. . . . . . 




Figure 6: Variation of Bandwidth with a Fixed Number 
and Size of Buffers for Each Line 
of the repeaters can be calculated. It must be mentioned 
here that it is possible to pipeline the bits with a bit per 
section. In fact it is possible to obtain a gain in the band- 
width by introducing repeaters to pipeline bits, even when 
there is no significant improvement in the overall line 
delay. We choose to ignore pipelining in this particular 
analysis. If pipelining is carried out, the bandwidth is sim- 
ply multiplied by the appropriate coefficient. 
For the same boundary conditions as those correspond- 
ing to the plot in Fig. 3, the total bandwidth for changing N 
and S where the repeaters are optimally sized is plotted in 
Fig. 5. It can be seen that the maximum bandwidth is 
k [ .  (k+hCdr,m+2.2- + 
( 1 4 )  
2cc) 
Rdrvm c, ro,5 = k 0 7- 
C C 
c ( 0 . 4 2 +  k k  1 . 5 1 2 + 0 . 7 h C d r v m  k 
0.4RCr+ ISRC, 
0.7RdrvmCdrvm 
hop, = I----- 0.7RdrvmCs + 3.'Rdrvmcc 
(I5) kept = /-
(16) 
0.7RCdrvm 
obtained when the parallelism is the maximum allowed by 
the physical constraints of the technology. This result is Now using (14), the total delay which includes the delay 
IV - 195 
Authorized licensed use limited to: Lancaster University Library. Downloaded on May 07,2010 at 14:54:42 UTC from IEEE Xplore.  Restrictions apply. 
logical because the buffers which are optimally sized for 
each configuration compensates for the increased resist- 
ance and cross-talk effect. However optimal repeater 
insertion results in a large number of huge buffers. Also it 
bas been shown in [ I ]  that the delay curve is quite flat, and 
the sizes can be reduced with little increase in delay. 
Instead of optimal repeater insertion, if a constraint is 
imposed on the number and size of buffers for each line, 
the optimal configuration does not equate to the maximum 
number of wires. Given in Fig. 6 is a plot of the bandwidth 
when a constraint of k=l and h=20 is laid down for each 
line. The optimal configuration corresponds to N=44, so 
that the Nhk product is 880. 
Typically the constraint would be on the total area occu- 
pied by the buffers, and hence k and k would be affected 
by N. If (18) describes the area constraint on the buffers, 
the optimum configuration is the solution to the con- 
strained optimisation problem of maximising ( 5 )  subject 
to(18). 
Nkh SA, , ,  (18) 
This adds a third independent variable to the objective 
function (5 ) ,  of either k or h since A, is a constant. It is a 
simple matter to incorporate all the relevant equations pre- 
sented here into an iterative algorithm that can be used to 
obtain a computer generated solution. As a final example, 
assume that A, is set to 500 for the same boundary condi- 
tions. It turns out that the optimal configuration is when 
&I, and shown in Fig. 7 is a plot of the bandwidth where 
kl and h changes according to N. 
; . ... . . . . . , . . N=32, S4.3pm. W=O.IBlun 
* n d r M - *  
Figure 7: Variation of Bandwidth with Constrained 
Buffering 
4. CONCLUSIONS 
In this paper we have canied out an analysis of delay 
with worst-case cross-talk in capacitively coupled lossy 
interconnects. In the important configuration of a large 
number of parallel lines, several factors combine to affect 
the delay in various ways. Increased parallelism is desira- 
ble in general, but when the total area that is allowed for 
the wires is constrained, this results in smaller, more 
tightly coupled wires, causing greater line delay. Repeater 
insertion and especially area constrained repeater insertion 
further complicates the issue. However we have demon- 
strated models and a method of building an objective func- 
tion that takes all these factors into account and predicts 
the optimum configuration. Additional considerations 
such as native word width constraints can easily be incor- 
porated. We propose these equations and the method of 
analysis as being suitable for calculations early in the 
design flow, as the simplicity of the expressions allow for a 
large number of iterations. 
5. ACKNOWLEDGEMENTS 
Productive discussions with Johnny Oberg, Axel Jant- 
sch and Mikael Millberg which helped identify the 
requirements from a systems perspective are gratefully 
acknowledged. 
6. REFERENCES 
[l] D. Pamunuwa and H. Tenhunen, “On dynamic delay 
and repeater insertion io distributed capacitively cou- 
pled interconnects” in Proc. ISQED, Mar. 2002. 
[2] D. Sylvester and K. Keutzer, “Getting to the bottom of 
deep submicron 11: a global wiring paradigm”, in Proc. 
ISPD, 1999, pp. 193-200. 
[3] H. Tenhunen and D. Pamunuwa, “On dynamic delay 
and repeater insertion”, in Proc. ISCAS, May 2002. 
[4] L-R. Zheng, D. Pamunuwa and H. Tenhunen, “Accu- 
rate a priori signal integrity estimation using a 
dynamic interconnect model for deep submicron VLSI 
design”, in Proc. ESSCIRC, Sept. 2000, pp- 324-327. 
[5] J. Rubinstein, P. Penfield and M. Horowitz “Signal 
delay in RC tree networks”, IEEE Trans. Computer 
Aided Design, vol CAD-2, no. 3, pp. 202-211, July 
1983. 
[6] H. Bakoglu, Circuits, Interconnections, and Packaging 
for VLSI, Reading, MA: Addison Wesley 1990 
[7] Y. Ismail and E. Friedman, “Effects of inductance on 
the propagation delay and repeater insertion in VLSI 
circuits”, IEEE Tram. VLSI Systems, April 2000, vol. 
8, pp. 195-206 
[8] S. Dar, M. Franklin, “Optimum buffer circuits for driv- 
ing long uniform lines”, IEEE J.  Solid State Circuits, 
vol. 26, pp. 32-40, Jan. 1991. 
IV - I96 
Authorized licensed use limited to: Lancaster University Library. Downloaded on May 07,2010 at 14:54:42 UTC from IEEE Xplore.  Restrictions apply. 
