Interconnect delay modeling under exponential input by Vembu, Rajesh K
Retrospective Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 
1-1-2003 
Interconnect delay modeling under exponential input 
Rajesh K. Vembu 
Iowa State University 
Follow this and additional works at: https://lib.dr.iastate.edu/rtd 
Recommended Citation 
Vembu, Rajesh K., "Interconnect delay modeling under exponential input" (2003). Retrospective Theses 
and Dissertations. 20072. 
https://lib.dr.iastate.edu/rtd/20072 
This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and 
Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses 
and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, 
please contact digirep@iastate.edu. 
Interconnect delay modeling under exponential input 
by 
Rajesh K. Vembu 
A thesis submitted to the graduate faculty 
in partial fulfillment of the requirements for the degree of 
MASTER OF SCIENCE 
Major: Computer Engineering 
Program of Study Committee: 
Chris Chong-Nuen Chu, Major Professor 
Akhilesh Tyagi 
Paul Sacks 
Iowa State University 
Ames, Iowa 
2003 
Copyright © Rajesh K. Vembu, 2003. All rights reserved. 
ii 
Graduate College 
Iowa State University 
This is to certify that the master's thesis of 
Rajesh K. Vembu 
has met the thesis requirements of Iowa State University 
Signatures have been redacted for privacy 
lll 
TABLE OF CONTENTS 
LIST OF TABLES . 
LIST OF FIGURES 
ABSTRACT. 
CHAPTER 1. INTRODUCTION 
CHAPTER 2. EXPONENTIAL DELAY MODEL 
2.1 Why Exponential Input?. 
2.2 Exponential Delay Model 
2.2.1 Driving Point Admittance Model . 
2.2.2 Response of a RC-tree to an Exponential Input 
2.2.3 Approximation .. 
2.2.4 Delay Calculation 
2.3 Performance Measure of Exponential Delay Model 
CHAPTER 3. APPLICATION TO WIRE SIZING ..... 
3.1 Problem Formulation ....... . 
3.2 Lagrangian Relaxation Framework 
3.3 Optimization of Individual Components 
3.4 Results .................. . 
CHAPTER 4. DISCUSSION AND FUTURE WORK ... 
IV 
v 
VI 
1 
4 
4 
4 
5 
7 
9 
11 
12 
22 
23 
24 
27 
27 
31 
APPENDIX . SPECIAL MODELS AND NUMERICAL TECHNIQUES . 35 
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 
Table 2.1 
Table 2.2 
Table 2.3 
Table 2.4 
Table 2.5 
Table 3.1 
Table 3.2 
Table 3.3 
IV 
LIST OF TABLES 
Comparison between Elmore, D2M, PRIMO, H-gamma, WED and 
EDM for a small test circuit shown in Figure 2.5 ..... 
Absolute relative error of Elmore, D2M, EDM to Hspice averaged over 
14 
10 random ten RC circuits . . . . . . . . . . . . . . . . . . . . . . . . . 15 
Minimum, maximum and standard deviation of the ratio of ED M to 
Hspice for randomly generated RC trees . . . . . . . . . . . . . . . . . 16 
Minimum, maximum, standard deviation and average relative error of 
EDM and PERI with respect to Hspice for randomly generated RC trees 16 
Delays (ps) for the tree shown in Figure 2.6 for risetimes lps, lOOps, 
500ps, Ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 
0.18 µm Technology parameters 30 
Run time and area results . . 30 
Hspice delays after wiresizing 30 
v 
LIST OF FIGURES 
Figure 2.1 The model of a wire segment of length 1 and width w by a 7r - type RC 
Figure 2.2 
Figure 2.3 
Figure 2.4 
Figure 2.5 
Figure 2.6 
Figure 2.7 
circuit ............................. . 
Modeling the circuit downstream of node j by the 7r model 
Combining parallel branch admittance . . . . . 
2-segment RC network for :finding the response 
An RC tree example . . 
A RC tree with 5 sinks. 
Comparison of Hspice and Exponential Delay Metric(Zoomed) . 
5 
6 
7 
9 
13 
17 
19 
Figure 2.8 Comparison of original and approximated Exponential response 20 
Figure 2.9 Comparison of original and approximated Exponential response(Zoomed) 21 
Figure 3.1 Convergence sequence . . . . . . . . . . . . 
Figure A.1 Circuit for calculating response at leaf node 
Figure A.2 Downstream driver model 
Figure A.3 Computation of moments 
29 
37 
38 
39 
Vl 
ABSTRACT 
Interconnect has become the dominating factor in determining the performance of VLSI 
deep submicron designs. With the rapid shrinking of feature size and development in the 
process technologies, it has been observed that the resistance per unit length of the interconnect 
continues to increase, capacitance per unit length remains roughly constant, and transistor or 
gate delay continues to decrease. This has led to the increasing dominance of interconnect delay 
over logic delay, and this trend to expected to continue. With this being the main bottleneck 
in realizing high speed circuits, complete understanding of the interconnect delay and thereby 
efficient and accurate delay calculation has assumed a greater significance in physical design, 
optimization and fast verification. 
In this thesis, a interconnect delay model under exponential input is presented. Because 
of its simple closed form expression, fast computation speed, and fidelity with respect to sim-
ulation, Elmore delay model remains popular. More accurate delay computation methods are 
typically central processing unit intensive and/or difficult to implement. To bridge this gap 
between accuracy and efficiency /simplicity, a new RC delay metric for interconnects which is 
as efficient as the Elmore metric, but more accurate, is proposed. However, there is no inter-
connect delay model considering exponential input waveform existing in the literature. The 
proposed Exponential Delay Metric uses exponential waveform as input and captures resis-
tive shielding effects by modeling the downstream by a 71"-model. An application of the delay 
model to perform interconnect optimization using wire sizing is also presented. Experimental 
results show that the proposed delay model is significantly more accurate than the existing 
interconnect delay models. 
1 
CHAPTER 1. INTRODUCTION 
As CMOS process technologies scale into nanometer regime, interconnect networks are 
becoming increasingly dominant in terms of total path delay or total path capacitance. Conse-
quently, physical design optimization tools such as floorplanning, placement, and routing are 
becoming more "timing-centric" than the previous generation of tools. For such a tool to be 
effective, it must be able to efficiently compute interconnect delay since several millions of such 
delay calculations are required during design optimization. 
Circuit simulators such as SPICE and AS/X are excellent when one wants to compute 
delays very accurately, but they are quite inefficient, especially for linear circuits. Since inter-
connects are linear circuits, one can apply model-order reduction techniques such as asymptotic 
waveform evaluation (AWE) (1] to compute delays with virtually the same accuracy as simu-
lation, but with higher efficiency. Essentially these methods compute the first few dominant 
poles and the corresponding residues of the impulse response of the circuit. For step inputs, the 
impulse response is then inverted to obtain the step response as a sum of exponentials. This 
nonlinear equation is then solved for the 50% point to yield the step input delay. For non-step 
inputs, convolution is performed with the input waveform before the nonlinear solution. As 
noted in (2], three moments or, equivalently, two poles are typically adequate to derive reason-
able accuracy within AWE. One can compute two poles directly from the circuit moments then 
iteratively solve a single equation by one of several methods, e.g., Newton-Raphson iterations. 
Model order reduction methods using moments or Krylov subspace methods have effectively 
addressed the interconnect modeling problem for back-end design verification, however, all of 
these methods produce transfer function models for which a transcendental equation must be 
solved to obtain the delay. The implicit nature of these models makes them impractical for 
2 
most front-end design applications or for use in the inner-loop during design optimization. 
Given its explicit nature of a simple closed form and ease of calculation, Elmore delay [3] 
is widely used as the delay metric of choice within physical design optimization algorithms. 
Although it has been proven to be the upper bound of the propagation delay [4], Elmore delay 
is known to be extremely inaccurate in some cases because it ignores the effect of resistive 
shielding. For actual deep submicrometer technologies, the Elmore delay can result in errors 
of several hundred percent [5]. Moreover, it can be shown that the first moment of the im-
pulse response is sometimes incapable of even approximating delay sensitivities with respect to 
changing path resistances, which is becoming increasingly important due to resistive shielding 
effects. This lack of correlation between front-end delay metrics and back-end model order 
reduction methods can produce convergence problems for top-down design methodologies. 
Various delay metrics have been proposed that achieve better accuracy than Elmore de-
lay, such as those proposed in [6],[8]. Among them, some metrics such as [7] and [2] try to 
construct a stable two-pole approximation and provide an explicit solution to the signal delay. 
Another metric in [8] simplifies the two-pole approximation to a single pole by matching the 
transfer function. The resulting delay metric is simple but its accuracy is far from satisfactory. 
More recently, an empirical metric, D2M, was proposed[6]. Despite its simplicity, D2M has 
remarkably high accuracy at the far-end nodes. 
In almost all of these works, the delay metric assumes a step excitation. In any practical 
application, the interconnect is driven by a non-linear device and the driving point waveform 
is not a step. It is common practice for timing analyzers to replace the non-linear driver by 
an ideal voltage source generating a saturated ramp signal that has the same 10-90 transition 
time as the original waveform. Thus, any practical delay metric should be able to handle 
non-zero input slew. Recognizing this, the authors in [9] propose adding a third dimension to 
the two-dimensional lookup table used for computing step delay. While being accurate, this 
method requires a carefully constructed table which is made harder by the fact that input slews 
can vary over a wide range, especially during the initial stages of design. In the above cited 
methods which compute delay directly as a function of moments, either through a lookup-
3 
table [10, 9] or an explicit formula [7, 6], a form of "moment massaging" was proposed in [9] 
to handle non-step inputs. The impulse response moments are modified to account for the 
non-zero input slews and these modified moments are used in the original step formula. The 
advantage of this approach is that the delay metric remains a closed-form formula even for 
ramp inputs. However, the formula is only valid for very fast input slews and has large error 
for even moderately slow inputs. The authors in (13] derive equations for the approximating 
function used in the RC tree analysis on the assumption of a ramp input, and presented a 
lower and upper delay bounds for the same. However, no work has considered an exponential 
input waveform for the calculation of the interconnect delay. 
In this thesis, we propose a new delay metric for interconnect trees under exponential input 
called EDM (Exponential Delay Metric), which is accurate yet easy to compute. The response 
of the exponential input for a 2-segment RC network is computed first, which is used for the 
entire interconnect tree by recursively reducing the downstream at each node to a 7r model. 
The response, which is a sum of four exponential function, is approximated to a sum of two 
exponential function by minimizing the error, obtained by Least square method, using Newton 
Raphson technique and the delay is computed by fitting the response by a quadratic function. 
Then this model is used for performing interconnect optimization such as wire sizing. The 
delay model focus exclusively on RC trees that have no directional coupling path to ground, a 
reasonable assumption for most on-chip interconnects. The remainder of the thesis is organized 
as follows. Chapter 2 describes the exponential delay metric. Chapter 3 presents a simple wire-
sizing application of the delay metric. The thesis is concluded with a discussion and future 
work in Chapter 4. 
4 
CHAPTER 2. EXPONENTIAL DELAY MODEL 
2.1 Why Exponential Input? 
In this chapter, a delay metric for RC interconnect trees considering an exponential input 
excitation is presented. 
In almost all the interconnect delay models that have been proposed till now, the delay 
metric assumes a step excitation. However, in most practical applications, the interconnect 
is driven by a non-linear device and the driving-point waveform is not a step. Hence a more 
practical solution is to model the input to handle the input slew. A few delay models have 
considered modeling the input by a linear saturated ramp that has the same 10-90 transistion 
time as the original waveform but a more realistic and practical approach would be to model 
the input by an exponential input as it characterizes the original waveform better than either 
an idealistic step waveform or a linear ramp. However, there has not been any previous work 
in literature which presented a delay model for an exponential input although some models 
like PERI [35] and AWE can be modified to compute the delay for an exponential input. 
The rest of the chapter is organized as follows. First a brief description of the wire model 
and the downstream model used for the delay model is given. Then the delay metric is explained 
in detail followed by the approximation technique and the method to calculate the delay. A 
performance measure of the proposed delay model concludes the chapter. 
2.2 Exponential Delay Model 
In the Exponential Delay Model (EDM) formulation, the driver is modeled as a resistor 
since the driver is assumed to be in the linear region. A wire is modeled as a 71"-type resistor-
capacitor (RC) model. Figure 2.1 shows an interconnect wire of length l and width wand its 
5 
w • 
Figure 2.1 The model of a wire segment of length 1 and width w by a 7T -
type RC circuit. 
corresponding RC model. The values l4nt and Cint represent the unit interconnect resistance 
and capacitance respectively and are calculated using equations 2.1 and 2.2. 
l4nt = 
where 
Ca is the unit area capacitance in fF / µ m2 
CJ is the unit effective fringing capacitance in fF / µ m 
r o is the sheet resistance in n / square 
2.2.1 Driving Point Admittance Model 
(2.1) 
(2.2) 
For each node, EDM requires the computation of a 7T model that models the downstream 
subtree. O'Brien and Savarino [11] proposed a method for computing the first three Taylor 
series coefficients of the driving point admittance Yj ( s) at node j. These coefficients are used 
to calculate the three components of their proposed 7T - model (R11", Cnj, C11) at node j as 
shown in the figure 2.2. 
Their algorithm starts at the leaf node of an RC tree and works back to the source (driver) 
in a finite sequence of steps in a bottom-up fashion. Suppose that the input admittance of the 
circuit downstream of node j is Yj(s ), then it can be expanded in a Taylor series about s=O as 
follows: 
6 
y APP (s) 
:::=} j 
Figure 2.2 Modeling the circuit downstream of node j by the 7r model 
}j(s) = Yoj + YljS + Y2jS2 + Y3jS3 + ..... . (2.3) 
The first three moments (Ylj , Y2j and Y3j) of the admittance for node j are enough to 
calculate the values of the three components of the rr- model. The moment Yoj in the equation 
2.3 is assumed to be zero since there is no DC path to ground. 
The first three moments of the driving-point admittance at node j are calculated in a 
bottom-up fashion as follows: 
Yl,j = YlJ+l + Cj (2.4) 
(2.5) 
(2.6) 
When the bottom-up recursive algorithm face a split in the circuit with k branches (k >= 2) 
as in Figure 2.3, parallel admittance simply add together as follows: 
k 
Yl,P = L (Yl,D,m) (2.7) 
m=l 
k 
Y2,P = L (Y2,D,m) (2.8) 
m=l 
k 
Y3,P = L (Y3,D,m) (2.9) 
m=l 
Yp (s) 
~ 
7 
• 
• 
• 
~ 
y DI (s) 
Figure 2.3 Combining parallel branch admittance 
where yl,D,m is the first moment of the mth branch, y2,D,m is the second moment of the 
mth branch and y3,D,m is the third moment of the mth branch. 
Figure 2.3 shows how to combine parallel k branches (k >= 2) in case of multiple-load 
circuits. 
The three components of the 1f model are then given by 
Cfj = Y~j I Y3j 
Cnj Y1j - Cfj 
Rrr 2 I 3 -y3j Y2j 
(2.10) 
(2.11) 
(2.12) 
The admittance moments can be computed recursively. There are two cases: (a) the 
computation for incorporating a single RC segment (for this rules 1 and 2 of [11] are used) and 
(b) combining parallel branches. One of these two cases can be applied at each node during 
a bottom-up tree traversal, yielding a 7r model at each node. The base case assumes that 
Yi(s) = 0 if i is a sink. Note that Y1,i is equal to the downstream capacitance Cdi . 
2.2.2 Response of a RC-tree to an Exponential Input 
In order to find the delay values at all nodes in a interconnect tree, the response of the 
exponential input at that node has to be computed. Since the downstream is modeled as a Jr 
8 
- RC segment, the equivalent circuit at that node is a 2-segment RC network as shown in the 
Figure 2.4. 
Consider an input waveform of the form Vin = 1 - exp(-t/tr) , where Vin is the input 
voltage and tr is the rise time of the exponential input. When this voltage is given to a 2 
segment RC network, from the first principles, the output will be a sum of 3 exponential 
function. The output can be found by solving the differential equations obtained by applying 
the Nodal equations to the 2 segment RC network. 
As the waveform propagates through the interconnect tree, the number of exponential 
terms in the response increases thereby making it really difficult to find the response and the 
delay, as a new set of equations has to derived for each case to determine the response. In order 
to make use of the same set of equations derived for the response of a 2 segment RC network, 
to an exponential input, an approximation of the sum of 3 exponential function by a single 
exponential function has to be performed. However, from the experiments it was found that 
the single exponential function does not quite capture the shape of the sum of 3 exponential 
function. So a better approach is to approximate the response by a sum of 2 exponential 
function. It is to be noted that the sum of 2 exponential function to a 2 segment RC network 
will result in a response which is a sum of 4 exponential functions. The next section explains 
the procedure to approximate the sum of four exponential function by a sum of 2 exponential 
function. The single exponential input at the driver node is expressed as a sum of 2 exponential 
inputs with the coefficients of the 2 exponent function being 0.5 and the exponent tr (i.e.) 
1 - exp(-t/tr) 
where a1 = a2 = 0.5 and b1 = ~ = tr 
(2.13) 
(2.14) 
The response to the sum of 2 exponential input at node x shown in figure 2.4 is given by 
4 
Vax= L li(l - exp(-t/mi)) {2.15) 
i=l 
9 
R1 R2 
NN x NN y I l 
Vin Jc, Jc, 
Figure 2.4 2-segment RC network for finding the response 
where 
li -ai bI [ c2r2] 1--bi(b-bi)-a bi (2.16) 
h = -a2b~ [l _ c2r2] (2.17) 
~(b-~)-a b2 
l3 
C2T2 (2.18) -ki[l- -] 
m3 
l4 
c2r2 -k2[l - -] 
m4 
(2.19) 
mi bi (2.20) 
m2 = ~ (2.21) 
2a (2.22) m3 = b-v'b2 - 4a 
2a (2.23) m4 
b + v'b2 - 4a 
a cic2r1r2 (2.24) 
b c1 r1 + c2r2 + c2r1 (2.25) 
kz [-l - aibI(b1 -m3) _ azb~(~ -m3)] m4 bi(b-b1)-a bz(b-~)-a m4-m3 
(2.26) 
ki k [ aibi a2b~ l] - 2- + + bi(b-b1)-a bz(b-b2)-a (2.27) 
2.2.3 Approximation 
As mentioned before, we need to approximate the sum of four exponential function by a 
sum of two exponential function, if we were to use this set of equations recursively to find the 
response. There are various numerical techniques that are available to do this kind of approxi-
mation. In this work, a two step approximation procedure is used. First the squared difference 
10 
error between the original function (sum of four exponential terms) and the approximating 
function (sum of two exponential terms) is computed by Least square method. Then this error 
is minimized by Newton Raphson method. 
2.2.3.1 Least Square Method 
Least square technique is a well known, effective and widely used method to minimize the 
squared difference error while approximating functions. Hence Least square method was used 
to perform the approximation. 
Let f(t) be the sum of four exponential function and g(t) be the sum of two exponential 
function. 
4 
f(t) = Lli(l - exp(-t/mi)) (2.28) 
i=l 
g(t) = x1(l - exp(-t/y1)) + x2(l - exp(-t/y2)) (2.29) 
the error function e is defined as 
e = fo 00 I J(t) - g(t) l2dt (2.30) 
(2.31) 
The error will be a function of xi, x2, y1, y2, li, h, h, l4, m1, m2, m3, m4 as given by equation 
(2.31). The objective is to find the unknowns x1, x2, y1, Y2 which minimizes the error function. 
11 
2.2.3.2 Newton Raphson Method 
Determining the values of x1, x2, y1, Y2 that minimizes the error function involves solving 
a set of non-linear equations that is obtained by partially differentiating the error function 
with respect to the unknowns. It should also be noted that x1 + x2 = 1 (to satisfy boundary 
conditions,(i.e.) at t=oo, g(t)=f(t)=l). Because of this relation, the number of unknowns to 
be determined is just three. 
Performing the partial differentiations results in a set of 3 non-linear equations as shown 
in equation(2.32). Newton-Raphson method is used to solve this set of non-linear equations 
because it is simple, faster and has a very good convergence given a good initial guess. 
= x1 (y1 - Y2)2 + Y2(Y1 - Y2) _ t 2 limi (y1 - Y2) 
(y1 + Y2) (y1 + Y2) i=l (y1 + mi)(Y2 +mi) 
(2.32) 
Sometimes the solution obtained from the Newton-Raphson method would be undesirable 
(e.g. negative values for YI or Y2 ) . When such a case arises, bisection method is used to 
find the solution. However in most of the cases, the values obtained using Newton-Raphson 
are quite accurate . After performing many experiments, it was found that for a initial guess 
of Y1 = max(m~s), Y2 = min(m~s) and x1 = f (y1, Y2) (obtained from equating the partial 
differential of error with respect to x1 to zero) the error function is greatly minimized. 
2.2.4 Delay Calculation 
Once the response at a node is determined, the next task is to calculate the delay. However, 
the delay cannot be directly calculated from the response as it involves solving a non-linear 
equation. Hence, it has to be solved numerically. To reduce the time spent in calculating 
the delay numerically, the response, which is a sum of two exponential function, is fitted by 
12 
a second order polynomial. This is accomplished by expanding the response in Taylor series 
around a point (suitably Elmore delay or scaled Elmore delay point). Sometimes, using Elmore 
delay point or scaled Elmore delay is not a good idea as it may be very far away from the 
original delay value. A point which is calculated in the same way as Elmore delay (calculated 
for a step response )is used. Elmore delay is equal to the area under the impulse response (i.e.) 
E.D. = fo 00 (1 - Vi(x))dx (2.33) 
where Vi(x) is the impulse response. Similarly, the area under the response can be used as 
the value around which the response is expanded in Taylor series and that will be equal to 
xl yl + x2 y2 where xl, x2 corresponds to the coefficients and yi, Y2 are the exponents of the 
response at that node. 
Then the quadratic equation is solved to find the time at which the function reaches 0.5. 
The delay is then the difference between the time instant for the response to reach 50% of the 
maximum value and the time instant for the input signal to reach 50% of the maximum value. 
2.3 Performance Measure of Exponential Delay Model 
In order to test the Exponential delay model, a set of randomly generated interconnect 
tree structures were used. A C program was written to generate RC trees which takes in 
the number of sinks as input and generates a random tree topology with random values for 
resistor and capacitors. As a first example, consider the RC tree in the Figure 2.5. The 
delay values obtained from the delay metrics, Elmore, D2M, PRIMO, H-gamma, WED and 
the Exponential delay model are compared. A quick comparison between all the delay metrics 
was made possible since the same RC circuit was used in (34] . The Exponential delay model 
assumes an exponential input of 1 ps rise time while all other delay metrics assume a step 
input signal. The actual delays are determined by Hspice simulation. The delay value for the 
given circuit running Hspice for an exponential input of lps rise time is within 1 % of that of 
the step input excitation. The delay values listed under Hspice are the ones using step input. 
As can be seen from the table, like all other delay models, with the exception of H-gamma, 
13 
~pF i L2pF 
v 
in 
Figure 2.5 An RC tree example 
the exponential delay metric performs badly, but better than the other delay metrics, at the 
near-end driver node. The error at that point is around 133 while the maximum error at 
any other node is less than 33 when compared to the Hspice delay value. The exponential 
delay metric has good performance for both near-end and far-end nodes with the exception 
of the driver node. Also it tends to under-estimate the delay. This is attributed to the error 
generated during the various approximations performed while computing the delay. 
The next experiment seeks to analyze the delay metrics for single-sink RC trees. A random 
ten RC circuit connected in series is used. The resistor and capacitor values were generated 
randomly. These test cases are actually quite challenging for a delay metric since near-end 
nodes will have significant resistive shielding. A single-sink circuit also permits one to track 
accuracy trends from near-end to far-end nodes. Let node 1 be the node closest to the source, 
node 2 the second closest, etc. 
To determine actual delays, Hspice was used. 10 random circuits were generated and delays 
were computed for each of Elmore, D2M and Exponential Delay Metric as well as Hspice. Table 
2.2 presents the average absolute relative error of each metric to Hspice over all the 10 circuits. 
A value close to zero implies excellent correspondence to Hspice. 
14 
Table 2.1 Comparison between Elmore, D2M, PRJMO, H-gamma, WED 
and EDM for a small test circuit shown in Figure 2.5 
I Node I Elmore I D2M I PRJMO I H-7 I WED I EDM I HSPICE l 
1 552 299 241 194 246 170 
2 804 514 498 486 485 459 
3 996 696 699 701 698 688 
4 1128 830 836 840 855 833 
5 1200 905 909 912 943 906 
6 684 420 376 355 386 365 
7 756 492 450 431 470 466 
Absolute Relative error of quantity x with respect toy is defined as 
AbsoluteRelativeError (x,y) = llx -yll 
y 
196 
476 
700 
844 
919 
374 
453 
From the set of simulations described above, the following observations were made: 
1. Elmore and D2M delay metrics are poor approximation at the near end and it overesti-
mates the delay by a huge factor even though D2M delays corresponds closely to Hspice delays 
at the far-end nodes. 
2. EDM delay metric does exceptionally well over the other two delay metrics even at 
the driver node. Its accuracy at the far-end nodes matches those of Hspice. The error at the 
farthest node is as low as 0. 7% averagely. 
The remaining experiments consider only Hspice and Exponential Delay Metric since the 
other delay metric cannot handle exponential input. Table 2.3 examines the stability of the 
delay metric, i.e., how prone a metric is to extreme error. For each of the 8 interconnect trees 
randomly generated, a measure of the minimum and maximum ratio of the delay metric to 
Hspice and also the standard deviation of these ratio is determined. A delay metric is stable 
if it has a smaller standard deviation and maximum ratio. A total of 190 components and 95 
sinks were considered for measuring delay values. 
The following observations were made: 
1. For all the RC trees, the maximum ratio is always at the driver node except for the 
tree Tl with 3 sinks. From this it can be inferred that the exponential delay metric is more 
15 
Table 2.2 Absolute relative error of Elmore, D2M, EDM to Hspice averaged 
over 10 random ten RC circuits 
RC Avg. Abs. Rel. Error of 
Element Delay Metric to Hspice 
Elmore D2M EDM 
1 36.49 27.20 3.95 
2 9.800 7.251 0.805 
3 3.266 2.843 0.158 
4 1.542 1.301 0.025 
5 0.810 0.637 0.005 
6 0.557 0.405 0.003 
7 0.423 0.282 0.004 
8 0.358 0.231 0.004 
9 0.325 0.206 0.006 
10 0.313 0.198 0.007 
susceptible at the driver node like any other delay metric that is available. 
2. The maximum standard deviation was found out to be 0.337 while the maximum ratio 
of the Exponential delay metric to Hspice was found out to be 3.09 and the minimum ratio 
0.763. 
In order to find a measure of accuracy of the proposed model, the relative error of the EDM 
metric with Hspice was calculated on the set of randomly generated trees. PERI [35] method 
was extended to handle exponential inputs and was tested on the same set of random trees to 
provide a comparison between the proposed model and the existing techniques. 
Table 2.4 gives the maximum, minimum, standard deviation and the average of the absolute 
relative errors for all the cases. 
As can be seen from the results, it is clear that PERI has a very high maximum error. This 
was observed at the driver node for all the cases under consideration. Even though the far-end 
nodes gives a very accurate measure of delay for both the metrics, EDM outperforms PERI at 
the near end nodes. The accuracy of the PERI method also depends on the step delay metric 
that is being used for calculating the delay under exponential input. In the experiments carried 
out, Hspice was used for accurate step delay metric used for PERI. However, using some other 
metrics would have resulted in more error. 
16 
Table 2.3 Minimum, maximum and standard deviation of the ratio of EDM 
to Hspice for randomly generated RC trees 
Tree I Min. Ratio I Max. Ratio I SD I 
Tl(3 sinks) 0.763 1.03 0.099 
T2(5 sinks) 0.964 1.05 0.021 
T3(7 sinks) 0.972 1.12 0.038 
T4(10 sinks) 0.923 1.75 0.176 
T5(12 sinks) 0.905 1.99 0.205 
T6(15 sinks) 0.907 2.82 0.337 
T7(20 sinks) 0.886 3.05 0.327 
T8(23 sinks) 0.883 3.09 0.311 
Table 2.4 Minimum, maximum, standard deviation and average relative 
error of EDM and PERI with respect to Hspice for randomly 
generated RC trees 
Min. Error Max. Error SD Avg. Rel. Error 
Tree EDM PERI EDM PERI EDM PERI EDM PERI 
Tl(3 sinks) 0.002 0.004 0.237 2.64 0.093 0.537 0.0487 0.448 
T2(5 sinks) 0.000 0.003 0.047 2.68 0.016 0.845 0.0125 0.272 
T3(7 sinks) 0.000 0.002 0.124 2.64 0.035 0.704 0.0176 0.192 
T4(10 sinks) 0.020 0.000 0.754 2.63 0.164 0.588 0.0753 0.134 
T5(12 sinks) 0.000 0.000 0.985 2.61 0.197 0.533 0.0623 0.111 
T6(15 sinks) 0.004 0.000 1.825 2.62 0.329 0.478 0.0816 0.088 
T7(20 sinks) 0.004 0.000 2.050 2.62 0.321 0.414 0.0766 0.066 
T8(23 sinks) 0.003 0.000 2.092 2.42 0.305 0.357 0.0706 0.053 
From the above observations, it can be concluded that the exponential delay model is both 
highly stable and accurate (except for the driver node). There were attempts made to improve 
the accuracy at the driver node by modeling the downstream by a 2-segment RC network 
instead of the traditional 7r - model. However, the new model seems to under-estimate the 
delay by a huge margin. The special model is presented in the Appendix. 
The next set of experiments were carried out to analyze the performance of the delay model 
for various input slews. A randomly generated tree with 5 sinks (Figure 2.6) was chosen for 
performance evaluation, the results of which are given in the Table 2.5 
From the values it can be seen that, the driver node has the least accuracy while the far-end 
nodes match the Hspice delay value very closely. In fact the maximum relative error in any 
Ro 
17 
W3 JCLi W7 J 2 W5 
Wz 
4 Wg 
W4 i 
W9 
w6 J 
WIO 
l 
Figure 2.6 A RC tree with 5 sinks. 
Table 2.5 Delays (ps) for the tree shown in Figure 2.6 for risetimes lps, 
lOOps, 500ps, lns 
lps lOOps 500ps lns 
Node EDM Hspice EDM Hspice EDM Hspice EDM Hspice 
1 3.751 2.785 4.618 4.572 4.048 8.38 0.5866 12.411 
2 1854 1951 1912 1984 2274 2189 2653 2526 
3 2624 2532 2686 2565 2994 2760 3330 3071 
4 4167 4200 4179 4231 4397 4374 4635 4593 
5 5325 5323 5353 5354 5498 5495 5709 5702 
6 5856 5885 5881 5916 6035 6055 6231 6256 
7 5632 5626 5660 5657 5795 5797 5995 6004 
8 5851 5843 5880 5874 6016 6015 6215 6221 
9 6119 6148 6145 6179 6278 6318 6462 6518 
10 5964 5996 5990 6028 6122 6167 6306 6367 
CL, 
CL8 
c~ 
CL10 
18 
of the sink nodes is just 8.4 % . Also to be noted is that the delay model performs creditably 
for any level of excitation. However, its worst behavior is captured at the driver node for the 
slowest varying signal. These result shows that the model can handle any input slew. 
Figure 2. 7 shows the comparison between Hspice and Exponential Delay Metric. The graph 
shows how close all the 3 waveforms are to each other. This graph is generated for an input 
of lOOps at the driver node for the interconnect tree with 7 sinks. A zoomed version, has 
been used to show how accurate the metric is at the 50%. Figure 2.8 shows the comparison 
of original and approximated exponential responses. Figure 2.9 shows a zoomed version of 
the approximated function and the original function. It can be seen from the graphs that 
the technique used to perform approximation is really good as the graphs are so close to each 
other. 
0.6 . 
0.58 : . 
0.56: 
0.54: 
.$ 0.52 . 
0 > 
c 
Q) 0.5: . 
O'l 
~ 
0 . 
> 0.48: 
0.46 : 
0.44 : . . ... . ..... .. . 
0.42 " 
0.4 ........ . 
4 5 
19 
Comparison between Hspice and Exponential Delay Model 
6 7 
-+- : 
. .... ? · .... . .... . . . 
e T .• + 
.+ .+ 
.+: ··· ·.+ · 
·+ ·+ ·+ .. ·+·· 
+ + .+ 
. .+ ... 
+: 
8 9 
time in seconds 
- Hspice 
+ orig fn(4 exp) 
approx fn(2 exp) 
10 11 
Figure 2.7 Comparison of Hspice and Exponential Delay Metric(Zoomed) 
0.9 
0.7 
~ 0.6 
0 > 
20 
Approximation of Sum of 4 exponential by sum of 2 exponential function 
~-· 
- orig fn(4 exp) 
· approx fn(2 exp) 
~i . . . . .~~ . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . . . . 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
time in seconds 
Figure 2.8 Comparison of original and approximated Exponential response 
~ 
0 > 
. S 
Ql 
Cl 
~ 
0 > 
21 
Zoomed version of Approximation of Sum of 4 exponential by sum of 2 exponential function 
0.45 
0.4 ... . 
0.35 
0.3 
0.25 
0.2 
3 4 
- orig fn(4 exp) 
approx fn(2 exp) 
....•.. 
5 6 
time in seconds 
1 . 
7 8 9 
Figure 2.9 Comparison of original and approximated Exponential re-
sponse (Zoomed) 
22 
CHAPTER 3. APPLICATION TO WIRE SIZING 
In this chapter, an algorithm for delay minimization of interconnect trees by wire sizing is 
presented. This algorithm is based on the Lagrangian relaxation technique [19], which trans-
forms the original problem into a sequence of subproblems called the Lagrangian relaxation 
subproblems. Then a greedy algorithm can be used to solve these subproblems. 
Sizing circuit components has been proven to be an efficient method in improving the 
performance of interconnects. This is because both resistance and capacitance depend on the 
width of the component if the length is fixed. Therefore, for a given circuit, the delay can still 
be improved by changing the width of the components. 
In the past, gate delay was the dominant factor in determining circuit performance. There-
fore, gate sizing and transistor sizing have been extensively studied in the literature [26, 14, 33]. 
As the process technology has advanced, interconnect delay has played an increasingly impor-
tant role in determining the performance of the circuit and hence wire sizing has become an 
active research topic in the past few years [18, 20, 21]. Besides sizing gates and wires, buffer 
insertion and buffer sizing have been known to be effective techniques to reduce delay and have 
been extensively studied in the literature [22, 23, 24, 25]. 
Wire sizing has become a more and more popular technique to improve circuit perfor-
mance. Since a significant portion of the total circuit delay comes from interconnects nowa-
days, it makes sense to adjust the wire width to achieve a better combination of resistance and 
capacitance to minimize the delay. 
23 
3.1 Problem Formulation 
The objective of the wire sizing technique is to minimize the total area of the interconnects 
subject to a delay bound on the maximum delay for the interconnect tree. It can be easily 
modified to give algorithms for optimizing several other objectives (e.g., minimizing maximum 
delay or minimizing total area subject to arrival time specifications at outputs). Convergence 
to global optimal solutions is guaranteed for all cases. 
Wire sizing problem can be described as minimizing the total component area with respect 
to component sizes s1, ..... ,Sn subject to the constraint that the maximum delay from any input 
driver to any output load is at most some constant Ao . 
Given a circuit with one input driver, t output loads, and n wire segments, the segment 
widths are allowed to be varied in order to optimize some objective. For 1 ~ i ~ t, let Cf 
be the load capacitance of the ith output load. A factitious component called the output 
component is added which consists of all the t output loads. Let a node be a connection point 
between 2 components or the output point of the output component. 
The nodes are labeled by indices l, .... n starting from the driver node with 0 being the 
output point of the output component. For 0 ~ i ~ n, let input(i) be the set of indices of 
components directly connected to the inputs of component i. For 1 ~ i ~ n, let output(i) be 
the set of indices of components directly connected to the output of component i. 
For each component i, the area of the component is O:iSi, where Si is a vector which contains 
the widths of all wire segments and O:i is a vector which contains the constants corresponding 
to each element in Si . The arrival time of the component (i.e., the maximum delay from 
any input to the output side of the component) is ai. And the delay associated with this 
component is Di. The way to formulate the problem as a constrained optimization problem 
with a polynomial number of constraints is described below. This formulation is called the 
primal problem (PP). PP is a geometric program. 
24 
PP : Minimize i = 1, ..... ,n (3.1) 
Subject to j i:input(O) 
i = 1, ... , n and\/ j E input( i) 
All wire widths are feasible 
There are many standard methods for solving geometric programs. However, because of 
the special structure of PP, it is shown that it can be solved very efficiently by Lagrangian 
Relaxation. Lagrangian relaxation is a general technique for solving constrained optimization 
problems. The basic idea of Lagrangian relaxation is outlined in the next section. More details 
can be found in [17, 15, 16] 
3.2 Lagrangian Relaxation Framework 
Lagrangian relaxation is a useful technique for handling constrained optimization problems. 
This technique uses the idea of relaxing the explicit constraints by bringing them into the 
objective function with a set of associated weights, called Lagrangian multipliers. Then, a new 
objective function which includes the sum of the original objective function and the weighted 
constrains is formed. 
Following the Lagrangian relaxation procedure, a non-negative value called the Lagrangian 
multiplier>. needs to be introduced into each constraint on arrival time. The resulting objective 
function is as follows: 
n 
L;.. = L aisi + L Ajo(aj - Ao) 
i=l jtinput(O) 
n 
+ L L Aji(aj +Di - ai) 
i=l j£input(O) 
{3.2) 
Let s = {s1, s2, ..... , sn) and a = {a1, a2, ..... ,an) . Then , the Lagrangian relaxation sub-
problem associated with the Lagrange multipliers >. is: 
25 
LRS/>..: Maximize L>.(s,a) (3.3) 
Subject to All wire widths feasible 
Let the function Q(>..) be the optimal value of the problem LRS/>.. . A Lagrangian dual 
problem can be defined as follows: 
LDP : Maximize Q(>..) (3.4) 
Subject to >.. ~ 0 
In the original problem formulation in [19]. Theorem 6.2.4 of [17] can be used to imply 
that if >.. is the optimal solution of LD P , then the optimal solution of LRS / >.. will also be the 
optimal solution of the original problem. 
As shown in [19], the Kuhn-Tucker conditions can be used to derive a set of optimality 
conditions on lambda: 
for 1 :::; k:::; n (3.5) 
iwutput(k) jtinput(k) 
Equation 3.5 means that the sum of Lagrangian multipliers at the output side of a compo-
nent has to be equal to the sum of lagrangian multipliers at the input side of the component. 
Let µi = Ljtinput(i) Aji for 0:::; i:::; n, µ = (µo, .... ,µn), and Lµ(s) = L~=l µiDi + L~=l O'.iSi 
It is shown in [19] that after rearranging the terms, L>,(s, a) = 2:~1 µiDi+ L~=l aisi-µoAo. 
Therefore, minimizing Lµ is the same as minimizing L>. . 
For any >.. that satisfies the optimal conditions on >.., solving LRS / >.. is equivalent to solving 
the following problem: 
LRS/µ: Minimize Lµ(s) (3.6) 
Subject to All wire widths feasible 
26 
For any fixedµ, the LRS/µ can be solved by a greedy algorithm. The greedy algorithm is 
based on an iterative process that re-sizes the wire segments. The process is carried out in the 
topological order. When optimizing a component, the rest of the components in the circuit 
are fixed, it is called the local optimization of this component. Drivers and loads (sinks) are 
considered fixed since the algorithm is mainly a post-layout optimization technique. 
ln each iteration, the algorithm traverses the tree in a topological order to find the optimal 
size for each component. The algorithm LRS/ µ is explained in next section. 
Let O.x = {>. ~ O; >. satisfies the optimality conditions (i.e., Kuhn-Tucker conditions) on >.} 
Instead of considering all >. ~ 0, only >. E O.x needs to be considered. Therefore, LDP can 
be redefined as follows: 
LDP : Maximize Q(>.) 
Subject to >. E n.x 
(3.7) 
The algorithm LDP is the outer loop of the Lagrangian relaxation framework. It starts 
from an arbitrary point >. the algorithm iteratively moves from the current point to a new 
point following the subgradient direction [17]. At step k, the LRS/>. is first solved by solving 
the simpler LRS / µ. Then for each relaxed constraint, the subgradient is defined to be the 
right hand side minus the left hand side of the constraint, evaluated at the current solution. 
The subgradient direction is the vector of all the subgradients. >. is moved to a new point by 
multiplying a step size Pk to the subgradient direction and then add it to >.. After each move, 
>.is projected back to the nearest point in O.x, therefore the simpler LRS/µ is solved instead 
of LRS / >. for the next iteration. The procedure is repeated until the program converges. 
For choosing the step size, it is known (see Theorem 8.9.2 of [17] ) that if the step size 
sequence {Pk} satisfies the conditions limk--+ooPk = 0 and 2:~1 Pk = oo, then the subgradient 
optimization method will always converge. Once the algorithm SOLVE.LDP converges, the s 
computed by the algorithm SOLVE.LRS/ µ is the wire solution of the circuit. 
27 
3.3 Optimization of Individual Components 
As mentioned earlier, LRS/ µ is handled by handling each component one by one greed-
ily. This section discusses the optimization of individual components. Since, a closed form 
expression for the delay of the components is not available, the wire sizing problem had to 
be handled numerically. This is achieved by using Golden ratio search algorithm to generate 
continuous wire widths. For each wire width, the new resistance and capacitance for that wire 
is found. Then the tree is traversed in a reverse topological order to update the driving point 
admittances at each node. The delays at each node is calculated by traversing the tree in the 
topological order. A new value for the objective function is found. This procedure is carried 
out till the program converges. The optimum width corresponds to the one which results in 
the minimum objective function value. The optimum width will always be within the bounds 
since the golden ratio search ensures that the widths generated are within the bounds specified. 
This way of performing wire sizing involves updating the admittance and computing delay 
for each node while changing the width of any wire segment in the tree, thereby making it 
slower for a tree with more depth. However, because of the unavailability of the closed form 
expression for delay in terms of wire widths, this is the only other option that is available if 
we were to get a good delay optimization using wire sizing. Updating only the admittances of 
the component under consideration and its immediate parents was also tried, but that resulted 
in a huge error in the delay values and also since the delay calculation of the sinks involves 
having an updated response at its parent's node means all the nodes had to be updated to get 
a correct solution. 
3.4 Results 
The algorithm was implemented in C language and was tested on a Sun Ultra-2, 750 MHz 
machine with 8 GB Memory. The wire sizing algorithm using Elmore delay model was also 
tested on the same machine. The 0.18 µ m technology parameters listed in [32] were used for 
the circuit parameters .Table 3.1 lists the circuit parameters used. The wire sizing algorithm 
was performed on all the 8 random trees generated by the random RC tree generator. 
28 
Table 3.2 lists the delay bound, initial area occupied by the interconnect components, final 
area obtained after performing wire sizing , to meet the delay bound, by Elmore Delay model 
and Exponential Delay model, and the time taken for the wire sizing algorithm to execute. 
The delay bounds were set manually based on the maximum delay. 
The experimental results show that the area obtained by using the wire sizing algorithm 
for the proposed exponential delay model is lesser than that with the Elmore delay model 
wire sizing. The maximum area saved using exponential delay metric relative to Elmore delay 
metric, after performing wire sizing to meet the delay bounds, is as high as 48. 7%. Considering 
the current technology trend, this savings in interconnect area will increase the amount of 
additional circuitry that can be placed in the chip tremendously. 
Even though the time taken to perform the wire sizing using exponential delay metric 
is quite high when compared to the Elmore delay metric, the high accuracy obtained using 
exponential delay metric outweighs the execution time. It gives a more realistic measure of the 
circuit performance. This difference in run time is due to the fact that the exponential delay 
metric does not have a closed form solution for delay in terms of width of the component. 
Table 3.3 gives the delay values obtained by Hspice for the test circuits generated after 
performing wiresizing using Elmore delay model and EDM delay model. From the results it 
can be observed that the maximum delay obtained after performing wiresizing using EDM is 
very close to Hspice delay value while those obtained by Elmore delay metric are not quite 
good. These results show that EDM delay metric has high accuracy for all the cases. 
Figure 3.1 shows the convergence sequence for the algorithm SOLVE_.LDP on T3(7sinks). 
It can be seen from the figure that the algorithm converges smoothly to the optimal solution. 
The solid line represents the upper bound of the optimal solution and the dotted line represents 
the lower bound of it. The lower bound values comes from the optimal value Q(>.) of LRS/>. 
at current iteration. Note that the optimal solution is always in between the upper bound and 
the lower bound. So these curves provide useful information about the distance between the 
optimal solution and the current solution, and help users to decide when to stop the algorithm. 
2400 
2200 
-Ill a. 
~000 
i 
0 
1800 . 
\ 
29 
Convergence Sequence 
. . . . . ....... . 
\ .--• ..... ~ .... •-. ............ .__. ..,_ ~--
-· - Upper Bound 
· · · · · · · · · · · · · · · · · --.-- Lower Bound 
1600 .. . . .. . . . .. . 
1400L--~~~----L~~~~---'-~~~~-'-~~~~-'-~~~~.L-~~~----' 
0 10 20 30 40 50 60 
# Iterations 
Figure 3.1 Convergence sequence 
30 
Table 3.1 0.18 µm Technology parameters 
Parameters I Values I 
R9 (KO) 23.8 
R0 (O/µm) 0.04256 
Ca (fF / µm2) 0.0375 
Ct (fF/µm) 0.04 
Table 3.2 Run time and area results 
Test Delay Bound Init Area Final Area(µ m2) Runtime (s) 
Circuit (ps) (µ m2) Elmore EDM Elmore EDM 
T1(3 sinks) 1100 30576 95773 67053 0.00 0.78 
T2(5 sinks) 1700 51688 185077 133850 0.01 3.07 
T3(7 sinks) 2800 66360 217545 146268 0.00 22.13 
T4(10 sinks) 5000 90160 323763 214036 0.00 24.3 
T5(12 sinks) 5700 111608 291515 220150 0.01 16.62 
T6(15 sinks) 7000 144480 410604 302267 0.00 26.26 
T7(20 sinks) 7700 178416 452719 316145 0.01 287.4 
T8(23 sinks) 9000 197512 491255 349107 0.01 322.9 
Table 3.3 Hspice delays after wiresizing 
Test Delay Bound Hspice Delay (ps) 
Circuit (ps) With EDM Solution With Elmore Solution 
T1(3 sinks) 1100 1110 1193 
T2(5 sinks) 1700 1720 1834 
T3(7 sinks) 2800 2817 2608 
T4(10 sinks) 5000 5091 4523 
T5(12 sinks) 5700 5809 5264 
T6(15 sinks) 7000 7141 6435 
T7(20 sinks) 7700 7840 6466 
T8(23 sinks) 9000 9159 7568 
31 
CHAPTER 4. DISCUSSION AND FUTURE WORK 
The performance of VLSI circuits is improving in an astonishing speed as the new materials 
and fabrication techniques fast update their generation. Besides the new generations of process 
technology, better circuit design also improves the performance of the circuits. Industry is 
facing a big problem at this point: the circuit design tools cannot catch up the fast changing 
process technology. This has been a common problem that keeps the semiconductor industry 
from advancing to a higher level. Among these design problems, interconnect problems may be 
the least conquered problem among all physical design problems. Interconnect optimization is 
a relatively new field in circuit design as well as design automation. This thesis has presented 
a delay model and optimization technique to address this field. 
A newly developed interconnect delay model under exponential excitation was used for 
calculating delay and applied it to perform interconnect optimization. Even though this model 
is quite accurate and first of its kind considering the exponential input it can be tuned further 
to produce better results efficiently. A good approximation technique can be developed to 
handle the kind of approximation that is performed while calculating the response and hence 
the delay. Also if it is possible to obtain a closed form expression for the response and/or delay, 
then this model can replace all the existing delay models as it is accurate, fast and efficient 
and can also be applied to perform optimization. 
As part of achieving aforementioned features, a lot of different methods to perform approx-
imation was tried out. A few of them are worth discussing as they are standard techniques 
that have been used in other applications. The Pade1 approximation was used to perform 
the approximation of sum of four exponential terms to sum of two exponential terms. The 
response in time domain was transformed in to frequency domain and expanded in Taylor 
32 
series around s = 0. Since the resultant function was also known in advance (i.e. sum of 
two exponential terms), it was also transformed to frequency domain and expanded in Taylor 
series around s = 0. By matching the coefficients of first three terms in the expansion, we 
get a set of equations which are solved to find the unknown variables in the approximating 
function. Even though the approximation obtained by this method is quite good, it is not 
stable as it sometimes gives infeasible solutions. Hence this method cannot be used to perform 
the required approximation as there's no control over the conditioning of the matrix. 
Also this kind of approximation neglects the higher powers as well as negative powers, 
which causes the error as it does not consider the entire time period or frequency spectrum 
to perform the approximation. By considering more terms to perform approximation, an over 
determined system results which is again unwanted. 
Another method that was attempted was to fix a constant ratio for the 2 exponents in the 
resulting function, so that only 2 variables had to be solved for. However, even this doesnot 
work, as the coefficients are effectively fixed. Since the coefficients can be expressed in a closed 
form in terms of the exponents. So the whole solution depends on the solution of one variable, 
depending on which the approximation is excellent or poor. 
So a good approximation technique if developed, preferably to obtain closed form approx-
imation and delay formula, could result in a very good delay model. 
Also as the clock frequency becomes higher and higher, and rising/fall time also become 
shorter and shorter, the inductance effect can no longer be ignored as before. An interconnect 
delay model that includes inductance effects might be needed for use in future works. Currently, 
however, the main problem associated with inductance is delay modeling with exponential 
input as well as noise modeling for any input. Wire sizing and especially buffer insertion can 
be effective methods to reduce coupling interconnect noise [27, 28, 29, 30]. Since noise in 
high-speed circuits is becoming a more and more important issue, future work is more likely 
to include noise optimization as a part of the algorithm's objective. 
With one eye on future, a new interconnect delay model under exponential input excitation 
considering inductance effect, was attempted at. Equations to model the downstream as a 11" 
33 
RLC model were also derived. This is analogous to what 0' Brien and Savarino developed for 
RC trees. A set of rules as in [11] was developed to accomplish the same. It is the first step 
towards developing the delay model. Equations for finding the response of a 2-segment RLC 
circuit for a sum of 2 exponential function was also derived. However, it turned out to be really 
complicated and due to time constraint and it could not be pursued further. Nevertheless, it 
is a very challenging problem and in the future, if its solved, will redefine the way interconnect 
delay modeling and optimization is handled. 
On the wire sizing front, since there is no closed form expression for delay in terms of wire 
widths is available for the exponential delay model, the way the interconnect optimization is 
carried out is very time consuming and elaborate. If a nice and simple closed form expression 
for delay can be found, then the procedure to perform wire sizing can be refined further and 
made to run faster. Then this model can be used at any stage of the VLSI design cycle without 
any inhibition. 
The algorithm introduced in this thesis mainly target post-layout optimization. That 
is, the structure and the length of interconnects are already known before the optimization 
procedure. However, these algorithms can also be used as modules within pre-layout wire and 
buffer planning, such as placement and routing tools. Since buffer insertion and wire sizing 
affect the area and floor plan of a chip, these algorithms can be integrated into higher levels 
in the physical design flow. Furthermore, timing closure problem in deep sub-micron design is 
becoming a nightmare to many designers. Traditional synthesis tools can only guess at what 
the actual wire loads might be or rely on early floor planning and placement information in 
attempt to estimate timing. If the physical design of this synthesis result cannot meet timing 
requirement, iteration back to synthesis is required. As the difficulties in delay estimation of 
deep sub-micron design increase, more and longer cycles between synthesis and physical design 
are required to achieve timing closure. The algorithms introduced in this thesis have relatively 
good estimation-run time combination. Therefore, it is also possible to utilize these algorithms 
for early estimation in synthesis. 
It is predicted in [31] that the number of buffer inserted into a high-speed VLSI circuit will 
34 
keep increasing. When the processing technology advances to 70 nm (0.07µ ), [31] predicted 
that there will have about SOOK buffers needed to be inserted into a chip. This implies that the 
quality of buffer insertion will have more effect on the performance of a circuit. Also, buffer 
insertion will consume more design time in a typical design cycle, which will make buffer 
insertion algorithm a more critical part in design tools. 
For decades, the semiconductor industry has followed Moore's law, which predicted that 
the speed and number of transistors on a VLSI chip double about every 18 months. This has 
been achieved by the advancement of process technology, new materials, and better circuit 
design. Many times in the past, people had questioned the possibility of keeping Moore's 
law valid because of some unforeseen obstacles. Every time, however, new process or design 
paradigms have led the ways out and holding the Moore's law valid. Currently, interconnect 
issues in deep sub-micron process technologies are obstacles that can keep the industry moving 
forward. 
35 
APPENDIX. SPECIAL MODELS AND NUMERICAL TECHNIQUES 
Golden Ratio Search 
If f(x) is known to be unimodal on [a,b], it is possible to replace the interval with a 
subinterval on which f(x) takes on its minimum (or maximum) value. 
The golden search requires that two interior points 
c=a+(l-r)(b-a) 
and 
d =a+ r(b- a) 
be used where r is the golden ratio. 
v'S-1 
r = = 0 61803 2 . 
This results in 
a<c<d<b 
The condition that f(x) is unimodal guarantees that the functional values, f(c) and f{d), 
are less than max[ f(a) , f(b)] when there is a minimum or more than min [ f(a) , f{b)] when 
there is a maximum. 
For the case when there is a minimum: 
36 
If J(c) S f(d), then the minimum must occur in the subinterval [a,d] and b is replaced 
with d and the search is continued in the new subinterval. If f (d) S f (c) then the minimum 
must occur in [c,b] and a is replaced with c and the search is continued. 
For the case when there is a maximum: 
If J(c) 2:: f (d) then the maximum must occur in the subinterval [a,d] and b is replaced 
with d and the search is continued in the new subinterval. If f (d) 2:: f (c) then the minimum 
must occur in [c,b] and a is replaced with c and the search is continued. 
Unimodal Function: A function f(x) is unimodal on [a,b] if there exist a unique number 
pin [a,b] such that 
-f(x) is decreasing on [a,p] and increasing on [p,b] (for a minimum) 
or 
-f(x) is increasing on [a,p] and decreasing on [p,b] (for a maximum) 
Special Case for calculating response at leaf node 
Since the downstream at each node is reduced to a 7r model according to [11], the re use of 
the response equation derived for a general 2-segment RC network was possible. However, at 
the leaf node or sink node, this will not be possible since only one RC segment will be present. 
So a special case has to be considered for leaf node response. 
Let Vi(t) be the input and V0 (t) be the output response. Writing Nodal equations at node 
x shown in Figure A. l , 
and 
Vi(t) - Va(t) = iR 
i = 0 aV0 (t) 
at 
Substituting for i in Equation A.l, 
Vi(t) - Va(t) = CRV~ 
(A.l) 
(A.2) 
(A.3) 
37 
R 
V(t) 
1 
x 
V(t) 
0 
c 
Figure A.I Circuit for calculating response at leaf node 
h I 8\1,,(t) wereV0 = at 
Solving the differential equation (A.3) for initial conditions Vi (0) = V0 (0) = 0 , the response 
V0 (t) can be found. 
In general, 
Vi(t) = a1(l - exp(-t/b1)) + a2(1- exp(-t/b2)) (A.4) 
and 
3 
V0 (t) = Lli(l - exp(-t/mi)) (A.5) 
i=l 
where 
li -a1b1 (A.6) = CR-b1 
l2 
-a2~ (A.7) = 
CR-b2 
iJ 1 a1b1 a2b2 + + CR-b1 CR-b2 (A.8) 
m1 = b1 (A.9) 
m2 = b2 (A.10) 
m3 = b3 (A.11) 
38 
Figure A.2 Downstream driver model 
However, in order to apply the same approximation technique, this sum of three exponential 
function is expressed as a sum of four exponential function with coefficients lI and l2 being the 
same while l~ = l~ = l3/2 and the exponents mI and m2 being the same while m~ = m~ = m3. 
Special Driver Model 
After analyzing the results generated by the exponential delay model, it was found that 
the error was at the maximum at the driver node. In order to rectify this and improve the 
efficiency at the driver node, a new model for the downstream was developed. 
The new model will be a 2 segment RC model rather than a traditional 7r model. However, 
for constructing this model, first four terms of driving point admittance are needed. 
If thP, driving point admittance at node 1 (i.e., driver node) is expressed as: 
(A.12) 
Since there is no de path to ground, the term Yo is zero. 
By equating the first four terms YI, y2, y3, y4 , the component values shown in the Figure 
A.2 are found out to be: 
RI = Yi-Y2Y4 (A.13) 
2YIY2Y3 - Y4Y? 
R2 = -y4 + R~y~y2 - 2RIY2(y3 + RIYIY2) - RI(Y2 + Riyn
2 
(A.14) 
(Y2 + RIY?)2 
C2 = -y2 -RIY~ (A.15) R2 
CI = YI -C2 (A.16) 
39 
c i+l 
Figure A.3 Computation of moments 
The computation of admittance terms for the circuit shown in the Figure A.3 are calculated 
as follows: 
YI,i = Yl,i+l + Ci+l 
Y2,i = Y2,i+l - ~+1(Y1,i+i)2 
Y3,i = Y3,i+I - 2 Ri+l (Y1,H1)(y2,i+1) + Rr+1 (y1,H1)3 
Y4,i = Y4,i+l - 2~+1(Y1,H1){y3,H1) - ~+1(Y2,H1)2 
+3Rf+1(Y1,H1)2(Y2,H1) - Rf+1(Y1,H1)4 
Delay Computation 
{A.17) 
(A.18) 
(A.19) 
(A.20) 
The objective is of the delay model is to compute the delays at all the nodes in the inter-
connect tree. But since the response is a sum of two exponential terms (after approximation), 
there is no direct analytical formula that can be formulated in order to find the delay in terms 
of the coefficients and exponents of the response. 
Hence, numerical techniques have to employed in order to compute the time at which the 
response reaches 50% of the maximum value. In this section, a method to calculate the delay 
from the response is presented. 
40 
Let g(t) be the original response, a sum of two exponential function. Expanding g(t) in 
Taylor series around a point a and taking the first three terms, 
g(t) ai (1 - exp(-t/b1)) + a2(1 - exp(-t/b2)) (A.21) 
t-a (t-a)2 
ai [1 - exp(-a/bi) + -b-exp(-a/b1) - b2 exp(-a/f>-i)] + 
1 2 1 
t - a (t - a) 2 
a2[1 - exp(-a/b2) + ~exp(-a/f>-i) - 2 exp(-a/f>-i)] (A.22) 
"'2 2b2 
Solving the quadratic function A.22 in t for g(t) = 0.5 , yields the time taken for the 
response to reach 50%. From this value, say to.5 , the delay at that node can be determined 
and is given by 
Delayi = to.5,i - to.5,in 
where 
Delayi is the delay at node i 
to.5,i is the time taken for the response at node i to reach 50% 
to.5,in is the time taken by the input to reach 50% of the maximum value and is given by 
to.5,in = ln(2)tr 
where tr is the input rise time. 
41 
BIBLIOGRAPHY 
[1] L. T. Pillage and R. A. Roherer, "Asymptotic waveform evaluation for timing analysis," 
IEEE Trans. Computer-Aided Design, vol. 9, pp.352-366, Apr.1990 
[2] B. Tutuianu, F. Dartu, and L. Pileggi, "Explicit RC-circuit delay approximation based on 
the first three moments of the impulse response," in Proc. IEEE/ A CM Design Automation 
Con/., June 1996, pp. 611-616. 
[3] W. C. Elmore, "The transient response of damped linear network with particular regard 
to wideband amplifiers," J.Applied Physics,1948, pp. 55-63. 
[4] R. Gupta, B. Tutuianu, and L. T. Pileggi, "The elmore delay as a bound for RC trees 
with generalized input signals," IEEE Trans. Computer-Aided Design, vol. 16, No. 1, pp. 
95-104, Jan. 1997. 
[5] L.T. Pileggi, " Timing metrics for physical design of deep submicron technologies," in 
Proc. ACM/SIGDA International Symposium on Physical Design, pp. 28-33, 1998. 
[6] C. J. Alpert, A. Devgan, and C. V. Kashyap, "RC delay metric for performance opti-
mization," IEEE Trans. Computer-Aided Design, vol. 20, No. 5, pp. 571-582, May 2001. 
[7] A. B. Kahng and S. Muddu, "An analytic delay model for RLC interconnects", IEEE 
Trans. Computer-Aided Design, vol. 16, pp. 1507-1514, December 1997. 
[8] A.B. Kahng and S. Muddu, "A general metodology for response and delay computations 
in VLSI interconnects," Technical Report TR-940015, UCLA CS Department, Univ. Cal-
ifornia, Los Angeles, 1994. 
42 
[9] T. Lin, E. Acar, and L. T. Pileggi, "H-gamma: An RC delay metric based on Gamma 
distribution approximation of the homogeneous response," Proc. IEEE/ACM Intl. Conj. 
Computer-Aided Design, 1998. 
[10] R. Kay and L. T. Pileggi, "PRIMO:Probability interpretation of moments for delay cal-
culation," Proc. IEEE/ACM Design Automation Conference, 1998. 
[11] P. R. O'Brien and T. L. Savarino, "Modeling the driving-point characteristic of resistive 
interconnect for accurate delay estimation," in Proc. IEEE/ACM Int. Conj. Computer-
Aided Design, Nov. 1989, pp. 510-515. 
[12] J. Cong and K. S. Leung, "Optimal wiresizing under Elmore delay model,'' IEEE Trans. 
Computer-Aided Design, vol. 14, pp. 321-336, Mar. 1995. 
[13] E. Friedman and J.H. Mulligan,Jr., "Ramp Input Response of RC Thee Networks," in 
IEEE ASIC Conference 1996. 
[14] J.P. Fishburn and A. E. Dunlop," TILOS: A posynomial programming approach to tran-
sistor sizing,'' In Proc. IEEE Intl. Conj. on Computer-Aided Design, pp. 326-328, 1985 
[15) M.L. Fisher, "An application oriented guide to lagrangian relaxation,'' Interfaces, vol. 15, 
no. 2 , pp. 10-21, March-April 1985. 
[16) D. G. Luenberger, Linear and Nonlinear programming, Addison Wesley, second edition, 
1984. 
[17) M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming : Theory and 
Algorithms, John Wiley & Sons, Inc., second edition, 1993. 
[18] J. Cong and K. S. Leung, "Optimal wiresizing under the distributed elmore delay model,'' 
IEEE Transactions on Computer-Aided Design, vol. 14, no. 3, pp. 321-336, March 1995. 
[19) C. P. Chen, C. C. N. Chu and D. F. Wong, "Fast and exact simultaneous gate and wire 
sizing by lagrangian relaxation," IEEE Transactions on Computer-Aided Design, vol. 18, 
no. 7, pp. 1014-1025, July 1999. 
43 
[20] C. P. Chen, H. Zhou and D. F. Wong, "Optimal non-uniform wire-sizing under the elmore 
delay model," Proc. ACM/IEEE International Conference on Computer-Aided Design, 
pp. 38-43, 1996. 
[21] S. S. Sapatnekar, "RC interconnect optimization under the elmore delay model," Proc. 
ACM/IEEE Design Automation Conj., pp. 387-391, 1994. 
[22] L. P. P. P. van Ginneken, "Buffer placement in distributed RC-tree networks for minimal 
elmore delay," Proc. Intl. Symp. on Circuits and Systems, pp. 865-868, 1990. 
[23] C. Alpert and A. Devgan, "Wire segmenting for improved buffer insertion," Proc. 
ACM/IEEE Design Automation Conj., pp. 588-593, 1997. 
[24] S. Dhar and M. A. Franklin, "Optimal buffer circuits for driving long uniform lines,'' IEEE 
J. Solid-State Circuit, vol. 26, no. 10, pp. 32-40, January 1991. 
[25] N. Hedenstierna and K. 0. Jeppson," CMOS circuit speed and buffer optimization," IEEE 
Trans. Computer-Aided Design, vol. 6, no. 2, pp. 270-281, March 1987. 
[26] M. A. Cirit, "Transistor sizing in CMOS circuits," Proc. ACM/IEEE Design Automation 
Conj., pp. 121-124, 1987. 
[27] A. Devgan, " Efficient coupled noise estimation for on chip interconnects," Proc. 
ACM/IEEE Intl. Conj. on Computer-Aided Design, pp. 147-153, 1997. 
[28] L. H. Chen and M. Marek-Sadowska, " Aggressor alignment for worst-case coupling noise,'' 
Proc. A CM Intl. Symposium on Physical Design, pp. 48-54, 2000. 
[29] L. He and K. M. Lepak, " Simultaneous shield insertion and net ordering for capacitive 
and inductive coupling minimization," Proc. ACM Intl. Symposium on Physical Design, 
pp. 55-60, 2000. 
[30] Y. I. Ismail, E. G. Friedman and J. L. Neves, "Equivalent elmore delay for RLC trees," 
IEEE Transactions on Computer-Aided Design, vol. 19, no. 1, pp. 83-97, January 2000. 
44 
[31] J. Cong, " Challenges and opportunities for design innovations in nanometer technologies," 
Semiconductor Research Corporation working paper, 1999. 
[32] J. Cong, L. He, K. Y. Khoo, C. K. Koh, and Z. Pan, "Interconnect design for deep submi-
cron ICs," Proc. ACM/IEEE Intl. Conj. on Computer-Aided Design, pp. 478-485, 1997. 
[33] S.S. Sapatnekar, V. B. Rao, P.M. Vaidya and S. M. Kang, "An exact solution to the tran-
sistor sizing problem for CMOS circuits using convex optimization," IEEE Transactions 
on Computer-Aided Design, vol. 12, no. 11, pp. 1621-1634, November 1993. 
[34] F. Liu, C. Kashyap, C. J. Alpert, "A delay metric for RC circuits based on the weibull 
distribution," Proc. IEEE/ACM Intl. Conj. Computer-Aided Design, 2002. 
[35] C. V. Kashyap, C. J. Alpert, F. Liu, and A. Devgan, "Closed form expressions for extend-
ing step delay and slew metrics to ramp inputs," Proc. ACM Intl. Symposium of Physical 
Design, pp. 24-29, April 2003. 
45 
ACKNOWLEDGEMENTS 
I would like to express my sincere appreciation and thanks to my advisor, Dr. Chris C.-N. 
Chu, for providing me with an opportunity to perform research in the field of Electronic Design 
Automation. His encouragement, guidance and suggestions made working on this thesis an 
exciting experience. I would also like to thank my group members for participating in useful 
discussions and for their invaluable suggestions. 
I would also like to thank Dr .Paul Sacks and Dr. Roger Alexander for their invaluable 
suggestions. 
I am also greatly indebted to my family, who have been a constant source of encouragement 
and support through all the years of my education. 
I would also like to thank my friends for standing by me during all times and also for 
providing some enjoyable moments during my stay at Iowa State University. 
