Noise-Constrained Performance Optimization by Simultaneous Gate and Wire Sizing Based on Lagrangian Relaxation by Hui-ru Jiang et al.
Noise-Constrained Performance Optimizationby Simultaneous Gate and Wire
Sizing Based on Lagrangian Relaxation
￿
y
Hui-Ru Jiang
1, Jing-Yang Jou
1, and Yao-Wen Chang
2
1Department of Electronics Engineering,National Chiao Tung University, Hsinchu30010, Taiwan
2Department of Computer and Information Science,National Chiao Tung University, Hsinchu30010, Taiwan
Abstract
Noise,as wellas area,delay,and power,is oneofthe mostimpor-
tant concerns in the design of deep sub-micron ICs. Currently exist-
ing algorithms can not handle simultaneous switching conditions of
signalsfornoiseminimization. In this paper,wemodelnotonlyphys-
ical coupling capacitance,but also simultaneous switching behavior
for noise optimization. Based on Lagrangian relaxation, we present
an algorithm that can optimally solve the simultaneous noise, area,
delay,and poweroptimization problem by sizing circuitcomponents.
Our algorithm, with linear memory requirement overall and linear
runtime per iteration, is very effective and efﬁcient. For example,for
a circuit of 6144 wires and 3512 gates, our algorithm solves the si-
multaneous optimization problem using only 2.1 MB memoryand 47
minute runtime to achieve the precisionof within 1% erroron a SUN
UltraSPARC-I workstation.
1 Introduction
With decreasing feature sizes, higher clock rates, and increasing
interconnect densities, noise is getting a greater concern of compara-
ble importance to power, area, and timing in integrated circuits [13].
While power, area, and timing have been extensively discussedin the
recentliterature, e.g., [2, 3, 4, 11], relatively less work has been done
on noise.
Noise profoundly affects the performance of a circuit, especially
in the deep sub-micron regime. Noise is an unwanted variation mak-
ing the behaviorof a manufactured circuit deviate from the expected
response [12]. The deleterious inﬂuences of noise include malfunc-
tioning and timing change, caused by switching behavior. Crosstalk
is a type of noises introduced by an unwanted coupling between a
nodeandits neighboringwire orbetweentwo neighboringwires. For
example, two adjacentwires form a coupling capacitorand a mutual
inductor. The inductiveeffects [10] mustbe consideredascircuit fre-
quenciesincrease above 500 MHz. The effects are beyond the scope
of this paper.
In this paper, we focus on the capacitive effects of crosstalk. We
refer to the capacitancecreatedby the physicalgeometry asthe phys-
ical coupling capacitance. The physical coupling capacitance is di-
rectly proportional to the overlap length of adjacent wires and is in-
versely proportional to the distance between them. There exist other
models to view the physicalcoupling capacitancefrom different per-
spectives, e.g., [6, 15]. Coupling capacitance is dominated not only
￿The work of Hui-Ru Jiang and Jing-Yang Jou was partially supported by the Na-
tionalScienceCouncilof TaiwanROC underGrantNo.NSC88-2215-E-009-070.Email:
huiru@cis.nctu.edu.tw,jyjou@bestmap.ee.nctu.edu.tw
yThe work of Yao-Wen Chang was partially supportedby National Science Council
ofTaiwanROC underGrantNo’sNSC88-2622-E-009-004andNSC88-2218-E-009-056.
Email: ywchang@cis.nctu.edu.tw
byphysicalgeometry,butalsobyswitchingconditions [9]. However,
currently existing literature handles only physical coupling capaci-
tance. The inﬂuenceof switching conditions can be explained by the
Miller and the anti-Miller effects [1]. Assume that the physical cou-
pling capacitance between two neighboring wires is
C
c. The Miller
effect occurs when the adjacent wires switch in opposite directions;
the equivalentcouplingis
2
C
c. On the contrary,the anti-Miller effect
happenswhenthe adjacentwires switchingin the samedirection; the
equivalent coupling is
0. In the appearance of the anti-Miller effect,
the transition of wires can be shortened so that the logic values be-
come stable earlier. If two wires have very large physical coupling
capacitance but possess the same switching behavior, the inter-wire
crosstalk can be very small. Hence, it is often too pessimistic if we
consider only the Miller effect. However, the anti-Miller effect is
hard to be consideredbecauseof its uncertainty. Thoughsome previ-
ous work has mentioned this problem, yet there is no literature solv-
ing this problem so far.
In this paper, we model not only physical coupling capacitance
but also simultaneous switching behavior for crosstalk optimization.
We ﬁrst consider a more accurate model, compared with most of the
literature, of crosstalk between wire
i and wire
j:
c
r
o
s
s
t
a
l
k
(
i
;
j
)
=
s
w
i
t
c
h
i
n
g
s
i
m
i
l
a
r
i
t
y
(
i
;
j
)
￿
c
o
u
p
l
i
n
g
c
a
p
a
c
i
t
a
n
c
e
(
i
;
j
)
: (1)
For this model, we propose a two-stage strategy to minimize the
crosstalkin a circuit. In the ﬁrststage,using geometrywire ordering,
we place the wires with similar switching behavior in closer prox-
imity; this switching similarity problem is an NP-hard problem [15].
Therefore, we resort to heuristics to deal with it. In the second stage,
we minimize the inter-wire physical coupling capacitance by sizing
wires. We formulate the constraints for physical coupling capaci-
tance in a posynomial(positive polynomial) form, which can be op-
timally solved by Lagrangian relaxation.
The second stage not only deals with the crosstalk problem, but
also optimizes area,power and delay by sizing gates and wires. Gate
and wire sizing has been extensively studied in the literature for op-
timizing area, power, and/or delay (e.g., [2, 3, 4], etc). In the pre-
vious work, Lagrangian relaxation has been proven to be an effec-
tive approach for simultaneous performance optimization [2, 3]; this
factencouragesusto adoptthe Lagrangianrelaxationmethod for our
problem. Inthispaper,basedonLagrangianrelaxation,wepresentan
algorithm that can optimally solve the simultaneous crosstalk, area,
power,anddelayoptimization problem bysizing circuitcomponents.
Our algorithm, with linear memory requirement overall and linear
runtime per iteration, is very effective and efﬁcient. For example,
for a circuit of 6144 wires and 3512 gates, our algorithm solves the
simultaneous optimization problem using only 2.1 MB memory and
47 minute runtime to achieve the precision of within 1% error on a
SUN UltraSPARC-I workstation.
2 Problem Description
In this section, we introduce the representation of a circuit and
some notation used throughout the paper, present circuit and delay
models, and formulate a performance optimization problem.
_
___________________________
Permission to make digital/hardcopy of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage, the copyright notice, the title of the publication
and its date appear, and notice is given that copying is by permission of ACM, Inc.
To copy otherwise, to republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee.
DAC 99, New Orleans, Louisiana
(c) 1999 ACM 1-58113-109-7/99/06..$5.002.1 Circuit Representation and Modeling
For a digital circuit, we can partition it into two groups—
combinationalandsequentialparts. We can improve the performance
by optimizing the combinational part. Hence, we focus on the com-
binational circuits. The way we interpret a circuit is similar to that
used in [3].
D R1
D R2
D R3
1 CL
Figure 1: A circuit with three input drivers, seven wires, three gates, and
one outputload, in which the gate and wire sizes can be varied.
Given a combinational circuit with
s primary inputs,
t primary
outputs and
n gates or wires. The sizes of gates and wires can be
changed according to our objectives. For the
i
t
h primary input,
1
￿
i
￿
s, we have one corresponding input resistor,
R
D
i , as its input
driver. Similarly, for the
j
t
h primary output,
1
￿
j
￿
t,w eh a v e
one corresponding output capacitor,
C
L
j , as its output load. Figure 1
depicts an example.
D R1
D R2
D R3
1 CL
0
1
2
3
4
5
6
7
8
9
10
11
12 13 14
(a)
0
1
2
3
4
5
6
7
8
9
10
11
12 13 14
(b)
Figure 2: (a) Two artiﬁcial nodes, 0 and 14, are added into the circuit
depicted in Figure 1. (b) The correspondingcircuit graph.
A component is a circuit element: a gate, a wire, or an input
driver. A node is located at the output of a component, either con-
necting two componentsor linking one primary output to one output
load. Thus, a circuit has
n
+
s nodes. Figure 2 illustrates the circuit
graph of the circuit given in Figure 1. A circuit graph
H
=
(
V
;
E
)is
a directed acyclic graph with
n
+
s
+
2nodes. The set
V of nodes
consistsof two additional artiﬁcial nodes as well as
n
+
s nodes cor-
responding to the
n
+
s components. One added node is viewed as
the source,
~
s, connectingto every inputdriver, theotherasthe sink,
~
t,
consisting of all outputloads. Let
S
=
f
~
s
g and
T
=
f
~
t
g. Therefore,
the nodeset
V
=
G
[
W
[
R
[
S
[
T containsthe set
G of gates,the
set
W ofwires, the set
R ofinputdrivers, the set
S of source,andthe
set
T of sink. The index of a nodeis labeled such that if node
i is the
inputof node
j,t h e n
i
<
j . This indexing canbe labeled bytopolog-
ical sorting. Hence,the index of
~
s is
0, and that of
~
t is
n
+
s
+
1.F o r
1
￿
i
￿
n
+
s, index
i refers to a component. On the other hand,the
set
E of edges represents the connections between nodes. An edge
(
i
;
j
), anordered pair,connectsnode
i to node
j,
1
￿
i
<
j
￿
n
+
s ,
if data ﬂow from node
i to node
j. Additional edges are added to
connect
~
s to
s input drivers and connect
t primary outputs to
~
t.T h e
connectivity relationship between parents and children are deﬁned
by
i
n
p
u
t
(
) and
o
u
t
p
u
t
(
),w h e r e
i
n
p
u
t
(
i
)
=
f
j
j
(
j
;
i
)
2
E
g,a n d
o
u
t
p
u
t
(
i
)
=
f
j
j
(
i
;
j
)
2
E
g.
Figure 3 illustrates the gate and wire models used in this paper.
We choose the
￿ model [12] to approximate wire behavior. For a
gate
i with size
x
i, the resistance
r
i is
^
r
i
=
x
i, and the capacitance
c
i
is
^
c
i
x
i,w h e r e
^
r
i and
^
c
i are the resistance and capacitance of gate
i
with unit size, respectively. For a wire
j with size
x
j, the resistance
r
j is
^
r
j
=
x
j, and the capacitance
c
j is
^
c
j
x
j
+
f
j,w h e r e
^
r
j and
^
c
j
are the respective resistanceand capacitanceof wire
j with unit size,
and
f
j is the fringing capacitance of wire
j. In addition, for an input
driver
i
;
1
￿
i
￿
s,
r
i is equalto the input resistor
R
D
i .
i size  x
gate  i
i  r i c
i c
wire  j
size  xj  rj
2
cj
2
cj
Figure 3: A gate or a wire is modeled as a combination of RC elements. A
gate is the loading of its upstream,but is the driver ofits downstream. A wire
is represented by the
￿ model.
With the circuitmodel, a combinationalcircuit canbetransformed
to a network with resistors and capacitors. Figure 4 illustrates the re-
sulting circuit modeling for the circuit shown in Figure 1. In the
transformed circuit, for
1
￿
i
￿
n
+
s,
u
p
s
t
r
e
a
m
(
i
) is the set
with all the nodes except
i on the paths from node
i to all reachable
drivers; similarly,
d
o
w
n
s
t
r
e
a
m
(
i
)isthe setwith allthenodesonthe
paths from node
i to all reachable loads. For instance, in Figure 4,
u
p
s
t
r
e
a
m
(
1
0
)
=
f
6
g and
d
o
w
n
s
t
r
e
a
m
(
2
)
=
f
2
;
5
;
7
g. We adopt
the Elmore delay model[7]to computethe delaysofgatesand wires.
The delay
D
i of node
i is
r
i
C
i,w h e r e
C
i is the downstream capac-
itance of
i including self-loading. In Section 4,
C
i also contains the
physicalcouplingcapacitance,consideringtheimpactofcrosstalkon
delay. For the time being,
R
i is referred to the upstream resistance of
node
i, whereas
R
i means the weighted upstream resistance of node
i in Section 4.
In the circuit graph
H ofa circuit,eachnode
iis taggedwith some
attributes, including size
x
i, node type
G,
W,
S or
T, unit-width
resistance
^
r
i, unit-width capacitance
^
c
i and fringing capacitance
f
i
(
f
i
=
0if
i
2
G). Thus, we shall optimize a circuit through manip-
ulating the corresponding circuit graph but ignoring the transformed
RC network.
D R1
D R2
D R3
1 CL
0
1
2
3
4
6
7
8
9
10
12 13 14
 r 4
 r 5
 r 7
 r 8
 r 10
 r 11
 r 13
 r 6
5
11
C2
 r 9
 r 12
Figure 4: Before analysis, a circuit is transformed to an RC network. The
delay
D
i lumped in
r
i is computed by
r
i
C
i, e.g.,
D
2 =
R
D
2
C
2,w h e r e
C
2
represents the capacitancefor all the capacitors in the shaded area.
2.2 Problem Description
For practical requirement, area is the greatest concern in circuit
design. This paper targets to minimize area subject to noise, timing,
and power constraints. Let
A,
X,
D and
P denote the total area,
the total crosstalk, the delay on the critical path, and the total power
of the circuit, respectively, and
X
B,
D
B and
P
B denote the upper
bounds of the total crosstalk, the delay on the critical path, and the
total power of the circuit, respectively. A generic formulation of this
problem is given as follows.M
:
M
i
n
i
m
i
z
e
A
S
u
b
j
e
c
t
t
o
￿
￿
￿
B
;
8
￿
2
f
X
;
D
;
P
g
:
In Section 4, we will give more detailed problem deﬁnitions and
presentour algorithms for the problem.
3 Crosstalk Modeling
In this section, we will focus on the crosstalk problem. We will
deal in turn with the two crucial factors which affect the crosstalk—
physicalcoupling capacitanceand switching behavior.
3.1 Physical Coupling Capacitance
We compute the crosstalk between two wires
i and
j using the
model mentioned in Equation (1). Figure 5 depicts a simple case
where two parallel wires
i and
j, belongingto different routing trees,
have coupling capacitance.
ij f^ ij is the unit−length fringing capacitance between     and 
lij ij d
wire xi i with size (width)
wire j xj with size (width)
Figure 5: The physicalcouplingcapacitancebetween two wires.
According to Figure 5, the physical coupling capacitance
c
i
j be-
tween two neighboringwires
i and
j can be calculated as follows:
c
i
j
=
^
f
i
j
l
i
j
d
i
j
￿
x
i
+
x
j
2
=
￿
^
f
i
j
l
i
j
d
i
j
￿
 
1
1
￿
x
i
+
x
j
2
d
i
j
!
; (2)
where
x
i and
x
j are the sizes of wires
i and
j (
x
i
;
x
j
>
0),
^
f
i
j
is the unit-length fringing capacitance between wires
i and
j,
l
i
j is
the overlap length of wires
i and
j,a n d
d
i
j is the middle-to-middle
distance between wires
i and
j. Equation (2) reﬂects the impact of
wire sizing on crosstalk. If
x
i increases,
c
i
j consequently increases.
This changewouldalso causevariation on delay. In Equation(2), the
ﬁrst term,
^
f
i
j
l
i
j
=
d
i
j, is a constantcomputedby technologyﬁles,and
the second term,
(
1
￿
(
x
i
+
x
j
)
=
2
d
i
j
)
￿
1, is whatwe are concerned.
Let
x
=
(
x
i
+
x
j
)
=
2
d
i
j, the second term becomes
(
1
￿
x
)
￿
1,
0
<
x
<
1 .F o rt h et e r m
(
1
￿
x
)
￿
1, we have the following properties.
Theorem 1 Let
f
(
x
)
=
1
1
￿
x
;
j
x
j
<
1
:
(1)
f
(
x
)
=
P
1
n
=
0
x
n;
(2) If
^
f
(
x
)
=
P
k
￿
1
n
=
0
x
n,t h e n
e
r
r
o
r
r
a
t
i
o
￿
=
f
(
x
)
￿
^
f
(
x
)
f
(
x
)
=
x
k.
Theorem1saysthat
(
1
￿
x
)
￿
1 canbeapproximatedby
P
k
￿
1
n
=
0
x
n,
the ﬁrst
k terms in the summation. The error ratio is small; for ex-
ample, for the case
x
=
0
:
2
5, the error ratio is less than 6.3%,
1.6%, 0.4%, and 0.1% when
k is 2, 3, 4, and 5 respectively. For
the purpose of easier presentation, we choose
k
=
2 , and thus
f
(
x
)
￿
P
1
n
=
0
x
n
=
1
+
x . Extensions to a larger
k are simple.
Therefore, Equation (2) can be approximated as follows:
c
i
j
￿
^
f
i
j
l
i
j
d
i
j
(
1
+
x
i
+
x
j
2
d
i
j
)
=
~
c
i
j
(
1
+
x
i
+
x
j
2
d
i
j
)
; (3)
where
~
c
i
j
=
^
f
i
j
l
i
j
=
d
i
j is a constant. Note that Equation (3) is in
a posynomial form [8], an important property to guarantee the opti-
mality of our algorithm presented in Section 4.
3.2 Switching Behavior
For two adjacent wires with coupling
C
c, one is interfered when
the other switches. In the worst case, the two wires simultaneously
switch in different directions. As a result, the transitions on these
wires are longer than expected. This phenomenon, called the Miller
effect [1], is like the effect causedby large loading. On the contrary,
the anti-Miller effect beneﬁts the transitions. While two neighboring
wires toggle in the same direction, they can help each other. Conse-
quently, the transition time is reduced. This phenomenon is like the
effect causedby small loading.
Taking advantage of the switching conditions for crosstalk mini-
mization,weshallanalyzetheswitchingbehaviorofsignals. Thetest
patterns are available from the logic simulation stage. When analyz-
ing the switchingbehavior,weﬁrst assumeeachgateorwire is ofthe
minimum size or of other sizes extracted from proﬁles. Therefore,
the similarity of switching behavior between two wires
i and
j can
be deﬁnedas follows:
s
i
m
i
l
a
r
i
t
y
(
i
;
j
)
=
R
T
D
0
f
(
i
;
t
)
f
(
j
;
t
)
d
t
T
D
;
where
T
D is the simulation duration,
f
(
i
;
t
) is the normalized wave-
form of wire
i at time
t.
f
(
i
;
t
)
=
1if node
i is high; other-
wise,
f
(
i
;
t
)
=
￿
1if node
i is low. For any two wires
i and
j,
￿
1
￿
s
i
m
i
l
a
r
i
t
y
(
i
;
j
)
￿
1. The closer to -1 for
s
i
m
i
l
a
r
i
t
y,t h e
less similar their behavior; the closer to 1 for
s
i
m
i
l
a
r
i
t
y,t h em o r e
similar their behavior.
The wire ordering with the
minimum effective loading
is <7,5,4,8> or <5,7,4,8>
Label each edge
by similarity 1−similarity
Label each edge
by adjusted weight
t
f(4,t)
t
t
t
f(8,t)
f(5,t)
f(7,t)
high
low
T
T
T
T
D
D
D
D
high
low
high
low
high
low
4 5
7 8
−0.07
0.07
0.07
−0.93
−0.93
0.93 0.07
1.93
4 5
7 8
1.07
0.93
1.93
0.93
Figure 6: The waveformsof wires and the similarity between wires.
Two wires with most similar switching behavior are assigned to
closer tracks to minimize the effective loading. The problem for
minimizing the effective loading is equivalent to a graph-theoretic
one. We build a complete graph
K
n for
n wires. In
K
n, each
node
i corresponds to a wire
i, and every edge
(
i
;
j
) is associated
with a
w
e
i
g
h
t
(
i
;
j
) equal to
1
￿
s
i
m
i
l
a
r
i
t
y
(
i
;
j
).A n o r d e r i n g
is a sequence composed of all nodes,
<
w
1
;
w
2
;
:
:
:
;
w
n
>.A c -
cordingly, the total effective loading between neighboring wires is
P
n
￿
1
i
=
1
w
e
i
g
h
t
(
w
i
;
w
i
+
1
). Hence, the Switching Similarity problem
S
S is deﬁned in the following.
S
S
:
G
i
v
e
n
n
w
i
r
e
s
a
n
d
t
h
e
i
r
s
w
i
t
c
h
i
n
g
b
e
h
a
v
i
o
r
:
F
i
n
d
a
n
o
r
d
e
r
i
n
g
f
o
r
t
h
e
w
i
r
e
s
;
s
u
c
h
t
h
a
t
t
h
e
t
o
t
a
l
e
f
f
e
c
t
i
v
e
l
o
a
d
i
n
g
b
e
t
w
e
e
n
n
e
i
g
h
b
o
r
i
n
g
w
i
r
e
s
i
s
m
i
n
i
m
i
z
e
d
:
The MCWO problem [15], which is NP-complete, can be reduced
to the
S
S problem. Therefore, the
S
S problem is NP-hard, and we
resortto heuristics. Speciﬁcally,we needanapproximation algorithm
with a performance guarantee. However, we have the negative result
shown below.
Theorem 2 If
P
6
=
N
P and
￿
￿
1, there is no polynomial-time
approximation algorithm with ratio bound
￿ for the
S
S problem.Algorithm: WOSS (Wire Ordering for the
S
S Problem)
Input: the complete graph
K
n for
n wires
Output: A wire ordering
O
A1.
O
 
<
w
1
;
w
2
>,w h e r e
(
w
1
;
w
2
)
=minimum-weighted edge.
A2. for
k
=
3to
n do
Choosea minimum-weighted edge
(
w
k
￿
1
;
j
),
j
6
2
f
w
1
;
w
2
;
:
:
:
;
w
k
￿
1
g;
O
 
<
w
1
;
w
2
;
:
:
:
;
w
k
￿
1
;
w
k
>,w h e r e
w
k
=
j.
Figure 7: Wire ordering for the
S
S Problem.
We propose an efﬁcient heuristic named WOSS for the
S
S prob-
lem as shown in Figure 7. Basically, the WOSS algorithm does the
Depth First Searchfor the complete graph
K
n in
O
(
n
2
) time.
Solving the Switching Similarity problem, we can obtain a ge-
ometry ordering for all wires with the minimum effective loading.
Therefore, we can know the adjacency relationship between wires.
The neighborhood
N
(
i
) of wire
i is deﬁned as the set of adjacent
wires; the dominating index of
N
(
i
), denoted by
I
(
i
),o fw i r e
i is
deﬁned as the set of adjacent wires with the indexes greater than
i.
For instance,in Figure 6, if we choose
<
5
;
7
;
4
;
8
> as the resulting
track assignment,
N
(
5
)
=
f
7
g,
N
(
7
)
=
f
5
;
4
g,
N
(
4
)
=
f
7
;
8
g and
N
(
8
)
=
f
4
g;
I
(
5
)
=
f
7
g,
I
(
7
)
=
;,
I
(
4
)
=
8 and
I
(
8
)
=
;.
4 Optimal Area Minimization Under Cross-
talk, Delay, and Power Constraints
In this section, we give the problem formulation and an algo-
rithm for simultaneous area, crosstalk, delay, and power optimiza-
tion. Since area is typically the most important concern in VLSI
design, we choose area as the objective function of the optimization
problem.
4.1 Problem Formulation
For eachcomponent
i,
s
+
1
￿
i
￿
n
+
s,the correspondingarea
is proportionalto its size
x
i. Given the unit-sized area
￿
i, the area of
component
i is
￿
i
x
i; the total area of a circuit is thus
P
n
+
s
i
=
s
+
1
￿
i
x
i.
The areas of input drivers are ignored, i.e.,
x
i
=
0 ,
1
￿
i
￿
s .T h i s
is alsotrue foroutputloads. If the crosstalk,power,anddelaybounds
of a circuit are
X
B,
P
B and
A
B, respectively,we have
P
i
2
W
P
j
2
I
(
i
)
c
i
j
￿
X
B
;
V
2
f
P
n
+
s
i
=
s
+
1
c
i
￿
P
B
;
P
i
2
￿
D
i
￿
A
B
;
8
￿
2
￿
;
where
￿ is one path of the path set
￿. Note that, though not pre-
sented here, the above crosstalk constraint can easily be extended to
the case with a distributed crosstalk bound on each net. The opti-
mization problem we wantto solve can be formulated as follows.
P
:
M
i
n
i
m
i
z
e
P
n
+
s
i
=
s
+
1
￿
i
x
i
S
u
b
j
e
c
t
t
o
P
i
2
￿
D
i
￿
A
B
;
8
￿
2
￿
;
P
i
2
W
P
j
2
I
(
i
)
c
i
j
￿
X
B
;
V
2
f
P
n
+
s
i
=
s
+
1
c
i
￿
P
B
;
L
i
￿
x
i
￿
U
i
;
8
s
+
1
￿
i
￿
n
+
s
:
From Section 3.1, the crosstalk between two adjacentwires
i and
j is their inter-wire physical coupling capacitance,
~
c
i
j
(
1
+
(
x
i
+
x
j
)
=
2
d
i
j
). Hence, the crosstalk constraint can be simpliﬁed by sub-
tracting both sides by
P
i
2
W
P
j
2
I
(
i
)
~
c
i
j; the constraint becomes
P
i
2
W
P
j
2
I
(
i
)
~
c
i
j
(
x
i
+
x
j
)
=
2
d
i
j
￿
X
B
￿
P
i
2
W
P
j
2
I
(
i
)
~
c
i
j.I f
we deﬁne
X
0 as
X
B
￿
P
i
2
W
P
j
2
I
(
i
)
~
c
i
j and
^
c
i
j as
~
c
i
j
=
2
d
i
j,t h e
modiﬁedcrosstalk constraintis
P
i
2
W
P
j
2
I
(
i
)
^
c
i
j
(
x
i
+
x
j
)
￿
X
0.
Assume the supply voltage
V and frequency
f are ﬁxed. The power
constraint can be simpliﬁed by dividing both sides by
V
2
f.L e t
P
0
be
P
B
=
(
V
2
f
). The power constraint becomes
P
n
+
s
i
=
s
+
1
c
i
￿
P
0.
Because, in deep sub-micron technology, the interconnect densities
of a circuit can be very high, the circuit graph could be very dense.
Hence,the pathset
￿ canbefargreaterthan,orevengrowsexponen-
tially with, the circuit size. It is prohibitively expensiveto traverse all
pathsto checkthe constraints. To conquerthis problem,we associate
a
i to each node
i, which represents the arrival time of that node [3].
Let
m
=
n
+
s
+
1and
A
0
=
A
B in the following discussion. We
have
a
j
￿
A
0
j
2
i
n
p
u
t
(
m
)
=
￿
p
r
i
m
a
r
y
o
u
t
p
u
t
s
￿
=
a
j
+
D
i
￿
a
i
i
=
s
+
1
;
:
:
:
;
n
+
s
a
n
d
8
j
2
i
n
p
u
t
(
i
)
D
i
￿
a
i
i
=
1
;
:
:
:
;
s
=
￿
p
r
i
m
a
r
y
i
n
p
u
t
s
￿
=
Consequently,the problem P can be modiﬁed as follows.
P
P
:
M
i
n
i
m
i
z
e
P
n
+
s
i
=
s
+
1
￿
i
x
i
S
u
b
j
e
c
t
t
o
a
j
￿
A
0
j
2
i
n
p
u
t
(
m
)
;
a
j
+
D
i
￿
a
i
i
=
s
+
1
;
:
:
:
;
n
+
s
a
n
d
8
j
2
i
n
p
u
t
(
i
)
;
D
i
￿
a
i
i
=
1
;
:
:
:
;
s
;
P
n
+
s
i
=
s
+
1
c
i
￿
P
0
;
P
i
2
W
P
j
2
I
(
i
)
^
c
i
j
(
x
i
+
x
j
)
￿
X
0
;
L
i
￿
x
i
￿
U
i
;
8
s
+
1
￿
i
￿
n
+
s
:
The objective function and constraints of the problem
P
P are all
in the posynomial form. Through variable transformation, a convex
programming problemis obtained. Hence,problem
P
Phas aunique
global optimum.
4.2 Lagrangian Relaxation
To solve the problem
P
P, we apply Lagrangian relaxation by
introducing one Lagrange multiplier to each constraint:
￿ to the
powerconstraint,
￿ to the crosstalk constraint,
￿
j
i to eachdelay con-
straint.
￿
j
i can be viewed as a timing weight on edge
(
j
;
i
).L e t
x =
(
x
s
+
1
;
:
:
:
;
x
n
+
s
) and
a =
(
a
1
;
:
:
:
;
a
n
+
s
). The Lagrangian function,
therefore, is
L
￿
;
￿
;
￿
(
x
;
a
)
=
n
+
s
X
s
+
1
￿
i
x
i
+
X
j
2
i
n
p
u
t
(
m
)
￿
j
m
(
a
j
￿
A
0
)
+
n
+
s
X
i
=
s
+
1
X
j
2
i
n
p
u
t
(
i
)
￿
j
i
(
a
j
+
D
i
￿
a
i
)
+
s
X
i
=
1
￿
0
i
(
D
i
￿
a
i
)
+
￿
 
n
+
s
X
i
=
s
+
1
c
i
￿
P
0
!
+
￿
0
@
X
i
2
W
X
j
2
I
(
i
)
^
c
i
j
(
x
i
+
x
j
)
￿
X
0
1
A
:
The correspondingLagrangian relaxation subproblemis
L
R
S
1
:
M
i
n
i
m
i
z
e
L
￿
;
￿
;
￿
(
x
;
a
)
S
u
b
j
e
c
t
t
o
L
i
￿
x
i
￿
U
i
;
8
s
+
1
￿
i
￿
n
+
s
:
To solve the Lagrangian relaxation subproblem, we derive the opti-
mality conditions by the Kuhn-Tuckerconditions [16].
Theorem 3 The optimality conditions on Lagrange multipliers are
given by
X
k
2
o
u
t
p
u
t
(
i
)
￿
i
k
=
X
j
2
i
n
p
u
t
(
i
)
￿
j
i
;
f
o
r
1
￿
i
￿
n
+
s
: (4)
Theorem 3 says that the sum of in-degree multipliers equals that of
out-degreemultipliers for every nodeexceptthesource. Thistheorem
is analogous to the Kirchhoff’s CurrentLaw [5]: The algebraic sum
of the currents ﬂowing into a node equalsthat of the currents leaving
from the node for all times.Theorem 4 For any
￿ satisfying Equation (4) in Theorem3, solving
L
R
S1 is equivalent to solving
L
R
S
2
:
M
i
n
i
m
i
z
e
L
￿
;
￿
;
￿
(
x
)
S
u
b
j
e
c
t
t
o
L
i
￿
x
i
￿
U
i
;
8
s
+
1
￿
i
￿
n
+
s
;
where
￿
=
(
￿
1
;
:
:
:
￿
m
),
￿
i
=
P
j
2
i
n
p
u
t
(
i
)
￿
j
i for
1
￿
i
￿
m, and
L
￿
;
￿
;
￿
(
x
)
=
n
+
s
X
i
=
s
+
1
￿
i
x
i
+
￿
 
n
+
s
X
i
=
s
+
1
c
i
￿
P
0
!
+
￿
0
@
X
i
2
W
X
j
2
I
(
i
)
^
c
i
j
(
x
i
+
x
j
)
￿
X
0
1
A
+
n
+
s
X
i
=
1
￿
i
D
i
:
We derive the optimal sizing solution and present a greedy,
optimal algorithm to solve the Lagrangian relaxation subproblem
L
R
S
2.
Theorem 5 Let
~
x =
(
~
x
s
+
1
;
:
:
:
;
~
x
n
+
s
) be asolution, thenthe optimal
resizing of component
i is given by
x
￿
i
=
m
i
n
(
U
i
;
m
a
x
(
L
i
;
o
p
t
i
)
),
where
o
p
t
i
=
v
u
u
t
￿
i
^
r
i
￿
C
0
i
+
P
j
2
N
(
i
)
^
c
i
j
x
j
￿
￿
i
+
(
￿
+
R
i
)
^
c
i
+
￿
P
j
2
N
(
i
)
^
c
i
j
:
In summary, we have the following theorem.
Theorem 6
(
x
￿
;
a
￿
) is anoptimal sizingsolution if andonly if there
exists a vector
￿
￿=
(
￿
￿
0
1
;
:
:
:
;
￿
￿
m
￿
1
m
),
￿
￿, and
￿
￿ such that
(1)
P
k
2
o
u
t
p
u
t
(
i
)
￿
￿
i
k
=
P
j
2
i
n
p
u
t
(
i
)
￿
￿
j
i
;
8
1
￿
i
￿
n
+
s
;
(2)
￿
￿
j
m
(
a
j
￿
A
0
)
=
0
;
8
j
2
i
n
p
u
t
(
m
)
;
￿
￿
j
i
(
a
j
+
D
i
￿
a
i
)
=
0
;
8
s
+
1
￿
i
￿
n
+
s
;
j
2
i
n
p
u
t
(
i
)
;
￿
￿
0
i
(
D
i
￿
a
j
)
=
0
;
8
1
￿
i
￿
s
;
￿
￿
(
P
n
+
s
s
+
1
c
i
￿
P
0
)
=
0
;
￿
￿
(
P
i
2
W
P
j
2
I
(
i
)
^
c
i
j
(
x
￿
i
+
x
￿
j
)
￿
X
0
)
=
0
;
(3)
a
￿
j
￿
A
0
;
8
j
2
i
n
p
u
t
(
m
)
;
a
￿
j
+
D
i
￿
a
￿
i
;
8
s
+
1
￿
i
￿
n
+
s
;
D
i
￿
a
￿
j
;
8
1
￿
i
￿
s
;
P
n
+
s
s
+
1
c
i
￿
P
0
;
P
i
2
W
P
j
2
I
(
i
)
￿
X
0
;
(4)
￿
￿
j
i
8
￿
0
;
1
￿
i
￿
m
;
j
2
i
n
p
u
t
(
i
)
;
￿
￿
￿
0
;
￿
￿
￿
0
;
(5)
x
￿
i
=
m
i
n
(
U
i
;
m
a
x
(
L
i
;
o
p
t
i
)
)
;
s
+
1
￿
i
￿
n
+
s
;
w
h
e
r
e
o
p
t
i
=
v
u
u
t
￿
i
^
r
i
￿
C
0
i
+
P
j
2
N
(
i
)
^
c
i
j
x
j
￿
￿
i
+
(
￿
+
R
i
)
^
c
i
+
￿
P
j
2
N
(
i
)
^
c
i
j
:
In the above theorem, (1) is the optimality condition, (2) reﬂects
the complementary slackness conditions, (3) represents constraints,
(4) restricts non-negative multipliers, and (5) is the optimal sizing.
We propose a greedy algorithm LRS in Figure 8 to optimally
solve the Lagrangian relaxation subproblem
L
R
S
2 (and equiva-
lently to solve
L
R
S
1). As mentioned earlier, the Lagrangian re-
laxation problem has a unique global optimum. This property guar-
antees that a greedy algorithm can ﬁnd the optimal solution.
The following gives the Lagrangiandual problem.
L
D
P
:
M
a
x
i
m
i
z
e
D
(
￿
;
￿
;
￿
)
S
u
b
j
e
c
t
t
o
￿
i
n
t
h
e
o
p
t
i
m
a
l
c
o
n
d
i
t
i
o
n
;
w
h
e
r
e
D
(
￿
;
￿
;
￿
)
=
m
i
n
L
￿
;
￿
;
￿
(
x
;
a
)
:
If
￿ is the optimal solution of the
L
D
P problem, then
￿ also
optimizes the
P
P problem. We present Algorithm OGWS listed in
Figure 9 to solve
L
D
P.
Theorem 7 Algorithm OGWS convergesto the global optimal.
Subroutine: LRS (Lagrangian Relaxation Subroutine)
Input: the circuit graph
H and Lagrange multipliers
￿
;
￿
;
￿
Output:
x=
(
x
s
+
1
;
:
:
:
;
x
n
+
s
) which minimizes
L
￿
;
￿
;
￿
(
x
)
S1.
x
i
 
L
i,
8
s
+
1
￿
i
￿
n
+
s .
S2. Compute
C
0
i,
8
s
+
1
￿
i
￿
n
+
s
by traversing
H in the reverse topological order.
S3. Compute
R
i,
8
s
+
1
￿
i
￿
n
+
s
by traversing
H in the topologicalorder.
S4. for
i
=
s
+
1to
n
+
s do
x
i
 
m
i
n
(
U
i
;
m
a
x
(
L
i
;
o
p
t
i
)
),w h e r e
o
p
t
i
=
v
u
u
t
￿
i
^
r
i
￿
C
0
i
+
P
j
2
N
(
i
)
^
c
i
j
x
j
￿
￿
i
+
(
￿
+
R
i
)
^
c
i
+
￿
P
j
2
N
(
i
)
^
c
i
j
.
S5. RepeatS2-S4 until no improvement.
Figure 8: LagrangianRelaxation Subroutine.
5 Experimental Results
We implemented our algorithm in the C language on a SUN
UltraSPARC-I workstation and tested on the ISCAS85 benchmark
circuits. The circuit sizes ranged from 640 to 9656. The sup-
ply voltage was set to 3.3 V, and the working frequency was set to
200 MHz. The unit-sized resistance and capacitance of a gate were
1
0
￿
￿
￿
m and
0
:
1
6
f
F
=
￿
m, and those of a wire were
0
:
0
7
￿
￿
￿
m
and
0
:
0
2
4
f
F
=
￿
m, respectively. The respective lower and upper
boundsfor a gate or wire size are
0
:
1
￿
m and
1
0
￿
m.T a b l e1s h o w s
the experimental results, where #G denotes the number of gates, #W
denotes the number of wires, tot denotes the total number of gates
and wires, Init denotes the initial values before sizing, Fin denotes
the ﬁnal values after sizing, ite denotes the number of iterations,
time denotestheruntime, mem denotesthe memoryrequirement,and
Impr(%) denotes the average improvement in %. The improvement
for each term is calculatedby
I
n
i
t
￿
F
i
n
I
n
i
t
￿
1
0
0
%.
Theresults showthat ouralgorithm, onthe average,improved the
respective area,noise,power,and delayby 87.90%, 89.67%,86.82%
and 5.3% after wire and gate sizing. Further, our algorithm is effec-
tive and efﬁcient. For example, for the largest circuit, c7552, with
3512 gates and 6144 wires, our algorithm needed only 47 min and
2.1 MB storage to achieve the precision of within 1% error.
Note that the results show that sizing beneﬁts delay not much.
Whenacomponentisenlarged,itwill increaseontonlytheloadingof
the componentson the upstreampath of the sizedcomponentandthe
driving capability for the componentsonthe downstreampathbutthe
physical coupling capacitance also. Consequently, up-sizing causes
that the delay for the upstream part increases,while the delay for the
downstreampart decreases.Similarly, down-sizingreducesthe delay
for the upstream part and harms that for the downstream part. As
a result, the delay over the whole circuit would not be signiﬁcantly
improved.
In Figures 10(a) and (b), the storage requirement and runtime per
iteration (denoted by the vertical axis) are plotted as functions of the
total number of gates and wires in a circuit (represented by the hori-
zontalaxis),respectively. Figures10(a)and(b) showthatthe runtime
per iteration and the storage requirements of our algorithm approach
linear in the total number of gates and wires. As revealed by Fig-
ure 10, some points deviate from the linear line; a probable reason
is that these circuits are not regular and their structures are different
from each other.Ckt Ckt Size Noise (pF) Delay (ps) Power (mW) Area (
u
m
2) ite time mem
Name #G #W tot Init Fin Init Fin Init Fin Init Fin (sec) (KB)
c1355 546 1064 1610 20.53 2.14 1005.57 1098.90 228.34 28.45 48299 5203 9 56 1096
c1908 880 1498 2378 24.55 2.45 1444.57 1338.62 357.09 41.45 71338 7369 13 155 1184
c2670 1193 2076 3269 33.46 3.35 1480.65 1499.87 486.38 58.45 98067 10319 7 444 1320
c3540 1669 2939 4608 50.24 5.03 1713.47 1685.51 682.19 79.53 138242 14292 8 553 1472
c432 214 426 640 7.89 .95 1442.28 958.20 89.95 18.35 19200 2984 7 21 976
c499 514 928 1442 16.37 1.72 875.81 799.31 211.25 27.88 43259 4834 10 97 1072
c5315 2307 4386 6693 82.06 8.23 1649.38 1548.37 959.28 113.92 200803 20768 7 1321 1752
c6288 2416 4800 7216 95.36 9.53 4888.33 4494.26 1015.03 129.94 216495 23341 14 2705 1808
c7552 3512 6144 9656 103.30 10.33 1615.32 1619.37 1433.49 168.91 289707 30120 7 2823 2120
c880 383 729 1112 13.12 1.35 931.49 794.43 159.30 22.14 33359 3827 12 94 1032
Impr(%) - 89.67% 5.3% 86.82% 87.90% -
Table 1: Experimental results in noise, delay,power and area.
Algorithm: OGWS (Optimal Gate and Wire Sizing)
Input: the circuit graph
H
Output:
￿,
￿,
￿ which maximize
m
i
n
L
￿
;
￿
;
￿
(
x
)
A1.
k
 
1;
￿
  arbitrary vector in the optimality condition;
￿
  an arbitrary positive number;
￿
  an arbitrary positive number.
A2.
￿ =
(
￿
1
;
:
:
:
;
￿
n
+
s
+
1
),w h e r e
￿
i
=
P
j
2
i
n
p
u
t
(
i
)
￿
j
i.
A3. Call LRS and compute
a
1
;
:
:
:
a
n
+
s.
A4. Adjust multipliers
￿
j
i’s,
￿,
￿:
for
i
=
1to
n
+
s
+
1do
forall
j
2
i
n
p
u
t
(
i
) do
￿
j
i
 
(
￿
j
i
+
￿
k
(
a
j
￿
A
0
) if
i
2
T
￿
j
i
+
￿
k
(
a
j
+
D
i
￿
a
i
) if
i
2
G
[
W
￿
j
i
+
￿
k
(
D
i
￿
a
i
) if
i
2
R
￿
 
￿
+
￿
k
(
P
n
+
s
i
=
s
+
1
c
i
￿
P
0
)
￿
 
￿
+
￿
k
(
P
i
2
W
P
j
2
I
(
i
)
^
c
i
j
(
x
i
+
x
j
)
￿
X
0
)
where the step size
￿
k satisﬁes
l
i
m
k
!
1
￿
k
=
0
and
P
k
j
=
1
￿
j
!
1 .
A5. Project
￿ onto the nearest point in the optimality condition.
A6.
k
 
k
+
1 .
A7. If
(
P
n
+
s
i
=
s
+
1
￿
i
x
i
￿
L
￿
;
￿
;
￿
(
x
)
)
￿ error bound,goto A2.
Figure 9: Optimal Gate and Wire Sizing Algorithm.
6 Concluding Remarks
Wehavemodeledthe crosstalkoptimization problembyconsider-
ing bothof theswitching conditionsandthephysicalcouplingcapac-
itance. Wehaveproposedatwo-stagemethodforcrosstalkminimiza-
tion: the ﬁrst stage handles geometry wire ordering by exploiting
the switching conditions to reduce the effective loading; the second
stage, further, simultaneously optimizes physical coupling capaci-
tance,power,anddelay. Basedon the Lagrangianrelaxation method,
our OWGS algorithm can economically optimize all the above ob-
jectives. The experimental results show that our algorithm is very
effective for performance optimization, especially for noise, power,
and area minimization.
7 Acknowledgments
The authors would like to thank Dr. Chung-Ping Chen of Intel
Corp. for his precious suggestionsand kind help.
Storage vs. Circuit Size
1.00
1.10
1.20
1.30
1.40
1.50
1.60
1.70
1.80
1.90
2.00
2.10
MB
#gates+#wires
2000 4000 8000 10000 6000
Runtime vs. Circuit Size
400
350
300
250
200
150
100
50
0
#gates+#wires
2000 4000 6000 8000 10000
seconds
(a) (b)
Figure10: Thestorageandruntimerequirementofouralgorithmvs. circuit
size.
References
[1] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-
Wesley Pub.CompanyInc., 1990.
[2] C.-P. Chen,Y.-W. Chang and D. F. Wong,“Fast Performance-DrivenOptimization
for Buffered Clock Trees Based on Lagrangian Relaxation,” Proc. DAC, pp. 405–
408,June 1996.
[3] C.-P. Chen, C. C. N. Chu and D. F. Wong,“Fast and ExactSimultaneousGate and
Wire Sizing by LagrangianRelaxation,”Proc. ICCAD, pp. 617–624,Nov.1998.
[4] D.-S. Chen and M. Sarrafzadeh, “An Exact Algorithm for Low Power Library-
Speciﬁc Gate Re-Sizing,” Proc.DAC, June 1996.
[5] L. O. Chua, C. A. Desoer and E. S. Kuh,Linearand NonlinearCircuits, McGraw-
Hill Book Company,1987.
[6] A.Devgan,“EfﬁcientCoupledNoiseEstimationforOn-ChipInterconnects,”Proc.
ICCAD, pp.147–151,Nov.1997.
[7] W. C. Elmore, “The Transient Response of Damped Linear Networkswith Partic-
ular Regardto Wide Band Ampliﬁers,”J. Applied Physics, 19(1),1948.
[8] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research, 5th ed.,
McGraw-HillPublishing,1990.
[9] M. Marek-Sadowska,“Impactof Deep Sub-micronTechnologieson Physical De-
sign,” Lecturenotes and Private Communication,Aug.1998.
[10] Y. Massoud,S.Majors, T.BustamiandJ. White, “LayoutTechniquesforMinimiz-
ing On-Chip InterconnectSelf Inductance,”Proc. DAC, pp. 566–571,June 1998.
[11] M. Nemani and F. N. Najm, “High-Level Area and Power Estimation for VLSI
Circuits,” Proc. ICCAD, pp.114–119,Nov.1997.
[12] J. Rabaey, Digital Integrated Circuits: A Design Perspective, Prentice-Hall, Inc.,
1996.
[13] K. L. Shepard, “Design Methodologies for Noise in Digital Integrated Circuits,”
Proc. DAC, pp.94–99,June 1998.
[14] H.-P. Tseng,L.Scheffer,andC. Sechen,“TimingandCrosstalk DrivenAreaRout-
ing,” Proc.DAC, pp. 378–381,June1998.
[15] A. Vittal and M. Merek-Sadowska, “Crosstalk Reduction for VLSI,” IEEE Trans.
CAD, pp.290–298,Vol. 16,No. 3, Mar. 1997.
[16] W. L. Winston, Operations Research: Applications and Algorithms, 3rd ed., Int
ThomsonPublishing,1994.
[17] T. Xue, E. S. Kuh and D. Wang, “Post Global Routing Crosstalk Risk Estimation
and Reduction,”Proc. ICCAD, pp.302–309,Nov. 1996.