RIP: An Efficient Hybrid Repeater Insertion Scheme for Low Power by Liu, Xun et al.
RIP: An Efficient Hybrid Repeater Insertion Scheme for Low Power
Xun Liu Yuantao Peng
North Carolina State University
Raleigh, NC 27695
 
xunliu,ypeng  @ncsu.edu
Marios C. Papaefthymiou
University of Michigan
Ann Arbor, MI 48109
marios@eecs.umich.edu
Abstract
This paper presents a novel repeater insertion algorithm
for interconnect power minimization. The novelty of our ap-
proach is in the judicious integration of an analytical solver
and a dynamic programming based method. Specifically, the
analytical solver chooses a concise repeater library and a
small set of repeater location candidates such that the dy-
namic programming algorithm can be performed fast with
little degradation of the solution quality. In comparison with
previously reported repeater insertion schemes, within com-
parable runtimes, our approach achieves up to 37% higher
power savings. Moreover, for the same design quality, our
scheme attains a speedup of two orders of magnitude.
1. Introduction
This paper presents a novel hybrid repeater insertion
technique for low-power global interconnect designs. Given
a two-pin interconnect and its timing budget, our algorithm
derives the number of repeaters, and the width and loca-
tion of each repeater so that the timing constraint is satis-
fied and power dissipation is minimized. The hybrid nature
of our scheme stems from its judicious combination of an
analytical repeater insertion solver and a dynamic program-
ming (DP) based approach. Specifically, our algorithm pro-
ceeds in three steps. First, an initial repeater insertion solu-
tion is derived using DP with a very coarse repeater library.
Second, an analytical procedure is applied to refine the ini-
tial solution and derive a new repeater library and a set of
location candidates that fit the current design. Finally, the
DP algorithm is repeated with the new library and location
set for the low-power repeater insertion solution.
Our hybrid algorithm maintains the advantages of both
analytical and DP-based schemes, producing high-quality
interconnect designs efficiently. Furthermore, it is highly
practical due to the adoption of a realistic interconnect
model. Specifically, the interconnects are represented as a
sequence of wire segments with fixed lengths and distinct
RC characteristics, as derived from a routing procedure.
Moreover, our algorithm can handle forbidden zones, i.e.,
parts of interconnects through macrocells in which no re-
peater can be placed, and is thus applicable to nets routed in
real design scenarios.
We have implemented our repeater insertion algorithm
into a software tool, called RIP, and applied it to the de-
sign of low-power global interconnects. In comparison with
the conventional DP algorithms, our scheme achieves power
reductions of up to 37% with comparable runtimes. More-
over, for the same design quality, RIP achieves shorter run-
times by two orders of magnitude.
The remainder of the paper is organized as follows. Sec-
tion 2 describes previous research on repeater insertion.
The problem of low-power repeater insertion for multi-layer
two-pin interconnects is formulated in Section 3. In Sec-
tion 4, analytical constraints on the repeater widths and lo-
cations are derived that must be satisfied to minimize re-
peater power dissipation. Our algorithm is presented in Sec-
tion 5. Section 6 presents our experimental results. Section
7 summarizes our paper.
2. Previous Work on Repeater Insertion
Extensive work on repeater insertion has appeared in
the literature [4, 19]. Repeater insertion has been applied
to interconnect designs with various objectives such as
delay minimization [3, 12, 17] and power minimization
[5, 10, 13, 15, 16]. Several circuit models have been pro-
posed to compute the delay and power dissipation of re-
peaters such as the switch-level RC model [8] and mo-
ment matching model [1]. In analytical repeater insertion
schemes, the optimization objectives are described using
analytical functions of repeater width and location. The op-
timal repeater insertion solutions can be derived by set-
ting the derivatives of these functions to zero and solving
the ensuing equations [6, 7]. Analytical schemes assume
that the repeater widths and/or locations can be continu-
ously changed. In actual designs, however, repeater widths
are discrete due to layout design rules. In addition, the re-
peater locations are restricted to areas not occupied by cir-
cuit blocks. Consequently, when practical interconnects are
considered, the analytical objective functions become very
complex or even intractable.
To address the limitation of analytical schemes, a re-
peater insertion algorithm based on DP was proposed in
[11] and improved in [20] for interconnect delay mini-
mization. This algorithm has been modified for intercon-
nect power reduction [14, 18, 21]. In these DP schemes,
the possible widths and locations of the repeaters are dis-
crete and finite. The algorithms choose the best solution out
of all the possibilities. As a result, the efficiency of the DP
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
schemes may be significantly affected by the given repeater
library and potential repeater locations. Specifically, if the
allowable repeater widths and locations are too limited, the
quality of the interconnect may degrade substantially. On
the other hand, when the numbers of repeater widths and
locations are large, the DP algorithms become very time-
consuming [14].
DP algorithms perform well for interconnect delay re-
duction, since the delay of a min-delay interconnect design
is insensitive to its repeater widths and locations [9]. Con-
sequently, a small-size coarse-granularity repeater library
can be derived and applied to all interconnect designs with
limited performance loss [2]. However, when power reduc-
tion is considered, i.e., the problem addressed in this paper,
DP schemes become less effective. Power dissipation of re-
peaters is sensitive to repeater widths, since at a first or-
der approximation, gate capacitance depends linearly on re-
peater widths. Consequently, fine granularities are needed
for repeater widths, so that repeater widths close to the op-
tima can be used. However, the runtime complexity of DP
schemes becomes pseudo-polynomial when power mini-
mization is considered [14]. The use of fine granularities
leads to large numbers of possible repeater widths, result-
ing in excessively long runtimes.
3. Problem Formulation
wrwd (r1,c l1)1, (rm,cm,lm)(r2,c2,l2) (r4,c4,l4)(r3,c3,l3)
ze2ze1zs1 zs2
Figure 1. Non-uniform two-pin interconnect.
Figure 1 shows the structure of a multi-layer two-pin in-
terconnect. The sizes of the driver and load are equal to wd
and wr, respectively. The interconnect is made of m seg-
ments connected in a linear order as derived by a routing
tool. The ith segment has a given length li, and the resis-
tance and capacitance per unit length are ri and ci, respec-
tively. In a realistic interconnect routing scenario, the inter-
connect may go through some macro-blocks, in which no
repeater can be placed. The portions of interconnect within
the macro-blocks are represented as forbidden zones, whose
ranges are labeled by   zsi  zei   i  1      b.
The problem of low-power repeater insertion for multi-
layer two-pin interconnects can be described as follows:
Problem LPRI Let m be the number of interconnect seg-
ments, and let li, ri and ci be the length, resistance per unit
length, and capacitance per unit length of the segment i 
	
1

2
    
m  , respectively. Furthermore, let b be the number
of forbidden zones and   zsi  zei   i  1      b be the range
of each zone i. Given the widths of the driver and receiver
(wd,wr) and a timing target τt , compute the number of re-
peaters n and the width w j and location x j of each repeater
j  	 1

2
    
n  such that  j

x j    zsi  zei   i  1      b, the
delay of the interconnect is equal to or less than τt , and the
total power P of the repeaters is minimized.
4. Low-power Repeater Insertion Analysis
4.1. Repeater delay and power
(ri1,ci1,li1) (ri2,ci2,li2) wi+1wi
Rs/wi
Cpwi
ri1li1
ci1li1/2 ci1li1/2 ci2li2/2
ri2li2
ci2li2/2 cikiliki/2 cikiliki/2
(riki,ciki,liki)
riki
liki owi+1C
Figure 2. Circuit model of a repeater stage.
w0 wn+1w1 w2 wn
(R 0,C0) (R 1,C1) (R n−1,Cn−1 ) (R n,Cn )
Figure 3. Repeaters in a non-uniform net.
In our analysis of repeater delays, we use the widely
adopted Elmore delay model, although more accurate an-
alytical delay models can be used by replacing the El-
more delay with the corresponding delay functions. Fig-
ure 2 shows the circuit model of a single repeater stage in
which the driving repeater is represented using the switch-
level RC model. Each interconnect segment is described us-
ing the lumped-RC π model. The receiving repeater is mod-
eled as a capacitor. The Elmore delay of such a stage is de-
rived as:
τi  RsCp 
Rs
wi
  ∑
j  1  ki
li jci j  Cowi  1   ∑
j  1  ki
li jri jCowi  1
 ∑
j  1  ki
 
1
2
li jci j  ∑
h  j  1  ki
lihcih  ri jli j  (1)
where Rs, Co, Cp are the output resistance, input capaci-
tance, and output capacitance of a unit-size repeater, ki is the
number of segments between the repeaters i and i

1, and
  ri j  ci j  li j  are the unit length resistance, unit length capac-
itance, and length of segment j    1

2
    
ki  . When n re-
peaters are inserted into a multi-layer two-pin net as shown
in Figure 3, the total delay can be written as:
τtotal 
n
∑
i  0
τi  (2)
where τi is the Elmore delay of each stage. Note that, to sim-
plify Equation (2), we denote the widths of the interconnect
driver wd and receiver wr as w0 and wn  1, respectively.
From [5], the short-circuit power of repeaters is very
small for advanced VLSI technologies. The total power of
repeaters can therefore be approximated by the sum of the
dynamic power and leakage power and is given by:
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
P   αv2dd fCtotal load 
n
∑
i  1
βwi  (3)
where α is the signal activities, Ctotal load is the total gate
capacitance, and β is a constant. Since Ctotal load is a lin-
ear function of the total repeater width, Equation (3) can be
rewritten as follows:
P   c

γ
n
∑
i  1
wi  (4)
where c and γ are constants. Consequently, the minimiza-
tion of the repeater power dissipation is equivalent to the
minimization of the total repeater width p   ∑ni  1 wi. No-
tice that the power dissipation due to interconnects is a con-
stant and therefore not considered during our low-power re-
peater insertion derivation.
4.2. Constraints on repeater widths
In Problem LPRI, it can be proved that, when the power
dissipation is minimized, we have
τtotal   τt  (5)
Specifically, if τtotal   τt , the repeater widths can then be
further decreased to reduce power. As a result, we can use
Lagrange relaxation to analyze the optimal repeater inser-
tion solutions. In particular, we define
L   p

λ  τtotal  τt   (6)
When power is minimized, we should have
∂L
∂wi
  0

i   1
    
n

(7)
Combining Equations (2), (6), and (7), we have
1

λ  Co  Ri  1 
Rs
wi  1


Rs  Ci  Cowi 	 1 
w2i

  0

i   1
    
n

(8)
where Ri  1   ∑ j  1 
 ki  1 r  i  1  jl  i  1  j is the total intercon-
nect resistance between repeaters i

1 and i, and Ci  
∑ j  1 
 ki ci jli j is the total interconnect capacitance between
repeaters i and i

1, as shown in Figure 3.
4.3. Constraints on repeater locations
We next develop a set of constraints on the repeater loca-
tions. Specifically, we denote the locations of the repeaters
as xi  i   1  2     n. Since τtotal is a function of repeater
widths wi and locations xi, using the first order Lagrange
expansion, we have
∆τtotal  
n
∑
i  1
∂τtotal
∂wi
∆wi 
n
∑
i  1
∂τtotal
∂xi
∆xi  (9)
Since τtotal   τt for the optimal repeater insertion design,
∆τtotal   0  (10)
Combining Equations (9) and (10), we have
n
∑
i  1
∂τtotal
∂wi
∆wi 
n
∑
i  1
∂τtotal
∂xi
∆xi   0  (11)
From Equation (7), we have
1

λ∂τtotal∂wi
  0 
∂τtotal
∂wi
 

1
λ  (12)
Combining Equations (11) and (12), we have
n
∑
i  1
∆wi   λ
n
∑
i  1
∂τtotal
∂xi
∆xi  (13)
i−1
ri1 xi∆
ri1 xi∆
ci1 xi/2∆ ci1 xi/2∆
ci1 xi/2∆ ci1 xi/2∆
(Ri−1,Ci−1) (Ri,Ci)
Rs/wi−1
Rs/wi−1
Rs/wi
Rs/wi
Cpwi−1
Cpwi−1
Cpwi
Cpwi
Cowi+1
Cowi+1Cowi
Cowi
(a)
(b)
(c)
i i+1
Figure 4. Repeater downstream movement.
The computation of ∂τtotal  ∂xi can be described with the
aid of Figure 4. When repeater i moves downstream by
a small distance ∆xi, a small resistive and capacitive load
switches from the output to the input side of repeater i as
shown in Figure 4 (b) and (c), in which the circuits in the
dotted boxes represent the repeaters. As a result, the delay
increase between repeaters i

1 and i can be calculated as
∆τi  1 	 i    Cowiri1   Ri  1 
Rs
wi  1

ci1

∆xi  (14)
Similarly, the delay increase between repeaters i and i

1 is
∆τi 	 i 	 1  

  Ci

Cowi 	 1  ri1

Rsci1
wi

∆xi  (15)
The total delay change due to the repeater movement can
then be derived by combining Equations (14) and (15) as
follows:
∆τtotal   ∆τi  1 	 i  ∆τi 	 i 	 1  (16)
From Equation (16), we can derive the right-hand derivative
of τtotal with respect to xi by letting ∆xi approach zero:

∂τtotal
∂xi  	
  lim
∆xi 	 0 	
∆τtotal
∆xi
(17)
  Cori1  wi

wi 	 1 

Rsci1 
1
wi  1

1
wi


ci1Ri  1

ri1Ci 
Similarly, the left-hand derivative of τtotal with respect to xi
is

∂τtotal
∂xi  
  Cor  i  1  ki  1  wi  wi 	 1   Rsc  i  1  ki  1 
1
wi  1

1
wi


c  i  1  ki  1Ri  1  r  i  1  ki  1Ci  (18)
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
where r   i  1  ki  1 and c   i  1  ki  1 are the resistance and capac-
itance per unit length of the interconnect at the input of re-
peater i.
For the repeater insertion solution that minimizes power
dissipation, the total repeater width cannot be further re-
duced, namely
n
∑
i   1
∆wi   0  (19)
Consequently, from Equation (13), we have
λ
n
∑
i   1
∂τtotal
∂xi
∆xi   0  (20)
Furthermore, since the movements of repeaters are indepen-
dent from each other, we have
 i

λ∂τtotal∂xi
∆xi
 
0

(21)
Therefore, we have
λ  ∂τtotal∂xi    
0

(22)
λ  ∂τtotal∂xi   
0

(23)
When repeater i is located inside an interconnect segment,
we have ri1  r   i  1  ki  1 and ci1  c   i  1  ki  1 . It follows that

∂τtotal
∂xi    
∂τtotal
∂xi   , and consequently, Inequalities (22)
and (23) can be rewritten as
Cori1  wi

wi  1  	 Rsci1 
1
wi  1

1
wi

	
ci1Ri  1

ri1Ci  0 
(24)
5. Low-Power Repeater Insertion Algorithm
5.1. Analytical insertion
Solving the constraints in Section 4 for the optimal re-
peater widths and locations remains challenging, since τtotal
is not an explicit analytical function of xi. Furthermore, Re-
lations (22) and (23) are inequalities. We next present al-
gorithm REFINE, an iterative scheme that can derive a low-
power repeater insertion solution from an initial solution.
Figure 5 shows the pseudocode of our algorithm RE-
FINE. Given the specification of a multi-layer two-pin net
and an initial repeater insertion solution, REFINE derives a
new solution so that the total repeater width is minimized.
Specifically, in Line 1, REFINE calculates the optimal re-
peater widths of the initial solution by solving the non-linear
Equations (5) and (8) using Newton-Raphson method. The
parameter λ is also computed concurrently. It then calcu-
lates the total repeater width and initializes the loop-control
variable ε in Line 2. The optimization procedure is per-
formed iteratively using the while loop in Lines 3–10. In
Line 4, the left-hand and right-hand partial derivatives of
τtotal with respect to the repeater location are computed for
each repeater using Equations (17) and (18). If Relation (22)
(or (23)) does not hold, repeater i is moved downstream
REFINE
1 Compute wi and λ
2 wtotal  ∑i wi and ε  ∞
3 while ε  ε0
4 Calculate  ∂τtotal∂xi   and 
∂τtotal
∂xi   for each repeater
5 Move repeaters according to the results of Line 4
6 Update lumped (Ri, Ci) driven by each repeater i
7 Solve for wi and λ
8 wtotal old  wtotal , wtotal  ∑i wi
9 ε   wtotal old  wtotal   wtotal old
10 return xi,wi
Figure 5. Algorithm REFINE.
(or upstream) by a preselected distance. Based on Equation
(13), such a movement results in the reduction of total re-
peater width. (If both relations do not hold, the moving di-
rection is chosen for larger reduction.) A repeater will not be
moved if the movement places the repeater inside a forbid-
den zone. After the movement, the lumped RC loads driven
by each repeater are updated in Line 6. In Line 7, REFINE
recalculates the width of each repeater and λ using Equa-
tions (5) and (8). The total repeater width is then updated
in Line 8. The iteration continues until the improvement is
smaller than a preselected threshold ε0. REFINE returns the
repeater insertion solution in Line 10.
5.2. Repeater insertion heuristic
Though algorithm REFINE can quickly derive low-power
repeater solutions, it assumes that the repeater width is con-
tinuously changeable, which is not practical. Furthermore, it
requires an initial solution. In this section, we address these
issues by combining REFINE with a DP algorithm.
RIP
1 Run DP with a coarse repeater library and locations
2 Run algorithm REFINE using the result from Line 1
3 Create a new library B and a set of candidate locations S
4 Run DP with B and S.
Figure 6. Repeater Insertion Algorithm RIP.
The pseudocode of our hybrid algorithm, called RIP, is
given in Figure 6. RIP first performs a DP algorithm with
coarse repeater widths and location candidates to derive an
initial solution. Algorithm REFINE is then used to improve
the initial solution in Line 2. In Line 3, RIP generates a new
repeater library by rounding each repeater width from RE-
FINE to its nearest valid discrete width. RIP also creates a
small set of repeater location candidates by choosing sev-
eral locations around the ones from REFINE. In Line 4, the
DP algorithm is performed again with the new repeater li-
brary and location set to compute the final solution.
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
6. Experimental Evaluation
We have applied Algorithm RIP to various intercon-
nect designs to demonstrate its effectiveness. Our intercon-
nects and repeaters were generated using 0.18µm technol-
ogy. Since our technique was used for global interconnects,
each net was routed on metal4 and metal5 only. The num-
ber of segments for each net varied from 4 to 10. The length
of each segment ranged from 1000 µm to 2500 µm. A sin-
gle forbidden zone was created for each net. The length of
the zone ranged from 20% to 40% of the total interconnect
length. The location of forbidden zone was uniformly dis-
tributed along the corresponding interconnect.
During the application of RIP, a DP algorithm was per-
formed in conjunction with a library of 5 repeaters. The
smallest repeater width and the width granularity were set
to 80u, where u is the minimal repeater width. The candi-
date locations were uniformly distributed along the inter-
connects with a granularity of 200 µm, excluding the for-
bidden zone. The ensuing results were then improved using
our analytical solver REFINE. The repeater widths and lo-
cations of the refined solution were used to generate a new
repeater library and a new set of location candidates. In par-
ticular, we rounded each repeater size to the nearest width
in a repeater library with a granularity of 10u. The possi-
ble locations of repeaters included the locations derived by
REFINE plus 10 locations before and after, with granular-
ity 50 µm. Finally, the DP algorithm was performed using
the new library and location set.
As a comparison, we used the DP algorithm in [14], one
of the most prevalent and well cited schemes, to derive low-
power repeater insertion solutions for the same intercon-
nects and timing targets. We chose a repeater library of size
10 so that the total runtime of the DP scheme was compara-
ble with that of our proposed scheme. We set the minimum
repeater width to 10u and changed the granularity of the re-
peater widths. The candidate locations were uniformly dis-
tributed along the interconnects with a granularity of 200
µm, except for the forbidden zone.
Table 1 shows the experimental results from 20 intercon-
nect designs. We designed each interconnect 20 times with
timing targets ranging from 1.05 τmin to 2.05 τmin, where
τmin is the minimum delay of the net. Columns 2–3 compare
our scheme with the DP scheme in [14] with library size 10
and granularity g   10u. As can be seen, our scheme can
achieve maximal power savings ∆Max up to 37.14%. Our
scheme always succeeded in deriving solutions that satisfy
the timing constraint, whereas the DP scheme resulted in
several violations (6 out of 20 on the average), as shown in
Column 3. When the library granularity g increases to 20u
and 40u, the DP scheme can derive valid repeater insertion
solutions for all the timing targets. As shown in Columns 5
and 7, our scheme achieves average power reductions∆Mean
of 3.6% and 9.5% over the DP scheme, on the average.
g=10u g=20u g=40u
Net ∆Max VDP ∆Max ∆Mean ∆Max ∆Mean
(%) (%) (%) (%) (%)
1 22.95 7 10.00 4.72 28.57 11.41
2 17.39 6 13.33 2.57 15.79 7.11
3 26.19 6 7.69 2.85 21.05 8.47
4 15.87 3 20.00 2.60 21.43 8.35
5 20.69 9 13.33 4.83 21.43 9.96
6 15.69 4 15.38 3.27 28.57 9.98
7 15.69 7 10.53 2.27 28.57 8.96
8 30.19 7 20.00 5.97 30.00 12.19
9 20.00 7 10.00 4.27 28.57 11.31
10 13.89 6 7.69 2.78 21.05 8.37
11 23.40 6 12.50 2.12 30.00 9.43
12 10.96 6 7.69 3.14 20.00 7.44
13 17.86 8 10.00 3.95 21.43 8.54
14 29.82 9 9.09 3.35 28.57 10.20
15 19.23 5 10.00 2.65 21.43 8.32
16 37.14 8 17.11 5.53 21.43 11.62
17 24.62 6 9.09 3.19 26.67 9.84
18 4.35 4 8.33 2.47 21.43 8.41
19 11.11 7 9.09 3.36 21.43 9.80
20 29.58 7 16.13 5.50 21.43 10.90
Ave 20.33 6 11.8 3.6 23.94 9.53
Table 1. Power reduction for two-pin nets.
2.5 3 3.5 4 4.5 5 5.5−10
0
10
20
30
Timing constraint (ns)
Im
pr
ov
em
en
t (
%)
ΙΙΙ ΙΙΙ
(a)
2.5 3 3.5 4 4.5 5 5.50
5
10
15
20
25
Timing constraint (ns)
Im
pr
ov
em
en
t (
%)
(b)
Figure 7. Power savings over DP scheme [13]
with repeater granularity (a)10u (b)40u.
Figure 7 reveals the relation between the timing targets
and the power savings of our approach over the DP scheme
with a repeater library of size 10. When the repeater gran-
ularity is small, large-size repeaters are missing in the li-
brary. As a result, the DP scheme fails to find any valid solu-
tion when the timing target is very tight, as shown by zone I
in Figure 7(a). When the timing target is very loose, i.e.,
zone III, the DP scheme performs as well as our approach,
since large-size repeaters are not required. In fact, there ex-
ist a limited number of cases when the DP scheme provides
better solutions. Our scheme achieves the best power sav-
ings when the timing target is in zone II. On the other hand,
if the repeater library has a coarse width granularity, our
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
scheme always outperforms the DP scheme as shown in
Figure 7(b). The power savings increase when the timing
target becomes loose, since such interconnect designs re-
quire various small-size repeaters, which seldom exist in a
coarse-granularity library.
To demonstrate the favorable runtime-quality tradeoff
of our scheme, we compared Algorithm RIP with the DP
scheme that used a repeater library with a fixed width range
of (10u, 400u). The repeater width granularity gDP varied
from 40u to 10u, resulting in different numbers of repeaters
in the library. Table 2 shows that, when gDP decreases,
the average power savings (∆) of our scheme over the DP
scheme reduces from 14.2% to 0.3%. However, the run-
time TDP increases significantly. The average runtime of our
scheme is about 0.17 seconds, more than 200 times faster
than the DP scheme when both schemes produce compara-
ble results. It is worth mentioning that when gDP   10u, the
DP scheme used the same repeater library as that of RIP.
Still, Algorithm RIP achieved slightly better power savings
than the DP scheme. Such a seemingly surprising result is
due to the fact that RIP judiciously chooses the candidate
repeater locations, which have a smaller local granularity
than that of the DP scheme.
gDP(u) ∆   %  TDP (s) Speedup
40 14.2 1.0 6
30 7.8 1.9 11
20 4.0 5.8 34
10 0.3 34.45 203
Table 2. Power savings and speedup tradeoff.
7. Conclusion
In this paper, we propose a hybrid repeater insertion al-
gorithm for interconnect power minimization. The novelty
of our scheme is its judicious combination of an analytical
solver with a DP scheme. Specifically, the analytical solver
provides a concise repeater library and limited repeater lo-
cation candidates, facilitating the efficient execution of the
DP scheme. Our approach is highly practical, capable of
handling multi-layer nets with forbidden zones. When ap-
plied to two-pin interconnects, our approach achieves up to
37% greater power savings or a speedup of more than 2
orders of magnitude in comparison with conventional DP
schemes.
Our greedy analytical solver REFINE can be improved
in several ways. Specifically, better power savings may be
achieved if repeaters are allowed to move across small-size
forbidden zones. Moreover, REFINE may be performed sev-
eral times for further power reduction. We are currently ex-
tending our hybrid scheme to the design of low-power in-
terconnect trees.
References
[1] C. J. Alpert, A. Devgan, and S. T. Quay. Buffer insertion
with accurate gate and interconnect delay computation. In
DAC, June 1999.
[2] C. J. Alpert, R. G. Gandham, J. L. Neves, and S. T. Quay.
Buffer library selection. In ICCD, Sept. 2000.
[3] C. J. Alpert, J. Hu, S. S. Sapatnekar, and P. G. Villarrubia.
A practical methodology for early buffer and wire resource
allocation. IEEE Trans. CAD, 22(5), May 2003.
[4] H. B. Bakoglu. Circuits, Interconnects, and Packaging for
VLSI. Reading, MA: Addison-Wesley, 1990.
[5] K. Banerjee and A. Mehrotra. A power-optimal repeater in-
sertion methodology for global interconnects in nanometer
designs. IEEE Trans. VLSI, 49(11):2001–2007, Nov. 2002.
[6] C.-C. N. Chu and D. F. Wong. Closed form solution to si-
multaneous buffer insertion/sizing and wire sizing. In Inter.
Symp. on Physical Design, Apr. 1997.
[7] C. C. N. Chu and D. F. Wong. A new approach to simultane-
ous buffer insertion and wire sizing. In Inter. Conf. on CAD,
Nov. 1997.
[8] J. Cong, L. He, C. K. Koh, and P. H. Madden. Performance
optimization of VLSI interconnect layout. Integration, the
VLSI Journal, 21(1):1–94, Jan. 1996.
[9] J. Cong, T. Kong, and D. Z. Pan. Buffer block planning for
interconnect-Driven floorplanning. In Inter. Conf. on CAD,
Nov. 1999.
[10] G. S. Garcea, N. P. van der Meijs, and R. H. Otten. Simul-
taneous analytical area and power optimization for repeater
insertion. In Inter. Conf. on CAD, Nov. 2003.
[11] L. Ginneken. Buffer placement in distributed RC-tree net-
works for minimal Elmore delay. In Inter. Symp. on Circuits
and Systems, 1990.
[12] N. Hedenstierna and K. O. Jeppson. CMOS circuit speed and
buffer optimization. IEEE Trans. CAD, 6(2):270–280, Feb.
1987.
[13] P. Kapur, G. Chandra, and K. C. Saraswat. Power estima-
tion in global interconnect and its reduction using a novel re-
peater optimization methodology. In DAC, June 2002.
[14] J. Lillis, C. K. Cheng, and T.-T. Y. Lin. Optimal wire sizing
and buffer insertion for low power and a generalized delay
model. J. of Solid-State Circuits, 31(3):437–447, Mar. 1996.
[15] X. Liu, Y. Peng, and M. C. Papaefthymiou. Practical re-
peater insertion for low power: What repeater library do we
need? In DAC, June 2004.
[16] A. Nalamalpu and W. P. Burleson. A practical approach
to DSM repeater insertion: Satisfying delay constraints
while minimizing area and power. In IEEE International
ASIC/SOC Conference, Sept. 2001.
[17] M. Nekili and Y. Savaria. Optimal methods of driving inter-
connections in VLSI circuits. In Inter. Symp. on Circuits and
Systems, May 1993.
[18] T. Okamoto and J. Cong. Buffered Steiner tree construction
with wire sizing for interconnect layout optimization. In In-
ter. Conf. on CAD, 1996.
[19] R. Otten. Global wires harmful? In Inter. Symp. on Physical
Design, Apr. 1998.
[20] W. Shi and Z. Li. An O(nlogn) time algorithm for optimal
buffer insertion. In DAC, June 2003.
[21] H. Zhou, D. F. Wong, I.-M. Liu, and A. Aziz. Simultane-
ous routing and buffer insertion with restrictions on buffer
locations. IEEE Trans. CAD, 19(7):819–824, 2000.
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
