An O(bn^2) Time Algorithm for Optimal Buffer Insertion with b Buffer
  Types by Li, Zhuo & Shi, Weiping
An Time Algorithm for Optimal Buffer Insertion with b
Buffer Types *
Zhuo
Dept. of Electrical Engineering
Texas University
College Station, Texas 77843, USA.
zhuoli@ee.tamu.edu
Abstract
Buffer insertion is a popular technique to reduce the in-
terconnect delay. The classic buffer insertion algorithm of
van has complexity where n is the
number of buffer positions. Cheng and Lin extended
van Ginneken’s algorithm to allow b buffer types in time
For modern design libraries that contain hun-
dreds of buffers, it is a serious challenge to balance the
speed andperformance of the buffer insertion algorithm. 
In thispaper; we present a new algorithmthat computes
the optimal buffer insertion in time. The reduction
is achieved by the observation that the C )pairs of the
candidates that generate the new candidates mustform a
hull. On industrial test cases, the new algorithm is 
faster than the previous best buffer insertion algorithms by
ordersofmagnitude.
1. Introduction
Delay optimization techniques for interconnect are in-
creasingly important for achieving timing closure of high
performance designs. One popular technique for reducing 
interconnect delay is buffer insertion. A recent study by
Saxena et al. projects that 35% of all cells will be
intra-block repeaters for the 45 node. Consequently, al-
gorithms that can efficiently insert buffers are essential for
the design automation tools. 
In 1990,van proposed an optimal buffer 
insertion algorithm for one buffer type. His algorithm has
time complexity where n is the number of candi-
date buffer positions. Lillis, Cheng and Lin [7] extended
van Ginneken’s algorithm to allow buffer types in time 
This research was supported by the NSF grants CCR-0098329,
01 13668, 512-0266-2001.
Weiping Shi
Dept. of Electrical Engineering
Texas University
College Station, Texas 77843, USA.
wshi@ee.tamu.edu
Recently, Shi and Li presented a new algo-
rithm with time complexity O(nlogn) for 2-pin nets, and
O(n n) for multi-pin nets, for one buffer type. Several
works have built upon van Ginneken’s algorithmand its ex-
tension for multiple buffer types to include wire sizing
simultaneous tree construction [8, 6, 5, 9, noise con-
straints [2] and resource minimization [7,
Modern design libraries may contain hundreds of differ-
ent buffers with different input capacitances, driving resis-
tance, intrinsic delay, power level, etc. If every buffer avail-
able for the given technology is supplied, it is stated in
[3] that the current algorithms could possibly take days or
even weeks for large designs since all these algorithms are 
quadratic in terms of b. Alpert et. [3] studied how to re-
duce the size of the buffer library with a clustering algo-
rithm. Though the buffer library size is reduced, the solu-
tion quality is often degraded accordingly.
In this paper, we propose a new algorithm that performs
optimal buffer insertion with buffer types in time.
Our speedup is achieved by the observation that the can-
didates that generate new buffered candidates must lie on
the convex hull of (Q,C). Experimental results show that
our algorithm is significantly faster than previous best algo-
rithms.
Section 2 formulates the problem. Section 3 describes 
the new algorithm. Simulation results are given in Section4
and conclusions are given in Section 5.
2. Preliminary
A net is given as a routing tree T = (V,E) , where
V = and E V x V. Vertex is the 
source vertex and also the root of T , is the set ver-
tices, and is the set of internal vertices. Each sink ver-
tex is associated with sink capacitance and
required arrival time A buffer library contains
differenttypes of buffers and its size is representedby b. For
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
each buffer type the intrinsic delay is driv-
ing resistance is and input capacitance is A
function f : specifies the types of buffers al-
lowed at each internal vertex. Each edge e E is associ-
ated with lumped resistance and capacitance
Following previous researchers [14, 7, 9, 15, we use
the Elmore delay for the interconnect and the linear delay 
for buffers. For each edge e = signals travel from
to The delay of is
where is the downstream capacitance at For any 
buffer type at vertex the buffer delay is
. +
where is the downstream capacitance at When a
buffer is inserted, the capacitance viewed from the upper
streamis
For any vertex V , let be the down-
stream from and with being the root. Once we decide
where to insert buffers in we have a candidate a for
The delay from to sink under a is
= +
where the sum is over all edges e in the path from to s. If
is a buffer in then is the buffer delay. If is not
a buffer in a, then = The slack of v under a is
a)= min - a)}.
Buffer Insertion Problem: Given routing tree T =
(V,E ) ,sink capacitance and for each sink 
capacitance and resistance for each edge e ,
possible buffer position and buffer library find a can-
didatea forT that maximizes a).
The effect of a candidate to the upstream is described
by slack Q and downstream capacitance Define
as the downstream capacitance at node under
candidate a. For any two candidates and of
we say dominates if and
The set of nonredundant candidates
of which we denote as is the set of candidates
such that no candidate in dominates any other candi-
date in and every candidate of is dominated by
some candidates in Once we have the can-
didate that gives the maximum a )can be found eas-
ily. The number of total nonredundant candidates is at most 
1forone buffer type and 1for b buffer types [
where n is the number of candidatebuffer positions.
3. New Algorithm
The previous best algorithm for multiple buffer types by
Lillis, Cheng and Lin consists of three major operations: 
1) adding buffers at a buffer position in time, 2)
adding a wire in time, and 3) merging two branches
in + time, where and are the numbers 
of buffer positions in the two branches. As a result, their 
algorithm has time complexity In this section, 
we show that the time complexity of the first operation,
addingbuffers at a bufferposition, can be reducedto
and thus our algorithm can achieve total time complexity
Assume we have computed the set of nonredundant can-
didates for and now reach a buffer position
see Fig. 1. Wire (v, has 0 resistance and capacitance. 
Define as the slack if we add a buffer type at v for
any candidatea in
If we do not insert any buffer at then every candidate 
for is a candidate for If we insert a buffer at 
then for every buffer type = ... , b, there will be
a new candidate
Note that some of the new candidates could be redun-
dant. The algorithm of Lillis, Cheng and Lin takes
time to generate all and time to insert
dant ones into the list of nonredundant candidates 
Figure 1. consists of buffer position
and
We show how to generate all in time. Since 
all candidates discussed in this section are in we
will write for a) ,and for a).Sup-
pose buffers in the buffer library are sorted according to its
driving resistance in non-increasingorder,
... If some buffer types are not al-
lowed at we simply omit them without affecting the rest 
of the algorithm. For any buffer type B, define the 
best candidate for as the candidate such
that maximizes among all candidates of
If there are multiple a’s that maximize (a) ,the one with
minimum is chosen.
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
Lemma 1 For any two buffer types and where
j,let theirbest candidatesbe and respectively.Then
we must have
Proof:From the definition of we have 
and Consequently,
.
Therefore, - - 0.
Since > j, If >
If = then it is easy to get
= and = From the defini-
tion, when there are multiple a's that maximize the
one with minimum is chosen. Thus and should
Lemma 1 implies that the best candidates . . . , for
buffer types . . . , are in increasing order of How-
ever,this is not enough for an time algorithm. In the
following, we define the concept of convexpruning,which
is important in generating new candidates
Convex pruning: Let and be three 
dant candidates of such that 
be the same candidate,which means =
If
-
-
-
-
then we prune candidatea
Convex pruning can be explained by Figure 2. Consider
Q as the Y-axis and as the X-axis. Then the set of
dundant candidate are a set of points in the two-
dimensional plane. Candidate in the above definition is 
shown in Figure and is pruned in Figure Call
the candidates after convex pruning It can be seen
that is a monotonically increasing sequence, while 
is a convexhull.
Figure 2. (a) Nonredundant candidates 
on (Q,C) plane. (b) Nonredundantcandidates
after convex pruning.
Function Convexpruning performs convex pruning
for any list of nonredundant candidates sorted in increas-
ing and order. The following C code the data 
structure for each candidate in the list:
typedef struct Candidate
Q, C;
struct Candidate *next, 
double link list
} Candidate;
Let the candidate with minimum C be We add a
dummy candidate pointed by header to
simplify the algorithm. Function Lef checks if 
a2 and a3 form a left turn on the plane. It is the same as
the condition in Eq. (2).
void *header)
Candidate
= header;
a2 =
a3 =
while (a3 != NULL) {
a2,
prune a2 and move backward
free (a2 ;
= a3;
= al;
a2 al;
=
move forward
a3 =
a2 =
=
} else
Lemma2 Given any set of k nonredundant candi-
dates sorted in increasing Q and
convexpruningin time.
Proof: This procedure is known as Graham's scan in com-
putational geometry It finds the convex hull of a set of 
points in sorted order in linear time. 
It is well known that a set of points form a convexhull if
and only if there are no consecutive a and that sat-
isfy Eq. (2). Therefore, Convexpruning is correct since 
it checks all consecutive candidates. 
To analyze the time complexity, consider the num-
ber of forward and backward moves. Each time
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
moves backward, it deletes a can-
didate. Therefore, there can be at most k backward moves. 
The number of forwardmoves is the size of the list plus the 
number of backward moves. Therefore the number of for-
ward moves is at most 2k. Hence the time complexity is
Lemma3 For any type B, its best can-
didate that maximizes is not pruned by
Proof: Consider any candidate with >
According to the definition of we have 
Therefore,
I -
<
we have 
Therefore,
where is any candidateswith and is any
candidates with > Accordingto the definition
of convex pruning, is not pruned.
Lemma4 Let the set of nonredundant candidates
ter Convexpruning be and assume are
sorted in increasingQ and order. Considerany three can-
didates a, in such that < <
For any buffer type B, i f then
2
Proof: From the definition of convex pruning, we have 
If then
Therefore,
Lemma 4 impliesthat for any buffer type if candidate
a maximizes among its previous and next consecu-
tive candidates in then a maximizes among
all candidates in
Function identifies from and
generates new candidates = 1,.. . ,b. Nonredundant
candidates in are stored in increasing order us-
ing a double link list pointed by header.Buffer types are 
sorted in non-increasing driver resistance order and stored
in array B. Function P a) computes as defined
in Eq. (1).
Candidate *header)
Candidate
Candidate *beta ;
int i;
Convexpruning (header);
= header;
a2 =
for = i i ++) {
while (a2 NULL) {
if c
= al->next;
a2 = al->next;
break;
} else
generate new candidate 
=
beta = B
sort beta’s in nondecreasing C order;
return beta’s; 
Theorem 1 is a bufferposition, wire is a wire
with zero resistance and capacitance,nonredundantcandi-
dates ofN are stored in increasingQ and order, then
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
functionAddBuff er generates all new candidates
in time.
Proof: Let the set of nonredundant candidates af-
ter Convexpruning be From Lemma 3, we
know that all best candidates are in From
Lemma and Lemma 4, starting from the first candi-
dates in function AddBuffer can find all in
the increasing order of i.
Now consider the time complexity. Function
Convexpruning takes time according to 
Lemma 2. The for loop takes + b) = time.
It takes only time to sort the entire buffer li-
brary in terms of the input capacitance and estab-
lish the order from buffer index i to the order in 
Each time function AddBuff er is called, the new candi-
dates can be sorted in nondecreasing order by using
Theorem 2 Given a set nonredundantcandidates
sorted in increasing Q and all b new candidates
can be inserted in time.
Proof: Since are in the nondecreasing order of ca-
pacitance and the given set of nonredundant candi-
dates are in nondecreasing order of it takes +
Since the operation of adding a buffer is reduced to
time from Theorem 1 and 2, it is easy to see that
buffer insertion with b buffer types can be done in worst
case time with our new algorithm.
the index in time.
= time to merge the two sorted lists. 
4. Simulation
Both the algorithm of Lillis et al. [7] and the new algo-
rithm are implemented in C andrunon a Sun SPARC work-
stations with 400 and 2 GB memory. The device and 
interconnect parameters are based on TSMC 180 nm tech-
nology. We have 4 different buffer libraries, with the size 
8, 16, 32 and 64 respectively. is chosen from 180
to 7000 Q, is chosen from 0.7 to 23 and
is chosen from 29 ps to 36.4 ps. The sink capaci-
tances range from 2 to 41 The wire resistance is 0.076 
and the wire capacitance is 0.118 Table
shows for large industrial circuits, the new algorithm is up 
to 11times faster than Lillis’ The memory usage 
is not shown in the table, but there is only almost 2% mem-
ory overhead due to the double linked list used by the new
algorithm. When b is small, algorithm has a little
time overhead compared to Lillis’ algorithm. due to func-
tion
Fig. 3 shows the time complexity curve of two algo-
rithms for the net with 1944 sinks and 33133 buffer posi-
tions with respect to the size of buffer library b. In the fig-
ure, the y axis is normalized to the running time of the case 
when the buffer library size is 8.Though the worst case time 
complexity of Lillis’ algorithm is quadratic in terms of b, it
behaves more like a linear function of as observed in
The time complexity curve of our algorithm is also linear,
but has a much smaller slope. 
Figure 3. Comparison of normalized running
time with respect to buffer library size among
two algorithms. Number of sink is and
number of buffer positions is 33133.
Fig. 4 shows the time complexity of the two algo-
rithms for the net with 1944 sinks, with respect to the num-
ber of buffer positions n. The buffer library size is 32. In 
the figure, the y axis is normalized to the running time of 
the case with 1943 buffer positions. We can see that while
Lillis’ and our algorithms both behave quadratically, our al-
gorithm shows much slower growing trend since the op-
eration of adding a buffer becomes more dominant among 
three major operations when n increases.
5. Conclusion
We presented a new algorithm for optimal buffer inser-
tion with buffer types of worst case time This is 
an improvement of the previous best algorithm
Simulation results show our new algorithm is significantly 
faster than algorithms for large industrial circuits 
with large buffer libraries. Our algorithm can also be ap-
plied to reduce buffer cost. We leave the details to the jour-
nal version.
References
C. Alpert and A. Devgan. Wire segmenting for improved
buffer insertion. In DAC, pages 588-593, 1997.
[2] C. J. Alpert, A. Devgan, and S. T. Quay. Buffer insertion for
noise and delay optimization. In DAC, pages 362-367, 1998. 
0 10 20 30 40 50 60 70
0
2
4
6
8
10
12
Buffer Library Size
N
or
m
al
iz
ed
 R
un
ni
ng
 T
im
e
O(bn2)
O(b2n2)
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
Figure 4. Comparison of normalized running 
time with respect to buffer positions among
two algorithms. Number of sink is and
number of buffer types is 32.
[3] C. J. Alpert, R. G. Gandham, J. L. Neves, and S. T. Quay.
Buffer library selection. In pages 221-226,2000.
[4] R. L. Graham. An efficient algorithm for determining the con-
vex hull of a fiite planar set. Information Processing Letters, 
[5] M. Hrkic and J. Lillis. Buffer tree synthesis with considera-
tion of temporal locality, sink polarity requirements, solution 
cost and blockages. In ISPD,pages 98-103,2002.
[6] M. Hrkic and J. Lillis. S-tree: a technique for buffered routing
tree synthesis. In DAC, pages 578-583,2002.
[7] J. Lillis, C. K. Cheng, and T.-T. Y. Lin. Optimal wire siz-
ing and buffer insertion for low power and a generalized de-
lay model. IEEE Trans. Solid-state Circuits, 3
1996.
J. Lillis, C.-K. Cheng, T.-T. Y.Lin, and C.-Y. Ho. New per-
formance driven routing techniques with explicit
and simultaneous wire sizing. In DAC, pages
400,1996.
[9] T. Okamoto and J. Cong. Buffered steiner tree construction 
with wire sizing for interconnect layout optimization. In 
CAD,pages 44-49, 1996.
P. Saxena, N. Menezes, P. Cocchini, and D. A. Kirkpatrick.
Repeater scaling and its impact on CAD. IEEE Trans. CAD,
[ J W. Shi and Z. Li. A fast algorithm for opitmal buffer inser-
tion. IEEE Trans.CAD, to appear.
W. Shi and Z. Li. An time algorithm for optimal 
buffer insertion. In pages 580-585,2003.
W. Shi, Li, and C. J. Alpert. Complexity analysis and 
speedup techniques for optimal buffer insertion with mini-
mum cost. In ASPDAC, pages 609414,2004.
[ L. P. P. van Buffer placement in distributed
tree network for minimal delay. In ISCAS,pages
868, 1990.
H. Zhou, D. F. Wong, I. M. Liu, and A. Aziz. Simultane-
ous routing and buffer insertion with restrictions on buffer lo-
cations. IEEE Trans.CAD, 2000.
1972.
m
337
1944
2676
Table Simulation results for industrial test 
cases, where m is the number of sinks, is
the numberof buffer positions,and is the li-
brary size.
0 1 2 3 4 5 6 7
x 104
0
20
40
60
80
100
120
140
160
180
Buffer Positions
N
or
m
al
iz
ed
 R
un
ni
ng
 T
im
e
O(bn2)
O(b2n2)
Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’05) 
1530-1591/05 $ 20.00 IEEE 
