Delay optimization of linear depth boolean circuits with prescribed input arrival times  by Rautenbach, Dieter et al.
Journal of Discrete Algorithms 4 (2006) 526–537
www.elsevier.com/locate/jda
Delay optimization of linear depth boolean circuits
with prescribed input arrival times
Dieter Rautenbach ∗, Christian Szegedy, Jürgen Werber
Forschungsinstitut für Diskrete Mathematik, Lennéstr. 2, D-53113 Bonn, Germany
Available online 11 July 2005
Abstract
We consider boolean circuits C over the basis Ω = {∨,∧} with inputs x1, x2, . . . , xn for which
arrival times t1, t2, . . . , tn ∈N0 are given. For 1 i  n we define the delay of xi in C as the sum of
ti and the number of gates on a longest directed path in C starting at xi . The delay of C is defined as
the maximum delay of an input.
Given a function of the form
f (x1, x2, . . . , xn) = gn−1(gn−2(. . . g3(g2(g1(x1, x2), x3), x4) . . . , xn−1), xn)
where gj ∈ Ω for 1  j  n − 1 and arrival times for x1, x2, . . . , xn, we describe a cubic-time
algorithm that determines a circuit for f over Ω that is of linear size and whose delay is at most 1.44
times the optimum delay plus some small constant.
© 2005 Elsevier B.V. All rights reserved.
Keywords: Circuit; Straight-line program; Depth; Delay; Computer arithmetic; VLSI design
1. Motivation
The motivation for the present work is a problem in VLSI design. At one of the final
stages in the design process of a chip, the tool that performs the so-called static timing
analysis [2–4] detects paths of ‘negative slack’. These are paths on which the propagation
* Corresponding author.
E-mail addresses: rauten@or.uni-bonn.de (D. Rautenbach), szegedy@or.uni-bonn.de (C. Szegedy),
werber@or.uni-bonn.de (J. Werber).1570-8667/$ – see front matter © 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.jda.2005.06.006
D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537 527Fig. 1.
of the signal is too slow to guarantee the correct functioning of the chip. The analysis
tool reports these paths, which usually consist of a sequence of gates g1, g2, . . . , gm that
perform some elementary logical operation on their inputs (see Fig. 1).
The output of the final gate gm is a boolean function f (x1, . . . , xn) of the inputs. If
we are given an arrival time, say t(xi), for each input xi and a delay, say d(gj ), for each
gate gj , then static timing analysis will determine the arrival time of the output of gate
gm, i.e., the time at which the evaluation of f terminates, as the maximum, over all paths
from an input xi to the output of gm, of the sum of t(xi) and all gate delays along the
path. If for example for the path in Fig. 1, m = 3, g1 is a 3-and, g2 is a 2-or and g3 is
a 2-nand (for undefined terminology we refer to [9] or [12]), then f (x1, x2, x3, x4, x5) =
¬(((x1 ∧ x2 ∧ x3) ∨ x4) ∧ x5) and the evaluation of f terminates at
max
{
t(x1) + d(g1) + d(g2) + d(g3), t(x2) + d(g1) + d(g2) + d(g3),
t(x3) + d(g1) + d(g2) + d(g3), t(x4) + d(g2) + d(g3), t(x5) + d(g3)
}
.
In order to guarantee that the chip works correctly, we have to find a faster representation
of f . This leads us to the algorithmical problem which we state more precisely in the next
section.
2. Problem
We consider boolean circuits [9,12] over the basis Ω = {∨,∧} whose elements have
fan-in 2 for functions f : {0,1}n → {0,1} of the form
(1)f (x1, x2, . . . , xn) = gn−1(gn−2(. . . g3(g2(g1(x1, x2), x3), x4) . . . , xn−1), xn)
where gj ∈ Ω for 1 j  n − 1. Clearly, (1) immediately leads to a similar circuit as in
Fig. 1.
If we are given a non-negative integer arrival time ti ∈N0 = {0,1,2, . . .} for input xi for
1 i  n, then we define the delay delay(xi) of xi in some circuit C as the sum of ti and
the number of gates on a longest directed path in C starting at xi . The delay delay(C) of
C is defined as the maximum delay of an input in C. Given a function f and arrival times
as above, we denote the minimum delay of a circuit for f by delay(f ). For some first and
fundamental results on this notion of delay we refer the reader to [8].
There is a simple lower bound on the achievable delay extending a classical observation
of Winograd [13].
528 D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537Lemma 1. If f : {0,1}n → {0,1} is computable over Ω and dependent on each of its inputs
x1, x2, . . . , xn, which have arrival times t1, t2, . . . , tn ∈N0, then
(2)delay(f )
⌈
log2
(
n∑
i=1
2ti
)⌉
.
Proof. The existence of a circuit C for f over Ω with delay T implies the existence of a
rooted binary tree with n leaves of depths at most (T − t1), (T − t2), . . . , (T − tn) ∈ N0.
By Kraft’s inequality, such a tree exists if and only if
∑n
i=1 2−(T −ti )  1 or, equivalently,
T  log2(
∑n
i=1 2ti ), and the proof is complete. 
Note that if f (x1, x2, . . . , xn) =∨ni=1 xi or f (x1, x2, . . . , xn) =∧ni=1 xi , then a tree as
considered in the above proof immediately leads to a circuit for f of minimum delay and
can obviously be constructed in polynomial time (see [8]).
Our main result is a cubic-time dynamic programming algorithm that produces a circuit
for functions f as in (1) whose delay is at most about 1.44 times the value of the lower
bound (2). We describe this algorithm first for the function f0 : {0,1}2n → {0,1} with
(3)f0(x1, y1, x2, y2, . . . , xn, yn) = ((. . . (((x1 ∧ y1) ∨ x2) ∧ y2) ∨ . . .) ∨ xn) ∧ yn.
The function f0 is known in computer arithmetic [10,11]. It can be used to perform the
carry-bit calculation for the addition of two n-bit binary numbers (for details see [9]). As
part of their circuits for addition Brent [1] and Khrapchenko [5] both described circuits for
f0 of depth log2(n) + O(
√
log(n)) (cf. also [6]). Nevertheless, their original constructions
and analysis hardly generalize to the case of arrival times and would certainly not lead to
polynomial time algorithms.
The existence of relevant signal arrival time differences has been acknowledged in some
recent engineering publications [7,14] that propose constructions for binary adders taking
these differences into account. The greedy approaches used by Liu et al. [7] and Yeh and
Jen [14] lead to adders for two n-bit binary numbers that are of size O(n2) but for which
no delay bound has been proved. Our algorithm allows the construction of an adder for two
n-bit binary numbers which is also of quadratic size but provably has at most about 1.44
times the minimum delay. In [8] we describe circuits for the prefix problem taking arrival
times into account which immediately leads to adders for two n-bit binary numbers which
are of size O(n log(log(n))) and have at most about twice the minimum delay.
In Section 3 we first describe the algorithm for functions as in (3). In Section 4, we
analyse the delay of the circuits constructed in Section 3. In Section 5, we describe the
algorithm for functions as in (1) and state the main result. Finally, in Section 6 we make
some concluding remarks.
3. Algorithm for f0 as in (3)
For 1 l  n − 1 the function f0 satisfies the following identity.
f0(x1, y1, x2, y2, . . . , xn, yn)
D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537 529= ((. . . (((x1 ∧ y1) ∨ x2) ∧ y2) ∨ . . .) ∨ xn) ∧ yn
= ((. . . ((((x1 ∧ y1 ∧ y2) ∨ (x2 ∧ y2)) ∨ x3) ∧ y3) . . .) ∨ xn) ∧ yn
= . . .
=
n∨
i=1
(
xi ∧
n∧
j=i
yj
)
=
((
l∨
i=1
(
xi ∧
l∧
j=i
yj
))
∧
(
n∧
j=l+1
yj
))
∨
(
n∨
i=l+1
(
xi ∧
n∧
j=i
yj
))
(4)=
(
f0(x1, y1, . . . , xl, yl) ∧
(
n∧
j=l+1
yj
))
∨ f0(xl+1, yl+1, . . . , xn, yn).
Note that we commit a small abus de langage using ‘f0’ to denote formally different
functions. We now describe the algorithm for f0.
Algorithm 1.
Input: Integers n ∈N= {1,2, . . .} and t1, s1, t2, s2, . . . , tn, sn ∈N0.
Output: A circuit C0(t1, s1, t2, s2, . . . , tn, sn) over Ω with inputs x1, y1, . . . , xn, yn that has
the two outputs f0(x1, y1, x2, y2, . . . , xn, yn) and
∧n
j=1 yj .
In what follows, we use ti as the arrival time for xi and si as the arrival time for
yi for 1  i  n. Furthermore, we denote the subcircuit of C0(t1, . . . , sn) that computes
f0(x1, . . . , yn) by C0,f0(t1, . . . , sn) and the subcircuit of C0(t1, . . . , sn) that computes∧n
j=1 yj by C0,∧(t1, . . . , sn).
Step 1 If n = 1, then let the circuit C0(t1, s1) be as in Fig. 2.
Step 2 If n 2, recursively construct C0(t1, . . . , sn) using C0(t1, . . . , sl) and C0(tl+1, . . . ,
sn) for some 1 l  n − 1 such that
max
{
delay
(
C0,f0(t1, . . . , sl)
)+ 1,delay(C0,f0(tl+1, . . . , sn))}
is minimized.
The output of C0,f0(t1, . . . , sn) is calculated exactly as in (4) with one∧-gate and one ∨-gate using the output of C0,f0(t1, . . . , sl), the output of
C0,f0(tl+1, . . . , sn) and the output of C0,∧(tl+1, . . . , sn).
Fig. 2.
530 D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537Fig. 3. C(x1, . . . , yn).
Furthermore, the output of C0,∧(t1, . . . , sn) is calculated with one ∧-gate using
the output of C0,∧(t1, . . . , sl) and the output of C0,∧(tl+1, . . . , sn). See Fig. 3 for
an illustration.
We collect some observations in the following lemma.
Lemma 2.
(i) Algorithm 1 works correctly.
(ii) The number of ∨- or ∧-gates in C0(t1, . . . , sn) is 4n − 3.
(iii) In C0(t1, . . . , sn) all inputs have fan-out at most 3 and all ∧- or ∨-gates have fan-out
at most two.
(iv) delay(C0,f0(t1, s1)) = max{t1, s1} + 1.
(v) delay(C0,∧(t1, . . . , sn)) delay(C0,f0(t1, . . . , sn)) − 1.
(vi) delay(C0,f0(t1, . . . , sn)) equals
min
1ln−1 max
{
delay
(
C0,f0(t1, . . . , sl)
)+ 2,delay(C0,f0(tl+1, . . . , sn))+ 1}.
(vii) Algorithm 1 can be implemented to run in cubic time.
Proof. (i) follows from (4). (ii), (iii) and (iv) are obvious. (v) follows easily by induction
and immediately implies (vi). (vii) is valid, since Algorithm 1 only needs to calculate the
delays of the
(
n
2
)
circuits C0,f0(ti , si , . . . , tj , sj ) for 1 i < j  n using the recursion given
by (iv) and (vi). This can clearly be done in cubic time. 
In order to analyse the quality of the construction we study the recursion in Lemma 2(iv)
and (vi) in the next section.
D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537 5314. Growth
For n  2 and non-negative integers a, b, a1, b1, . . . , an, bn ∈ N0 let D0 be defined
recursively by
D0(a, b) = max{a, b} + 1,
D0(a1, b1, . . . , an, bn) = min
1ln−1 max
{D0(a1, b1, . . . , al, bl) + 2,
(5)D0(al+1, bl+1, . . . , an, bn) + 1
}
.
Clearly, this corresponds to the recursion in Lemma 2. If we define D1 similarly by
D1(a) = a,
(6)D1(a1, . . . , an) = min
1ln−1 max
{D1(a1, . . . , al) + 2,D1(al+1, . . . , an) + 1},
then the following properties are immediate. In order to simplify our notation we write
(A,B) to denote the vector (a1, a2, . . . , anA, b1, b2, . . . , bnB ) where A = (a1, a2, . . . , anA)
and B = (b1, b2, . . . , bnB ).
Lemma 3. Let a, a1, a2, . . . , an, a′1, a′2, . . . , a′n, b1, b2, . . . , bn ∈N0 be such that ai  a′i for
1 i  n. Let A ∈NnA0 and B ∈NnB0 with nA + nB  1. Then
(i) D0(a1, b1, . . . , an, bn) =D1(max{a1, b1}+ 1,max{a2, b2}+ 1, . . . ,max{an, bn}+ 1),
(ii) D1(a1 + a, a2 + a, . . . , an + a) =D1(a1, a2, . . . , an) + a,
(iii) D1(a1, a2, . . . , an)D1(a′1, a′2, . . . , a′n), and
(iv) D1(A,B)D1(A,a,B).
Before we proceed to the analysis, we give a combinatorial interpretation for D1. Let n
non-negative integers a1, a2, . . . , an ∈ N0 be given. We consider rooted binary trees with
root r in which every left branch is labelled with length 2, every right branch is labelled
with length 1 and the leaves are labelled in left-to-right order with u1, u2, . . . , un.
If D denotes the maximum over all 1 i  n of the sum of ai and the total length of the
path from ui to r , then D1(a1, a2, . . . , an) equals the minimum value of D over all such
binary trees. See Fig. 4 for some examples of optimal trees where all edges of length 2 are
pointing left.
Let Fk denote the kth Fibonacci number, i.e., F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2
for n 2. For k ∈N let Z(k) denote the vector of k zeros.
Lemma 4. Let k ∈N0 and l, n,m ∈N. Let A ∈Nn0 and B ∈Nm0 .
(i) max{i ∈N |D1(Z(i)) k} = Fk+1.
(ii) D1(A, l)D1(A,Z(Fl+1)).
(iii) D1(l,B)D1(Z(Fl+2),B).
(iv) D1(A, l,B)D1(A,Z(Fl+3 − 1),B).
532 D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537Fig. 4.
Proof. (i) Let max(k) = max{i ∈ N | D1(Z(i))  k}. It is easy to verify that max(0) = 1
and max(1) = 1.
By (6), for l  2 we have D1(Z(l)) = max{D1(Z(l1)) + 2,D1(Z(l2)) + 1} for some
l1, l2 ∈N with l1 + l2 = l. This immediately implies the recursion max(k) = max(k − 2)+
max(k−1) for k  2 and thus we obtain max(k) = Fk+1, which completes the proof of (i).
(ii) For contradiction, we assume that (A, l) is a counterexample of minimum length
n + 1.
First, we assume that D1(A,Z(Fl+1)) = max{D1(A1) + 2,D1(A2,Z(Fl+1)) + 1} for
some non-trivial A1 and some A2 with (A1,A2) = A.
If either A2 is non-trivial or l  2, then (6) and (i) or the choice of (A, l) imply the
contradiction
D1(A, l)max
{D1(A1) + 2,D1(A2, l) + 1}
max
{D1(A1) + 2,D1(A2,Z(Fl+1))+ 1}
=D1
(
A,Z(Fl+1)
)
.
If A2 is trivial (A1 = A) and l = 1, then D1(A2, l)+ 1 =D1(1)+ 1 = 2D1(A1)+ 2 and
we obtain a similar contradiction.
Therefore, there is some 1 r  Fl+1 − 1 such that
(7)D1
(
A,Z(Fl+1)
)= max{D1(A,Z(Fl+1 − r))+ 2,D1(Z(r))+ 1}.
By (6), we have D1(A, l)max{D1(A) + 2, l + 1}.
If D1(A) + 2 l + 1, then (7) implies the contradiction
D1(A, l)D1(A) + 2D1
(
A,Z(Fl+1 − r)
)+ 2D1(A,Z(Fl+1)).
D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537 533Hence l + 1 >D1(A) + 2 and D1(A, l) l + 1.
If r  Fl + 1, then (i) implies the contradiction
D1(A, l) l + 1D1
(
Z(Fl + 1)
)+ 1D1(Z(r))+ 1D1(A,Z(Fl+1)).
Therefore, r  Fl which implies Fl+1 −r  Fl−1. Again by (i), we obtain the contradiction
D1(A, l) l + 1D1
(
Z(Fl−1 + 1)
)+ 2D1(A,Z(Fl−1))+ 2
D1
(
A,Z(Fl+1)
)
.
This final contradiction completes the proof of (ii).
(iii) This proof is very similar to the proof of (ii) and we just include it for the sake of
completeness. For contradiction, we assume that (l,B) is a counterexample of minimum
length 1 + m.
As before, this implies that there is some 1 r  Fl+2 − 1 such that
(8)D1
(
Z(Fl+2),B
)= max{D1(Z(r))+ 2,D1(Z(Fl+2 − r),B)+ 1}.
By (6), we have D1(l,B)max{l + 2,D1(B) + 1}.
If D1(B) + 1 l + 2, then (8) implies the contradiction
D1(l,B)D1(B) + 1D1
(
Z(Fl+2 − r),B
)+ 1D1(Z(Fl+2),B).
Hence l + 2 >D1(B) + 1 and D1(l,B) l + 2.
If r  Fl + 1, then (i) implies the contradiction
D1(l,B) l + 2D1
(
Z(Fl + 1)
)+ 2D1(Z(r))+ 2D1(Z(Fl+2),B).
Therefore, r  Fl which implies Fl+2 − r  Fl+1. Again by part (i), we obtain the contra-
diction
D1(l,B) l + 2D1
(
Z(Fl+1 + 1)
)+ 1
D1
(
Z(Fl+1),B
)+ 1D1(Z(Fl+2),B).
This final contradiction completes the proof of (iii).
(iv) For contradiction, we assume that (A, l,B) is a counterexample of minimum length
n + 1 + m.
As before, this implies that there is some 1 r  Fl+3 − 2 such that
D1
(
A,Z(Fl+3 − 1),B
)
(9)= max{D1(A,Z(r))+ 2,D1(Z(Fl+3 − 1 − r),B)+ 1}.
If r  Fl+1, then (6), (9) and (ii) imply the contradiction
D1(A, l,B)max
{D1(A, l) + 2,D1(B) + 1}
max
{D1(A,Z(Fl+1))+ 2,D1(B) + 1}
max
{D1(A,Z(r))+ 2,D1(Z(Fl+3 − 1 − r),B)+ 1}
=D1
(
A,Z(Fl+3 − 1),B
)
.
534 D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537Therefore, r  Fl+1 −1 which implies that Fl+3 −1− r  Fl+2 and (6), (9) and (iii) imply
the contradiction
D1(A, l,B)max
{D1(A) + 2,D1(l,B) + 1}
max
{D1(A) + 2,D1(Z(Fl+2),B)+ 1}
max
{D1(A,Z(r))+ 2,D1(Z(Fl+3 − 1 − r),B)+ 1}
=D1
(
A,Z
(
Fl+3 − 1
)
,B
)
.
This final contradiction completes the proof of (iv). 
Theorem 1. If a1, a2, . . . , an ∈N0, then
D1(a1, a2, . . . , an)D1
(
Z
(
n∑
i=1
(Fai+3 − 1)
))
< log √5+1
2
(
n∑
i=1
2ai
)
+ 2 ≈ 1.44 log2
(
n∑
i=1
2ai
)
+ 2.
Proof. The first inequality follows immediately from Lemmas 3 and 4(iv).
By Lemma 4(i), D1(Z(l)) = k implies that l > Fk  (
√
5+1
2 )
k−2 for k ∈ N and l ∈ N.
Therefore, D1(Z(l)) < log √5+1
2
(l) + 2. Since Fi+3 − 1  2i for i ∈ N0, the remaining
inequalities follow. 
Corollary 1. If a1, b1, a2, b2, . . . , an, bn ∈N0, then
D0(a1, b1, a2, b2, . . . , an, bn) < log √5+1
2
(
n∑
i=1
(2ai + 2bi )
)
+ 3
≈ 1.44 log2
(
n∑
i=1
(2ai + 2bi )
)
+ 3.
Proof. By Lemma 3 and Theorem 1, we obtain
D0(a1, b1, a2, b2, . . . , an, bn)
=D1
(
max{a1, b1} + 1,max{a2, b2} + 1, . . . ,max{an, bn} + 1
)
=D1
(
max{a1, b1},max{a2, b2}, . . . ,max{an, bn}
)+ 1
< log √5+1
2
(
n∑
i=1
2max{ai ,bi }
)
+ 3
< log √5+1
2
(
n∑
i=1
(2ai + 2bi )
)
+ 3
and the proof is complete. 
D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537 5355. Algorithm for f as in (1)
We now describe the algorithm for functions f as in (1).
Algorithm 2.
Input: A function f with inputs x1, x2, . . . , xn as in (1) specified by gates g1, g2, . . . ,
gn−1 ∈ Ω and an arrival time t (xi) for xi for 1 i  n.
Output: A circuit Cf for f over Ω .
Step 1 Set t1 ← t (x1) and s1 ← 0.
For 1 i  n − 1 set ti+1 ← t (xi+1) and si+1 ← 0, if gi = ∨.
For 1 i  n − 1 set ti+1 ← 0 and si+1 ← t (xi+1), if gi = ∧.
Step 2 Use Algorithm 1 to construct the circuit C0,f0(t1, s1, t2, s2, . . . , tn, sn) on the in-
puts x′1, x′′1 , x′2, x′′2 , . . . , x′n, x′′n with arrival times ti for x′i and si for x′′i for
1 i  n.
Step 3 Set x′1 ← x1 and x′′1 ← 1.
For 1 i  n − 1 set x′i+1 ← xi+1 and x′′i+1 ← 1, if gi = ∨.
For 1 i  n − 1 set x′i+1 ← 0 and x′′i+1 ← xi+1, if gi = ∧.
Step 4 The circuit Cf arises from the circuit constructed so far by eliminating all constant
inputs using the relations x ∨ 0 = x ∧ 1 = x, x ∨ 1 = 1 and x ∧ 0 = 0.
Lemma 5. Algorithm 2 works correctly and can be implemented to run in cubic time.
Proof. Using the identities x∨y = (x∨y)∧1 and x∧y = (x∨0)∧y, it is straightforward
to check that Cf computes f (cf. Fig. 5). Hence Algorithm 2 works correctly. Its time
complexity follows from the time complexity of Algorithm 1 and the fact that considering
each of the less than 8n − 3 ∨- or ∧-gates of C0,f0(t1, s1, t2, s2, . . . , tn, sn) once in non-
increasing distance from the output gate, step 4 can be done in linear time. 
Theorem 2.
(i) If C0,f0 denotes the circuit generated by Algorithm 1 for f0 as in (3) given arrival
times for the inputs, then delay(C0,f0) 1.44 delay(f0) + 3.
Fig. 5.
536 D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537(ii) If Cf denotes the circuit generated by Algorithm 2 for f as in (1) given arrival times
for the inputs, then delay(Cf ) 1.44 delay(f ) + 4.44.
Proof. (i) This follows immediately from Lemma 1 and Corollary 1.
(ii) Using the same notation as above, we have
n∑
i=1
(2ti + 2si ) 2
n∑
i=1
2t (xi ).
By Lemma 1 and Corollary 1, we obtain
delay(Cf ) delay
(
C0,f0(t1, s1, t2, s2, . . . , tn, sn)
)
 1.44 log2
(
n∑
i=1
(2ti + 2si )
)
+ 3
 1.44 log2
(
2
n∑
i=1
2t (xi )
)
+ 3
 1.44 log2
(
n∑
i=1
2t (xi )
)
+ 4.44
 1.44 delay(f ) + 4.44
and the proof is complete. 
6. Conclusion
We have described a simple cubic-time algorithm for the construction of circuits for
functions as in (1) whose delay is at most 1.44 times the lower bound plus some small
constant. Our algorithm is essentially the first mathematically justified method that allows
for the redesign of the logic on longer critical paths at late stages of the VLSI design
process.
As we mentioned, the functions as in (3) are closely related to addition. As a conse-
quence, we can construct circuits over the basis {∨,∧,¬} for the addition of two binary
n-digit numbers whose delay is at most 1.44 times the optimal delay plus some small
constant. Unfortunately, the number of gates of these circuits is quadratic in n. In [8] we
describe circuits for the same task whose delay is essentially at most twice the lower bound
and whose size is O(n log(log(n))).
In view of the practical motivation explained in the first section, it is obvious that many
technical details not contained in the mathematical abstraction can actually be incorporated
in the algorithm. This motivation is also the reason for controlling the number of gates and
the maximum fan-out.
D. Rautenbach et al. / Journal of Discrete Algorithms 4 (2006) 526–537 537References
[1] R. Brent, On the addition of binary numbers, IEEE Trans. Comput. 19 (1970) 758–759.
[2] R.B. Hitchcock, Timing verification and the timing analysis program, in: Proc. 19th IEEE Design Automa-
tion Conference, 1982, pp. 594–604.
[3] R.B. Hitchcock, G.L. Smith, D.D. Cheng, Timing analysis of computer hardware, IBM J. Res. Develop. 26
(1982) 100–105.
[4] N.P. Jouppi, Timing analysis for nMOS VLSI, in: Proc. 20th IEEE Design Automation Conference, 1983,
pp. 411–418.
[5] V.M. Khrapchenko, Asymptotic estimation of addition time of parallel adder, Syst. Th. Res. 19 (1970) 105–
122.
[6] R.E. Ladner, M.J. Fischer, Parallel prefix computation, J.A.C.M. 27 (1980) 831–838.
[7] J. Liu, S. Zhou, H. Zhu, C.-K. Cheng, An algorithmic approach for generic parallel adders, in: Proc.
ICCAD ’03, 2003, pp. 734–740.
[8] D. Rautenbach, C. Szegedy, J. Werber, Fast circuits for functions whose inputs have specified arrival times,
Technical Report No. 03933, Forschungsinstitut für Diskrete Mathematik, Universität Bonn, 2003.
[9] J.E. Savage, Models of Computation: Exploring the Power of Computing, Addison-Wesley Longman, Read-
ing, MA, 1998.
[10] E.E. Swartzlander (Ed.), Computer Arithmetic, vol. I, IEEE Computer Society Press, 1990, 378 p.
[11] E.E. Swartzlander (Ed.), Computer Arithmetic, vol. II, IEEE Computer Society Press, 1990, 396 p.
[12] I. Wegener, The complexity of Boolean Functions, Wiley-Teubner Series in Computer Science, B.G. Teub-
ner, Stuttgart, Wiley, Chichester, 1987.
[13] S. Winograd, On the time required to perform addition, J. ACM 12 (1965) 277–285.
[14] W.-C. Yeh, C.-W. Jen, Generalized earliest-first fast addition algorithm, IEEE Trans. Comput. 52 (2003)
1233–1242.
