Faster Carry Bit Computation for Adder Circuits with Prescribed Arrival
  Times by Brenner, Ulrich & Hermann, Anna
Faster Carry Bit Computation for Adder Circuits with
Prescribed Arrival Times
Ulrich Brenner, Anna Hermann
Research Institute for Discrete Mathematics, University of Bonn
June, 15, 2018
Abstract
We consider the fundamental problem of constructing fast circuits for the carry bit
computation in binary addition. Up to a small additive constant, the carry bit computation
reduces to computing an And-Or path, i.e., a formula of type t0 ∧ (t1 ∨ (t2 ∧ (. . . tm−1) . . . )
or t0 ∨ (t1 ∧ (t2 ∨ (. . . tm−1) . . . ). We present an algorithm that computes the fastest known
Boolean circuit for an And-Or path with given arrival times a(t0), . . . , a(tm−1) for the
input signals. Our objective function is delay, a natural generalization of depth with respect
to arrival times. The maximum delay of the circuit we compute is log2W + log2 log2m +
log2 log2 log2m + 4.3, where W :=
∑m−1
i=0 2
a(ti). Note that dlog2W e is a lower bound on
the delay of any circuit depending on inputs t0, . . . , tm−1 with prescribed arrival times. Our
method yields the fastest circuits for And-Or paths, carry bit computation and adders in
terms of delay known so far.
1 Introduction
An And-Or path is a Boolean formula of type
t0 ∧ (t1 ∨ (t2 ∧ (. . . tm−1) . . . ) or t0 ∨ (t1 ∧ (t2 ∨ (. . . tm−1) . . . ) .
We assume that for each Boolean input variable ti (i ∈ {0, . . . ,m−1}), an arrival time a(ti) ∈ N
is given. Our goal is to find a Boolean circuit using only And and Or gates that computes
the Boolean function of a given And-Or path and minimizes the maximum delay of the inputs.
Here, the delay of an input ti in a Boolean circuit is its arrival time a(ti) plus the length of a
maximum directed path in the circuit starting at ti. Thus, the concept of delay minimization
generalizes the concept of depth minimization. When only depth is considered, one has to assume
that all input signals are available at the same time (uniform arrival times), which is not the
case on real-world chips. Hence, taking non-uniform arrival times into account leads to a much
more realistic problem formulation.
1.1 Applications of And-Or Path Optimization
Realizing And-Or paths with fast circuits is important for VLSI design in different aspects.
Computing fast And-Or paths is, up to a small additive constant, equivalent to constructing
fast adder circuits. To see this, assume we want to compute the sum of two binary numbers
1
ar
X
iv
:1
71
0.
08
26
7v
2 
 [c
s.D
S]
  1
5 J
un
 20
18
Author Upper bound on delay Size Maximum fanout
[15]; [7] 1.441 log2W + 2.674 O(m) 2
[14]; [18] (1 + ε) dlog2W e+ 3ε + 5 O(mε ) 2
[14]; [18] (1 + ε) dlog2W e+ 3ε + 5 O(m) 2
1
ε
[18] dlog2W e+ 2
√
2 log2m− 2 + 6 O(m
√
log2m) 2
[18] dlog2W e+ 2
√
2 log2m− 2 + 6 O(m) 2
√
2 log2m−2 + 1
Here log2W + log2 log2m O(m log2m log2 log2m) O(log2m)
+ log2 log2 log2m+ 4.3
Table 1: Known upper bounds on delay of And-Or paths with non-uniform arrival times.
x = (xr−1, . . . , x0) and y = (yr−1, . . . , x0) with most significant bit r − 1. Starting with c1 =
(x0 ∧ x1), the carry bits can be computed recursively via
ci+1 = (xi ∧ yi) ∨
(
(xi ∨ yi) ∧ ci
)
= (xi ∧ yi)︸ ︷︷ ︸
t0
∨
(
(xi ∨ yi)︸ ︷︷ ︸
t1
∧
(
(xi−1 ∧ yi−1)︸ ︷︷ ︸
t2
∨( (xi−1 ∨ yi−1)︸ ︷︷ ︸
t3
∧ . . . )))
for 0 < i < r. The computation of the ith carry bit is hence essentially the evaluation of an And-
Or path of length 2i. Once all carry bits are known, the sum can be computed with an additional
delay of 2. Adders with non-uniform input arrival times occur, e.g., as a part of multiplication
units (see [21]). Depth optimization of adders (and thus of And-Or paths for the carry bit
computation) is a classical and well-studied optimization problem, see Sklansky [17], Brent [1],
Khrapchenko [9], Kogge and Stone [11], Ladner and Fischer [13], and Brent and Kung [2].
Another application of And-Or path optimization is the comparison of binary numbers since
lexicographic comparison can be expressed as an And-Or path (see, e.g., Grinchuk [6]).
More generally, And-Or path optimization is used to speed up timing-critical paths on VLSI
chips. Note that the most critical path P on a chip can be decomposed into, e.g., And gates and
inverter gates. Using De Morgan’s laws, the inverters can be pushed to the inputs of P , making
P a path consisting of And and Or gates (not necessarily alternating). In [20], Werber et al.
successfully applied their And-Or path optimization algorithm presented in [15] for iteratively
improving such paths in a late stage of physical design. In this context, it is essential that the
objective function of the algorithm used is delay instead of depth since typically, input signals
of the most critical path will not arrive simultaneously.
1.2 Previous Work
ForAnd-Or paths with uniform arrival times (i.e., only depth is considered), the currently fastest
Boolean circuit has been proposed by Grinchuk [6] reaching a depth of log2m+log2 log2m+O(1).
This is close to the best known lower bounds on depth: Khrapchenko [10] showed that any circuit
for an And-Or path has a depth of at least log2m+ 0.15 log2 log2 log2m+ Θ(1). The result is
based on a lower bound of Θ
(
m log2m log2 log2m
log2 log2 log2 log2m
)
on the product of size and depth of a Boolean
formula for an And-Or path (see [4]). For monotone circuits, i.e., circuits without negations (and
the circuit built in [6] is a monotone circuit), this lower bound can be improved to Θ(m log22m)
(see [3]). This directly implies a lower bound of log2m + log2 log2m + Θ(1) on the depth of a
monotone circuit for an And-Or path.
2
For And-Or paths with non-uniform arrival times a(t0), . . . , a(tm−1), the value dlog2W e is
a lower bound on the achievable delay, where W :=
∑m−1
i=0 2
a(ti) (see, e.g., [15]). No stronger
lower bounds on delay are known. Rautenbach et al. [15] presented an algorithm computing a
Boolean circuit for an And-Or path with delay at most 1.441 log2W + 3. This delay bound
was improved to 1.441 log2W + 2.674 by Held and Spirkl [7]. In both of these circuits, so-called
2-input prefix gates are used, and it can be shown that any And-Or path realization based on
prefix gates has a delay of at least logϕ
(∑m−1
i=0 ϕ
a(ti)
)
where ϕ = 1+
√
5
2 ≈ 1.618 is the golden
ratio (see [7]). In particular, this implies that any prefix-based And-Or path realization has a
depth of at least 1.44 log2m− 1. Without using prefix gates, Rautenbach et al. [14] presented a
circuit for an And-Or path with delay at most (1+ε) dlog2W e+cε (for any ε > 0), where cε is a
number depending on ε, only. Spirkl [18] specified the delay bound to (1+ε) dlog2W e+ 6ε +8+5ε
and improved it to (1 + ε) dlog2W e + 3ε + 5. Moreover, Spirkl [18] described a circuit with a
delay of at most dlog2W e + 2
√
2 log2m− 2 + 6. Note that for any ε > 0, this is actually a
better delay bound than (1 + ε) dlog2W e+ 3ε + 5 (because ε log2W + 3ε + 5 ≥ 2
√
3 log2W + 5 ≥
2
√
3 log2m+ 5 ≥ 2
√
2 log2m− 2 + 6). Up to now, this is the fastest circuit for And-Or paths
with non-uniform arrival times. Table 1 summarizes these results in comparison with our delay
bound. We also state size (i.e., number of gates) and maximum fanout of the constructed circuits.
Note that some methods trade off size against fanout and provide two different circuits.
1.3 Our Contribution
In this paper, we present an algorithm with running time O(m2 log2m) that computes a Boolean
circuit for And-Or paths (with m ≥ 3) using only two-input And and Or gates with a delay of
at most
log2W + log2 log2m+ log2 log2 log2m+ 4.3 ,
size O(m log2m log2 log2m) and maximum fanout log2m + log2 log2m + log2 log2 log2m + 3.3.
In terms of delay, this yields the currently best known circuits for And-Or paths and thus also
adder circuits. In particular, we improve the previously best known delay bound of dlog2W e+
2
√
2 log2m− 2 + 6 by Spirkl [18] for each m ≥ 3 as well as asymptotically. The construction of
the circuit is based on a recursive approach similar to the algorithm of Grinchuk [6] for uniform
arrival times.
The rest of the paper is organized as follows. In Section 2, we introduce basic definitions
and results. We give a formal description of the problem (Subsection 2.1), define splitting steps
which allow us to partition an instance into smaller sub-instances (Subsection 2.2) and introduce
a measure for deciding which instances admit an And-Or path realization with a given delay
(Subsection 2.3). Section 3 classifies these instances, which is the major step of the paper. In
Section 4, we deduce how to construct circuits realizing And-Or paths with a delay of at most
log2W+log2 log2m+log2 log2 log2m+4.3 and analyze the size and fanout as well as the runtime
needed to compute such circuits. In Section 5, we extend this result to paths that also consist
of And and Or gates only, but not necessarily alternatingly.
2 Preliminaries
2.1 Problem Formulation
We denote the set of natural numbers including zero by N. Our notation regarding Boolean
functions and circuits is based on Savage [16]. Given r ∈ N and a Boolean function h : {0, 1}r →
3
t0 t1 t2 t3 t4 t5
(a) And-Or path g∗((t0, . . . , t5)).
s0 s1 s2 t0 t1 t2 t3 t4
(b) Extended And-Or path f((s0, s1, s2), (t0, . . . , t4)).
Figure 1: Examples for (extended) And-Or paths.
{0, 1} with Boolean input variables (shorter, inputs) x0, . . . , xr−1, we write x = (x0, . . . , xr−1)
as a shorthand for all inputs with fixed ordering. If r = 0, we write x = ().
Definition 1. Let Boolean input variables t = (t0, . . . , tm−1) for some m ∈ N with m > 0 be
given. We call each of the recursively defined functions
g(t) =
{
t0 m = 1
t0 ∧ g∗((t1, . . . , tm)) m > 1
and g∗(t) =
{
t0 m = 1
t0 ∨ g((t1, . . . , tm)) m > 1
an And-Or path on m inputs.
We want to realize a given And-Or path by a Boolean circuit over the basis {∧,∨}. A
Boolean circuit over the basis {∧,∨} is a directed acyclic graph such that
• the nodes with indegree 0, called inputs, are labeled by Boolean variables,
• there is only one node with outdegree 0, called output, and it has indegree exactly 1,
• each of the remaining nodes has indegree exactly 2 and outdegree at least 1 and is labeled
either as an And gate or as an Or gate.
The logical function on the input variables computed by the output of the circuit can be obtained
by combining the logical functions represented by the gates recursively. A circuit is a realization
of the And-Or path g(t) if and only if the output signal equals g(t) for all input values (corre-
spondingly for g∗). Figure 1 (a) shows a Boolean circuit which is a straightforward realization
of the And-Or path g∗((t0, . . . , t5)). We omit drawing the output in such pictures – the unique
predecessor of the output will always be the bottommost gate.
We assume that each input variable is associated with a prescribed arrival time.
Definition 2. Let r ∈ N and a Boolean function h : {0, 1}r → {0, 1} on Boolean input variables
x = (x0, . . . , xr−1) with arrival times a : {x0, . . . , xr−1} → N be given. Consider a circuit C
computing h. For i = 0, . . . , r− 1, the delay of input xi is defined as a(xi) + l(xi), where l(xi)
denotes the maximum number of gates of any path in C starting at input xi. The delay of the
circuit C is the maximum delay of any input.
4
Note that for uniform arrival times, e.g., a ≡ 0, the delay of a circuit is simply the depth of
the circuit.
Our goal is computing fast circuits for And-Or paths, i.e., solving the following problem:
Problem 3 (And-Or Path Optimization Problem). For m ∈ N with m > 0, consider Boolean
input variables t = (t0, . . . , tm−1) with arrival times a : {t0, . . . , tm−1} → N. Find circuits over
the basis {∧,∨} that compute the And-Or paths g(t) and g∗(t) with minimum possible delay.
Note that adding the same constant to all arrival times does not change the problem. This
is why we only allow non-negative arrival times in this formulation. Moreover, forbidding non-
integral arrival times is not a significant restriction because rounding up all arrival times will
change the delay of any circuit by less than 1. Therefore, we only consider natural numbers as
arrival times.
2.2 Recursive Circuit Construction
We construct fast circuits for And-Or paths in a recursive way. Before describing all details of
the approach, we explain the idea of the induction step.
Assume we want to realize the And-Or path g∗(t) = t0 ∨ (t1 ∧ t2 ∨ (t3 ∧ (. . . tm−1 . . . ))). We
subdivide the inputs t0, . . . , tm−1 into two groups t0, . . . , t2k and t2k+1, . . . , tm−1. Recursively, we
compute fast circuits for the And-Or paths on each of these input sets. These two circuits can
be combined to a circuit for the whole And-Or path as illustrated in an example with m = 12
and k = 3 in Figure 2; the general construction is described in Lemma 5. The output of the
circuit for the subinstance t2k+1, . . . , tm−1 is combined with every second input of t0, . . . , t2k by
using only And gates. Just one additional Or gate (labeled with “G” in the picture) is needed
to compute a circuit for the whole And-Or path. It is not too difficult to check that the circuits
in Figure 2 (a) and (b) are logically equivalent. To see this, note that the output of the circuit in
(a) is “true” if and only if there is an i ∈ {0, 2, 4, 6, 8, 10, 11} such that the input signals at ti and
at every tj with j odd and j ∈ {1, . . . , i − 1} are “true”. This is also a sufficient and necessary
condition for a “true” as an output of the circuit in (b).
Note that while the left input of gate G in the example is the output of an And-Or path, the
right input of G is the output of a function combining the And-Or path for t2k+1, . . . , tm−1 with
a multiple-input AND. The occurrence of such functions in our recursion requires generalizing
the concept of And-Or paths.
Definition 4. Given n,m ∈ N, m > 0, and inputs s0, . . . , sn−1, t0, . . . , tm−1 subdivided into
s = (s0, . . . , sn−1) and t = (t0, . . . , tm−1), we define the extended And-Or paths
f(s, t) = s0 ∧ f((s1, . . . , sn−1), t) and f∗(s, t) = s0 ∨ f∗((s1, . . . , sn−1), t) ,
where f(s, t) = g(t), f∗(s, t) = g∗(t) in the case that s = (). We call the input variables s
symmetric inputs and the input variables t alternating inputs, respectively.
Figure 1 (b) shows the extended And-Or path f(s, t) for s = (s0, s1, s2) and t = (t0, . . . , t4).
We shall always assume that the set of input variables contained in s and t are disjoint sets
indexed by s0, . . . , sn−1 and t0, . . . , tm−1. Note that expanding the definitions of f(s, t) and
f∗(s, t) given in Definition 4 yields
f(s, t) = s0 ∧ . . . ∧ sn−1 ∧ g(t) = s0 ∧ . . . ∧ sn−1 ∧ t0 ∧ (t1 ∨ t2 ∧ (t3 ∨ (. . . tm−1 . . . ))) ,
f∗(s, t) = s0 ∨ . . . ∨ sn−1 ∨ g∗(t)= s0 ∨ . . . ∨ sn−1 ∨ t0 ∨ (t1 ∧ t2 ∨ (t3 ∧ (. . . tm−1 . . . ))) ,
5
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
t′ t′′
(a) And-Or path on t = (t0, . . . , t11).
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
t′ t′′
G
(b) Alternating split with t′ = (t0, . . . , t6).
Figure 2: Performing the alternating split on g∗((t0, . . . , t11)) = t0 ∨ (t1 ∧ (. . . (t10 ∨ t11) . . . )).
where, for m odd, the innermost operation of f(s, t) and f∗(s, t) is ∨ and ∧, respectively, and
vice versa for m even.
Due to the duality principle of Boolean algebra, over the basis {∧,∨}, any realization for
f(s, t) yields a realization of f∗(s, t) and vice versa by switching all And and Or gates. In order
to compute fast realizations for f and f∗, we will apply two methods that allow realizing f and
f∗ recursively. Each of these methods reduces the problem of realizing f(s, t) to the problem of
realizing extended And-Or paths with strictly less symmetric or alternating inputs.
First, we formally describe the well-known method which is depicted in Figure 2 for the
special case that s = (), m = 12 and k = 3.
Lemma 5. Let input variables s and t and an integer k with 0 ≤ k < m−12 be given. Denote
by t′ the (odd-length) prefix t′ = (t0, t1, . . . , t2k) of t, and by t′′ the remaining inputs of t, i.e.,
t′′ = (t2k+1, . . . , tm−1). Then, we have
f∗(s, t) = f∗(s, t′) ∨ f(t̂′, t′′) , (1)
where t̂′ := (t′1, t
′
3, t
′
5, . . . , t
′
2k−1) contains every second entry of t
′.
Proof. At first, we prove the following claim:
Claim: For any Boolean variable x, we have
g∗((t0, . . . , t2k, x)) = g∗((t0, . . . , t2k)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ x) . (2)
Proof of the claim: We prove the claim by induction on k. For k = 0, we have
g∗((t0, . . . , t2k, x)) = g∗((t0, x)) = t0 ∨ x = g∗((t0, . . . , t2k)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ x) .
6
Assuming statement (2) holds for some k ≥ 0 and setting x1 := t2k+1 ∧ (t2k+2 ∨ x) and x2 :=
t2k+1 ∧ t2k+2, we compute
g∗((t0, . . . , t2(k+1), x))
Def. 1
= g∗((t0, . . . , t2k, x1))
(2)
= g∗((t0, . . . , t2k)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ x1)
= g∗((t0, . . . , t2k)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ t2k+1 ∧ (t2k+2 ∨ x))
= g∗((t0, . . . , t2k)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ x2) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ t2k+1 ∧ x)
(2)
= g∗((t0, . . . , t2k, x2)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ t2k+1 ∧ x)
Def. 1
= g∗((t0, . . . , t2k+2)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ t2k+1 ∧ x) .
This proves the inductive step and thus the claim.
From this claim and Definition 1, we conclude
f∗(s, t) Def. 4= s0 ∨ . . . ∨ sn−1 ∨ g∗(t)
Def. 1
= s0 ∨ . . . ∨ sn−1 ∨ g∗((t0, . . . , t2k, g(t′′)))
(2)
= s0 ∨ . . . ∨ sn−1 ∨ g∗((t0, . . . , t2k)) ∨ (t1 ∧ t3 ∧ . . . ∧ t2k−1 ∧ g(t′′))
Def. 4
= f∗(s, t′) ∨ f(t̂′, t′′) .
The previous lemma allows us to recursively construct a circuit for f∗(s, t): Inductively, we
can assume that we can compute a fast circuit for f∗(s, t′) and f(t̂′, t′′), respectively. Combining
these circuits as in equation (1) yields a good circuit for f∗(s, t). We call this way to construct
a circuit for f∗(s, t) an alternating split.
Similarly, if n > 0, the definition of f(s, t) (see Definition 4) yields two symmetric splits
indicated by the equations
f(s, t) = (s0 ∧ . . . ∧ sn−1) ∧ g(t) , (3)
f(s, t) = (s0 ∧ . . . ∧ sn−1 ∧ t0) ∧ g∗((t1, . . . , tm−1)) . (4)
Here, we only inquire a recursive circuit construction for the arising And-Or path since a delay-
optimum symmetric And-tree can be constructed via Huffman coding [8] as recapitulated in
Remark 8.
Note that dualizing the splits (1), (3) and (4) yields analogous splits for f∗:
f(s, t) = f(s, t′) ∧ f∗(t̂′, t′′) (5)
f∗(s, t) = (s0 ∨ . . . ∨ sn−1) ∨ g∗(t) (6)
f∗(s, t) = (s0 ∨ . . . ∨ sn−1 ∨ t0) ∨ g((t1, . . . , tm−1)) (7)
2.3 Delay and Weight
We will realize extended And-Or paths with good delay using the recursive methods defined in
Subsection 2.2. For this, we need to classify inputs by their weight.
Definition 6. Given inputs x = (x0, . . . , xr−1) and arrival times a, for i = 0, . . . , r − 1, the
weight of input xi is W (xi) := 2
a(xi), and the weight of x is W (x) :=
∑r−1
i=0 W (xi).
7
Note that in this definition, the weights W (xi) and W (x) do not only depend on the inputs,
but also on the input arrival times.
Remark 7. By Definition 6, we have W (xi) ≥ 1 for any input xi with a(xi) ≥ 0.
The definition of the weight is motivated by the fact that for symmetric binary functions, i.e.,
And trees or Or trees, the optimum delay achievable for any realization can be derived from
the weight of the inputs directly.
Remark 8. A binary tree on inputs x = (x0, . . . , xr−1) with weight W (x) can be realized with
delay d if and only if Kraft’s inequality W (x) ≤ 2d is fulfilled [12], see also Golumbic [5]. A
delay-optimum tree can be computed in runtime O(r log2 r) via a greedy algorithm based on
Huffman coding [8]. A short proof of this can be found in Werber [19].
Note that when t has at most 2 entries, f(s, t) is a symmetric binary And tree.
Lemma 9. For Boolean input variables s and t with m ≤ 2 and arrival times a, we can realize
f(s, t) with delay d if and only if W (s) +W (t) ≤ 2d.
For the case that t has more than 2 entries, we give an upper bound on the delay of f(s, t)
in Section 3.
3 Bounding the Weight for Given Delay
We will prove an upper bound on the delay of And-Or paths by a reverse argument similarly
as in Grinchuk’s proof for the case of uniform input arrival times [6]. Grinchuk fixes a depth
bound d and the number n of symmetric inputs s, and determines how many alternating inputs
t an And-Or path may have such that f(s, t) can be realized with depth d. Similarly, given
symmetric inputs s with a fixed weight w and a fixed delay bound d, we will determine for which
alternating inputs t a realization for f(s, t) with delay d can be guaranteed. Since it is difficult
to classify these t exactly, we distinguish different alternating inputs t by their weight only.
Definition 10. Let ζ := 1.9 be a fixed constant.
We aim at proving the following statement:
Theorem 11. Let d,w ∈ N with d > 1 and 0 ≤ w < 2d−1 be given. Consider Boolean input
variables s and t with W (s) = w and
W (t) ≤ ζ 2
d−1 − w
d log2(d)
.
Then, there is a circuit realizing f(s, t) with delay at most d.
Remark 12. In Section 4, Theorem 11 will allow us to deduce the desired upper bound of
log2W +log2 log2m+log2 log2 log2m+4.3 on the delay of And-Or paths. Note that the choice
of the constant ζ influences the additive constant (here 4.3) in the delay bound. If we chose
ζ := 1, we would need to replace the additive constant by 5. An improvement of the additive
term to 4.2 would only be possible for ζ ≥ 1.992, for which we cannot prove Theorem 11.
Most parts of the proof of Theorem 11 will work for any ζ with 1 ≤ ζ < 2; only at the end of
the proof of Lemma 23 we demand ζ ≤ 1.9. A slightly larger choice of ζ would be possible, but
using ζ = 1.9 keeps calculations simpler.
8
For proving Theorem 11, we would like to proceed by induction in d making use of the
restructuring formulas presented in Section 2.2. The following remark explains why the inductive
step from d to d+ 1 does not work directly.
Remark 13. Assume that Theorem 11 holds for some d > 1 and all 0 ≤ w < 2d−1. In
order to prove Theorem 11 for d + 1, we need to show that, given input variables s and t with
0 ≤ w := W (s) < 2d and W (t) ≤ ζ 2d−w(d+1) log2(d+1) , f(s, t) can be realized with delay d + 1. In
the case that w < 2d−1, we would like to apply the alternating split f(s, t) = f(s, t′) ∧ f∗(t̂′, t′′)
given by inequation (5). If we choose the prefix t′ of t such that
W (t′) ≤ ζ 2
d−1 − w
d log2(d)
(8)
(whenever this is possible), the induction hypothesis and the assumption w < 2d−1 allow us to
construct a circuit for f(s, t′) with delay d. Thus, in order to construct a circuit with delay d+ 1
for f(s, t), it remains to prove that f∗(t̂′, t′′) admits a circuit with delay d. Again, by induction
hypothesis, we need to show that W (t′′) ≤ ζ 2d−1−W (t̂′)d log2(d) . But the only thing we know about
W (t′′) is that
W (t′′) = W (t)−W (t′) . (9)
Even if we choose the prefix t′ maximal with (8), this will not give us a meaningful upper
bound on W (t′′) since W (t′) might be arbitrarily small in comparison to W (t).
Note that this is what distinguishes our proof from Grinchuk’s [6]: For arrival times all 0,
choosing t′ maximal with (8) works well, since then, by maximality, we haveW (t′) > ζ 2
d−1−w
d log2(d)
−2,
hence equation (9) yields W (t′′) = W (t)−W (t′) < W (t)− ζ 2d−1−wd log2(d) − 2. It turns out that this
upper bound on W (t∗∗) suffices to prove that f∗(t̂′, t′′) can be realized with delay d.
When arbitrary arrival times are present, a different proof idea is needed.
Thus, instead of proving Theorem 11 via induction on d, we strengthen the induction hy-
pothesis and prove the stronger Theorem 15.
Definition 14. Let m ∈ N with m > 0. For inputs t = (t0, . . . , tm−1) with arrival times a, we
denote by Λt the weight of the last two (or fewer) entries of t, i.e.,
Λt :=
{
W (t0), m = 1 ,
W (tm−2) +W (tm−1), m > 1 .
Theorem 15. Let d,w ∈ N with d > 1 and 0 ≤ w < 2d−1 be given. Consider Boolean input
variables s and t with W (s) = w and
W (t) ≤ ζ 2
d−1 − w
d log2(d)
+
d− 1
d
Λt . (10)
Then, there is a circuit realizing f(s, t) with delay at most d.
Note that since Λt ≥ 0, Theorem 15 implies Theorem 11. The proof of Theorem 15 is the
most important part of this paper and covers the rest of this section. First, we observe two ways
to express requirement (10) differently.
9
Remark 16. Assuming the conditions of Theorem 15, the following statements are equivalent
to requirement (10):
m−3∑
i=0
W (ti) +
Λt
d
≤ ζ 2
d−1 − w
d log2(d)
(11)
d ·
m−3∑
i=0
W (ti) + Λt ≤ ζ 2
d−1 − w
log2(d)
(12)
Statement (11) can be obtained from requirement (10) by subtracting d−1d Λt. From this, multi-
plication with d yields statement (12).
Next, we give an upper bound on the weight W (t) + w.
Lemma 17. Assuming the conditions of Theorem 15, we have
W (t) + w ≤
{
2d−1 if d ≥ 2ζ ,
2d
log2(d)
otherwise .
(13)
Proof. Using inequation (12), we obtain
W (t) + w
(12)
≤ ζ 2
d−1 − w
log2(d)
+ w =
ζ 2d−1 + (log2(d)− ζ)w
log2(d)
. (14)
If d ≥ 2ζ , the condition w < 2d−1 yields
W (t) + w
(14)
≤ ζ 2
d−1 + (log2(d)− ζ)w
log2(d)
≤ 2
d−1(ζ + log2(d)− ζ)
log2(d)
= 2d−1 .
Otherwise, if d < 2ζ , the condition w ≥ 0 yields
W (t) + w
(14)
≤ ζ 2
d−1 + (log2(d)− ζ)w
log2(d)
≤ ζ 2
d−1
log2(d)
ζ<2
<
2d
log2(d)
.
The equivalent requirements (10), (11) and (12) as well as Lemma 17 will be used extensively
when proving Theorem 15. For this proof, we proceed by induction on d. In Lemma 18, we will
show as a base case that Theorem 15 holds for d ≤ 3. Then, in Theorem 19, we will prove the
inductive step: Assuming that Theorem 15 holds for some d ≥ 3 and all 0 ≤ w < 2d−1, we will
prove the statement for d+ 1 and all 0 ≤ w < 2d.
Lemma 18. Assuming the conditions of Theorem 15 for d = 2, 3, we can realize f(s, t) with
delay d.
Proof. First assume that m ≤ 2. Recall that in this case, f(s, t) is a symmetric binary tree. By
inequation (13), we know that W (t) + w ≤ 2d. Hence, by Remark 8, we can realize f(s, t) with
delay d using Huffman coding.
Now let m ≥ 3. Requirement (11) yields
1 +
2
d
m≥3,Rem. 7
≤
m−3∑
i=0
W (ti) +
Λt
d
(11)
≤ ζ 2
d−1 − w
d log2(d)
ζ<2
<
2d − 2w
d log2(d)
. (15)
10
For d = 2, this leads to the contradiction
2 = 1 +
2
2
(15)
<
4− 2w
2 · 1
w≥0
≤ 2 ,
i.e., for d = 2, we always have m ≤ 2 and have already proven the required statement.
Similarly, if d = 3, we obtain
5
3
= 1 +
2
3
(15)
<
8− 2w
3 · log2(3)
<
{
2 w = 0
4
3 w ≥ 1
.
In the case that w ≥ 1, this is a contradiction; and for w = 0, the only remaining case is m = 3
with t0 = t1 = t2 = 0 for which f(s, t) = t0 ∧ (t1 ∨ t2) can obviously be constructed with delay
2 < 3.
Theorem 19. Assume inductively that for some d ≥ 3 and all 0 ≤ w < 2d−1, Theorem 15 holds.
Then, for inputs s and t with w := W (s) such that 0 ≤ w < 2d and
W (t) ≤ ζ 2
d − w
(d+ 1) log2(d+ 1)
+
d
d+ 1
Λt , (16)
we can realize f(s, t) with delay (d+ 1).
As a sub-calculation for the proof of this theorem, we need the following lemma.
Lemma 20. In the situation of Theorem 19, we have
ζ
2d−1
d log2(d)
+
d− 1
d
Λt − ζ 2
d − w
(d+ 1) log2(d+ 1)
− d
d+ 1
Λt
≥ ζ 2
d−1 log2(d+ 1)− (2d − w) log2(d)
d log2(d) log2(d+ 1)
.
Proof. Using the bound on Λt implied by inequation (12), we calculate
ζ
2d−1
d log2(d)
+
d− 1
d
Λt − ζ 2
d − w
(d+ 1) log2(d+ 1)
− d
d+ 1
Λt
= ζ
2d−1(d+ 1) log2(d+ 1)− (2d − w)d log2(d)
d(d+ 1) log2(d) log2(d+ 1)
− 1
d(d+ 1)
Λt
(12)
≥ ζ 2
d−1(d+ 1) log2(d+ 1)− (2d − w)d log2(d)
d(d+ 1) log2(d) log2(d+ 1)
− 1
d(d+ 1)
ζ
2d − w
log2(d+ 1)
= ζ
2d−1 log2(d+ 1)− (2d − w) log2(d)
d log2(d) log2(d+ 1)
.
This is the only ingredient needed to prove Theorem 19 for the case that 2d−1 ≤ w < 2d.
Lemma 21. Theorem 19 holds for all w satisfying 2d−1 ≤ w < 2d.
Proof. The symmetric split (3) yields the realization
f(s, t) = (s0 ∧ . . . ∧ sn−1) ∧ g(t) . (17)
11
Since w < 2d, Remark 8 allows the construction of a symmetric tree on inputs s with delay d.
In order to show that f((), t) = g(t) can be realized with delay d, by induction hypothesis, it
suffices to show the second inequality in
W (t)
(16)
≤ ζ 2
d − w
(d+ 1) log2(d+ 1)
+
d
d+ 1
Λt ≤ ζ 2
d−1
d log2(d)
+
d− 1
d
Λt .
Subtracting the left-hand side from the right-hand side, we prove this via
ζ
2d−1
d log2(d)
+
d− 1
d
Λt − ζ 2
d − w
(d+ 1) log2(d+ 1)
− d
d+ 1
Λt
Lem. 20≥ ζ 2
d−1 log2(d+ 1)− (2d − w) log2(d)
d log2(d) log2(d+ 1)
w≥2d−1
≥ ζ 2
d−1(log2(d+ 1)− log2(d))
d log2(d) log2(d+ 1)
≥ 0 .
Hence, applying the symmetric split (17) yields a realization for f(s, t) with delay d+ 1.
In the case 0 ≤ w < 2d−1, we need a bound on the logarithm of consecutive integers.
Remark 22. For d ≥ 3, we have d ≥ ln(2)(d+ 1) and thus
log2(d+ 1)− log2(d) =
ln(d+ 1)− ln(d)
ln(2)
=
∫ d+1
d
1
ln(2)x
dx ≥ 1
ln(2)(d+ 1)
≥ 1
d
. (18)
Now we will prove Theorem 19 for the case that 0 ≤ w < 2d−1.
Lemma 23. Theorem 19 holds for each w satisfying 0 ≤ w < 2d−1.
Proof. We prove this lemma via a case distinction. In Case 2, we will consider a prefix t′ of the
inputs t with weight at most ζ 2
d−1−w
d log2(d)
in order to proceed similarly as indicated in Remark 13.
If the weight of t0 is already larger than this, such a prefix does not exist. We deal with this
situation in Case 1.
Case 1: Assume that
W (t0) > ζ
2d−1 − w
d log2(d)
. (19)
The symmetric split (4) yields
f(s, t) = (s0 ∧ . . . ∧ sn−1 ∧ t0) ∧ g∗((t1, t2, . . . , tm−1)) . (20)
Due to inequation (13) and d + 1 ≥ 4 > 2ζ , we have W (t0) + w ≤ W (t) + w ≤ 2d. Hence, by
Remark 8, we can realize s0 ∧ . . . ∧ sn−1 ∧ t0 as a binary tree with delay d. Thus, we will check
inductively that f∗((), (t1, t2, . . . , tm−1)) = g∗((t1, t2, . . . , tm−1)) can be realized with delay d.
Note that requirement (16) and condition (19) imply
W ((t1, t2, . . . , tm−1)) < ζ
2d − w
(d+ 1) log2(d+ 1)
+
d
d+ 1
Λt − ζ 2
d−1 − w
d log2(d)
,
12
which we claim to be at most ζ 2
d−1
d log2(d)
+ d−1d Λt. This can be shown by
ζ
2d−1
d log2(d)
+
d− 1
d
Λt − ζ 2
d − w
(d+ 1) log2(d+ 1)
− d
d+ 1
Λt + ζ
2d−1 − w
d log2(d)
Lem. 20≥ ζ 2
d−1 log2(d+ 1)− (2d − w) log2(d)
d log2(d) log2(d+ 1)
+ ζ
2d−1 − w
d log2(d)
= ζ
(2d − w)(log2(d+ 1)− log2(d))
d log2(d) log2(d+ 1)
w<2d≥ 0 .
Thus, realization (20) yields a delay of d + 1 for f(s, t), which proves the lemma for the case
that W (t0) > ζ
2d−1−w
d log2(d)
.
Case 2: Assume that W (t0) ≤ ζ 2d−1−wd log2(d) .
Therefore, we can consider a maximum odd-length prefix t′ = (t0, t1, . . . , t2k) of t with 0 ≤
k ≤ m−12 such that
W (t′) ≤ ζ 2
d−1 − w
d log2(d)
. (21)
We define t′′ := (t2k+1, . . . , tm−1).
If t′′ is empty, there is nothing to show since, by induction hypothesis, we can construct
f(s, t) = f(s, t′) with a delay of d < d + 1 due to w < 2d−1. Otherwise, we will realize f(s, t)
with delay d+ 1 using the alternating split (5) for some prefix t∗ of t to be determined, i.e.,
f(s, t) = f(s, t∗) ∧ f∗(t̂∗, t∗∗), (22)
where t∗ = (t0, t1, . . . , t2l) for some 0 ≤ l < m−12 and t∗∗ := (t2l+1, . . . , tm−1). Our main
argument, which is presented in Case 2 (ii), requires that {t2k+1, t2k+2}∩ {tm−2, tm−1} = ∅, i.e.,
that t′′ has at least 4 elements. Thus, in Case 2 (i), we treat the remaining t′ which have at most
3 elements.
Case 2 (i): Assume that t′′ consists of at most 3 elements.
We set t∗ := t′, thus t∗∗ = t′′. By induction hypothesis and due to w < 2d−1, inequation (21)
allows realizing f(s, t′) with delay d. Hence, we have to show that f∗(t̂′, t′′) can be realized with
delay d.
If t′′ has at most 2 elements, by Remark 8, we can realize f∗(t̂′, t′′) as a binary tree with delay
d since W (t̂′) +W (t′′) ≤W (t), which is at most 2d due to inequation (13) and d+ 1 ≥ 4 > 2ζ .
If t′′ contains exactly 3 elements, we can show that the realization f∗(t̂′, t′′) = (t̂′ ∨ t2k+1) ∨
(t2k+2 ∧ t2k+3) yields delay d using Remark 8: Since the last two elements of t are not among t̂′
and t2k+1, we have
W (t̂′) +W (t2k+1)
(11)
≤ ζ 2
d − w
(d+ 1) log2(d+ 1)
ζ<2,w≥0
<
2d+1
(d+ 1) log2(d+ 1)
d≥2
≤ 2d−1 .
It remains to show that W (t2k+2) +W (t2k+3) ≤ 2d−1. Since t contains at least 2 elements apart
from t2k+2 and t2k+3, the weight Λt = W (t2k+2) +W (t2k+3) can be bounded by
Λt
(12), Rem. 7
≤ ζ 2
d − w
log2(d+ 1)
− 2(d+ 1) ζ<2,w≥0< 2
d+1
log2(d+ 1)
− 2(d+ 1)
d≥1
≤ 2d−1 .
13
s0 s1 s2 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9
Λt˜ Λt
t′
t∗ := t˜ t∗∗
(a) In the case that W
(
t˜
) ≤ 2d−1−wd log2(d) , we set t∗ := t˜.
s0 s1 s2 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9
Λt˜ Λt
t∗ := t′ t∗∗
t˜
(b) In the case that W
(
t˜
)
> 2
d−1−w
d log2(d)
, we set t∗ := t′.
Figure 3: Illustration of the choice of t∗.
Hence, when t′′ has at most 3 elements, f(s, t) can be realized with delay d+ 1 using split (22)
with t∗ := t′.
Case 2 (ii): Assume that t′′ contains at least 4 elements.
Note that the first two elements t2k+1 and t2k+2 of t
′′ and the last two elements tm−2 and
tm−1 of t are disjoint sets. Set t˜ := (t0, . . . , t2k+2). We need to find an appropriate prefix t∗
of t for realization (22) such that both f(s, t∗) and f∗(t̂∗, t∗∗) can be realized with delay d by
induction hypothesis. We choose t∗ depending on the weight of t˜:
a. If W
(
t˜
) ≤ ζ 2d−1−wd log2(d) + d−1d Λt˜, we set t∗ := t˜.
b. If W
(
t˜
)
> ζ 2
d−1−w
d log2(d)
+ d−1d Λt˜, we set t
∗ := t′. Note that in this case, we in particular have
W (t∗) = W (t′) = W
(
t˜
)− Λt˜ > ζ 2d−1 − wd log2(d) + d− 1d Λt˜ − Λt˜ = ζ 2
d−1 − w
d log2(d)
− 1
d
Λt˜ .
Figure 3 visualizes the case distinction. In either case, the weight of t∗ will be of the form
W (t∗) = ζ
2d−1 − w
d log2(d)
+ δ with − 1
d
Λt˜ ≤ δ ≤
d− 1
d
Λt˜ . (23)
The upper bound on δ allows us to realize f(s, t∗) with delay d by induction hypothesis since
w < 2d−1. It remains to show that f(t̂∗, t∗∗) can be realized with delay d. Note that since t∗
does not contain any of the last two elements of t, we have
W (t̂∗)
(11)
≤ ζ 2
d − w
(d+ 1) log2(d+ 1)
w≥0
≤ ζ 2
d
(d+ 1) log2(d+ 1)
ζ<2
< 2d−1
for d ≥ 2 and thus, by induction hypothesis, it suffices to prove that
W (t∗∗) ≤ ζ 2
d−1 −W (t̂∗)
d log2(d)
+
d− 1
d
Λt∗∗ . (24)
14
Due to requirement (16), we have
W (t∗∗) = W (t)−W (t∗) ≤ ζ 2
d − w
(d+ 1) log2(d+ 1)
+
d
d+ 1
Λt −W (t∗) .
Since W (t̂∗) ≤ W (t∗) and Λt∗∗ = Λt, inequation (24) is thus implied if we prove the following
claim.
Claim: We have ζ 2
d−1−W (t∗)
d log2(d)
+ d−1d Λt − ζ 2
d−w
(d+1) log2(d+1)
− dd+1Λt +W (t∗) ≥ 0 .
Proof of the claim: We first only bound the summands depending on W (t∗) or Λt.
− ζ W (t
∗)
d log2(d)
+
d− 1
d
Λt − d
d+ 1
Λt +W (t
∗)
= W (t∗)
(
1− ζ
d log2(d)
)
− 1
d(d+ 1)
Λt
(23)
≥
(
ζ
2d−1 − w
d log2(d)
− 1
d
Λt˜
)
d log2(d)− ζ
d log2(d)
− 1
d(d+ 1)
Λt
= ζ
(2d−1 − w)(d log2(d)− ζ)
d2 log22(d)
− d log2(d)− ζ
d2 log2(d)
Λt˜ −
1
d(d+ 1)
Λt
= ζ
(2d−1 − w)(d log2(d)− ζ)
d2 log22(d)
− 1
d
(
Λt˜ +
Λt
d+ 1
)
+
ζ Λt˜
d2 log2(d)
(11)
≥ ζ (2
d−1 − w)(d log2(d)− ζ)
d2 log22(d)
− 1
d
(
ζ
2d − w
(d+ 1) log2(d+ 1)
−W (t′)
)
+
ζ Λt˜
d2 log2(d)
Rem. 7≥ ζ (2
d−1 − w)(d log2(d)− ζ)
d2 log22(d)
− ζ 2
d − w
d(d+ 1) log2(d+ 1)
+
2 ζ +d log2(d)
d2 log2(d)
(25)
Note that in the last two steps, we used that t2k+1, t2k+2, tm−2, tm−1 are four different inputs
which are not contained in t′ and that t′ is not empty. Based on inequation (25), the left-hand
side of the claim can be bounded from below by
ζ
2d−1 −W (t∗)
d log2(d)
+
d− 1
d
Λt − ζ 2
d − w
(d+ 1) log2(d+ 1)
− d
d+ 1
Λt +W (t
∗)
(25)
≥ ζ
(
(2d−1 − w)(d log2(d)− ζ)
d2 log22(d)
− 2
d − w
d(d+ 1) log2(d+ 1)
+
2 ζ +d log2(d)
ζ d2 log2(d)
+
2d−1
d log2(d)
− 2
d − w
(d+ 1) log2(d+ 1)
)
= ζ
(
2d−1 − w + 2d−1
d log2(d)
− ζ 2
d−1 − w
d2 log22(d)
− (2
d − w)(d+ 1)
d(d+ 1) log2(d+ 1)
+
2 ζ +d log2(d)
ζ d2 log2(d)
)
= ζ
(
2d − w
d log2(d)
− ζ 2
d−1 − w
d2 log22(d)
− 2
d − w
d log2(d+ 1)
+
2 ζ +d log2(d)
ζ d2 log2(d)
)
=
ζ
d2 log22(d) log2(d+ 1)
(
log2(d+ 1)
(
(2d − w)d log2(d)− ζ(2d−1 − w)
)
− (2d − w)d log22(d) +
(
2 +
1
ζ
d log2(d)
)
log2(d) log2(d+ 1)
)
,
15
which is required to be non-negative. After multiplying with the denominator and dividing by
ζ, we apply the bound log2(d + 1) ≥ log2(d) + 1d stated in Remark 22, and thus can prove the
claim if we show that
log2(d+ 1)
(
(2d − w)d log2(d)− ζ(2d−1 − w)
)
− (2d − w)d log22(d) +
(
2 +
1
ζ
d log2(d)
)
log2(d) log2(d+ 1)
ζ<2,(18)
≥ (2d − w) log2(d)−
(
log2(d) +
1
d
)
ζ(2d−1 − w) +
(
2 +
1
ζ
d log2(d)
)
log2(d) log2(d+ 1)
ζ≥1,w≥0
≥ 2d log2(d)−
(
log2(d) +
1
d
)
ζ 2d−1 +
(
2 +
1
ζ
d log2(d)
)
log2(d) log2(d+ 1)
is at least 0. Note that for d ≥ 7, this is already implied by
2d log2(d)−
(
log2(d) +
1
d
)
ζ 2d−1
ζ=1.9
= 2d−1
(
0.1 log2(d)−
1.9
d
)
d≥7
≥ 0 .
For 3 ≤ d ≤ 6, we have
2d log2(d)−
(
log2(d) +
1
d
)
ζ 2d−1 +
(
2 +
1
ζ
d log2(d)
)
log2(d) log2(d+ 1)
ζ=1.9
= log2(d)
(
0.1 · 2d−1 +
(
2 +
1
1.9
d log2(d)
)
log2(d+ 1)
)
− 1.9
d
2d−1
d≥3
≥ log2(3)
(
0.1 · 22 +
(
2 +
1
1.9
3 log2(3)
)
2
)
− 1.9
d
2d−1
> 14− 1.9
d
2d−1
d≤6
≥ 14− 1.9
6
32
> 0 .
This proves the claim. Note that this is the only place where we used the definition ζ = 1.9.
Thus, by induction hypothesis, we can find a realization with delay d for f(t̂∗, t∗∗). Split (22)
hence also provides a realization with delay d + 1 for f(s, t) if t′′ contains at least 4 elements.
This concludes the proof.
Proof of Theorem 19. Lemma 23 proves the theorem in the case that 0 ≤ w < 2d−1, while
Lemma 21 proves it for the remaining case that 2d−1 ≤ w < 2d.
Finally, we can prove Theorem 15.
Proof of Theorem 15. We prove the theorem by induction on d. For d ≤ 3, Lemma 18 provides
a realization of f(s, t) with delay d. Now we can assume that the theorem holds for some d ≥ 3,
and prove the inductive step via Theorem 19.
4 Constructing Fast Circuits
Based on Theorem 11, we could now show that there is a circuit realizing the And-Or path
t0∧(t1∨(t2∧(. . . tm−1) . . . ) with delay at most log2W+log2 log2W+log2 log2 log2W+5. Instead,
16
we will prove a stronger result: By modifying the instance, we can diminish the dependency on
W . The modification is based on the observation that we can round up small arrival times to
the same value without losing too much for the maximum delay. Moreover, shifting all arrival
times by some number does not change the problem. Both modifications allow us to reduce the
problem to instances with a total arrival time weight of at most 2m.
Theorem 24. Let m ∈ N with m ≥ 3, Boolean variables t0, . . . , tm−1 and arrival times a :
{t0, . . . , tm−1} → N be given, and define W :=
∑m−1
i=0 2
a(ti). There is circuit realizing the And-
Or path t0 ∧ (t1 ∨ (t2 ∧ (. . . tm−1) . . . ) with delay at most
log2W + log2 log2m+ log2 log2 log2m+ 4.3 .
Proof. We compute new arrival times a˜ : {t0, . . . , tm−1} → N by setting
a˜(ti) := max{0, a(ti)− d(log2W − log2m)e}
for all i ∈ {0, . . . ,m − 1}. We define W˜ := ∑m−1i=0 2a˜(ti) and partition the input indices into
I1 := {i ∈ {0, . . . ,m− 1} | a˜(ti) = 0} and I2 := {0, . . . ,m− 1} \ I1. Then, we have
W˜ =
∑
i∈I1
20 + 2−d(log2W−log2m)e
∑
i∈I2
2a(ti) ≤ m+ 2
log2m
2log2W
W = 2m.
Define d˜ := blog2m+ log2 log2m+ log2 log2 log2m+ 3.3c.
Claim: There is a circuit C realizing the And-Or path t0∧ (t1∨ (t2∧ (. . . tm−1) . . . ) with arrival
times a˜ with delay at most d˜.
Proof of the claim: Let M := 500. If m < M , we have 1.441 log2 W˜ + 2.674 ≤ 1.441 log2(2m) +
2.674 = 1.441 log2m+ 4.115 ≤ log2m+ log2 log2m+ log2 log2 log2m+ 3.3. Since the And-Or
path optimization method presented in [7] computes a circuit with delay at most b1.441 log2 W˜ +
2.674c, this proves the claim for m < M .
Hence assume m ≥M . For proving the claim, by Theorem 11, it is sufficient to show
2m ≤ ζ 2
d˜−1
d˜ log2 d˜
. (26)
Note that the mapping x 7→ 2x−1x log2 x is strictly increasing for x ≥ 2. Moreover, we have d˜ ≥
log2m+ log2 log2m+ log2 log2 log2m+ 2.3. Since
log2m+ log2 log2m+ log2 log2 log2m+ 2.3 ≤ 1.8 log2m (27)
for m ≥M , equation (26) is hence valid if
2m ≤ ζ
1
2 ·m · log2m · log2 log2m · 22.3
1.8 · log2m · log2(1.8 · log2m)
.
This is equivalent to
1.8 · log2(1.8 · log2m) ≤ ζ 20.3 · log2 log2m, (28)
which is true for m ≥M since ζ = 1.9. This proves the claim.
Since we have a(ti) ≤ a˜(ti) + d(log2W − log2m)e for all i ∈ {0, . . . ,m− 1}, the circuit C has,
for the initial arrival times a : {t0, . . . ,m− 1} → N, a delay of at most
blog2m+ log2 log2m+ log2 log2 log2m+ 3.3c+ d(log2W − log2m)e
≤ log2W + log2 log2m+ log2 log2 log2m+ 4.3 .
17
Remark 25. In the proof, we apply method [7] for small instances. Without this trick, we
would obtain a delay bound of log2W + log2 log2m + log2 log2 log2m + 7. Moreover, for suffi-
ciently large values of m, the delay bound in the previous theorem can be improved slightly to
log2W + log2 log2m+ log2 log2 log2m+ 4 + ε for any constant ε > 0: Note that the factor 1.8 in
inequation (27) can be decreased to a value arbitrarily close to 1 if m is sufficiently large. Thus,
also the factor ζ 20.3 in inequation (28) becomes arbitrarily close to 1 for large values of m. This
leads to the stated delay bound.
The following theorem shows that the circuit described in Theorem 24 does not only exist,
but can also be computed efficiently.
Theorem 26. There is an algorithm that computes for given m ≥ 3 the circuit in Theorem 24
in time O(m2 log2m).
Proof. As main subroutine, we will use Algorithm 1.
Claim: Given input variables s = (s1, . . . , sn−1) and t = (t0, . . . , tm−1) with arrival times a :
{t0, . . . , tm−1, s0, . . . , sn−1} → N, Algorithm 1 computes a Boolean circuit realizing f(s, t) with
delay at most d, where d is the smallest natural number with w := W (s) < 2d−1 and W (t) ≤
ζ 2
d−1−w
d log2(d)
+ d−1d Λt. The number of computation steps of Algorithm 1 is
O(m(m+ n) log2(m+ n) +m log2 log2(W ′)) ,
where W ′ =
∑m−1
i=0 2
a(ti) +
∑n−1
i=0 2
a(si).
Proof of the claim: We apply the recursive approach described in Algorithm 1 which arises
from the proof of Theorem 15: In line 1, we compute the minimum d ∈ N such that W (t) ≤
ζ 2
d−1−w
d log2(d)
+ d−1d Λt. We have d ∈ O(log2(W ′)), so d can be computed by binary search in
O(log2 log2(W ′)) steps. Note that in line 1, we have w < 2d−1 since otherwise, we would obtain
a contradiction to Λt ≤W (t) since
W (t) ≤ ζ 2
d−1 − w
d log2(d)
+
d− 1
d
Λt <
d− 1
d
Λt < Λt .
Thus, Theorem 15 provides a circuit realizing f(s, t) with delay d. Lemma 18 computes this
realization if d ≤ 3 (see lines 4 to 7). For d > 3, Lemmata 21 (see lines 11 to 13) and 23 (see
lines 22 to 37) construct the circuit recursively. Hence, the claimed delay bound is fulfilled by
Theorem 15.
We prove the bound on the number of computation steps of Algorithm 1 by counting the
number of steps needed for a single call excluding the recursive calls (i.e., lines 12, 18, 27, 35)
and bounding the number of recursion steps.
Note that the number of recursive calls of the algorithm is bounded by 2m since in each of the
recursive calls in lines 18, 27, 35, the number m of alternating inputs decreases by 1, and since
in the only other recursive call in line 12, m remains the same, but in this case, we recursively
compute f((), t), thus m will decrease in the next recursive call.
Note that in each call of Algorithm 1, we compute at most one symmetric binary tree.
Since each symmetric tree we construct has at most m+ n inputs, due to Remark 8, this takes
O((m+ n) log2(m+ n)) steps per tree.
If we precompute the weight for each consecutive subset of t, computing the prefix t′ in line
22 (or finding out that no such prefix exists) requires O(log2m) steps using binary search.
18
Algorithm 1: Circuit Construction
Input: Inputs s = (s0, . . . , sn−1) and t = (t0, . . . , tm−1), arrival times a(si), a(ti) ∈ N.
Output: Circuit C computing f(s, t).
1 Set w := W (s). Choose d ∈ N minimum such that W (t) ≤ ζ 2d−1−wd log2(d) +
d−1
d Λt.
2 if d ≤ 3 then
3 if m ≤ 2 then
4 Obtain a circuit for f(s, t) as a binary tree s0 ∧ . . . ∧ sn−1 ∧ t0 ∧ . . . ∧ tm−1.
5 end
6 else
7 Obtain a circuit for f(s, t) via f(s, t) = t0 ∧ (t1 ∨ t2).
8 end
9 end
10 else if 2d−2 ≤ w then
11 Construct a binary tree realizing s0 ∧ . . . ∧ sn−1.
12 Recursively realize f((), t).
13 Obtain a circuit for f(s, t) via f(s, t) = (s0 ∧ . . . ∧ sn−1) ∧ f((), t).
14 end
15 else
16 if W (t0) > ζ
2d−2−w
(d−1) log2(d−1) then
17 Construct a binary symmetric tree realizing s0 ∧ . . . ∧ sn−1 ∧ t0.
18 Recursively realize f∗((), (t1, . . . , tm−1)).
19 Obtain a circuit for f(s, t) via
f(s, t) = (s0 ∧ . . . ∧ sn−1 ∧ t0) ∧ f∗((), (t1, t2, . . . , tm−1)).
20 end
21 else
22 Choose a maximum odd-length prefix t′ of t with W (t′) ≤ ζ 2d−2−w(d−1) log2(d−1) .
23 Set t′′ := t\t′.
24 Set t˜ := t′ ∪ t′′0 ∪ t′′1 .
25 Set t∗ :=
{
t′ if t′′ contains ≤ 3 elements or W (t˜) ≤ ζ 2d−2−w(d−1) log2(d−1) +
d−2
d−1Λt˜ ,
t˜ otherwise .
26 Set t∗∗ := t\t∗.
27 Recursively realize f(s, t∗).
28 if t′′ contains at most 2 elements then
29 Realize f∗(t̂∗, t∗∗) as a binary symmetric tree.
30 end
31 else if t′′ contains exactly 3 elements then
32 Directly construct a circuit realizing f∗(t̂′, t′′) = (t̂′ ∨ t′′0) ∨ (t′′1 ∧ t′′2).
33 end
34 else
35 Recursively realize f∗(t̂∗, t∗∗).
36 end
37 Obtain a circuit for f(s, t) via f(s, t) = f(s, t∗) ∧ f∗(t̂∗, t∗∗).
38 end
39 end
19
Apart from this, there are only constantly many steps in each recursive call.
Hence, the number of steps needed for each recursive call of Algorithm 1, excluding lines 12,
18, 27, and 35, is at most O((m+ n) log2(m+ n) + log2 log2(W ′)). Since there are at most 2m
recursive calls, we have at most O(m(m+n) log2(m+n)+m log2 log2(W ′)) steps in total, which
finishes the proof of the claim.
Now we can prove the theorem. We follow the proof of Theorem 24, also using its notation.
For m < M , we construct the circuit described in [7] such that nothing more is to show due
to the properties collected in Table 1. For m ≥ M , we compute the modified instance with
arrival times a˜ and weight W˜ in linear time. Then, we call Algorithm 1 with the modified arrival
times a˜. Since W˜ ∈ O(m), the sizes of all numbers occurring in the algorithm are polynomial
in m. Applying the claim with s = (), hence n = 0, and W ′ = W˜ , we obtain a running time of
O(m2 log2(m)).
Our main objective when designing good circuits for And-Or paths is delay. Still, there are
other metrics to be regarded during circuit construction such as the size, i.e., the total number
of gates used in the circuit, and maximum fanout, i.e., the maximum number of successors of
any input or gate.
Theorem 27. The circuit computed in Theorem 26 has size at most
10m log2m log2 log2m
and maximum fanout at most
log2m+ log2 log2m+ log2 log2 log2m+ 3.3 .
Proof. In order to prove the fanout bound, we show the following claim.
Claim: In the circuit computed by Algorithm 1, each gate has fanout exactly 1, each input in s
has fanout exactly 1 and each input in t has fanout at most d.
Proof of the claim: Note that each gate constructed has fanout 1. We prove the bound on the
maximum fanout of the inputs by induction on d.
Note that in the realizations computed by Lemma 18 which is used in lines 4 to 7, each input
has fanout 1. In the realizations provided in lines 13 and 19, inputs of s only occur in exactly
one binary tree and thus have a fanout of 1. Since we can inductively assume that f((), t) and
f((), (t1, . . . , tm−1)) fulfill the claimed fanout bound, respectively, each input in t has fanout at
most d− 1 < d.
In the realization in line 37, inductively, the circuit for f(s, t∗) computed in line 35 has fanout
at most 1 for inputs in s and fanout at most d − 1 for inputs in t∗. Since inputs of s do not
occur in f∗(t̂∗, t∗∗), it remains to show that inputs of t have fanout at most d in the realization
for f(s, t). If t′′ has at most 3 inputs, lines 29 and 32 show that each input of t has fanout at
most 1 in the realization of f∗(t̂∗, t∗∗), which proves the claimed fanout bounds. Otherwise, we
realize f∗(t̂∗, t∗∗) recursively in line 35 and inductively can assume that the inputs of t̂∗ have
fanout at most 1 and the inputs of t∗∗ fanout at most d − 1 in this realization. Together with
the recursive fanout bounds for the realization of f(s, t∗), this shows the claimed fanout bounds
for the circuit constructed for f(s, t). This proves the claim.
For the bound on the size, note that the combinatorial depth of the circuit is at most log2m+
log2 log2m+ log2 log2 log2m+ 3.3. Since it consists of 2-input gates only, the number of gates is
bounded from above by 23.3m log2m log2 log2m < 10m log2m log2 log2m.
Remark 28. For given m ≥ 3, with the additional use of buffers, the circuit constructed in
Theorem 26 can be transformed into a logically equivalent circuit with maximum fanout 2, but
20
with delay at most
log2W + 2 log2 log2m+ log2 log2 log2m+ 6
and size at most O(m log2m log2 log2m).
To see this, first note that the circuit constructed in [7] which we use for small instances
already has a maximum fanout of 2. Secondly, note that the circuit C constructed in Theorem 26
has fanout larger than 1 only at the inputs. Write
f := log2m+ log2 log2m+ log2 log2 log2m+ 3.3
for the maximum possible fanout of C. For each input ti, we can replace the outgoing edges of
ti by a delay-optimum buffer tree with maximum fanout 2 for each buffer (compare Remark 8).
This increases the size by at most f − 1 and, since we can assume that m ≥ 500, the delay by at
most log2 f ≤ log2 log2m+ 1. This yields the stated properties of the transformed circuit.
5 More General Boolean Functions
In this section, we extend Theorem 24 from And-Or paths to similar functions that do not
alternate between ∧ and ∨ regularly, but arbitrarily.
Definition 29. We call a Boolean function of the form
h(t, ◦1, . . . , ◦m−1) := t0 ◦1
(
t1 ◦2
(
t2 ◦3
(
. . . ◦m−2 (tm−2 ◦m−1 tm−1) . . .
)))
,
where t = (t0, . . . , tm−1) are Boolean input variables and for each i ∈ {1, . . . ,m− 1}, the symbol
◦i denotes a two-input gate over the basis {∧,∨}, a generalized And-Or path.
Theorem 30. Given Boolean input variables t = (t0, . . . , tm−1) and gates ◦1, . . . , ◦m−1, there is
a circuit realizing the generalized And-Or path h(t, ◦1, . . . , ◦m−1) with delay at most
log2(W ) + log2 log2(c+ 1) + log2 log2 log2(c+ 1) + 5.3 ,
size at most 10(c + 1) log2(c + 1) log2 log2(c + 1) + m − c − 1 and maximum fanout at most
log2(c+ 1) + log2 log2(c+ 1) + log2 log2 log2(c+ 1) + 3.3, where c denotes the number of changes
between ∧ and ∨ or vice versa.
Proof. We first prove the delay bound. We partition the inputs t0, . . . , tm−1 of h into c + 1
maximal groups P0, . . . , Pc of consecutive inputs that feed the same kind of gate, see Figure 4 (a).
We denote the common gate type of the gates fed by the inputs Pb by Gb. Note that for each
b ∈ {0, . . . , c− 1}, the group Pb contains at least 1 input, while Pc contains at least 2 inputs.
For each b ∈ {0, . . . , c}, we can build a symmetric binary Gb-tree on the inputs Pb using
Huffman coding. By Remark 8, this yields a Boolean circuit Cb with output t
′
b with delay
a(t′b) =
⌈
log2
(∑
ti∈Pb
W (ti)
)⌉
. (29)
Denote the outputs of the circuits C0, . . . , Cc by t
′
0, . . . , t
′
c, respectively. Without loss of
generality, we may assume that ◦1 = ∧. Then, we can express h(t, ◦1, . . . , ◦m−1) as an And-Or
path as follows:
h(t, ◦1, . . . , ◦m−1) = g(t′0, . . . , t′c) (30)
21
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
P0 P1 P2 P3 P4
(a) A generalized And-Or path.
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
P0 P1 P2 P3 P4
(b) Equivalent circuit after performing
Huffman coding on the groups Pb.
Figure 4: Illustration of the proof of Theorem 30.
In Figure 4 (b), you can see the circuit arising from the generalized And-Or Path in Figure 4 (a)
in this way: The yellow gates are used for the circuits Cb on the input groups Pb; and their outputs
feed an And-Or path drawn with red And and green Or gates.
Write W ′ :=
∑c
b=0W (t
′
i). Theorem 24 yields a circuit C for g((t
′
0, . . . , t
′
c)) and thus also for
h(t, ◦1, . . . , ◦m−1) with delay at most
log2(W
′) + log2 log2(c+ 1) + log2 log2 log2(c+ 1) + 4.3 . (31)
Note that the weight W ′ can be bounded by
W ′ =
c∑
b=0
2a(t
′
b)
(29)
≤
c∑
b=0
2
log2
(∑
ti∈Pb W (ti)
)
+1
= 2
c∑
b=0
∑
ti∈Pb
W (ti) = 2W (t) .
Hence, the delay of C stated in (31) can be bounded by
log2(W
′) + log2 log2(c+ 1) + log2 log2 log2(c+ 1) + 4.3
≤ log2(W ) + log2 log2(c+ 1) + log2 log2 log2(c+ 1) + 5.3 ,
which finishes the proof of the delay bound.
For bounding the size of the arising circuit, note that for each b ∈ {0, . . . , c}, the circuit Cb
has |Pb| − 1 gates. Together with the size bound for the circuit realizing the And-Or path (30)
on c+ 1 inputs shown in Theorem 27, we obtain a total size of at most
c∑
b=0
(|Pb| − 1) + 10(c+ 1) log2(c+ 1) log2 log2(c+ 1)
= m− c− 1 + 10(c+ 1) log2(c+ 1) log2 log2(c+ 1) .
22
Since in the circuits Pb, b = 0, . . . , c + 1, every node has exactly one predecessor and each
input of the And-Or path (30) occurs only once, the maximum fanout occurs in the circuit for
the And-Or path (30). Due to Theorem 27, this fanout is thus at most
log2(c+ 1) + log2 log2(c+ 1) + log2 log2 log2(c+ 1) + 3.3 .
References
[1] Richard Brent. On the addition of binary numbers. IEEE Transactions on Computers,
19(8):758–759, 1970.
[2] Richard P. Brent and H.-T. Kung. A regular layout for parallel adders. IEEE Transactions
on Computers, 31(3):260–264, 1982.
[3] Beate Commentz-Walter. Size-depth tradeoff in monotone Boolean formulae. Acta Infor-
matica, 12(3):227–243, 1979.
[4] Beate Commentz-Walter and Juergen Sattler. Size-depth tradeoff in non-monotone Boolean
formulae. Acta Informatica, 14(3):257–269, 1980.
[5] Martin C. Golumbic. Combinatorial merging. IEEE Transactions on Computers,
25(11):1164–1167, 1976.
[6] Mikhail I. Grinchuk. Sharpening an upper bound on the adder and comparator depths.
Journal of Applied and Industrial Mathematics, 3(1):61–67, 2009.
[7] Stephan Held and Sophie Spirkl. Fast prefix adders for non-uniform input arrival times.
Algorithmica, 77(1):287–308, 2017.
[8] David A. Huffman. A method for the construction of minimum-redundancy codes. Proceed-
ings of the Institute of Radio Engineers, 40(9):1098–1101, 1952.
[9] Valerii M. Khrapchenko. Asymptotic estimation of addition time of parallel adder. Syst.
Theory Res., 19:105–122, 1970.
[10] Valerii M. Khrapchenko. On possibility of refining bounds for the delay of a parallel adder.
Journal of Applied and Industrial Mathematics, 2(2):211–214, 2008.
[11] Peter M. Kogge and Harold S. Stone. A parallel algorithm for the efficient solution of a
general class of recurrence equations. IEEE Transactions on computers, 100(8):786–793,
1973.
[12] Leon Gordon Kraft. A device for quantizing, grouping, and coding amplitude-modulated
pulses. PhD thesis, Massachusetts Institute of Technology, 1949.
[13] Richard E. Ladner and Michael J. Fischer. Parallel prefix computation. Journal of the ACM
(JACM), 27(4):831–838, 1980.
[14] Dieter Rautenbach, Christian Szegedy, and Ju¨rgen Werber. Asymptotically optimal Boolean
circuits for functions of the form gn−1(gn−2(...g3(g2(g1(x1, x2), x3), x4)..., xn−1), xn) given
input arrival times. Research Institute for Discrete Mathematics, University of Bonn, 2003.
[15] Dieter Rautenbach, Christian Szegedy, and Ju¨rgen Werber. Delay optimization of linear
depth Boolean circuits with prescribed input arrival times. Journal of Discrete Algorithms,
4(4):526–537, 2006.
23
[16] John E. Savage. Models of Computation, volume 136. Addison-Wesley Reading, MA, 1998.
[17] Jack Sklansky. Conditional-sum addition logic. IRE Transactions on Electronic computers,
(2):226–231, 1960.
[18] Sophie Spirkl. Boolean circuit optimization. Master’s thesis, University of Bonn, 2014.
[19] Ju¨rgen Werber. Logic Restructuring for Timing Optimization in VLSI Design. PhD thesis,
University of Bonn, 2007.
[20] Ju¨rgen Werber, Dieter Rautenbach, and Christian Szegedy. Timing optimization by re-
structuring long combinatorial paths. In Proceedings of the 2007 IEEE/ACM International
Conference on Computer-Aided Design, pages 536–543. IEEE Press, 2007.
[21] Reto Zimmermann. Binary Adder Architectures for Cell-Based VLSI and Their Synthesis.
PhD thesis, Swiss Federal Institute of Technology, Zurich (ETH), 1998.
24
