We consider the fundamental problem of constructing fast circuits for the carry bit computation in binary addition. Up to a small additive constant, the carry bit computation reduces to computing an And-Or path, i.e., a formula of type
INTRODUCTION
An And-Or path is a Boolean formula of type t 0 ∧ (t 1 ∨ (t 2 ∧ (. . . t m−1 ) . . . ) or t 0 ∨ (t 1 ∧ (t 2 ∨ (. . . t m−1 ) . . . ).
We assume that for each Boolean input variable t i (i ∈ {0, . . . ,m − 1}), an arrival time a(t i ) ∈ N is given. Our goal is to find a Boolean circuit using only And and Or gates that computes the Boolean function of a given And-Or path and minimizes the maximum delay of the inputs. Here, the delay of an input t i in a Boolean circuit is its arrival time a(t i ) plus the length of a maximum directed path in the circuit starting at t i . Thus, the concept of delay minimization generalizes the concept of depth minimization. When only depth is considered, one has to assume that all input signals are available at the same time (uniform arrival times), which is not the case on real-world
Previous Work
For And-Or paths with uniform arrival times (i.e., only depth is considered), the currently fastest Boolean circuit has been proposed by Grinchuk [6] reaching a depth of log 2 m + log 2 log 2 m + O(1). This is close to the best known lower bounds on depth: Khrapchenko [10] showed that any circuit for an And-Or path has a depth of at least log 2 m + 0.15 log 2 log 2 log 2 m + Θ (1) . The result is based on a lower bound of Θ m log 2 m log 2 log 2 m log 2 log 2 log 2 log 2 m on the product of size and depth of a Boolean formula for an And-Or path (see Commentz-Walter and Sattler [4] ). For monotone circuits, i.e., circuits without negations (and the circuit built in Ref. [6] is a monotone circuit), this lower bound can be improved to Θ(m log 2 2 m) (see Commentz-Walter [3] ). This directly implies a lower bound of log 2 m + log 2 log 2 m + Θ(1) on the depth of a monotone circuit for an And-Or path.
For And-Or paths with non-uniform arrival times a(t 0 ), . . . , a(t m−1 ), the value log 2 W is a lower bound on the achievable delay, where W := m−1 i=0 2 a (t i ) (see, e.g., Rautenbach et al. [15] ). No stronger lower bounds on delay are known. Rautenbach et al. [15] presented an algorithm computing a Boolean circuit for an And-Or path with delay at most 1.441 log 2 W + 3.
Faster Carry Bit Computation for Adder Circuits with Prescribed Arrival Times 45:3 
Author
Upper bound on delay Size Maximum fanout [15] ; [7] 1.441 log 2 W + 2.674 O(m) 2 [14] ; [18] (1 + ε) log 2 W + 3 ε + 5 O( m ε ) 2 [14] ; [18] (1 + ε) log 2 This delay bound was improved to 1.441 log 2 W + 2.674 by Held and Spirkl [7] . In both of these circuits, so-called 2-input prefix gates are used, and it can be shown that any And-Or path realization based on prefix gates has a delay of at least log φ ( m−1 i=0 φ a (t i ) ) where φ = 1+ √ 5 2 ≈ 1.618 is the golden ratio (see Ref. [7] ). In particular, this implies that any prefix-based And-Or path realization has a depth of at least 1.44 log 2 m − 1. Without using prefix gates, Rautenbach et al. [14] presented a circuit for an And-Or path with delay at most (1 + ε) log 2 W + c ε (for any ε > 0), where c ε is a number depending on ε, only. Spirkl [18] specified the delay bound to (1 + ε) log 2 W + 6 ε + 8 + 5ε and improved it to (1 + ε) log 2 W + 3 ε + 5. Moreover, Spirkl [18] described a circuit with a delay of at most log 2 W + 2 2 log 2 m − 2 + 6. Note that for any ε > 0, this is actually a better delay bound than (1 + ε) log 2 W + 3 ε + 5 (because ε log 2 W + 3 ε + 5 ≥ 2 3 log 2 W + 5 ≥ 2 3 log 2 m + 5 ≥ 2 2 log 2 m − 2 + 6). Up to now, this is the fastest circuit for And-Or paths with non-uniform arrival times. Table 1 summarizes these results in comparison with our delay bound. We also state size (i.e., number of gates) and maximum fanout of the constructed circuits. Note that some methods tradeoff size against fanout and provide two different circuits.
All the And-Or path circuits presented in the table can be used directly to obtain adder circuits with the same delays by computing each carry bit separately. However, the size of the arising circuits would be super-quadratic. The construction of linear-size adder circuits with our delay guarantee is part of ongoing work.
Our Contribution
In this article, we present an algorithm with running time O(m 2 log 2 m) that computes a Boolean circuit for And-Or paths (with m ≥ 3) using only two-input And and Or gates with a delay of at most log 2 W + log 2 log 2 m + log 2 log 2 log 2 m + 4.3, size O(m log 2 m log 2 log 2 m) and maximum fanout log 2 m + log 2 log 2 m + log 2 log 2 log 2 m + 3.3. In terms of delay, this yields the currently best known circuits for And-Or paths and thus also adder circuits. In particular, we improve the previously best known delay bound of log 2 W + 2 2 log 2 m − 2 + 6 by Spirkl [18] for each m ≥ 3 as well as asymptotically. The construction of the circuit is based on a recursive approach similar to the algorithm of Grinchuk [6] for uniform arrival times. The rest of the article is organized as follows. In Section 2, we introduce basic definitions and results. We give a formal description of the problem (Section 2.1), define splitting steps that allow us to partition an instance into smaller sub-instances (Section 2.2), and introduce a measure for deciding which instances admit an And-Or path realization with a given delay (Section 2.3).
45:4
U. Brenner and A. Hermann Section 3 classifies these instances, which is the major step of the article. In Section 4, we deduce how to construct circuits realizing And-Or paths with a delay of at most log 2 W + log 2 log 2 m + log 2 log 2 log 2 m + 4.3 and analyze the size and fanout as well as the runtime needed to compute such circuits. In Section 5, we extend this result to paths that also consist of And and Or gates only, but not necessarily alternatingly.
PRELIMINARIES

Problem Formulation
We denote the set of natural numbers including zero by N. Our notation regarding Boolean functions and circuits is based on Savage [16] . Given r ∈ N and a Boolean function h : {0, 1} r → {0, 1} with Boolean input variables (shorter, inputs) x 0 , . . . , x r −1 , we write x = (x 0 , . . . , x r −1 ) as a shorthand for all inputs with fixed ordering. If r = 0, we write x = (). Definition 2.1. Let Boolean input variables t = (t 0 , . . . , t m−1 ) for some m ∈ N with m > 0 be given. We call each of the recursively defined functions
an And-Or path on m inputs.
We want to realize a given And-Or path as a Boolean circuit over the basis {∧, ∨}. A Boolean circuit over the basis {∧, ∨} is a directed acyclic graph such that -the nodes with indegree 0, called inputs, are labeled by Boolean variables, -there is only one node with outdegree 0, called output, and the output has indegree exactly 1, -each of the remaining nodes has indegree exactly 2 and outdegree at least 1, and is labeled either as an And gate or as an Or gate.
The logical function on the input variables computed by the output of the circuit can be obtained by combining the logical functions represented by the gates recursively. A circuit is a realization of the And-Or path д(t ) if and only if the output signal equals д(t ) for all input values (correspondingly for д * ). Figure 1 (a) shows a Boolean circuit, which is a straightforward realization of the And-Or path д * ((t 0 , . . . , t 5 )). We omit drawing the output in such pictures-the unique predecessor of the output will always be the bottommost gate. We assume that each input variable is associated with a prescribed arrival time.
Faster Carry Bit Computation for Adder Circuits with Prescribed Arrival Times 45:5 Fig. 2 . Performing the alternating split on д * ((t 0 , . . . , t 11 )) = t 0 ∨ (t 1 ∧ (. . . (t 10 ∨ t 11 ) . . .)).
Definition 2.2. Let r ∈ N and a Boolean function h : {0, 1} r → {0, 1} on Boolean input variables x = (x 0 , . . . , x r −1 ) with arrival times a : {x 0 , . . . , x r −1 } → N be given. Consider a circuit C computing h. For i = 0, . . . , r − 1, the delay of input x i is defined as a(x i ) + l (x i ), where l (x i ) denotes the maximum number of gates of any directed path in C starting at input x i . The delay of the circuit C is the maximum delay of any input.
Note that for uniform arrival times, e.g., a ≡ 0, the delay of a circuit is simply the depth of the circuit.
Our goal is computing fast circuits for And-Or paths, i.e., solving the following problem: Problem 2.3 (And-Or Path Optimization Problem). For m ∈ N with m > 0, consider Boolean input variables t = (t 0 , . . . , t m−1 ) with arrival times a : {t 0 , . . . , t m−1 } → N. Find circuits over the basis {∧, ∨} that compute the And-Or paths д(t ) and д * (t ) with minimum possible delay.
Note that adding the same constant to all arrival times does not change the problem. This is why we only allow non-negative arrival times in this formulation. Moreover, forbidding non-integral arrival times is not a significant restriction because rounding up all arrival times will change the delay of any circuit by less than 1. Therefore, we only consider natural numbers as arrival times.
Recursive Circuit Construction
We construct fast circuits for And-Or paths in a recursive way. Before describing all details of the approach, we explain the idea of the induction step.
Suppose we want to realize the And-Or path д * (t ) = t 0 ∨ (t 1 ∧ t 2 ∨ (t 3 ∧ (. . . t m−1 . . .))). We subdivide the inputs t 0 , . . . , t m−1 into two groups t 0 , . . . , t 2k and t 2k+1 , . . . , t m−1 . Recursively, we compute fast circuits for the And-Or paths on each of these input sets. These two circuits can be combined to a circuit for the whole And-Or path as illustrated in an example with m = 12 and k = 3 in Figure 2 ; the general construction is described in Lemma 2.5. The output of the circuit for the subinstance t 2k+1 , . . . , t m−1 is combined with every second input of t 0 , . . . , t 2k by using only And gates. Just one additional Or gate (labeled "G" in the picture) is needed to compute a circuit for the whole And-Or path. It is not too difficult to check that the circuits in Figure 2 are logically equivalent. To see this, note that the output of the circuit in Figure 2 (a) is "true" if and only if there is an i ∈ {0, 2, 4, 6, 8, 10, 11} such that the input signals at t i and at every t j with j odd and j ∈ {1, . . . , i − 1} are "true." This is also a sufficient and necessary condition for a "true" as an output of the circuit in Figure 2 (b).
Note that while the left input of gate G in the example is the output of an And-Or path, the right input of G is the output of a function combining the And-Or path for t 2k+1 , . . . , t m−1 with a multiple-input AND. The occurrence of such functions in our recursion requires generalizing the concept of And-Or paths.
Definition 2.4. Given n, m ∈ N, m > 0, and inputs s 0 , . . . , s n−1 , t 0 , . . . , t m−1 subdivided into s = (s 0 , . . . , s n−1 ) and t = (t 0 , . . . , t m−1 ), we define the extended And-Or paths
in the case that s = (). We call the input variables s symmetric inputs and the input variables t alternating inputs, respectively. and t = (t 0 , . . . , t 4 ). We shall always assume that the set of input variables contained in s and t are disjoint sets indexed by s 0 , . . . , s n−1 and t 0 , . . . , t m−1 . Note that expanding the definitions of f (s, t ) and f * (s, t ) given in
where, for m odd, the innermost operation of f (s, t ) and f * (s, t ) is ∨ and ∧, respectively, and vice versa for m even.
Due to the duality principle of Boolean algebra, over the basis {∧, ∨}, any realization for f (s, t ) yields a realization of f * (s, t ) and vice versa by switching all And and Or gates. In order to compute fast realizations for f and f * , we will apply two methods that allow realizing f and f * recursively. Each of these methods reduces the problem of realizing f (s, t ) to the problem of realizing extended And-Or paths with strictly fewer symmetric or alternating inputs.
First, we formally describe the well-known method, which is depicted in Figure 2 for the special case that s = (), m = 12, and k = 3. Lemma 2.5. Let input variables s and t and an integer k with 0 ≤ k < m−1 2 be given. Denote by t the (odd-length) prefix t = (t 0 , t 1 , . . . , t 2k ) of t, and by t the remaining inputs of t, i.e., t = (t 2k+1 , . . . , t m−1 ). Then, we have
where t := (t 1 , t 3 , t 5 , . . . , t 2k−1 ) contains every second entry of t .
Proof. At first, we prove the following claim:
Claim. For any Boolean variable x, we have
Proof of the Claim. We prove the claim by induction on k. For k = 0, we have Faster Carry Bit Computation for Adder Circuits with Prescribed Arrival Times
45:7
Assuming Statement (2) holds for some k ≥ 0 and setting
This proves the inductive step and thus the claim. From this claim and Definition 2.1, we conclude
The previous lemma allows us to recursively construct a circuit for f * (s, t ): Inductively, we can assume that we can compute a fast circuit for f * (s, t ) and f ( t , t ), respectively. Combining these circuits as in Equation (1) yields a good circuit for f * (s, t ). We call this way to construct a circuit for f * (s, t ) an alternating split.
Similarly, if n > 0, the definition of f (s, t ) (see Definition 2.4) yields two symmetric splits indicated by the equations
Here, we only inquire a recursive circuit construction for the arising And-Or path since a delay-optimum symmetric And-tree can be constructed via Huffman coding [8] as explained in Remark 2.8. Note that dualizing the Splits (1), (3), and (4) yields analogous splits for f * :
Delay and Weight
We will realize extended And-Or paths with good delay using the recursive methods defined in Section 2.2. For this, we need to classify inputs by their weight.
Definition 2.6. Given inputs x = (x 0 , . . . , x r −1 ) and arrival times a, for i = 0, . . . , r − 1, the
. Note that in this definition, the weights W (x i ) and W (x ) do not only depend on the inputs, but also on the input arrival times.
The definition of the weight is motivated by the fact that for symmetric binary functions, i.e., And trees or Or trees, the optimum delay achievable for any realization can be derived from the weight of the inputs directly. [12] ; see also Golumbic [5] . A delayoptimum tree can be computed in runtime O(r log 2 r ) via a greedy algorithm based on Huffman coding [8] . A short proof of this can be found in Werber [19] .
Note that when t has at most two entries, f (s, t ) is a symmetric binary And tree. Lemma 2.9. For Boolean input variables s and t with m ≤ 2 and arrival times a, we can realize
For the case that t has more than two entries, we give an upper bound on the delay of f (s, t ) in Section 3.
BOUNDING THE WEIGHT FOR GIVEN DELAY
We will prove an upper bound on the delay of And-Or paths by a reverse argument similarly as in Grinchuk's proof for the case of uniform input arrival times [6] . Grinchuk fixes a depth bound d and the number n of symmetric inputs s, and determines how many alternating inputs t an And-Or path may have such that f (s, t ) can be realized with depth d. Similarly, given symmetric inputs s with a fixed weight w and a fixed delay bound d, we will determine for which alternating inputs t a realization for f (s, t ) with delay d can be guaranteed. Since it is difficult to classify these t exactly, we distinguish different alternating inputs t by their weight only.
In order to facilitate the formulation of our main statement, we fix a constant ζ ; see also Remark 3.3. We aim at proving the following statement:
Then, there is a circuit realizing f (s, t ) with delay at most d.
We will prove Theorem 3.2 by induction on d. Based on this, we will deduce the desired upper bound of log 2 W + log 2 log 2 m + log 2 log 2 log 2 m + 4.3 on the delay of And-Or paths in Section 4. For example, if we chose ζ := 1, we would need to replace the additive constant by 5. On the other hand, the higher ζ , the larger the minimum value of d is for which the induction step works, which increases the proof complexity for the base case.
Most parts of the proof of Theorem 3.2 will work for any ζ with 1 ≤ ζ < 2; only at the end of the proof of Lemma 3.14, we demand ζ ≤ 1.9. A slightly larger choice of ζ would be possible, but using ζ = 1.9 keeps calculations simpler. An improvement of the additive term to, e.g., 4.2 would only be possible for ζ ≥ 1.992, for which we cannot prove Theorem 3.2.
For proving Theorem 3.2, we would like to proceed by induction on d making use of the restructuring formulas presented in Section 2.2. The following remark explains why the inductive step from d to d + 1 does not work directly.
Remark 3.4. Suppose that Theorem 3.2 holds for some d > 1 and all 0 ≤ w < 2 d −1 . In order to prove Theorem 3.2 for d + 1, we need to show that, given input variables s and t with 0 ≤ w := W (s) < 2 d and W (t ) ≤ ζ
can be realized with delay d + 1. In the case that w < 2 d −1 , we would like to apply the alternating split f (s, t ) = f (s, t ) ∧ f * ( t , t ) given by Equation (5). If we choose the prefix t of t such that
(whenever this is possible), the induction hypothesis and the assumption w < 2 d −1 allow us to construct a circuit for f (s, t ) with delay d. Thus, in order to construct a circuit with delay d + 1 for f (s, t ), it remains to prove that f * ( t , t ) admits a circuit with delay d. Again, by induction hypothesis, we need to show that
But the only thing we know about W (t ) is that
Even if we choose the prefix t maximal with Inequality (8), this will not give us a meaningful upper bound on W (t ) since W (t ) might be arbitrarily small in comparison to W (t ).
Note that this is what distinguishes our proof from Grinchuk's [6] : For arrival times all 0, choosing t maximal with Inequality (8) works well, since then, by maximality, we have W (t ) >
It turns out that this upper bound on W (t * * ) suffices to prove that f * ( t , t ) can be realized with delay d.
When arbitrary arrival times are present, a different proof idea is needed.
Thus, instead of proving Theorem 3.2 via induction on d, we strengthen the induction hypothesis and prove the stronger Theorem 3.6. Definition 3.5. Let m ∈ N with m > 0. For inputs t = (t 0 , . . . , t m−1 ) with arrival times a, we denote by Λ t the weight of the last two (or fewer) entries of t, i.e., 
Note that since Λ t ≥ 0, Theorem 3.6 implies Theorem 3.2. The proof of Theorem 3.6 is the most important part of this article and covers the rest of this section. First, we observe two ways to express Requirement (10) differently. Remark 3.7. Assuming the conditions of Theorem 3.6, the following statements are equivalent to Requirement (10):
Statement (11) can be obtained from Requirement (10) 
Proof. Using Inequality (12), we obtain
If d ≥ 2 ζ , the condition w < 2 d −1 yields
Otherwise, if d < 2 ζ , the condition w ≥ 0 yields
The equivalent Requirements (10), (11) , and (12) as well as Lemma 3.8 will be used extensively when proving Theorem 3.6. For this proof, we proceed by induction on d. In Lemma 3.9, we will show as a base case that Theorem 3.6 holds for d ≤ 3. Then, in Theorem 3.10, we will prove the inductive step: Assuming that Theorem 3.6 holds for some d ≥ 3 and all 0 ≤ w < 2 d −1 , we will prove the statement for d + 1 and all 0 ≤ w < 2 d . Lemma 3.9. Assuming the conditions of Theorem 3.6 for d = 2, 3, we can realize f (s, t ) with delay d.
Proof. First assume that m ≤ 2. Recall that in this case, f (s, t ) is a symmetric binary tree. By Inequality (13), we know that W (t ) + w ≤ 2 d . Hence, by Remark 2.8, we can realize f (s, t ) with delay d using Huffman coding. Now let m ≥ 3. Requirement (11) yields
For d = 2, this leads to the contradiction
i.e., for d = 2, we always have m ≤ 2 and have already proven the required statement. Faster Carry Bit Computation for Adder Circuits with Prescribed Arrival Times
45:11
Similarly, if d = 3, we obtain
In the case that w ≥ 1, this is a contradiction; and for w = 0, the only remaining case is m = 3 with t 0 = t 1 = t 2 = 0 for which f (s, t ) = t 0 ∧ (t 1 ∨ t 2 ) can obviously be constructed with delay 2 < 3.
Theorem 3.10. Assume inductively that for some d ≥ 3 and all 0 ≤ w < 2 d −1 , Theorem 3.6 holds. Then, for inputs s and t with w := W (s) such that 0 ≤ w < 2 d and
we can realize f (s, t ) with delay (d + 1).
As a sub-calculation for the proof of this theorem, we need the following lemma.
Lemma 3.11. In the situation of Theorem 3.10, we have
Proof. Using the bound on Λ t implied by Inequality (12), we calculate
This is the only ingredient needed to prove Theorem 3.10 for the case that 2 d −1 ≤ w < 2 d . Lemma 3.12. Theorem 3.10 holds for all w satisfying
Proof. The symmetric Split (3) yields the realization
Since w < 2 d , Remark 2.8 allows the construction of a symmetric tree on inputs s with delay d. In order to show that f ((), t ) = д(t ) can be realized with delay d, by induction hypothesis, it suffices to show the second inequality in W (t ) (16) ≤ ζ 2 d − w (d + 1) log 2 (d + 1)
Subtracting the left-hand side from the right-hand side, we prove this via
Hence, applying the symmetric Split (17) yields a realization for f (s, t ) with delay d + 1.
In the case 0 ≤ w < 2 d −1 , we need a bound on the logarithm of consecutive integers.
Remark 3.13. For d ≥ 3, we have d ≥ ln(2)(d + 1) and thus
Now we will prove Theorem 3.10 for the case that 0 ≤ w < 2 d −1 .
Lemma 3.14. Theorem 3.10 holds for each w satisfying 0 ≤ w < 2 d −1 .
Proof. We prove this lemma via a case distinction. In Case 2, we will consider a prefix t of the inputs t with weight at most ζ 2 d −1 −w d log 2 (d ) in order to proceed similarly as indicated in Remark 3.4. If the weight of t 0 is already larger than this, such a prefix does not exist. We deal with this situation in Case 1.
Case 1: Assume that
The symmetric Split (4) yields
Due to Inequality (13) and d + 1 ≥ 4 > 2 ζ , we have W (t 0 ) + w ≤ W (t ) + w ≤ 2 d . Hence, by Remark 2.8, we can realize s 0 ∧ · · · ∧ s n−1 ∧ t 0 as a binary tree with delay d. Thus, we will check inductively that f * ((), (t 1 , t 2 , . . . , t m−1 )) = д * ((t 1 , t 2 , . . . , t m−1 )) can be realized with delay d. Note that Requirement (16) and Condition (19) imply
which we claim to be at most
This can be shown by 1 for f (s, t ) , which proves the lemma for the case that
. Therefore, we can consider a maximum odd-length prefix t = (t 0 ,
We define t := (t 2k+1 , . . . , t m−1 ). If t is empty, there is nothing to show since, by induction hypothesis, we can construct f (s, t ) = f (s, t ) with a delay of d < d + 1 due to w < 2 d −1 . Otherwise, we will realize f (s, t ) with delay d + 1 using the alternating split (5) for some prefix t * of t to be determined, i.e.,
where t * = (t 0 , t 1 , . . . , t 2l ) for some 0 ≤ l < m−1 2 and t * * := (t 2l +1 , . . . , t m−1 ). Our main argument, which is presented in Case 2 (ii), requires that {t 2k+1 , t 2k+2 } ∩ {t m−2 , t m−1 } = ∅, i.e., that t has at least four elements. Thus, in Case 2 (i), we treat the t with at most two elements, and at the beginning of Case 2 (ii), those with exactly three elements.
Case 2 (i): Assume that t consists of at most two elements. We set t * := t ; thus, t * * = t . By induction hypothesis and due to w < 2 d −1 , Inequality (21) allows realizing f (s, t ) with delay d. Since t has at most two elements, by Remark 2.8, we can realize f * ( t , t ) as a binary tree with delay d since W ( t ) + W (t ) ≤ W (t ), which is at most 2 d due to Inequality (13) and d + 1 ≥ 4 > 2 ζ . Case 2 (ii): Assume that t contains at least three elements. Sett := (t 0 , . . . , t 2k+2 ). We need to find an appropriate prefix t * of t for Realization (22) such that both f (s, t * ) and f * ( t * , t * * ) can be realized with delay d by induction hypothesis. We choose t * depending on the weight oft:
Λt , we set t * := t . Note that in this case, we, in particular, have
Λt . Figure 3 visualizes the case distinction. In either case, the weight of t * will be of the form
The upper bound on δ allows us to realize f (s, t * ) with delay d by induction hypothesis since w < 2 d −1 . It remains to show that f * ( t * , t * * ) can be realized with delay d. The case that t contains exactly three elements still needs to be treated separately. Here, case (a) is easy since we have t * * = (t m−1 ); hence, f * ( t * , t * * ) is a binary tree that can be realized with delay d by Huffman coding since W (t ) ≤ 2 d due to Inequality (13) . In Case (b), we show that the realization
yields delay d: The binary tree t ∨ t 2k+1 can be realized with delay d − 1 using Remark 2.8 since It remains to show Λ t ≤ 2 d −1 , so assume the contrary. W.l.o.g., since f (s, t ) is logically symmetric in t 2k+2 and t 2k+3 , we may assume W (t 2k+3 ) = max{W (t 2k+3 ),W (t 2k+2 )}. Due to Inequality (13) and
and
where the last step can be verified by hand for d = 3, and for d ≥ 4 is implied by
This is a contradiction, concluding the case that t has exactly three elements.
Finally, we can prove Theorem 3.6.
Proof of Theorem 3.6. We prove the theorem by induction on d. For d ≤ 3, Lemma 3.9 provides a realization of f (s, t ) with delay d. Now we can assume that the theorem holds for some d ≥ 3, and prove the inductive step via Theorem 3.10.
CONSTRUCTING FAST CIRCUITS
Based on Theorem 3.2, we could now show that there is a circuit realizing the And-Or path t 0 ∧ (t 1 ∨ (t 2 ∧ (. . . t m−1 ) . . . ) with delay at most log 2 W + log 2 log 2 W + log 2 log 2 log 2 W + 5. Instead, we will prove a stronger result: By modifying the instance, we can diminish the dependency on W . The modification is based on the observation that we can round up small arrival times to the same value without losing too much for the maximum delay. Moreover, shifting all arrival times by some number does not change the problem. Both modifications allow us to reduce the problem to instances with a total arrival time weight of at most 2m. Claim. There is a circuit C realizing the And-Or path t 0 ∧ (t 1 ∨ (t 2 ∧ (. . . t m−1 ) . . . ) with arrival timesã with delay at mostd. · m · log 2 m · log 2 log 2 m · 2 2.3 1.8 · log 2 m · log 2 (1.8 · log 2 m) .
Note that whenever two non-disjoint sequences of inputs are considered as alternating inputs in the algorithm, one of the sequences must be a subset of the other. Therefore, the number of alternating input sequences considered by the algorithm can be bounded by m. Moreover, in each of the recursive calls in lines 19, 28, and 36, the number of alternating inputs decreases by 1, and in the only other recursive call in line 13, the number remains the same, but in this case, we recursively compute f ((), t ); thus, the number of alternating inputs will decrease in the next recursive call. Thus, the number of recursive calls in the algorithm is bounded by 2m.
Note that in each call of Algorithm 1, we compute at most one symmetric binary tree. Since each symmetric tree we construct has at most m + n inputs, due to Remark 2.8, this takes O((m + n) log 2 (m + n)) steps per tree.
If we precompute the weight for each consecutive subset of t, computing the prefix t in line 23 (or finding out that no such prefix exists) requires O(log 2 m) steps using binary search.
Apart from this, there are only constantly many steps in each recursive call. Hence, the number of steps needed for each recursive call of Algorithm 1, excluding lines 13, 19, 28, and 36, is at most O((m + n) log 2 (m + n) + log 2 log 2 (W )). Since there are at most 2m recursive calls, we have at most O(m(m + n) log 2 (m + n) + m log 2 log 2 (W )) steps in total, which finishes the proof of the claim. Now we can prove the theorem. We follow the proof of Theorem 4.1, also using its notation. For m < M, we construct the circuit described in Ref. [7] such that nothing more is to show due to the properties collected in Table 1 . For m ≥ M, we compute the modified instance with arrival times a and weight W in linear time. Then, we call Algorithm 1 with the modified arrival timesã. Since W ∈ O(m), the sizes of all numbers occurring in the algorithm are polynomial in m. Applying the claim with s = (), hence n = 0, and W = W , we obtain a running time of O(m 2 log 2 (m)).
Our main objective when designing good circuits for And-Or paths is delay. Still, there are other metrics to be regarded during circuit construction such as the size, i.e., the total number of gates used in the circuit, and maximum fanout, i.e., the maximum number of successors of any input or gate. Proof. In order to prove the fanout bound, we show the following claim.
Claim. In the circuit computed by Algorithm 1, each gate has fanout exactly 1, each input in s has fanout exactly 1, and each input in t has fanout at most d.
Proof of The Claim. Note that each gate constructed has fanout 1. We prove the bound on the maximum fanout of the inputs by induction on d.
Note that in the realizations computed by Lemma 3.9 which is used in lines 5 to 8, each input has fanout 1. In the realizations provided in lines 14 and 20, inputs of s only occur in exactly one binary tree and thus have a fanout of 1. Since we can inductively assume that f ((), t ) and f ((), (t 1 , . . . , t m−1 )) fulfill the claimed fanout bound, respectively, each input in t has fanout at most d − 1 < d.
In the realization in line 38, inductively, the circuit for f (s, t * ) computed in line 36 has fanout at most 1 for inputs in s and fanout at most d − 1 for inputs in t * . Since inputs of s do not occur in f * ( t * , t * * ), it remains to show that inputs of t have fanout at most d in the realization for f (s, t ).
If t has at most three inputs, lines 30 and 33 show that each input of t has fanout at most one in the realization of f * ( t * , t * * ), which proves the claimed fanout bounds. Otherwise, we realize f * ( t * , t * * ) recursively in line 36 and inductively can assume that the inputs of t * have fanout at most one and the inputs of t * * fanout at most d − 1 in this realization. Together with the recursive fanout bounds for the realization of f (s, t * ), this shows the claimed fanout bounds for the circuit constructed for f (s, t ). This proves the claim.
For the bound on the size, note that the combinatorial depth of the circuit is at most log 2 m + log 2 log 2 m + log 2 log 2 log 2 m + 3.3. Since it consists of 2-input gates only, the number of gates is bounded from above by 2 3.3 m log 2 m log 2 log 2 m < 10m log 2 m log 2 log 2 m. Remark 4.5. For given m ≥ 3, with the additional use of buffers, the circuit constructed in Theorem 4.3 can be transformed into a logically equivalent circuit with maximum fanout 2, but with delay at most log 2 W + 2 log 2 log 2 m + log 2 log 2 log 2 m + 6 and size at most O(m log 2 m log 2 log 2 m).
To see this, first note that the circuit constructed in Ref. [7] , which we use for small instances already has a maximum fanout of 2. Secondly, note that the circuit C constructed in Theorem 4.3 has fanout larger than 1 only at the inputs. Write f := log 2 m + log 2 log 2 m + log 2 log 2 log 2 m + 3.3 for the maximum possible fanout of C. For each input t i , we can replace the outgoing edges of t i by a delay-optimum buffer tree with maximum fanout 2 for each buffer (compare Remark 2.8). This increases the size by at most m( f − 1), and, since we can assume that m ≥ 500, the delay is by at most log 2 f ≤ log 2 log 2 m + 1. This yields the stated properties of the transformed circuit.
MORE GENERAL BOOLEAN FUNCTIONS
In this section, we extend Theorem 4.1 from And-Or paths to similar functions that do not alternate between ∧ and ∨ regularly, but arbitrarily. where t = (t 0 , . . . , t m−1 ) are Boolean input variables and for each i ∈ {1, . . . ,m − 1}, the symbol • i denotes a two-input gate over the basis {∧, ∨}, a generalized And-Or path.
Theorem 5.2. Given Boolean input variables t = (t 0 , . . . , t m−1 ) and gates • 1 , . . . , • m−1 , there is a circuit realizing the generalized And-Or path h(t, • 1 , . . . , • m−1 ) with delay at most log 2 (W ) + log 2 log 2 (c + 1) + log 2 log 2 log 2 (c + 1) + 5.3, size at most 10(c + 1) log 2 (c + 1) log 2 log 2 (c + 1) + m − c − 1 and maximum fanout at most log 2 (c + 1) + log 2 log 2 (c + 1) + log 2 log 2 log 2 (c + 1) + 3.3, where c denotes the number of changes between ∧ and ∨ or vice versa.
Proof. We first prove the delay bound. We partition the inputs t 0 , . . . , t m−1 of h into c + 1 maximal groups P 0 , . . . , P c of consecutive inputs that feed the same kind of gate; see Figure 4 (a). We denote the common gate type of the gates fed by the inputs P b by G b . Note that for each b ∈ {0, . . . , c − 1}, the group P b contains at least one input, while P c contains at least two inputs.
For each b ∈ {0, . . . , c}, we can build a symmetric binary G b -tree on the inputs P b using Huffman coding. By Remark 2.8, this yields a Boolean circuit C b with output t b with delay Denote the outputs of the circuits C 0 , . . . ,C c by t 0 , . . . , t c , respectively. Without loss of generality, we may assume that • 1 = ∧. Then, we can express h(t, • 1 , . . . , • m−1 ) as an And-Or path as follows:
h(t, • 1 , . . . , • m−1 ) = д(t 0 , . . . , t c )
In Figure 4 (b), we can see the circuit arising from the generalized And-Or Path in Figure 4 (a) in this way: The white gates are used for the circuits C b on the input groups P b ; and their outputs feed an And-Or path. Write W := c b=0 W (t i ). Theorem 4.1 yields a circuit C for д((t 0 , . . . , t c )) and thus also for h(t, • 1 , . . . , • m−1 ) with delay at most log 2 (W ) + log 2 log 2 (c + 1) + log 2 log 2 log 2 (c + 1) + 4.3.
(31)
Note that the weight W can be bounded by
Hence, the delay of C stated in Statement (31) can be bounded by log 2 (W ) + log 2 log 2 (c + 1) + log 2 log 2 log 2 (c + 1) + 4.3 ≤ log 2 (W ) + log 2 log 2 (c + 1) + log 2 log 2 log 2 (c + 1) + 5.3, which finishes the proof of the delay bound. For bounding the size of the arising circuit, note that for each b ∈ {0, . . . , c}, the circuit C b has |P b | − 1 gates. Together with the size bound for the circuit realizing the And-Or path (30) on c + 1
