It is well-known which symmetric Boolean functions can becomputed by constant depth, polynomial size, unbounded fan-in circuits, i.e. which are contained in the complexity class AC 0 . This result is sharpened. Symmetric Boolean functions in AC 0 can be computed by unbounded fan-in circuits with the following properties. If the optimal depth of AC 0 -circuits is d, the depth is at most d + 2 , the number of wires is almost linear, namely n log O(1) n, and the number of gates is subpolynomial (but superpolylogarithmic), namely 2 O(log n) for some < 1.
Introduction
Symmetric functions form an important subclass of Boolean functions including all kinds of counting functions. A Boolean function f : f0 1g n ! f0 1g is called symmetric if f(x 1 : : : x n ) depends on the input only via x 1 +: : : +x n , the number of ones in the input. Hence, symmetric functions f can be described by v alue vectors v(f) = ( v 0 : : : v n ) where v i is the output of f on inputs with exactly i ones.
It is a classical result of Boolean complexity theory that all symmetric Boolean functions are contained in N C 1 . They can even be computed by fan-in 2 circuits of logarithmic depth and linear size. We are interested in the more massive parallel, unbounded fan-in circuits. AC 0 is the class of Boolean functions computable by u n bounded fan-in circuits of constant depth and polynomial size. The following theorem by Moran (1987) and Brustmann and Wegener (1987) is based on the lower bounds of Boppana (1984) and Hastad (1986) and the upper bounds due to Ajtai and Ben-Or (1984) , Denenberg, Gurevich and Shelah (1986) and Fagin, Klawe, Pippenger and Stockmeyer (1985) .
Theorem 1: A sequence of symmetric Boolean functions f = (f n ) with value vectors v(f n ) = ( v n 0 : : : v n n ) is contained in AC 0 i v n g(n) = : : : = v n n;g(n) for some polylogarithmic function g, i.e. g(n) = O(log k n) for some k.
This theorem does not answer whether symmetric functions in AC 0 can becomputed by AC 0 -circuits which are e cient compared with the well-known logarithmic depth, linear size, fan-in 2 circuits for symmetric functions. A similar question has been asked for adders. The carry-look-ahead method leads to an unbounded fan-in circuit with optimal depth 3 (adders of depth 2 need exponential size), (n 2 ) gates and (n 3 ) wires. But these adders cannot compete with the well-known fan-in 2 adders of linear size and logarithmic depth. The problem of determining the complexity o f u n bounded fan-in constant depth adders has beensolved by Chandra, Fortune and Lipton (1983) for the upper boundsand by Dolev, Dwork, Pippenger and Wigderson (1983) for the lower bounds. For any recursive function g(n) where g(n) ! 1 as n ! 1 there are constant depth adders of size O(ng(n)) but there do not exist linear size adders of constant depth. The results of this paper go into the same direction.
It is proved that symmetric Boolean functions in AC 0 can be computed by unbounded fan-in circuits with constant depth, an almost linear number of n log O(1) n wires and a subpolynomial but superlogarithmic numb e r o f 2 O(log n) gates for some < 1. We improve the bestknown upper bounds for the depth of AC 0 -circuits by a factor of approximately 2. If the optimal depth of AC 0 -circuits is d, the depth of our circuits is at most d + 2 .
In Section 2 we review some known results and discuss some lower bounds. In Section 3 we reformulate the method of Denenberg, Gurevich and Shelah (1986) on which our circuit design presented in Section 4 is based.
Known results and simple lower bounds
Many of the known results are stated only for threshold functions which form some kind of basis for all symmetric functions. The threshold function T n k on n variables computes 1 exactly on those inputs with at least k ones, i.e. v l (T n k ) = 1 i l k. N T n k := :T n k+1 , is the corresponding negative threshold function, v l (N T n k ) = 1 i l k. E n k := T n k^N T n k is called exactly function, v l (E n k ) = 1 i l = k. A symmetric function f is obviously the disjunction of all E n k where v k (f) = 1 .
We have already mentioned that the upper bounds of Theorem 1 have been proved independently in three papers. Di erent methods have been used. For T n k where k = blog m nc Ajtai und Ben-Or (1984) proved the existence of circuits of depth 2m + 3 and size (n 2m+4 log m+1 n). The circuit whose existence has been proved by Fagin, Klawe, Pippenger and Stockmeyer (1985) also has depth 2m + 3 but the size is larger, namely n (m 2 ) . Denenberg, Gurevich and Shelah (1986) could even construct AC 0 -circuits. They were only interested in the qualitative result that the circuits are AC 0 -circuits. A direct implementation leads to circuits which are less e cient with respect to depth and size compared with the other circuits. Our circuit design uses the main idea of Denenberg, Gurevich and Shelah (1986) (see also Mayr (1985) ), the e cient coding of the cardinality of small subsets of f0 : : : n ;1g by short 0-1-vectors, see Lemma 1. Since we w ork directly with circuits and do not use the notation of logics, we are not concerned with the size of the \universe". This simpli es our approach. Furthermore, we present an iterative circuit design and use some implementation tricks. This leads to the uniform design of monotone circuits for T n k , where k = blog m nc, with the following characteristics. The depth is bmc + 3 (m must not bean integer), the numberof gates is g = 2 O(log m=(bmc+1) n log log n) and, hence, o(n ) for all > 0, and the numberof wires equals O(n log 2m+2 n). By using some more wires, namely O(ng) wires, we can decrease the depth to bmc + 2 , which is optimal, if m is not an integer.
Independently from our approach and with some other methods Newman, Ragde and Wigderson (1990) have w orked in the same direction. They have designed uniformly circuits of depth O(m), approximately 4m, n umber of gates O(n) and numberofwiresO(n log 2m n). Because of their use of hash functions the circuit is not monotone. This design beats our bounds only for the numberof wires and is worse else.
For constant k better results are possible. Friedman (1984) has investigated the formula size of threshold functions. Using his methods the following theorem can be proved in a straightforward way.
Theorem 2: For constant k, T n k can becomputed by unbounded fan-in circuits of depth 3 with O(log n) gates and O(n log n) wires.
In order to appreciate the new upper bounds we discuss some lower bounds. Each B o o l e a n function can becomputed with depth 2 by its DNF. But what is the minimal depth of circuits with polynomial size ? And what is the minimal size if the depth is bounded or even unbounded ? The following lower bound has been proved by Boppana (1984) for monotone circuits and by Hastad (1986) There are not too many small size lower bounds on the number of gates and wires of unbounded fan-in, unbounded depth circuits for functions in AC 0 or N C 1 . Hromkovic (1985) proved some lower bounds by a communication complexity approach and Wegener (1990) adapted the elimination method for proving lower bounds on the complexity of the parity function. For threshold functions we only know the following simple lower bounds.
Proposition 1: Unbounded fan-in circuits for T n k and 1 k n=2 need at least n wires and k gates.
Proof: Obviously, at least one wire has to leave each variable x i . If x i enters an _-gate or x i enters an^-gate, we can eliminate this gate for x i = 1. Otherwise one gate can be eliminated for x i = 0. This procedure can be repeated at least k times before T n k is replaced by a constant function.
The method of Denenberg, Gurevich and Shelah
We reformulate the method of Denenberg, Gurevich and Shelah (1986) in a generalized form. We use a representation supporting our circuit design. The method is based on a numbertheoretic theorem allowing a succint coding of the cardinality of small subsets of f0 : : : n ; 1g. L e t res(i j) : = ( i mod j) 2 f 0 : : : j ; 1g.
Lemma 1: Let L := L(n). For large n one can choose for each small subset S of f0 : : : n ; 1g, i.e. jSj L, some numberu < L 2 log n such that res(i u) 6 = res(j u) for all di erent i j 2 S.
Proof: The proof relies on the prime numbertheorem in the following form. Let S f0 : : : n ; 1g be some small set, i.e. jSj L. Let u be the smallest number such that res(i u) 6 = res(j u) for all di erent i j 2 S. For large n, either u < L 2 log n or (u ; 1) > (u ; 1) ln 2. We prove that (u ; 1) > (u ; 1) ln 2 implies u < L 2 log n and, hence, we prove the lemma.
Let a be the least common multiple of all ji ; jj, where i j 2 S and i 6 = j, and let b be the product of all p k where p is prime and p k u ; 1 < p k+1 . By de nition of u, res(i p k ) = res(j p k ) for some di erent i j 2 S. Hence, ji ; jj is a multiple of p k .
This implies that p k and also b divides a. Hence, b a. Also, a < n L(L;1)=2 , since, by assumption, S has at most ; L 2 subsets fi jg where i 6 = j and ji ; jj < n for i j 2 S. By de nition ln b = (u ; 1).
Combining all our inequalities we have (u ; 1) ln 2 < (u ; 1) = ln b ln a < L 2 ln n or u < L 2 log n.
This lemma can be applied several times. For the second application n is replaced by n 0 := L 2 log n and S is replaced by S 0 := fiji = res(j u) for some j 2 Sg. It should be emphasized that u = u(S) depends on S. For our circuit design this repeated application of the lemma is only of limited use. In order to keep the depth of the circuit small, we work with L = log n for some < 1 but 1. After the rst application of the lemma u can beestimated by log 1+2 n and the upper bound for u does not become smaller than (log 2 n) log log n.
We shall see that large L corresponds to large size and small depth, and small L corresponds to small size and large depth. For the size of the circuit the function L L is important.
4. The construction of small depth, small size circuits We start with a simple but important subcircuit.
Lemma 2: N T n 1 can be computed by a circuit of depth 3 with 3dlog ne + 1 gates and (n + 3 ) dlog ne wires. The output gate of the circuit is an^-gate.
Proof: We assume that n = 2 k , otherwise one may consider N T N 1 where N = 2 dlog ne and may replace N ; n inputs by zeros. N T n 1 is the conjunction of all prime clauses x i _ x j , i 6 = j. We use a so-called separating system to compute these prime clauses. By ( x 0^: : : x n=2;1 ) _ ( x n=2^: : : x n;1 ) we compute with 3 gates and n + 2 wires the conjunction of all prime clauses x i _ x j where the rst bit of the k-bit number i equals 0 while the rst bit of j equals 1. The same can be done for the other k ; 1 bits of the numbers0 : : : n ; 1. Finally, the k outputs of the _-gates are combined by a n-gate.
If we like to decrease the depth to 2, (n 2 ) gates and wires are necessary and su cient.
We are interested in the design of e cient circuits for symmetric functions in AC 0 . Since f and :f have the same complexity, it is by Theorem 1 su cient to consider symmetric functions f where v k (f) = 1 only for some k where k or n ; k is bounded by a polylogarithmic function. Furthermore, by duality, we may restrict ourselves to functions f where v k (f) = 1 only for some k where k is bounded by a polylogarithmic function.
Let L = L(n) be a function speci ed later and let f be a symmetric function where v k (f) = 1 only for some k L.
The rst step of our circuit design is an application of the coding lemma. For all u 2 f1 : : : U g where U := bL 2 log nc we compute in parallel the following information: We compute f u (y u ) b y its conjunctive normal form, i.e. in depth 2, from y u and y u . Hence, f u (y u ) is computed on the third level by an^-gate. Therefore, also c u^fu (y u ) can be computed on the third level. Finally, f is computed in depth 4.
We still have to estimate the size of the conjunctive normal forms. Since, v i = 0 for i > L Theorem 4: i) Symmetric functions f on n variables where v k (f) = 1 only for some k L can be computed by a n u n bounded fan-in circuit of depth 4 with O(L 2L+4 log L+2 n) gates and O(L 2L+6 log L+3 n + n L 2 log 2 n) wires. ii) If L = L(n) = O(log n) for some < 1, the number of gates is bounded by 2 O(log n log log n) and the numberof wires by O(n log 2+2 n).
We make some remarks. The number of gates (in part ii) of the theorem) is subpolynomial, i.e. o(n ) for each > 0, but superpolylogarithmic. The upper bound is superpolynomial, if L = (log n). We leave it to the reader to discuss functions L where L = o(log n) but L = !(log n) for all < 1. We also leave it to the reader to design circuits where Lemma 1 is applied more than once.
The threshold functions are of particular interest. Our circuit works for N T n k;1 only with the negative variables x i . The conjunctive normal form for f u = N T u k;1 consists only of the ; n k prime clauses containing only n ; k negative y u -variables each. By Lemma 1, the N T 1 -circuits work only with negative variables. By construction, y u j is the conjunction of some negative v ariables. Hence, we do not need the positive y u -a n d x-variables. In order to compute T n k = :N T n k;1 we apply deMorgan's rules to this circuit and obtain a monotone circuit for T n k of the same size as stated in Theorem 4.
Up to now we do not have designed e cient circuits for all symmetric functions in AC 0 . For the general case let us assume that v k = 1 only for some k L m where L = O(log n) for some < 1 and m is a constant. For U := bL 2m log nc we compute as before all y u y u and c u with O(U 2 log n) gates and O(nU log n) wires. y u and y u are computed in depth 1 and c u in depth 3.
Since v k may equal 1 for k = L m , the normal forms for f u may have nonpolynomial size. We only can test y u for up to L ones with normal forms of very small size. We do these computations for all subvectors of y u . Afterwards we may test for L 2 ones by testing L pieces for L ones each. After m steps of this type we can test y u for up to L m ones. We explain these ideas now in more detail.
Let u be xed and let i j] denote the vector (y u i : : : y u j ). We are interested in the following functions and their negations. We describe how we compute the functions p l p l : : : s l s l by disjunctive forms using the functions p l;1 : : : s l;1 t t. T h e n w e know b y deMorgans's rules also conjunctive forms of the same complexity for these functions.
For even l we use the disjunctive forms and for oddl the conjunctive ones. Then the rst level of stage l is of the same type as the second level of stage l ; 1 and these two levels can bemerged. Up to now we have considered the computation of f on the input (y u y u ) under the assumption that c u = 1. We still have to eliminate invalid computations. In any case we have negations only at the inputs. If the last level is an _-level, we feed all gates on level 3, an^-level, with the appropriate c u . F urthermore, we combine the outputs for the di erent u on the last level by a disjunction which does not increase the depth. Invalid computations feed 0 into this output disjunction. If all computations are invalid, we compute 0 which is also the correct result. If the last level is an^-level, we feed all gates on level 4, an _-level with the appropriate c u . Furthermore, we combine the outputs for the di erent u on the last level by a conjunction. Invalid computations feed 1 into this output conjunction. In order to obtain the correct result also in the case where all u are invalid, we also feed the disjunction of all c u into the output conjunction.
The depth of our circuit is m + 2 , i f m 2. The numberof gates can beestimated by O(L m+1 U L+4 ) = 2 O(log n log log n) :
The number of bits in all (y u y u c u ) i s v ery small. Hence, the number of wires which e n ter the gates on the rst two levels dominates the number of all other wires. Therefore, the numberof wires can beestimated by O(nU log n) = O(n log 2m +2 n).
We have proved the following theorem.
Theorem 5: Let L = O(log n) for some < 1. Symmetric functions f on n variables, where v k (f) = 1 only for some k L m and m 2 is a constant, can be computed by unbounded fan-in circuits of depth m + 2 with 2 O(log n log log n) gates and O(n log 2m +2 n) wires.
In order to compare our results with the results of the other papers we consider the special case of T n k and k = blog m c. Let L := log m=(m+1) n. Then L m+1 = log m n. W e have p r o ved the following theorem.
Theorem 6: The threshold functions T n k where k = blog m nc for constant m can be computed by u n bounded fan-in circuits of depth m + 3 with 2 O(log m=(m+1) n log log n) gates and O(n log 2m+2 n) wires.
In order to compare our results with the other papers we consider the special case of threshold functions. We design circuits for N T -functions working on the negative v ariables x i only and apply deMorgan's rules to obtain monotone circuits for threshold functions. We consider N T n k . W.l.o.g. (reductions by projection) we assume that k = L h and that L is an integer.
We compute as before c u (for all u U) in depth 3. We also compute all y u in depth 1. Now we are interested in the following functions. The functions for l = 1 again have simple disjunctive normal forms. We describe how we compute the functions p l and q l by disjunctive forms (for odd l) and by conjunctive forms (for even l) using the functions p l;1 and q l;1 . a disjunction. If the last level is an^-level, we combine all outputs on the last level by a conjunction. If N T n k (x) = 1, the invalid codings cause no problem, since they are only underestimating the numberof ones. If N T n k (x) = 0 , invalid computations may feed ones into the output gate. Hence, we feed the disjunction of all c u into the output gate.
For k = blog m nc (m not necessarily an integer but a constant) we c hoose L = blog m=(bmc+1) nc and h = bmc + 1 . (In order to beprecise we have to use the appropriate reductions.) By the design above we have proved the following theorem.
Theorem 6: The threshold functions T n k where k = blog m nc for constant m can be computed by monotone unbounded fan-in circuits of depth bmc+3 with 2 O(log m=(bmc+1) n log log n) gates and O(n log 2m+2 n) wires.
The depth of our circuits di ers, by Theorem 3, from the lower bound for polynomial-size circuits only by 2, if m is an integer, and by 1 else.
In order to minimize the depth we can even do better. The coded variables y u are computed by^-gates on the rst level, and the functions p 1 and q 1 are computed by their disjunctive normal forms. Hence, the level of the computation of y u and the rst level of the computation of p 1 and q 1 can be merged. This increases the number of wires to O(ng) where g is the number of gates. If m 2, the total depth is decreased to bmc + 2 . For 1 m < 2, we like to get by with depth 3 and have to feed the disjunction of all c u into the output gate, an^-gate, on depth 3. In this case we compute the N T 1 -functions c u by their disjunctive normal forms in depth 2. Also in depth 2 we can compute the disjunction of all c u . In this case we have increased the numberof gates to O(nU) and the numberof wires to O(n 2 U).
Theorem 7: The threshold functions T n k where k = blog m nc can becomputed by monotone unbounded fan-in circuits of depth bmc+2 with polynomial size. If m 2, the number of gates is subpolynomial and the numberof wires is only by a linear factor larger.
By Theorem 3 this depth is optimal, if m is not an integer. Theorem 7 holds also for m < 1, since in that case T n k has disjunctive normal forms of polynomial size.
